Clarifying Hacking with XSS

Disclaimer: The ideas below are my own and may not reflect those of my employer.

Clarifying Hacking with...

The purpose of this post is to help cybersecurity professionals explain 'hacking' to lay-people. This might be useful when communicating with individuals totally outside of the infosec space, or with businessy folk inside the space that want to get a more intuitive understanding of what attackers really can do and how they can be so dangerous.

...Cross-Site Scripting

Cross-Site Scripting is an excellent vulnerability to showcase the reach and power of attacks for at least three reasons:

1) The client is the victim and the explainee is the client

Since XSS targets a visiting client's browser, the attack can be made much more personally relevant than something like SQL Injection or Directory Traversal. Most people don't have experience with managing databases or organizing web directory structures, but they do have a lot of experience operating a browser. Since the browser is itself the target, we can skip a bunch of abstractions and get right to what happens between the attacker and the victim.

2) The explainee experiences user-generated web content often

The first central concept used in the explanation below is that web pages often allow users to alter the experience of other users. Luckily, people are very familiar with this type of web content. It is everywhere in the form of things like social media, blog post comments, and web forums. While XSS doesn't need to target these kinds of sites, they are a great starting point for helping people realize that other users can change the content that their browser renders.

3) The code can be descriptive and concise

When we start getting into the details of how an attack works, its important that any code we want to employ is easy to read and low on symbols / syntax. Fortunately, XSS requires a very low amount of code to get started with even for someone who hasn't ever read or written a line of code before.

Clients and Canvases

To proceed with the explanation, we'll use three different levels of description. The first will be to introduce a magical XSS-like interaction in the physical world. The second will provide the intuition that the content consumed on web pages can be authored by other users, and show how malicious actors can make someone's experience unpleasant. Finally, the third will dive deeper into the mechanics of XSS properly.

To begin, let's introduce the concept of a Public Canvas. Imagine going to the middle of a populated city square where a giant Oil Painting Canvas is at its center. This Canvas is peculiar since it's content is entirely crowdsourced. Anyone can walk up to it and paint a single brushstroke with the conveniently available magic paintbrushes that never run out of paint. In addition, brushstrokes can never be removed or painted over with other colors.

Anyone who interacts with the Canvas is called a "Canvas Client", or a User. Users have two essential actions that they can take with regard to the Canvas:

  1. They can "read" the Canvas, which they do by default just by observing it
  2. They can "write" to the Canvas by adding a brushstroke of their choice to it

Notice that Users can inherently affect the experience of other Users. If Alice "writes" a red square onto the Canvas, then Bob will "read" the red square when he observes it. In some sense, this transferability of experience is the purpose of the Canvas. It exists so that Users can collectively share artistry and alter the reality of other Users.

But Alice can also affect Bob's experience in a more subtle sense. One day Alice realizes that she can paint actual words onto the Canvas. Of course, these words would need to be cursive since Users must paint a single brushstroke at a time. If Alice assumes that Bob knows English, then she will realize that Bob cannot help but read whatever words Alice decides to paint. So if Alice paints the phrase Think of an Orange Dolphin, then Bob will read the words and subsequently think of an Orange Dolphin.

Alice has "hacked" Bob's mind by forcing him to think of something that he didn't intend to think about via a "command", i.e. the command to think of an Orange Dolphin. Bob has no choice but to run the concept of an Orange Dolphin in his mind when he reads the Canvas.

Content as a Corruptible Canvas

We've broken the ice with a contrived-yet-physical scenario where Users can impact the experience of other Users by writing specific inputs on a Public Canvas. In this part of the explanation, we aim to show that this general pattern is nowhere near as weird as the story of Alice and Bob. Indeed, it's a scene we each experience almost every day on social media sites like Facebook and Twitter.

When we as Users of something like Twitter browse to twitter.com, what happens? In simple terms, our browser requests that Twitter's web server provides us with some content which will be displayed in the browser. The web server responds and shows us some content, but what content is displayed? The algorithms sites like Twitter use are very complex, but the important part to note is that in Twitter's case, the vast majority of content is generated by other Twitter Users. In fact, almost all of the content we observe on Twitter comes from other Users and not from the company itself.

Viewed in this light, Twitter is a Canvas upon which any User can read and write. However, just because a User writes (a tweet) does not mean that other Users will necessarily read it. Who gets to read what is dictated by follower count and by who follows whom. Still, we can imagine a simpler version of Twitter where every tweet gets read by every other User.

On such a site, it would be trivial to corrupt the Canvas with offensive content and therefore negatively influence the experience of other Users. In real-Twitter's case, rules against certain content prevent Users from posting particularly unpleasant items. The important thing to note is that the regular intended use case for Twitter Users -- i.e. the posting of data to the site -- allows them to impact what happens to other Users.

From here we only need to take one more step before we can get to Cross-Site Scripting for real. The important aha moment is this: when Cathy writes a tweet and Derek loads the webpage, it isn't only Derek-the-Human that is forced to read whatever Cathy wrote. Crucially, Derek's browser has to load the content produced by Cathy, and that is the mechanism by which something like XSS can work.

Canvas to Code

At this point the explainee should have a pretty strong intuition about how a User can influence and negatively impact the experience of other Users. They should also understand how the content that a User writes will get loaded up and read by the browsers belonging to other Users.

From here we must roll up our sleeves and get into the details of what XSS actually is, however we should hopefully be at a place where the listener is able to grasp a bit of web logic. Since this part can be rather long to explain in a real back-and-forth conversation, I'll briefly outline the rough steps I would take.

First, I'd introduce a simplified version of HTML. I'd explain that HTML describes web pages, and that it can be written with a series of tags like <HTML>, <head>, <body>, and <script>.

Second, I'd explain that the <script> tag allows the programmer to insert some code into the web page. I wouldn't necessarily talk about JavaScript, or explain the difference between a markup language and a programming language. Instead, I would focus on what <script> does at a high level: it instructs the browser to actually do things other than just display text.

Third, I'd start talking about the concept of injection. In a site's normal use case, the User is expected to post content that will usually appear as text in another User's browser. But what happens if they post something that seems like an HTML tag? For example, what happens if they post <b>this text is bold</b>, and it causes another User to read this text is bold?

(EDIT: Ironically, when I initially wrote this post I did not use backticks (`) to put the above HTML tags in code blocks and the website would not publish any content from the tags onward. This shows that Hashnode is filtering for HTML syntax. Nice!)

Fourth, if I could convince the explainee that it's usually unintended to cause other Users to load bold text in their browsers via HTML injection, then it's not a big stretch to show how an attacker could be capable of using <script> to perform code injection. And so, here we finally arrive at the inevitable and ubiquitous <script>alert('XSS')</script> example. When I as the attacker insert the above string into a webpage, and the webpage accepts the input as code (and not merely as text), then when your browser loads the page, it will execute the code (instead of merely displaying it as text).

To summarize, a given site might expect Users to write text that it will later display to other Users. However, a properly programmed site will interpret any text that happens to appear like code just like any other text, or it will filter it out. If the site isn't properly written, it might interpret text that seems like code as code. In such a scenario, an attacker can inject code into the web page which will later be loaded and run by other Users' browsers.

While the example shown above via alert is often harmless, it should do the trick of persuading the explainee how an attacker might be able to control someone else's browser just by entering some text. At this point, they will hopefully be sufficiently spooked so as to understand the potential gravity of the situation.

Caveats and Considerations

I'll end this post with a few details about XSS that do make it somewhat cumbersome to explain if we are trying to be precise. First and foremost, we might want to inform our audience that XSS is really poorly named: it's usually not Cross-Site, it requires barely any scripting, and its acronym doesn't even make use of the first letters of it's component words. It's also not always useful for many penetration testers.

1) XSS isn't necessarily Cross-Site

Most XSS is done via text inputs in the comments section of a blog post or other similar fields. Usually we don't need to enlist a second website to communicate to the target site (i.e. the "Cross-Site" in "Cross-Site Scripting" isn't usually true).

2) XSS doesn't require Scripting

As shown above, the attacker only needs a tiny amount of code to hijack other Users' browsers. This means that the "Scripting" in "Cross-Site Scripting" also isn't usually true.

So if Cross-Site Scripting neither is Cross-Site nor does it require Scripting... then what exactly is it? My friend and colleague Dejan Zelic prefers the term "output injection", because what the attacker is really doing is manipulating the output loaded and subsequently executed by a victim's browser.

3) XSS is not CSS

To make matters more confusing, the acronym XSS starts with an X and not a C, which can certainly make people furrow their eyebrows. "CSS" is already taken, and Xs are edgy and scary.

Finally, a useful XSS attack for a penetration tester usually requires that the victim have higher permissions than the attacker, or else there's not usually a purpose to the attack. Explaining this point is however beyond the scope of this blog post.

In spite of these details, I believe that XSS is visceral and visual enough so that it can be easily understood by non-security and even non-tech people due to the familiarity of its target. Next time you're chatting with a friend or a colleague about the reach of hacking, please feel free to try out the explanation above and let me know how it goes!