An accessible overview of browser security

2011-09-20 by . 2 comments

Post to Twitter

So the purpose of this post is to introduce you, an IT-aware but perhaps not software-engineering person, to the various issues surrounding browser security. In order to do this I’ll go quickly through some theory you need to know first and there is some simple C-like pseudocode and the odd bad joke, but otherwise, this isn’t too much of a technical post and you should be able to read it without having to dive into 50 or so books or google every other word. At least, that’s the idea.

What is a browser?

Before we answer that we really need a crash course on kernel internals. It’s not rocket science though, it’s actually very simple. An application on your system runs and is called a process. Each process, for most systems you’re likely to be running, has it’s own address space, which is where everything lives (memory). A process has some code and the operating system (kernel) gives it a set amount of time to run before giving another process some time. How to do that so everything keeps going really would be a technical blog post, so we’ll just skip it. Suffice to say, it’s being constantly improved.

Of course, sometimes you might want to get more than one thing done at once, in which case threads come in. How threads are implemented across OSes is different – on Windows, each process also contains a single, initial thread. If you want to do more things, you can add threads to the process. On Linux, threads and processes are all the same, but they share memory. That’s the real important common idea behind a thread and is also true of Windows – threads share memory, processes do not.

How does my process get stuff done then?

So now we’ve got that out the way, how does your program get stuff done? Well, you can do one of these three things:

  1. If you can do it yourself, as a program, you just do it.
  2. If another library can do it, you might use that.
  3. Or alternatively, you might ask the operating system to do it.

In reality, what might happen is your program asks a library to do it which then asks the operating system to do it. That’s exactly how the C standard library works, for example. The end result, however, and the one we care about, is whether you end up asking the operating system. So you get this:

Operating system  Program

Later on, I’ll make the program side of things slightly more complicated, but let’s move on.

How do we do plugins generically?

Firstly, a word on “shared objects”. Windows calls these dynamic link libraries and they’re very powerful, but all we’re concerned with here really is how they end up working. On the simplest possible level, you might have a function like this:

void parse_html(char* input)

A shared library might implement a function like this:

void decode_and_play_swf(char* data)

When a shared object loads with your code, the result, in your program’s memory, is this:

void decode_and_play_swf(char* data)

<p>void parse_html(char* input)

Told you it was straightforward. Now, making a plugin work is a little more complicated than that. The program code needs to be able to load the shared object and must expect a standard set of functions be present. For example, a browser might expect a function like this in a plugin:

int plugin_iscompatible(int version)
    if ( version <= min_required_version && version <= max_supported_version )
        return 0;
        return -1;

Similarly, the app we’re plugging into might offer certain functions that a plugin can call to do things. An example of this would be exactly a browser plugin for drawing custom content to a page – the browser needs to provide the plugin the functionality to do that.

So can all this code do what it likes then?

Well that depends. Basically, when you do this:

Operating system   Program

the operating system doesn’t just do whatever you asked for. In most modern OSes, that program is also run in the context of a given user. So that program can do exactly what the user can do.

I should also point out that for the purposes of loading shared libraries, they appear to most reporting tools as the program they’re running as. After all, they are literally loaded into that program. So when the plugin is doing something, it does it on behalf of the program.

I should also point out that when you load the plugin, it literally gets joined to the memory space of the program. Yes, this is an attack vector called DLL injection, yes you can do bad things but it is also highly useful too.

I read this blog post to know about browsers! What’s all this?

One last point to make. Clearly, browsers run on lots of different platforms. Browsers have to be compiled on each platform, but forcing plugin writers to do that is a bit hostile really. It means many decent plugins might not get written, so browser makers came up with a solution: embed a programming language.

That sounds a lot crazier than it is. Technically, you already have one quite complicated parser already in the browser; you’ve got to support this weird HTML thing (without using regex) – and CSS, and Javascript… wait a minute, given we’ve stuck a programming language parser for javascript into the browser, why don’t we extend that to support plugins?

This is exactly what Mozilla did. See their introductory guide (and as a side note on how good Stack Overflow really is, here’s a Mozilla developer on SO pointing someone to the docs). Actually in truth it’s a little bit more complicated than I made out. Since Firefox already includes an XML/XHTML/HTML parser, the developers went all out – a lot of the user interface can be extended using XUL, an XML variant for that purpose. Technically, you could write an entire user interface in it.

Just as a quick recap, in order for your XML, XHTML, HTML, Javascript, CSS to get things done, it has to do this:

Operating system   Browser   Javascript or something

So, now the browser makes decisions based on what the Javascript asks before it asks the operating system. This has some security implications, which we’ll get to in exactly one more diagram’s time. It’s worth noting that on a broad concept level, languages like Python and Java also work this way:

Operating system   Interpreter/JVM etc  Script/Java Bytecode

Browsers! Focus, please!

Ok, ok. All of that was to enable you to understand the various levels on which a browser can be attacked – you need to understand the difference in locations and the various concepts for this explanation to work, so hopefully we’re there. So let’s get to it.

So, basically, security issues arise for the most part (ignoring some more complicated ones for now) because of bad input. Bad input gets into a program and hopefully crashes it, because the alternative is that specially crafted input makes it do something it shouldn’t and that’s pretty much always a lot worse. So the real aim of the game is evaluating how bad bad input really is. The browser accepts a lot of input on varying levels, so let’s take them turn by turn:

  1. The address bar. You might seriously think, wait, that can’t be a problem can it, but if the browser does not correctly pass the right thing to the “page getting” bit, or the “page getting” bit doesn’t error when it gets bad input, you have a problem, because all the links all over the input are going to break your browser. Believe it or not, in the early days of Internet Explorer, this was actually an issue that could lead you to believe you were on a different site to the one you thought you were on. See here. These days, this particular attack vector is more of a phishing/spoofing style problem than browser manufacturers getting it wrong.

  2. Content parsers. For content parsing I mean all the bits that make it up – html, xhtml, javascript, css. If there are bugs in these, the best case scenario is really that the browser crashes. However, you might be able to get the browser to do something it shouldn’t with a specially crafted web page, javascript or CSS file. Technically speaking, the browser should prevent the javascript it is executing from doing anything too malicious to your system. That’s the idea. The browser will also prevent said javascript from doing anything too malicious to other pages too. A full discussion of web vulnerabilities like this is a blog post in itself, really so we won’t cover it here.

  3. Extensions in Javascript/whatever the browser uses. Again, this comes down to are the vulnerabilities in the parser/enforcement rules. Otherwise, an extension of this nature will have more “power” than an in page Javascript, but still won’t be able to do too much “bad stuff”.

  4. Binary formats in the rendering engine. Clearly, a web page is more than just parseable programming-language like things – it’s also images in a whole variety of complicated formats. A bug in these processors might be exploitable; given they’re written as native functions inside the browser, or as shared libraries the browser uses, the responsibility for access control is at the OS level and so a renderer that is persuaded to do something bad could do anything the user running the process can do.

  5. Extensions (Firefox plugins)/File format handlers. These enable the browser to do things it can’t do in and of itself and are often developed externally to the core “browsing” part. Examples of this are embedding adobe reader in web pages and displaying flash applications. Again, these have decoders or parsers, so bugs in these can persuade the plugin to ask the operating system to do what we want it to do, rather than what it should.

You might wonder if 4 and 5 really should be subject to the same protections afforded to the javascript part. Well, the interesting thing is, that is part of the solution, but you can’t really do it in Javascript, because compared to having direct access to memory and the operating system, it would be much slower. Clearly, that’s a problem for video decoding and really heavy duty graphical applications such as flash is often used for, so plugins are and were the solution.

Ok, ok, tell me, what’s the worst the big bad hacker can do?

Depends. There are two really lucrative possible exploits from bugs of this sort; either, arbitrary code execution (do what you want) or drive by downloading (which is basically, send code that downloads something, then run it). In and of itself, the fact that you’re browsing as a restricted user and not an administrator (right?) means the damage the attacker can do is limited to what your user can do. That could be fairly critical, including infecting all your files with malware, deleting stuff, whatever it is they want. You might then unwittingly pass on the infection.

The real problem, however, is that the attacker can use this downloaded code as a launch platform to attack more important parts. A full discussion of botnets, rootkits and persistent malware really ought to also be a separate blog post, but let’s just say the upshot is your computer could be taking part in attacks on other computers orchestrated by criminal gangs. Serious then.

Oh my unicorns, why haven’t you developers fixed this?

First up, no piece of code is entirely perfect. The problem with large software projects is that they become almost too complicated for one person to understand every part and every change that is going on. So the project is split up with maintainers for each part and coding standards, rules, guidelines, tools you must run and that must pass, tests you must run and must pass etc. In my experience, this improves things, but no matter how much you throw at it in terms of management, you’re still dealing with humans writing programs. So mistakes happen.

However, our increasing reliance on the web has got armies of security people of the friendly, helpful sort trying to fix the problem, including people who build browsers. I went at great lengths to discuss threads and processes – there was a good reason for that, because now I can discuss what google chrome tries to do.


So as I said, the traditional way you think about building an application when you want to do more than one thing at once is to use threads. You can store your data in memory and any thread can get it (I’m totally ignoring a whole series of problems you can encounter with this for simplicity, but if you’re a budding programmer, read up on concurrency before just trying this) reasonably easily. So when you load a page and run a download, both can happen at once and the operating system makes it look like they’re both happening at the same time.

Any bugs from 2-5, however, can affect this whole process. There’s nothing to stop a plugin waltzing over and modifying the memory of the parser, or totally replacing it, or well, anything. Also, a crash in a single thread in a program tends to bring the whole thing down, so from a stability point of view, it can also be a problem. Here’s a nice piece of ascii art:

| Process no 3412  Thread/parsing             |
|  Big pile of     Thread/rendering PNG                                 |
|  shared memory   Thread/running flash plugin - video from youtube     |
So, all that code is in the same space (memory address) as each other, basically. Simple enough.

The Chrome developers looked at this and decided it was a really bad idea. For starters, there’s the stability problem – plugins do crash fairly frequently, especially if you’ve ever used the 64-bit flash plugin for linux. That might take the whole process down, or it might cause something else to happen, who knows.

The Chrome model executes sub-processes for each site to be rendered (displayed) and for every plugin in use, like this:

   |--------------------------|       |-----------------------------------|
   | Master chrome            |-------| Rendering process for security....|
   | process. User interface  |       |-----------------------------------|
   | May also use threads     |
                |                     | Rendering process for somesite... |
   |--------------------------|       |-----------------------------------|
   | Flash plugin process     |
So now what happens is the master process becomes only the user interface. Child processes actually do the work, but if they get bad input, they die and don’t bring the whole browser down.

Now, this also helps solve the security problem to some extent. Those sub-processes cannot alter each other’s memory easily at all, so there’s not much for them to exploit. In order to get anything done, they have to use IPC – interprocess communication, to talk to the master process and ask for certain things to happen. So now the master process can do some filtering based on a policy of who should be able to access what. Child processes are then denied access to things they can’t have, by the master process.

But, can’t sub processes just ask the operating system directly? Well, not on Windows for certain. When the master process creates the child process, it also asks Windows for a new security “level” if you like (token is the technical term) and creates the process in that level, which is highly restricted. So the child process will actually be denied certain things by the operating system unless it needs them, which is harder to do on a single process where parts of it do need unrestricted access.

In defence to the Windows engineers, this functionality (security tokens) has been part of the Windows API for a while; it’s just developers haven’t really used it. Also, it applies to threads too, although there remains a problem with access to the process’s memory.

Great! I can switch to Chrome and all will be well!

Hold up a second… yes, this is an excellent concept and hopefully other browser writers will start looking at integrating OS security into their browsers too. However, any solution like this will also under certain conditions be circumventable, it’s just a matter of finding a scenario. Also, chrome and any other browser cannot defend you from yourself – some basic rules still apply. They are, briefly, do not open dodgy content and do keep your system/antivirus/firewall up to date.

Right, so what’s the best browser?

I can’t answer that, nor will I attempt to try. All browser manufacturers know that “security” bugs happen and all of them release fixes, as do third parties. A discussion on which has the best architecture is not a simple one – we’ve outlined the theory here, but how it is implemented will really determine whether it works – the devil is in the detail, as they say.

So that concludes our high level discussion of the various security issues with browsers, plugins and drive by downloads. However, I would really like to stress there is a whole other area of security which you might broadly call “web application” security, which concerns in part what the browser allows to happen, but also encompasses servers and databases and many other things. We’ve covered a small corner of the various problems today; hopefully some of our resident web people will over time cover some of the interesting things that happen in building complex web sites. Here is a list of all the questions here tagged web-browser.

Filed under Uncategorized


Subscribe to comments with RSS.

  • Tony Meyer says:

    There’s a dated (pre Safari 5.1) but quite readable explanation of how the WebKit2 model is both similar to and different to the Chrome model (built on WebKit1) here:

  • ninefingers says:

    @Tony That’s a really good explanation of webkit’s architecture, thanks!