Adding code to an existing ELF file


Recently I was reverse-engineering an Android app[a]. The relevant details are as follows:

My goal was to dump a buffer I found that contained some data, and I was able to write a Python/GDB script to do just that. This worked fine on an (x86_64) emulator, but I faced a problem when I finally got ahold of an Android device: Python/GDB is very slow on hardware, especially when you're writing to disk at each breakpoint invocation.

So what's a gal to do with a laggy debugger and an app that crashes if it gets too far behind?

Translate her Python script to AArch64 assembly and patch it into the app, of course!

Because I had reverse-engineered the x86_64 binary and not the AArch64 one, I had to find the right registers to pull my buffer data from again. I also had to learn how to read (and write) ARMv8-A assembly. Thankfully, ARM is both a RISC and load-store architecture, so it was fairly easy to pick up on. I had a patch written fairly quickly[b].

Applying it was another matter. My first attempt went as follows:

This prevented the app from being able to load the ELF file, because it would attempt to access a string in .rodata and would instead pull a different string.

As it turns out, you can't just stick code to the end of the .text section, because relative addressing to later sections would be broken (and in this and most ELF files, .data, .rodata, .bss, etc. are all stored after .text.) In order to get this to work, I would have to find and modify every single relative address in the binary. Alternatively, I could try to somehow add my code after those sections. I decided on the latter, for what I'm sure are obvious reasons.

Time for a tour of the ELF format! (For simplicity, I'll be focusing on 64-bit ELF files.)

An ELF file has "segments" and "sections". The program header table contains segments, which hold runtime info and map out sections into memory segments. The section header table contains section descriptions, which map out the file contents. In order to add in our patch, we'd have to add it to the file, and map it to a section. Then, we'd have to map that section to a segment.

Adding a section is easy, because the section header table is stored at the end of this file (by no means is this a requirement). The contents of this new section should just be the .text section of the compiled patch. However, we do need to make sure to set some flags so it's executable:

as patch.s -o patch.o
objcopy patch.o --dump-section .text=patch.text
objcopy --add-section .patch=patch.text --set-section-flags .patch=code,readonly,alloc patch.out

Now we need to map this new .patch section to a segment. Sections are mapped to segments by checking what sections are in the segment's chunk of the file. Unfortunately, the program header is stored immediately after the ELF file header itself. This means we'll have to commandeer an existing segment header.

Here is the program header of the ELF file I'm working with:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000589c18 0x0000000000589c18  R E    0x1000
  LOAD           0x000000000058a530 0x000000000058b530 0x000000000058b530
                 0x000000000003a418 0x000000000003c268  RW     0x1000
  DYNAMIC        0x00000000005ba368 0x00000000005bb368 0x00000000005bb368
                 0x0000000000000320 0x0000000000000320  RW     0x8
  NOTE           0x0000000000000200 0x0000000000000200 0x0000000000000200
                 0x0000000000000024 0x0000000000000024  R      0x4
  NOTE           0x0000000000589b80 0x0000000000589b80 0x0000000000589b80
                 0x0000000000000098 0x0000000000000098  R      0x4
  GNU_EH_FRAME   0x0000000000509244 0x0000000000509244 0x0000000000509244
                 0x000000000000f97c 0x000000000000f97c  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x000000000058a530 0x000000000058b530 0x000000000058b530
                 0x0000000000038ad0 0x0000000000038ad0  R      0x1

Each segment has a type, which describes what it stores. These are the ones in the file:

A NOTE segment is the best (read: only) choice we can make here, but how do we choose which one?

 Section to Segment mapping:
  Segment Sections...
   00 .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .plt .text .rodata .eh_frame_hdr .eh_frame .gcc_except_table
   01     .init_array .fini_array .dynamic .got .data cfstring .bss
   02     .dynamic
   05     .eh_frame_hdr
   07     .init_array .fini_array .dynamic .got seems the safest bet here, as that's only used to provide a unique identifier for the binary. So my new process was:

Modifying the segment header is the meat of this process. Here is the segment header definition, from elf(5):

// typedef uint64_t Elf64_Off
// typedef uint64_t Elf64_Addr
typedef struct {
    uint32_t   p_type;
    uint32_t   p_flags;
    Elf64_Off  p_offset;
    Elf64_Addr p_vaddr;
    Elf64_Addr p_paddr;
    uint64_t   p_filesz;
    uint64_t   p_memsz;
    uint64_t   p_align;
} Elf64_Phdr;

One problem: as far as I can tell, objcopy doesn't support modifying the program table. In order to solve this problem and avoid modifying it by hand, I wrote a small C program capable of doing so.

My final build script looks like this:

as patch.s -o patch.o
objcopy patch.o --dump-section .text=patch.text
objcopy --add-section .patch=patch.text --set-section-flags .patch=code,readonly,alloc patch.out
./modelf patch.out                          \
    --segment 3                             \
        --type   1                          \
        --offset 0x5d0000                   \
        --vaddr  0x5e0000                   \
        --paddr  0x5e0000                   \
        --filesz $(stat -c '%s' patch.text) \
        --memsz  $(stat -c '%s' patch.text) \
        --align  1                          \
        --flags  0x5                        \
    --section 26                            \
        --addr 0x5e0000                     \
    --segment 1                             \
        --memsz 0x3c270                     \
    --section 24                            \
        --size 0x1e50
./ modelf-out.elf

And there you have it! This method produces a working, patched shared object ready to be loaded by the Android app.

If the ELF file didn't happen to have a spare NOTE segment, I would have needed to do something much uglier. I plan on experimenting with adding new segments in the future.

a. ^ I plan to do a writeup of this project in the future, but until then, I won't give specifics on it (for legal reasons, if nothing else).

b. ^ Of course, I had to fix bugs after successfully patching it in, but the process of writing the patch and fixing the bugs therein are outside the scope of this post and are better suited for the aforementioned planned future writeup.

c. ^ Only sections in LOAD segments are mapped to memory, but other segments can also contain those sections (e.g. here).