Assignment: RE
(Edit 30 Jan 2017: clarify what is required for question 9.)
Purpose
The purpose of this assignment is to help understand the process of reverse engineering and refresh students’ knowledge of x86 assembly.
Materials to Review
You should understand the basic operation of x86 assembly language and the call stack. You may also find materials from 2150, this guide for 64-bit x86 assembly from CMU and this guide for 32-bit x86 assembly helpful.
A simple C program:
Consider the following two file C program:
foo1.c
:
#include <stdio.h>
#define BUFSIZE 12
int foo(char vector[], int len, int value);
int main() {
int i, sum, x;
char buffer[BUFSIZE];
x = foo(buffer, BUFSIZE, 5);
sum = 0;
for (i = 0; i < BUFSIZE; i++)
sum += buffer[i];
printf ("Sum is %d\n", sum);
return 0;
}
foo2.c
:
int foo(char vector[], int len, int value) {
int i;
for (i = 0; i < len; i++)
vector[i] = value;
return len;
}
A result of compiling this with gcc -O2 foo1.c foo2.c
, then disassemblying the result with
objdump -sRrd
is included in foo-disasm.txt. See below for some notes about interpreting
this output format.
Task
Look at the dump of the information in the executable and answer the following questions.
When asked for an address, provide the form used in the assembly to access the value, e.g
0x18(%rsp)
or (%rax)
. If a value is stored directly in a register, indicate that register
instead. If a value is both located in registers and in memory, provide
both locations. If a value was eliminated by optimizations, say so.
- What is the address or register of the local variable
i
in main()? - What is the address or register of the local variable
sum
in main()? - What is the address or register of the local variable
x
in main()? - What is the address or register of the local variable
buffer
in main()? - What is the address or register of the parameter
len
in foo()? - What is the address or register of the parameter
value
in foo()? - What is the address or register of the local variable
i
in foo()? - What needs to happen for the jne at address 0x40052f to jump to 0x400536?
__libc_start_main
is passed several (constant) addresses. What do these appear to represent? (It is sufficient to identify what things they are addresses of.)
Put your answers in a text file and submit it on Collab.
Notes on interpreting the objdump file
General format
The objdump output we provide contains several parts corresponding to several parts of the executable, which are described in more detail below:
- Information about the type of executable file. This indicates what architecture it is for, that it is in Linux’s ELF format, and the address at which execution of the program starts.
-
The actual contents of each “section” of the executable that will be loaded into memory. The sections have names like “.text” and “.dynstr” depending on their purpose.
These will look something like:
Contents of section .text: 4004d0 4883ec28 ba050000 00be0c00 00004889 H..(..........H. 4004e0 e764488b 04252800 00004889 44241831 .dH..%(...H.D$.1 4004f0 c0e83a01 0000488d 74240c48 89e031d2 ..:...H.t$.H..1.
The leftmost column indicates the address (in hexadecimal) where this data will be loaded in memory. The next four columns are the hexadecimal values actually placed in memory. These values are written in the order the bytes appear in memory, so the value
0x12345678
in little endian will appear as78563412
. The final columns are the same values represented as characters, except a period (.
) is used to represent bytes which do not correspond to a printable ASCII character. -
Disassembled versions of sections that contain executable code.
These will look something like:
0000000000400460 <_init>: 400460: 48 83 ec 08 sub $0x8,%rsp 400464: 48 8b 05 8d 0b 20 00 mov 0x200b8d(%rip),%rax # 600ff8 <_DYNAMIC+0x1d0> 40046b: 48 85 c0 test %rax,%rax 40046e: 74 05 je 400475 <_init+0x15> 400470: e8 3b 00 00 00 callq 4004b0 <__gmon_start__@plt> 400475: 48 83 c4 08 add $0x8,%rsp 400479: c3 retq
This indicates that there is a label called
_init
which has the address0x400460
when the executable is loaded. Each following line is an instruction. The value before the colon indicates the memory address in hexadecimal of the first byte of the instruction. The hexadecimal values after the colon are the bytes of the instruction in hexadecimal. Following this is the disassembled instruction itself.Within the disassembled instructions,
objdump
attempts to provide information about addresses in addition to showing the addresses encoded in the instruction. In cases where the label is exactly equal to the address, like for the label__gmon_start__@plt
in the example above, the format isaddress <LABEL>
with the address in hexadecimal. In cases where the address does not correspond to a label, the format isaddress <LABEL+offset>
. For example400475 <_init+0x15>
indicates the address0x400475
, which is0x15
bytes after the label_init
.On 64-bit x86, some instructions specify an address relative to
%rip
.%rip
represents the “instruction pointer”, which in 2150 and 3330 we have called the “program counter”. It is the address of the current instruction, so0x200b8d(%rip)
means memory0x200b8d
bytes after the address of the current instruction.objdump
’s disassembly includes a comment indicating what address is computed. In the case of the example above, the address is0x600ff8
, which is0x1d0
bytes after the label_DYNAMIC
.
Note that this is not all the information in the executable and not all the information that
objdump
is capable of providing.
On dynamic linking
This executable is dynamically linked, so it doesn’t include code for functions
in the C standard library like printf
. These are loaded at runtime by the dynamic linker
which is contained in /lib64/ld-linux-x86-64.so.2
. The way Linux implements dynamic linking
involves having this program handle loading all dynamically linked executables as an interpreter.
As part of Linux’s implementation of dynamic linking, there is a Procedure Linkage Table (PLT). This contains “stubs” for each function the executable expects to find in a dynamically linked library, like the C standard library. One of the “stubs” looks like:
00000000004004c0 <__printf_chk@plt>:
4004c0: ff 25 6a 0b 20 00 jmpq *0x200b6a(%rip) # 601030 <_GLOBAL_OFFSET_TABLE_+0x30>
4004c6: 68 03 00 00 00 pushq $0x3
4004cb: e9 b0 ff ff ff jmpq 400480 <_init+0x20>
This stub is called __print_chk@plt
and is loaded into the program’s memory at address
0x4004c0
. The first instruction in this function reads the address of a function from memory
at 0x601030
, then jumps to that function. As indicated by the comment added by objdump
this
address is part of the “global offset table”. This is an array of pointers
used to find functions like printf which are loaded every time the executable runs. Using this table
allows the same program to work with different implementations of printf
, where printf
may end up
at different locations in memory. For example, in this case the global offset table will eventually
contain the address at which __printf_chk
, part of the Linux C library’s implementation of printf
is loaded into memory.
By default, the values in this global offset table are initialized to point to the
instruction following the jump, for example 0x601030
contains 0x4004c6
. This means that
the first time the “stub” is called, it will “fall through” to the code after the global offset
table jump. This code pushes an indicator of what function was called on the stack, then jumps
to part of the dynamic linker. (This code is not included in the executable file, and therefore
not present in the objdump
output.) The dynamic linker will then locate the actual routine (the
implementation of __printf_chk
in the standard library, in this case) and update the global offset
table to contain its address.
On _start
Execution of the program does not actually start in main but starts in a function called _start
that is
provided by the compiler — this is the start address specified in the program header. This function
calls a special function in the C standard library called __libc_start_main
. It is this function that
actually calls main
and takes care of exiting when main
returns.
On %fs
x86 has a feature called “segmentation”. As part of this feature, the processor has several “segment registers”
which specify a region of memory — essentially the segment register acts as a pointer.
%fs:0x28
specifies to use segment register %fs
and access a value 0x28
bytes from the beginning
of the memory region it identifies.
On Linux, the %fs
segment register is used for “thread-local storage” — to point to a block of data particular
to a thread, even in a multithreaded process.
On Windows, the %gs
segment register is used for something similar.
The use of a segment register for this purpose instead of a normal register is just to make sure as many registers are available to the program as possible.
Segmentation was originally intended to provide functionality similar to virtual memory. These days, it is rarely used for this purpose, and its primary use is to support thread-local storage, as occurs briefly in the assembly in this assignment. It is, still, however, universally present on x86 and is entangled with x86’s implementation of kernel mode and exceptions.