Assignment: RE
Changelog:
- 11 Feb 2021: remove irrelevant hints section about modifying executables
Your Task
-
Download the executable version of a “Go Fish” game written in C (based on the version distributed with OpenBSD, which is based on a version written by Muffy Brockey before 1990). Also download the source code here.
-
Answer the questions about the executable fish.exe linked above on the answer sheet. (Do not answer the questions about a recompiled version of the executable.)
-
Download the mystery executable here
-
Answer the questions about the executable mystery.exe linked above on the answer sheet answer sheet.
Hints for examining the executables
-
The
objdump
command is my recommended way of decoding an executable. A command likeobjdump -sRrd --file-offset something.exe > output.txt
will provide a fairly complete dump of information about
something.exe
and write it tooutput.txt
.Each of the options
s
,R
,r
, andd
enable certain different data to be included (so if you omit one of them, you’ll get output with less information), seeobjdump --help
orman objdump
for information about these options and other options we do not use.See “Interpreting objdump output” below for information on what you should expect from this “objdump” output.
-
Some other useful objdump options include:
objdump -x
to include all “program header” information
-
The way programs are typically built on Linux, execution of the program does not actually start in main but starts in a function called
_start
that is provided by the compiler — this is the start address specified in the program header. This function calls a special function in the C standard library called__libc_start_main
. It is this function that actually callsmain
and takes care of exiting whenmain
returns.
Interpreting objdump output
When run using the command we suggest above, the objdump output includes the following parts in this order:
File format information
Information like “elf64-x86-64” about the format of the exectuable.
Section contents
Objdump will dump the contents of “sections” in a format like:
Contents of section .text:
4011d0 f30f1efa 41544c8d 257b1000 005589fd ....ATL.%{...U..
4011e0 31ff5348 89f3e865 ffffff48 89c7e88d 1.SH...e...H....
4011f0 ffffff4c 89e24889 de89efe8 a0ffffff ...L..H.........
In this example:
-
“.text” is the name of the section. Generally, the section name will indicate its purpose. “text” usually means the section containing machine code.
-
The leftmost column indicates the address (in hexadecimal) where this data will be loaded in memory.
-
The next four columns are the hexadecimal values actually placed in memory. These values are written in the order the bytes appear in memory, so the value
0x12345678
in little endian will appear as78563412
. The final columns are the same values represented as characters, except a period (.
) is used to represent bytes which do not correspond to a printable ASCII character.
Disassembly
For each section that is marked as containing machine code, objdump will attempt to turn the machine code into assembly.
The assembly will be split into what objdump
guesses are the functions (but it may not always correctly
identify where a function starts and ends).
dynamic linking-related functions
You may notice some name@plt
functions in the disassembly that are unusual and deserve special
discussion:
00000000004011b0 <exit@plt> (File Offset: 0x11b0):
4011b0:▶ f3 0f 1e fa ▶ endbr64
4011b4:▶ f2 ff 25 b5 2e 00 00 ▶ bnd jmpq *0x2eb5(%rip) # 404070 <exit@GLIBC_2.2.5>
4011bb:▶ 0f 1f 44 00 00 ▶ nopl 0x0(%rax,%rax,1)
-
These are an artifact of dynamic linking. The
plt
stands for “procedure linkage table”, and this function’s purpose is to stand in for a standard library function. In this case, that function is “exit”. -
The instruction
bnd jmpq *0x2eb5(%rip)
reads a pointer from memory at0x2eb5 + %rip
. Then it jumps to the location that points to. The comment inserted by objdump# 404070 <exit@GLIBC_2.2.5>
indicates that that0x2eb5 + %rip
will be the address 0x404070, and that that address should contain a pointer to the symbolexit@GLIBC_2.2.5
which will be loaded from another file.Since the code for
exit
is not included in the executable, that pointer will be filled in as part of running the executable (either when loading the executable or sometime later):-
Running
objdump -R exec.exe
will give a list of the “dynamic relocation records” in the executable that will be fixed when the executable is loaded. For example:fish.exe: file format elf64-x86-64 DYNAMIC RELOCATION RECORDS OFFSET TYPE VALUE ... 0000000000404070 R_X86_64_JUMP_SLOT exit@GLIBC_2.2.5 ...
includes an indication that address
0x404070
should be replaced with the address ofexit@GLIBC_2.2.5
, and the TYPE fieldR_X86_64_JUMP_SLOT
indicates how that address should be formatted. -
Running
objdump -p exec.exe
will show general executable headers which include an “DYNAMIC” section. This section lists where external functions can likeexit@GLIBC_2.2.5
will be found. For example:fish.exe: file format elf64-x86-64 ... Dynamic Section: NEEDED libc.so.6 INIT 0x0000000000401000 FINI 0x0000000000401e28 ...
indicates that
libc.so.6
is the only file this executable expects to find extra functions in.
-
“normal functions”
An example of disassembly you might see for a function _init
would look like:
0000000000401000 <_init> (File Offset: 0x1000):
401000:▶ f3 0f 1e fa ▶ endbr64
401004:▶ 48 83 ec 08 ▶ sub $0x8,%rsp
401008:▶ 48 8b 05 d9 2f 00 00 ▶ mov 0x2fd9(%rip),%rax # 403fe8 <__gmon_start__> (File Offset: 0x3fe8)
40100f:▶ 48 85 c0 ▶ test %rax,%rax
401012:▶ 74 02 ▶ je 401016 <_init+0x16>
401014:▶ ff d0 ▶ callq *%rax
401016:▶ 48 83 c4 08 ▶ add $0x8,%rsp
40101a:▶ c3 ▶ retq
In this example:
-
The first line indicates that there is a label called
_init
which has the address0x401000
when the executable is loaded.(If the executable could be loaded at multiple addresses (called a “position independent executable”), then this address will make sense to other addresses in the objdump output, but most likely the operating system will choose another address when the executable is run.)
The “File Offset” is the number of bytes into the executable at which the machine code is located. The machine code is stored contiguously, so if
0x401000
is at offset0x1000
, then0x401008
is at offset0x1008
. -
Each following line is an instruction. The value before the colon indicates the memory address in hexadecimal of the first byte of the instruction. The hexadecimal values after the colon are the bytes of the instruction in hexadecimal. Following this is the disassembled instruction itself. (Some long instructions may require multiple lines for their hexadecimal values.)
Within the disassembled instructions,
objdump
attempts to provide information about addresses in addition to showing the addresses encoded in the instruction. In cases where the label is exactly equal to the address, like for the label__gmon_start__
in the example above, the format isaddress <LABEL>
with the address in hexadecimal. In cases where the address does not correspond to a label, the format isaddress <LABEL+offset>
. For example401016 <_init+0x16>
indicates the address0x401016
, which is0x16
bytes after the label_init
.On 64-bit x86, some instructions specify an address relative to
%rip
.%rip
represents the “instruction pointer”, which in 2150 and 3330 we have called the “program counter”. It is the address of the current instruction, so0x2fd9(%rip)
means memory0x2fd9
bytes after the address of the current instruction.objdump
’s disassembly includes a comment indicating what address is computed. In the case of the example above, the address is0x403fe8
, which is the address of__gmon_start__
. -
No dynamic linking is involved. But if it is, like if you see symbols whose name contains
@plt
, see the section on dynamic linking above.