Assignment: TRICKY
Contents
Changelog:
- 18 Feb 2021: adjust phrasing about about how infect programs should work on modified targets; clarify what we want on answers.txt; change suggestion of .s file to .S file (so // comments work)
- 18 Feb 2021: correct
e8
(call opcode) toe9
(jmp opcode in hints) - 18 Feb 2021: adjust question phrasing
- 19 Feb 2021: adjust requirement for output to be output only when run in a terminal (to avoid buffering issues); add note about output order not to a terminal being different in hints with suggested virus assembly code (after originally placing this note in the “Your Task” section)
- 20 Feb 2021: indicate that we will use
g++
instead ofgcc
for.cc
files - 22 Feb 2021: make it clearer that we want a modified copy of target1.exe, etc.
- 24 Feb 2021: adjust phrasing about needing to determine location in the file to insert machine code/jump to make it harder to read it as implying that the inserted machine code needs to use offsets within the file
- 26 Feb 2021: specify use of
-std=gnu++17
for C++
Your Task
-
Create a C or Python program called
infect1.c
orinfect1.cc
orinfect1.py
that will produce a modified copy of target1.exe so that when run outputting to a terminal instead of producing:Initialize application. Begin application execution. Terminate application.
it will produce the output:
Initialize application. You have been infected with a virus! Begin application execution. Terminate application.
and still run the same application code as the original version of target1.exe. (That is, your modified version should replace or add to the existing code in
target1.exe
, not replacetarget1.exe
with a new program. If we supplied an alternate version of target1.exe with the same layout but slightly different application code, your infect1 program should still work on that modified version without changing the different application code.)Given a copy of
infect1.c
(orinfect1.cc
) andtarget1.exe
, we should be able to do:gcc -Os -o infect1 infect1.c ./infect1 target1.exe target1-infected.exe chmod +x target1-infected.exe ./target1-infected.exe
(or the same with
g++
for.cc
files, adding an option like-std=gnu++17
if needed) and see the output above. Similarly, giveninfect1.py
, we should be able to do:python3 infect1.py target1.exe target1-infected.exe chmod +x target1-infected.exe ./target1-infected.exe
We have given what we hope are very extensive hints below on one way to accomplish this (but you are welcome to find alternate strategies).
-
Do the same for target2.exe in a program called
infect2.c
orinfect2.cc
orinfect2.py
. (If you end up writing one program that works for both, you may submit the sameinfect
program twice.) -
In a file called answers.txt, briefly answer the following questions. (We do not expect more than a sentence or two for the first two questions.)
-
How did you identify the file offsets in the target executables to overwrite (for the jump and and for the virus code)?
-
How did you produce the machine code to insert for the jump to the virus code?
-
If your infect programs have a hard-coded offsets or something similar, how would you automate finding the locations in target executables to overwrite so that it would work on other target programs?
-
-
Upload your three files to the submission site.
Hints
General advice
-
You can divide this task into two parts:
- writing and insert code for the “virus” (that prints out “You have been infected with a virus!”)
- add a jump to the virus code you inserted
-
To simplify the assignment, you can hardcode the input and output file names in your infect program. That is, infect1.c opens and reads target1.exe and opens and writes target1-infected.exe.
-
You should use the utility objdump to examine the executable target.exe. The option
--disassemble
is useful. In particular, you need to determine the starting address of the virus code. The dissasembly will also help you determine the opcodes of the instructions that you need to insert (i.e., a push instruction and a ret instruction). You may wish to consult the objdump manual. -
A very useful program to examine the file is a hex editor such as
ghex
orbless
. If it is not already installed, you can installghex
usingsudo apt-get install ghex
and similar forbless
.
Encoding a jump operation
-
One way to encode a jump to
AddressOfVirusFunction
is using:pushq $AddressOfVirusFunction ret
This results in 6 bytes of machine code on x86-64. Compared to using a normal jump instruction, this has the advantage of not being dependent on where the machine code is placed.
-
Alternately one could also encode a jump using a conventional jump instruction:
jmp AddressOfVirusFunction
but the machine code for this will encode the address relatively (as an offset from the address of the jump instruction), so you will need to change the machine code based on where it is inserted.
-
A
push
of a 32-bit constant (on 32- or 64-bit x86) can be encoded as an0x68
byte followed by the (little-endian) constant. Aret
is encded asc3
. A jump can be encoded as an0xe9
byte followed by a 32-bit offset from the address of the following instruction.
Placing the jump/returning to the application
-
Identify where the constant stings “Initialize appliation.” and “Begin application execution.” are referenced to locate relevant parts of the application code.
-
A problem with placing either version of the jump is that:
- we need to do whatever the code that was replaced with the jump so the application still functions normally; and
- we need to make sure the virus code returns back to the application afterwards.
A convenient solution for this is to replace a function return with a jump.
This allows us to solve issue 2 by having the virus code end with a
ret
instruction that will substitute for the original code’sret
. -
It seems, however, like it would be a problem that a
ret
instruction in x86-64 is one byte but either version of our tricky jump is at least 5 bytes. However, often there is padding after a function return like:400661: c3 retq 400662: 66 66 66 66 66 2e 0f data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1) 400669: 1f 84 00 00 00 00 00
The instruction at address
0x400662
is an unreachable no-op instruction, which is inserted only because the compiler and/or linker wanted to ensure that the next function started at an address that was a multiple of something. This is a “cavity” that gives a virus writer some room to work. -
To insert both the virus code and the jump itself, you must determine the location in the executable to insert them, which is not the same as address at which they will be loaded in memory. The location that objdump most prominently shows is the expected memory address.
- One option is to look at the program headers object dump can output to figure out what offsets in the executable are loaded into what bytes of memory.
- Another option is to pass options to objdump to get it to display the offset of code within the executable file.
- Another option is to get a hexadecimal dump of the raw file and look for
bytes shown in
objdump
output in the actual executable file to find their location. - Yet another option would be for your infect.c to search for particular bytes in the executable file itself.
Writing virus code
-
The “virus” code we want you to insert could be written as follows (assuming the strategy of replacing a
ret
instruction with the jump to this code; you will need to modify this if you use a different strategy):virus: movl $1, %eax // 1 = SYS_write movl $1, %edi // system call first argument = stdout leal string(%rip), %esi // system call second argument = string movl $37, %edx // system call third argument = length of string syscall retq string: .asciz "You have been infected with a virus!\n"
This code assembly is carefully written to avoid dependencies on its location in the executable.
(This code makes a Linux system call to write a string to stdout. We could also have made a function call to the
puts
function by calling theputs@plt
stub, but the location ofputs@plt
varies between the two executables.)(When using the above assembly, if you redirect the infected executable’s output to a file instead of sending it to the terminal normally, you may see the output in an unexpected order: This assembly code outputs via a write system call directly and the application code uses buffered stdio.h functions. When stdout isn’t a terminal, stdio.h’s buffering will not write lines of output immediately. For this assignment, we only require you to support stdout being a terminal, so you should not need to worry about this issue.)
-
You can put this assembly in a .S file and then use something like
gcc -c file.S
to get an
.o
file. You can then examine this.o
file withobjdump --file-offset -d file.o
or similar.
Find space for the virus code
-
Look for a large area of nops in the disassembly to determine where to insert the virus code. Record the address of this location in memory to generate the “tricky jump” code you will insert elsewhere in the executable.
-
In
target1.exe
, there is a large segment of padding (nops) between functions that is a suitable place to put the virus code. -
In
target2.exe
, there is not any obvious space to put a non-trivial amount of virus code based on the disassembly. But you can take advantage of how programs are loaded in “page” sized units. On our x86-64 Linux systems, pages are 4096 bytes (=0x1000
bytes), and typically whole pages of an executable are loaded into memory.This is refleced in the program headers of
target2.exe
:LOAD off 0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**12 filesz 0x0000000000000520 memsz 0x0000000000000520 flags r-- LOAD off 0x0000000000001000 vaddr 0x0000000000401000 paddr 0x0000000000401000 align 2**12 filesz 0x00000000000005d5 memsz 0x00000000000005d5 flags r-x LOAD off 0x0000000000002000 vaddr 0x0000000000402000 paddr 0x0000000000402000 align 2**12 filesz 0x0000000000000218 memsz 0x0000000000000218 flags r-- LOAD off 0x0000000000002e10 vaddr 0x0000000000403e10 paddr 0x0000000000403e10 align 2**12 filesz 0x0000000000000240 memsz 0x0000000000789e78 flags rw-
The load instructions use offests 0x0, 0x1000, 0x2000 and target addresses 0x400000, 0x401000, 0x402000 because these align with the beginning of 4096 byte pages in memory and the executable file. Because this is done there are some additional cavaties in the executable file: for example, the second load instruction only loads bytes 0x1000 through 0x15d5 of the executable and the third loads bytes 0x2000 through 0x2218, so apparently bytes 0x15d5 through 0x2000 are not used.
Although these bytes appear to be unused and unloaded from the program headers, they are actually loaded because of how x86-64 Linux memory management works. For example though the program headers of
target2.exe
requests to load0x5d5
bytes from ofsets0x1000
through0x2000
into address0x401000
through0x4015d5
:LOAD off 0x0000000000001000 vaddr 0x0000000000401000 paddr 0x0000000000401000 align 2**12 filesz 0x00000000000005d5 memsz 0x00000000000005d5 flags r-x
because x86-64 Linux manages memory in page-sized chunks, it implements this by loading
0x1000
bytes from the file into addresses0x401000
through0x402000
. It doesn’t know how to load less than0x1000
bytes.This means that — even if the “LOAD” program header is not modified — data placed at offset 0x15d6 will be loaded into memory the same as data placed at offset 0x1500 would be because it’s part of the same page.
File I/O
-
We are reading and writing binary files—not textfiles. You may need to open files in binary mode, next text mode.
-
To read from and write to a binary file in C, you can use
fopen
,fread
, andfwrite
. You can runman fopen
,man fread
, etc. to get documentation for how these functions are called, or search online. An example usage of a program that copies “input.dat” to “output.dat” is the following:#include <stdio.h> #include <stdlib.h> int main(void) { FILE *in; FILE *out; char *buffer; int size; in = fopen("input.dat", "rb"); /* get size of input.dat, by moving to the end of the file */ fseek(in, 0, SEEK_END); size = ftell(in); /* then, return to the beginning of the file */ fseek(in, 0, SEEK_SET); buffer = malloc(size); fread(buffer, 1, size, in); fclose(in); out = fopen("output.dat", "wb"); fwrite(buffer, 1, size, out); fclose(in); }