Assignment: OBFUSCATE
Contents
Changelog:
- 5 March 2021: adjust grammar errors in describing readme.txt and explicitly mention that it should be submitted. Also state that description should be brief.
- 11 March 2021: add notes to hint that adding print statements may obscure whether you are successful.
Your Task
-
For this assignment, we have three different versions of a password-protected tic-tac-toe-like program. The three versions vary in how the code is obscured to prevent analysis. For each of the versions, your task is to produce a modified version that disables the password check, either by removing it or by making it accept the password “password”.
-
The first version is ttt1.exe, submit a modified executable called
ttt1-modified.exe
. -
For the second and third versions, to make this assignment more tractable, we provide the executable as C source code produced using Tigress. They are available as ttt2.c and ttt3.c. For these submit modified versions of the C source code called ttt2-modified.c and ttt3-modified.c.
It is okay if your modified versions include extra debug print statements.
-
Submit each of your modified versions to the submission site. Submit C code for the versions provided as C code and binaries for the versions provided as binaries.
-
In a text file,
readme.txt
briefly describe how you produced each of the modified versions. Submit this file.
How the versions are produced
-
The first version is a compiled and stripped (debugging information/function names removed) version, with no attempt at obfuscation.
-
The second version is run through the following Tigress transformations:
- EncodeLiterals (convert constants into code that computes them).
- Merge (to combine most non-main functions)
- Flatten (to obscure control flow in the main function)
- CleanUp (rename functions and variables to hide details)
-
The third version is run through the following Tigress transformations:
- EncodeLiterals (convert constants into code that computes them)
- Merge (to combine most of the functions into two different functions)
- Virtualize (to use an emulator for a custom-instruction for one of the combined functions)
- CleanUp (rename functions and variables to hide details)
For this third version, we deliberately only applied the Virtualize transformation to a small subset of the code to make the assignment simpler.
Hints
General Hints
-
You do not need to understand a vast majority of the code in order to change the password check.
If the program is running code like
result = check_password(...); if (result) { fail; } else { play game; }
and you can make it accept the password “password” by editing the compare and jump instructions that make up the if statement (for example, to ignore the value of
result
), by changing the result value just before if statement or changing what function is called, or by moving theplay game
code elsewhere or probably in some other ways. -
We’ve avoided any transformations that would add unnecessary conditional jumps/if statements. In real obfuscated, it’s common to add extra conditionals that are always true or always false in order to complicate analysis. (If we had done this, you’d most likely need to spend extra time eliminating these added conditionals, probably by paying careful attention to where the entered password is used.)
-
You can identify interesting places in the executables by looking for calls to standard library functions like
fgets
, andprintf
. -
For the programs provided as C, I find compiling with options like:
-Wno-int-conversion -Wno-format-security -Wno-builtin-declaration-mismatch
to suppress warnings generated by the obfuscated code helpful.
-
One way to identify relevant code is to run the program under a debugger like GDB.
For example, you could run the program until it prompts for a password, then interrupt the program (such as with control-C) and use the
backtrace
to learn information about where the password prompt is. A good guess is that the area of interest for you is shortly afterwards. -
For programs provided in C:
-
it may be helpful to rename variables/functions (e.g. via find/replace) as you discover their purpose.
-
you can and probably should add debug prints to help figure out what’s going on (if the programs were provided in machine code, this would be equivalent to scripting a debugger)
When adding print statements make sure you can still figure out what the program is outputting, otherwise you may miss whether or not you have been successful.
-
The C code generated by Tigress often uses single-element arrays instead of normal variables, so you might see code like:
char l___4[1];
and then uses of
l___4[0]
afterwards.
-
First Program
-
I recommend using a debugger to identify where in the code the input is read from. You will find a call to
fgets
. -
You can look at the strings in the executable (
objdump -sd
will show you disassembly and constant data, and you can see where constant data addresses are used) to find relevant parts of the code.Notably, you can look for strings you see printed out, and use them to identify where the relevant parts of the code are. (This is telling reason why “encrypting” strings is a common practice.)
Most interesting strings are usually in the
.rodata
section. Once you figure out the address of a string, you can search for references for it in the disassembly.(Note that the real password does not appear as a constant string.)
-
We’ve stripped out information from the executable about where functions are, but you can idenitfy where they are by looking at call and return instructions.
-
Some helpful GDB commands:
-
b *0x122345
— set a breakpoint at a particular instruction -
info registers
— output all registers -
x/i $pc
— disassemble the current instruction -
x/10i 0x123456
— disassemble 10 instructions at address 0x123456 -
x/10bx 0x123456
— print out the values of 10 memory bytes starting at address 0x123456
-
-
You can overwrite code you don’t want to execute with
nop
instructions (opcode90
hexdecimal). -
You can a program like
ghex
to edit the executable, or a technique similar to what you did for TRICKY.
Second Program
-
In the second program, Tigress’s Flatten transformation converts the function containing the if statement to a loop like:
int place = 0; while (true) { switch (place) { case 0: ...; place = ...; case 1: ...; place = ...; ... } }
This structure should be apparent in the C code. You can look for the assignment(s) to
place
that correspond to the originalif
statement and change them. -
You may notice the flatten code in the second version makes a conditional assignment to its
place
variable without using anif
statement -
Since the game itself has this flattening transformation, if you add debug prints to it, your debugging output may be interleaved with the game output.
Third Program
-
The third variant uses a technique where it makes an custom instruction set and converts much of the program to instructions in that set. There’s an interpreter for that instruction set which has a high-level structure something like this:
char *code = ....; ... char *pc = &code[0]; char *stack_pointer = /* some array */; char *locals = /* some array */; while (true) { switch (*pc) { case OPCODE_1: /* use stack_pointer and pc and the locals array*/ pc += ...; break; case OPCODE_2: ... } }
Probably in a more realistic obfuscation, this strategy would be used for most of the program, but to simplify this assignment, we have only used it for one key function.
-
Only one opcode in the virtualized code corresponds to something like an if-statement, and the password check chooses between the code that runs if the password is correct or not correct using this opcode.
-
It will be helpful to identify calls to library functions like
puts
,printf
, andfgets
. (To manage the difficulty of the assignment, we have not taken special efforts to obscure these, though that would be a way to improve the obfuscation.) -
We’ve deliberately avoided having a large portion of the logic for checking the password and implementing the game the password protects transformed using the virtualization technique in order to simplify the assignment. That code is in other functions (which have been merged together to obscure them).
I solved this assignment by modifying the function that implemented the interpreter we describe in these hints, but another solution might be to modify or examine the other functions instead.
-
Once you’ve identified this loop, it may be helpful to:
-
look at each opcode and see if you can describe what it does and come up with a name
-
add some print statements and use that to identify what opcodes are running and where in the code it is (the current “pc” value). You might find it helpful to use something like
printf("offset = %d", pc - &code[0]);
(where you’ll have to identify what values actually correspond to
pc
andcode
) so you can correlate what’s happening to the code array. -
identify “opcodes” would result in calling library functions like fgets and printf
-
identify “opcodes” that could be used to implement
if
statements: figure out which variable represents the PC and see which opcodes might change it other than by incrementing it a fixed amount
(We have the luxury of doing these transformations on the C code. If we had a machine code executable instead, we could probably achieve a similar effect by using scripting a debugger or emulator and/or by converting the machine code into C with inline assembly code we could edit more easily.)
-
-
One technique is to change the result of the branch that is triggered when the password is wrong. There are several ways to do this; a non-exhaustive list:
-
You could do this by modifying the interpreter to replace the opcode used for the condtional jump with something like:
if (flag) { pc = ....; flag = 0; } else { /* normal code */ }
where
flag
is a variable you arrange to be set elsewhere (e.g. based on the code address, or based on the how long it’s been sincefgets
was run or similar, or based on a counter) -
You could find the part of the custom instruction set code that is used for the conditional jump and change it to use no-op or unconditional jump operations if those are available.
-
You could change the initial PC value to the one the jump is supposed to target, which you’d have recorded earlier.
-