Assignment: OVER
Changelog:
- 13 March 2017: add reminder that we tell you the overflow is in
gets
in hints. - 24 March 2017: Correct typo of
dumbedore
fordumbledore
.
In this assignment, you learn and demonstrate how buffer overflow vulnerabilities can be exploited.
Assignment Resources
-
The articles “Detection and Prevention of Stack Buffer Overflow Attacks” (VPN may be required off-campus) and “Smashing the Stack for Fun and Profit”
-
The binary to attack: dumbledore.exe.
-
Please ask any clarifying questions on Piazza.
Assignment details
-
Your submission on the assignment must work in a 64-bit Ubuntu 16.04 LTS environment like you installed on your VM. This is the environment where we will test your submitted code.
-
Ubuntu (and most current Linux environments) have ASLR (Address Space Layout Randomization) enabled as a mitigation against buffer overflow and similar exploits. For this assignment, you will work with this feature disabled. To turn it off run the command:
setarch x86_64 -v -RL bash
This will run a shell (i.e. command prompt) with ASLR disabled. It does not affect any other shells. We also have provided you a binary without other mitigations that would usually be used these days including:
- Non-executable stacks; and
- Stack canaries Later in the semester we will talk about these mitigations and how they can be defeated.
-
Examine the supplied file. It contains an obvious buffer overrun vulnerability in the
GetGradeFromInput
function, which calls the C standard library functiongets
.gets
, as its manpage documents, does not check the length of the buffer supplied as an argument as is unsafe. -
Create a file name data.txt containing your name and run
dumbledore.exe
./dumbledore.exe <data.txt Thank you, Charles Reiss. I recommend that you get a grade of F on this assignment.
-
Your goal is to produce an input file so that the output of the program execution is as follows:
./dumbledore.exe <data.txt Thank you, Charles Reiss. I recommend that you get a grade of A on this assignment.
- To do this, you will use the stack smashing technique we discussed in class. There are several strategies to write the machine code run by this attack, which both have extensive hints below:
- The first is to call a convenient
PrintGradeAndExit
function in the supplied executable. To do this, you should be careful to set the stack pointer is less than the address of your machine code, so this function does not corrupt your machine code/data when it executes. This is probablythe easiest solution. - The second is to write code that directly prints out the string, without calling any application functions. We give examples of how to make direct calls to the operating system to print strings and to exit below. A challenge with this approach is that you cannot include the newline character directly in the middle of your attack string.
- The third is to use shellcode that actually executes a shell, and then send commands to print out the appropriate strings in from that shell (for example using “echo”). A challenge with this approach is that the supplied application does buffered I/O — if input is available, it will read in (from the OS) more than just the one line you input, assuming it will be needed later by the application anyways. This means that when the newly executed shell tries to read its input, some bytes after the buffer overflowing line may have already been consumed.
- The first is to call a convenient
-
Note that the location of the stack pointer can vary slightly when your environment changes. See the section “Variations in the location of the stack pointer” under hints below. Because of this, you should plan on using a NOP sled so you don’t have to precisely predict the address of the stack pointer.
-
Rather than submit the input file alone, we’d like you to submit a C program
attack-gradeA.c
, that will generate the input. This C file can include comments that explain how the exploit works (which might any sort of partial credit/figuring out if our test environment diagrees with your environment/etc. possible). An example file which produces a normal (non-exploit input) is:#include <stdio.h> int main(void) { /* Just have the name */ printf("Thomas Jefferson\n"); return 0; }
This would be used to run the program like
$ ./attack-program.exe >data.txt $ ./dumbledore.exe <data.txt Thank you, Thomas Jefferson. I recommend that you get a grade of F on this assignment.
Tips
General
- You will find it very confusing if you are not running your commands from a shell started with
setarch x86_64 -RL bash
. In particular, the stack will have inconsistent addresses.
Disassembly and Debugging
-
A useful starting point is using
objdump
to disassemble the executable file. -
Using the debugger
gdb
can be helpful for debugging and refining your buffer overflow payload. See this list of useful GDB commands. But see the warning below about the debugger’s environment slightly changing the location of the stack pointer. -
In particular, after looking over
objdump
output, a good second step is running the program in GDB to find the address of the stack pointer at a relevant time. -
Since we tell you the buffer overflow occurs in
gets
, it is helpful to find the call togets
and examine the state of the program at that time in the debugger. -
Drawing a picture of the state of the stack is helpful.
Variations in the location of the stack pointer
-
The stack can start at slightly different locations depending on how the program is run. One cause of this is that Linux stores program arguments and “environment variables” on the stack, so the location on the stack pointer on entry to
main
depends how much space these take up.Environment variables include things like information about the terminal the program is being run in. You can see a list of environment variables by running
printenv
. Note that the shell commonly sets environment variables depending on what program is being run like_=/usr/bin/printenv
orOLDPWD
-
For example, the program
int main(void) { int x; printf("%p\n", &x); }
has different output on my system depending on the environment variables:
$ setarch x86_64 -RL bash $ ./stackloc # run normally 0x7ffffffffe034 $ env - ./stackloc # run with no enviornment variables 0x7ffffffffed84 $ gdb ./stackloc ... (gdb) run 0x7ffffffffe004
-
A particular case where this is a problem is running the program in the debugger versus not. The debugger may set a few environment variables itself, and when you run the program in the debugger, it may set
-
The best way to avoid problems with the stack starting in different locations is to use a “NOP sled”. Please place a large string of NOPs before your exploit code and try to “aim” the return address in the middle of this string. This will prevent you from being sensitive to small differences in the location of the stack. We’ve made the buffer that is overflowed particularly large to make a NOP sled more reliable.
-
An encoding for a 1-byte NOP instruction on x86 is
0x90
. -
You could also try to figure out how to keep the debugger from changing the enviornment (likely with some
unset env
commands), but this is less preferable, because it means your exploit is less reliable.
Shellcode production
-
You can run
objdump
on.o
files. I would recommend usingobjdump -dr file.o
, which will show disassembly and unresolved relocations, so you can tell if you accidentally generated machine code which needs the linker to complete it. (Recall that relocations are addresses the linker needs to fill in later.) -
On 64-bit x86, you can use RIP-relative addressing (that is, program counter-relative addressing) to load addresses within your machine code without worrying about the location at which your machine code is placed in memory:
code: movq value(%rip), %rax leaq value(%rip), %rbx ... value: .quad 42
will place the value
42
in%rax
and the address of the value42
in%rax
. But, unlike not using(%rip)
, the resulting machine code will not have any depenencies on the memory addresses eventually assigned tocode
andvalue
. It will only depend on how far apartcode
andvalue
are in memory.Other techniques for finding the address of your code include using a sequence like:
call next next: popq %rax
to load the current program counter into
%rax
. Thecall
instruction uses an address relative to the current program counter, so the resulting machine code does not include hard-coded addresses. -
Since
gets
reads until a newline, you need to make sure your machine code does not contain newlines. -
The
objcopy
utility can be used to extract a particular section of an object file. For exampleobjcopy -O binary --only-section=.text compiled_code.o compiled_code.raw
will take the
.text
section of the object filecompiled_code.o
and put it incompiled_code.raw
. (compiled_code.o
might be a file generated bygcc -c some_assembly_file.s
.) You might then look at the resulting file with a tool likeghex
orod
to extract the machine code in an less cluttered way than looking at theobjdump
output. -
Here is a python 3 program that outputs a binary file as a C array declaration.
Running an executable function
-
The executable contains
PrintGradeAndExit
function. To figure out what the arugments mean, figure out what the arguments of its call toprintf
are. -
A challenge with calling the
PrintGradeAndExit
function is that our machine code and data is on the stack and could be corrupted by our call toPrintGradeAndExit
if we are not careful. To avoid this, you can explicitly set the stack pointer. For example, you might useleaq label-0x100(%rip), %rsp
to set the stack pointer to point
0x100
bytes before alabel
in your shellcode.(
label-0x100
is assembly syntax for0x100
bytes beforelabel
.) -
Recall that the
pushq
thenret
allows you to jump to an location from machine code without worrying about where that machine code ends up relatively in memory.
Alternate print/exit
-
If you don’t call
PrintGradeAndExit
, you could instead print out the output you want directly, then exit. This is more realistic but a little more challenging. -
Instead of including a newline in your buffer overflow, you can, instead, include code to compute a newline (e.g., by adding or subtracting from another value) or to copy one from elsewhere in the application.
-
To print something out from your machien code, you could call the
printf@plt
“stub” (hard-coding its address) or make a write() system call directly. An example assembly snippet to make a write system call is:mov $1, %eax /* system call number 1 = write */ mov $1, %edi /* arg 1: file descriptor number 1 = "standard output" */ lea string, %rsi /* arg 2: pointer to string */ mov $length_of_string, %rdx /* arg 3: length of string */ syscall
-
If you decide that your attack code should exit directly, you can do this by caling the
exit@plt
“stub” or by making anexit_group
system call directly. An example assembly snippet to make anexit_group
system call is:mov $231, %eax /* system call number 231 = exit_group */ xor %rdi, %rdi /* arg 1: exit code = 0 */ syscall
Executing a shell
-
You can find an example of shellcode that runs runs the
execve
system call to execute/bin/sh
in this archive of shellcode. Note that some of the shellcode you find may make assumptions about the initial contents of registers or location of the stack pointer. If you use prebuilt shellcode like this, you must clearly cite its source. -
On Linux,
execve
replaces the current program with the executed program. The new program inherits the same input and output as the prior program. -
Standard I/O functions read ahead in their input. For example,
gets
may read part of the next line, saving it in a buffer for future calls togets
or other<stdio.h>
functions. These buffers are not passed to the new program byexecve
. To compensate for this, you may need to include padding in your input. -
You can print out a string from the shell using the
echo
command. -
By default, the shell won’t print out a command-prompt when its input is not a terminal.
Submission
- Submit a C file called
attack-gradeA.c
, which will produce to stdout adata.txt
that will cause the supplied program to output your name and a recommendation for a grade of A. Make sure you C file includes comments that describe how it works and any special resources you used.
Credit
This assignment was adopted from Jack Davidson’s Fall 2016 assignment, which was adopted from one given previously by Andrew Appel in Princeton’s COS 217.