Assignment: LEX
Last edited: 13 Feb 2017
Changelog:
- 23 Feb 2017: change “disassembled code” to “the program”
- 20 Feb 2017: correct flex command from
flex test.l -o test.c
toflex -o test.c test.l
- 20 Feb 2017: correct flex command from
flex virus-patterns.l -o virus-patterns.c
toflex -o virus-patterns.c virus-patterns.l
- 13 Feb 2017: add several flex examples under resources; adjust order of instructions to put example false positives where they are actually needed
- 10 Feb 2017: remove requirement to submit
virus-patterns2.l
This assignment will explore constructing a pattern-based scanner for a particular pattern we inserted in:
pushq $AddressOfVirusFunction
retq
We are not concerned with the function address. We want to take advantage of it being unusual for a legitimate address to use this sequence of code to perform a jump instead of a more typical:
jmp AddressOfVirusFunction
We will use this “signature” to detect the presence of virus code instead of likely executable code. Other examples of plausible signatures would include looking for “payload” code, patterns of code used to scan infections, etc.
Resources
-
The flex manual.
-
Example flex files:
- count_foos.l: counts bytes, newlines, foos in its input
- a_words.l: output words starting with
a
- states.l: example using non-exclusive lex states
- states2.l: example using exclusive lex states
You can build any of these flex files into an executable using:
flex -o file.c file.l gcc -Wall -g -O -o file.exe file.c -lfl
They all read input from stdin and produce output to stdout. If you type input to them manually, you can send end-of-file by pressing control-D. You can also try them with input from a file using
./file.exe <input.txt
.
Task
In this assignment, we will use the regular expression tool flex
to write a pattern to detect this
“push” followed by “ret” pattern.
Create the scanner
-
Write a flex pattern file to scan the program and detect the virus code pattern described above. You should detect any example of the pattern regardless of the function address pushed by the first instruction. Keep track of the byte number, starting at byte number 0. After a warning message print the bytes that matched the pattern, in hexadecimal with two digits per byte, separated by “.”s, followed by a blank line. Do not write anything else to the output stream. A sample run and output would be:
$ ./scanner <input.exe WARNING! Tricky jump: byte number: 2057 68.f0.14.41.00.c3.
An example
printf
statement that produces output in the correct format is:printf("WARNING! Tricky jump: byte number: %d\n%02x.%02x.%02x.%02x.%02x.%02x.\n", byte_number, match_bytes[0], match_bytes[1], match_bytes[2], match_bytes[3], match_bytes[4], match_bytes[5]);
where
match_bytes
is an array of unsigned chars matching the value.Note: the above output is for illustrative purposes. It is not the byte number from any actual input.
-
Call your flex file
virus-patterns.l
. Compile and link the flex code into an executable. Here are some commands that can do this:flex -o virus-patterns.c virus-patterns.l gcc -Wall -g -O -o scanner virus-patterns.c -lfl
The
-lfl
option togcc
tells the linker to link the flex runtime library into the executable. -
Here are some positive test cases: 1, 2, 3. Make sure your file works on these test cases.
False Positives and Refinement
-
Run your solution on this copy of
/bin/ping
from a Ubuntu 16.04LTS system. You should notice a false positive. Examine thisping
binary to determine the cause of the false positive. Describe this in a text fileanswers.txt
. -
Propose a strategy that will fix this false positive. Describe what changes you would make in
answers.txt
. Be specific enough that someone could consistently implement this from your description.
Submission
-
Upload your
virus-patterns.l
file. -
Upload your answers to the questions under “False Positives and Refinement” in text file called
answers.txt
.