Changelog:
- 14 September 2022: in description of relocations use “place in the machine code” instead of “relocation” for clarity; similar updates for symbols, avoid using “to fill in” in the list of symbols description.
- 16 September 2022: clarify that parsing tool is just about checking answer formats
These is documentation for the object file and executable file format to be used in the linking tasks in the linking+ISA tradeoffs lab and homework.
Parsing tool
To aid in avoiding typos that create syntax issues, etc., we have supplied a web-based parsing tool for the object file and executable formats described below. This tool will check that your answers are formatted correctly. (It is not for reformatting them.)
Object file format
Each file has 3 parts, separated by “***”:
-
Machine code/data. Each file’s machine code is written in hexadecimal as a sequence of bytes, each byte written with two hexadecimal “nibbles” (even if its value is less than 16). The bytes are space-seperated in our examples, but this is not required. Each line of hexadecimal is prefixed with starting offset in hexadecimal followed by a “:”. Relocations to be filled in are represented with 0 bytes. Omitted bytes are assumed to be zeroes.
Optionally, you may place coments on lines of machine code and data, by placing a
|
followed by the comment. (This, incidentally, makes it easier to paste machine code generated byyas
.) -
A list of relocations. Each relocation is comma separated line containing:
- the offset (as a hexadecimal number) of the place in the machine code to be filled in (this should be the location of the first byte of the address placeholder to fill in, which usually will not be the first byte of the instruction containing that address), and
- the symbol to fill in at that location
-
A list of symbols Each symbol is comma separated line containing:
- the offset (as a hexadecimal number) of the machine code/data for the symbol (that is, where the corresponding label was declared), and
- the name of symbol
Example:
0x0: 11 22 33 44
0x4: 55 66 77 88
0x8: 99 AA BB CC
0xC: 00 00 00 00
0x10: 00 00 00 00
***
0xC,foo
***
0x0,start
specifies an object file containing 20 bytes of machine code and/or data, with a ‘start’ symbol at offset 0x0 and a relocation (whose corresponding machine code/data bytes happen to all be 0s before the replacement, though this is not required) to be replaced with the address of ‘foo’ at offset 0xC.
This would be the object file one would expect from assembling
start:
.quad 0x8877665544332211
.byte 0x99
.byte 0xAA
.byte 0xBB
.byte 0xCC
.quad foo
where
.quad
is an assembler directive that inserts a 64-bit little-endian number at the specified location; as a special case,.quad foo
inserts the 64-bit little-endian number that is that corresponds to the labelfoo
(which must be in another file).byte
is an assembler directive that inserts a particular byte.
The object file for the above example could also be written with less whitespace and different numbers of bytes per line, like:
0x00: 11223344556677
0x07: 8899AABBCC00000000
0x10: 00000000
***
0x0C,foo
***
0x00,start
Or with a comment to aid reading as in:
0x00: 112233445566778899AABBCC
0x0c: 0000000000000000 | placeholder for foo
***
0x0C,foo
***
0x00,start
Executable file format:
Executable files (for the homework only) consist of lines of hexadecimal byte values prefixed with with the address to which they should be loaded in hexadecimal followed by a “:”. For example:
0x0: 11 22 33 44
0x4: 55 66 77 88
0x8: 99 AA BB CC
0xC: EE FF 00 11
specifies an executable that would load 16-bytes into memory starting at address 0x0.
Like for object files, it is also okay for this to be written with less whitespace and different numbers of bytes per line, like:
0x00: 1122334455
0x05: 66778899AABBCCEE
0x0D: FF0011
The whitespace between the hexadecimal for each byte can be omitted, and more than four bytes can be placed per line.
This is essentially the same as the format that tools/yas
produces.
Like in the code we use for HCLRS, executables should arrange for the address of main
to
be at 0x0
, and then include the data machine/code from the other object files used to construct
them in any order.