Assignment: OOO
- Infrastructure
- VM issues
- Deliverables
- gem5 Explanation/Tutorial
- Supplied Benchmark Programs
- Tasks
- Part A: General attributes of benchmark programs (required for checkpoint)
- Part B: Use the pipeline viewer (required for checkpoint)
- Part C: Effective cache miss penalty versus miss latency (required for checkpoint)
- Part D: Branch Prediction Benefits
- Part E: Achievable bandwidth
- Part F: Miscellaneous Resource
- Note on errors
- Building benchmarks and gem5
- Linux quick start
In this homework, you will use the gem5 simulator to explore how the performance and behavior of an out-of-order processor changes as various implementation factors are adjusted.
Infrastructure
- prebuilt gem5
- patch to original gem5 source (if you’re building it yourself)
- prebuilt benchmarks
- sample inputs to the benchmarks are included in the archive (as of 6 October; apologies for the error) and available seperately
- VirtualBox compatible 64-bit Ubuntu 16.04 image with no software preinstalled. You should be able to use almost any 64-bit x86 Linux installation; this is just provided as an example. (If you want a more minimal, faster GUI-less VM, a vagrant box (mentioned below) will do it.)
We recommend a 64-bit Linux machine or virtual machine to
do this assignment. This is such a virtual machine image
suitable for importing into VirtualBox. (It does not
have any software installed on it beyond the Ubuntu defaults.) That VM image
has a user created called vmuser
whose password is password
. If you use the supplied
VM and have the RAM, I would suggest increasing the amount of memory allocated to it.
Alternately, you might consider using vagrant
and its ubuntu64/xenial `box’ (virtual machine template).
If you’d like to try to install this natively on a non-64-bit Linux virtual machine, we provide installation instructions below which should work on many Unix-like systems. If you’d like to run it without a virtual machine on Windows, this should also be possible. In the event of technical difficulties, however, our recommendation is going to be to use a Linux virtual machine.
After obtaining a sutiable environment, download a prebuilt copy of gem5 (which we have patched with this patch) and of our benchmarks. For this assignment, you should have some familiarity with the Linux command line. If you do not, see the brief guide below.
VM issues
To run a 64-bit virtual machine in Virtualbox, you may need to enable VT-x, which is Intel’s name for extra hardware support for virtual machines. This is sometimes also called names like “Intel Virtualization Technology”. For reasons I do not understand, often laptops or desktops ship with this feature disabled by default. If it is disabled, you can typically reenable it in BIOS or `the Setup Utility’. For example, on my laptop (a Thinkpad T460), I did this by pressing Enter while the machine was booting, selecting to go into the “Setup tool” from the resulting prompt. Then, from the resulting menu, selecting “Config”, then “Security”, then “Virtualization”. Then, I had an option to enable or disable “Intel Virtualization Technology”. Instructions for how to do this will vary between machines, so I cannot give a universal guide here. But other common ways of entering Setup include pressing F12 or Del while booting, and you can probably find instructions online given the model of your laptop or desktop. If you have trouble figuring out how to do this on your system, please ask the course staff for assistance.
Fighting against Linux or virtual machine software is not an objective of this homework. If you have trouble with such issues, please don’t hesitate to ask us.
Deliverables
Submit a zip or tar archive containing:
- an HTML, PDF, or text document containing answers to questions in and explanations required by the Tasks listed below;
- supporting simulation results (raw data files)
For the checkpoint, you need only supply the tasks labelled “(required for checkpoint)” below. For the final submission, you must do all tasks.
gem5 Explanation/Tutorial
An Example Simulation
After installing gem5, download our benchmarks directory. This includes several benchmarks and their source code, as well as a script to run a gem5 simulation called `gem5script.py’. The gem5 executable expects to be passed a Python script like this, which is responsible for configuring the simulation environment. The script we have supplied is suitable for you to modify this assignment.
To start out, first build the benchmark programs using the supplied Makefile
by running the make
command from the command-line. Try running the blocked-matmul
prorgam out of the simulation using
./blocked-matmul
This program performs a 84x84 register-blocked matrix multiply and times it four times. You can read its source code in blocked-matmul.c. Now let us run this program under the processor simulator using the supplied script.
First, modify the gem5script.py
to point to where you downloaded GEM5, at the top of
script is a line like
GEM_DIR = '../gem5'
Either place your downloaded copy of gem5 in the same directory that contains the benchmarks
directory or modify this to set GEM_DIR
to the location of your copy of gem5 (that you downloaded
or built). Then find the location of the gem5.opt binary for your copy of gem5, which
is in build/X86/gem5.opt. I suggest creating a symbolic link to this binary using
a command like:
ln -s PATH/TO/GEM5_DIR/build/X86/gem5.opt ./
This command creates an for PATH/TO/GEM5_DIR/build/X86/gem5.opt
in the current directory.
Further instructions in this homework assume this alias exists by running gem5 with
./gem5.opt
.
Note that you will need to refer to the gem5 source code repeatedly throughout this assignment to find out about what options are supported. The source code is included with our prebuilt version of gem5 for this reason.
Then run gem5script.py with gem5.opt, pointing it to the blocked-matmul program:
./gem5.opt gem5script.py \
--cmd=./blocked-matmul \
--directory=blocked-matmul-output
Some notes on this simulation:
-
It runs much slower than the original blocked-matmul program; around ten thousand times slower. This is the primary reason that our benchmark programs are `toy’ programs in this assignment. For “real” architecture research, you would use larger benchmarks, but this would involve waiting much longer for simulation results.
-
The program run is an ordinary user-space program, but the simulated processor does not have an OS. This is gem5’s ‘system call emulation’ (SE) mode. This acts like a simulated processor except that there is no virtual memory, and the system call instructions magically do what the operating system would do rather than triggering an exception and calling the operating system. gem5 also supports a ‘full system’ (FS) mode, which can boot some real operating systems, which we will not be using in this assignment.
-
The simulator will output some spurious warning messages like:
info: Entering event queue @ 0. Starting simulation... warn: ignoring syscall access(140737352001731, ...) warn: ignoring syscall access(140737352012752, ...) warn: ignoring syscall access(140737352001731, ...) warn: ignoring syscall mprotect(140737349496832, ...) warn: x86 cpuid family 0x0000: unimplemented function 7 warn: ignoring syscall mprotect(140737351593984, ...) warn: ignoring syscall mprotect(6295552, ...) warn: ignoring syscall mprotect(140737354121216, ...)
These are harmless.
-
The simulator outputs a message like:
Done simulation @ tick = 939334500: target called exit()
Which indicates indicated which simulation tick the program completed on. By default, each simulation tick represents 1 picosecond of simulation time, and the simulated CPU has a clock rate of 2 GHz, so this simulation represents around 93 billion ticks / (500 ticks / clock cycle) = 1.9M clock cycles of simulated time.
The more important outputs from the simulation are in the blocked-matmul-output directory specified via the –directory option. This will contain the following files:
config.ini
,config.json
: contain the full configuration of the components of the simulation.program.err
,program.out
: contain the output of the program. From program.out, you can read the amount of simulated time that blocked-matmul took.stats.txt
: contain numerous statistics from the simulation. Interesting statistics include:sim_seconds
: simulation timesystem.cpu.ipc
: instructions per cycle achieved by the simulated CPUsystem.cpu.commit.loads
,system.cpu.commit.branches
, etc.: number of loads, branches, etc. finishedsystem.cpu.iew.exec_branches
: number of branches executed (regardless of whether taken), including ones only executed because of a misprediction of a prior branchsystem.cpu.iew.branchMispredicts
: number of branch mispredictionssystem.cpu.iq.fu_full::IntALU
: number of times a reservation buffer for integer ALU operations was not available but could have been usedsystem.cpu.dache.overall_miss_rate::cpu.data
: data cache miss rate
gem5script.py
takes some parameters that affect the simulation. For example, you can
adjust the data cache size with the --l1d_size
option. Try running:
./gem5.opt gem5script.py \
--cmd=./blocked-matmul \
--directory=blocked-matmul-output-hugecache \
--l1d_size=2MB
[The \ represents a line continuation. This indicates that this is meant to all be one command, even though there are newlines in the middle of it.]
Notice that the value of system.cpu.dcache.overall_miss_rate::cpu.data in stats.txt from this simulation is much lower than the first simulation.
Note that the simulator takes on the order of a minute to simulate 5 milliseconds of simulated runtime, a slowdown of around 10000 times. For this reason, we will generally be only running very short benchmark programs in this assignment.
Supplying Command-Line Arguments
To supply command-line arguments to a program you run under gem5, you can pass
the --options
option, for example:
./gem5.opt gem5script.py \
--cmd=./queens \
--directory=queens-default-output \
--options='-c 10'
runs the command queens -c 10
in the simulator.
This is handled by the code in the create_process()
function gem5script.py.
When supplying filenames, you should generally provide the full path to the program.
The program will be run inside the output directory instead of in your current working
directory. This is why we suggest using realpath
above.
Varying Simulation Parameters
Rather than vary cache parmaters, you will be primarily be varying parameters of the simulated
out-of-order CPU. Most of these parameters don’t have convenient command-line options. Instead,
you will need to modify the create_cpu() function in gemscript.py to change the parameters as
you choose. A comment before this function shows some examples of how to modify parameters,
including some which are more complicated to set.
(In the original version of the benchmarks archive, it also has an error in the first
paragraph, claiming the function varies the number of reorder buffers by default like is described
below, but it does not as supplied.)
If you aren’t familar with Python note that that text between triple quotes ("""
) in this
file are effectively comments, including several paragraphs of text before def create_cpu
.
You should edit parameters by modifying gem5script.py
’s create_cpu
function,
around where there is a comment reading YOUR CUSTOMIZATION CODE HERE
.
The code you add can look like:
the_cpu.numROBEntries = 100
to run with 100 ROB (reorder buffer) entries instead of the default.
For a full list of parameters in the out-of-order CPU model, look at the
gem5 source file src/cpu/o3/O3CPU.py
.
You can also edit the get_options()
function to make the script support additional
command-line options which you can access via the ‘options’ parameter to the create_cpu()
function. This can allow you to avoid changing the python file every time you want to run
a different simulation.
For example, if you add code like:
parser.add_option('--vary', type=str, default='none')
you could do something like
if the_cpu.vary == 'numROB100':
the_cpu.numROBEntries = 100
elif the_cpu.vary == 'none':
pass # python for do nothing
else:
eprint("ERROR: unrecgonized --vary option")
sys.exit(1)
to let you run
./gem5.opt gem5script.py \
--cmd=./blocked-matmul \
--directory=blocked-matmul-output-rob100 \
--vary=rob100
to try 100 ROB entries outputting to blocked-matmul-output-rob100
and:
./gem5.opt gem5script.py \
--cmd=./blocked-matmul \
--directory=blocked-matmul-output-default
to try with the default configuration.
Stages of the Simulated Processor
The simulated processor executes instructions in several stages, which are important to understand to reason about the statistics reported:
- fetch: instructions are fetched from the instruction cache. By default, the processor fetches
up to eight instructions at a time. The number fetched is controlled by the
fetchWidth
parameter. This stage does branch prediction and branch target prediction to determine what to fetch. - decode: instructions from the fetch stage are preproecssed. This stage handles execution of unconditional
branches (whose target address is not in a register, etc.). In a real processor, this would be where instruction
register numbers, etc. would be identified (but the simulation does not work this way internally; it only simulates the timings).
The maximum number of instructions processed per cycle is controlled by the
decodeWidth
parameter. - rename: entries in the re-order buffer and the instruction queue (approximately a shared
reservation buffer) are allocated for each instruction.
Register operands of the instruction are renamed, updating a renaming map (blocking if
not enough free registers are available). The maximum number of instructions processed per cycle is
controlled by the
renameWidth
parameter. - dispatch/issue: instructions whose renamed operands are available are dispatched to functional units.
For loads, stores, they are dispatched to the Load/Store Queue (LSQ).
The simulated processor has a single instruction queue from which all instructions issue.
Ordinarily instructions are taken in-order from this queue
The maximum number of instructions processed per cycle is controlled by the
dispatchWidth
parameter. - execute: the functional unit actually processes their instruction. Each functional unit may have a different latency (number of cycles until it produces a result). Conditional branch mispredictions are identified here. The maximum number of instructions processed per cycle is controlled by the configuration of the types of operations, latencies, and counts of the functional units available.
- writeback: send the result of the instruction into the corresponding physical register (if any),
marking the register as available and permitting the issue of dependent instructions. Update the
reorder buffer entry for the instruction. The
maximum number of instructions processed per cycle is controlled by the
wbWidth
parameter. - commit: process the reorder buffer, freeing up reorder buffer entries. A second renaming map
is updated. The maximum number of microps processed per cycle is controlled by
the
commitWidth
parameter.
In the event of branch misprediction, trap, or other speculative execution event, “squashing” can occur at all stages of this pipeline. When a pending instruction is squashed, it is removed from the instruction queues, reorder buffers, requests to the instruction cache, etc.
The fetch, decode, rename and commit stages process instructions in program order. Other stages process instructions out-of-order based on availability of operands and results.
The simulated processor also lets one configure the latency between many of these stages — how many clock cycles it takes an instruction to pass from one phase to another in the best case.
To deal with the complicated instructions that are very common in X86, like a single instruction that performs a load and an add, the simulated processor splits many instructions into multiple ‘microoperations’. Confusingly, it is often not clear whether statistics are referring to microops (and calling them instructoiuns) or real instructions. Generally, statistics about the issue, execute, and writeback phase will always concern microops (even if their descriptions use the word “instructions”), and statistics about the commit phase will make it clear which are referred to.
Some Terminology in the Program Statistics
- IEW: issue and execute and writeback. The portion of the simulated processor that reads the queue from rename and managed the load/store queue and the functional units.
- IQ: instruction queue, where instructions are placed when they are ready to be executed on functional units or the caches.
- LSQ, LQ, SQ: Load/Store Queue, Load Queue, Store Queue: queues that hold pending memory operations. These are also responsible for tracking accessed memory addresses to ensure that out-of-order load/store instructions that refer to the same memory address produce the correct results.
- non-speculative instruction: An instruction that cannot be executed speculatively. These instructions must be executed when their reorder buffer entry is committed and no earlier. An example of such an instruction is the system call instruction.
- squashing: undoing the effect of instructions due to a branch misprediction, fault, or other kind of incorrect speculative execution. Note that this occurs at all stages of the pipeline, rather than waiting for instructions to reach the reorder buffer.
Supplied Benchmark Programs
I have selected several benchmark programs that should explore a range of demands on the simulated processor. For each program, I have a suggested way to run the program that should take not much more than a minute to simulate each time. If the program takes command-line arguments, you should pass the arguments when using them in simulations as described above.
-
blocked-matmul: a 2x2 register-blocked matrix multiplication of two 84x84 matrices. The matrices are pseudorandomly generated and all sizes are hard-coded. Source code for this benchmark is in
blocked-matmul.c
.Our suggsted command-line for this program is
./blocked-matmul
It takes no command-line arguments.
This program was selected because it should have a mix of cache accesses and floating point operations
-
BFS: computes a breadth-first search problem. This is taken from the Problem Based Benchmark Suite. Source code for this benchmark, along with utilities for generating graph data is in the
breadthFirstSearch
directory.We supply some example graphs in the inputs directory. Our suggested command-line for this program is
./BFS path/to/RL3k.graph
where path/to/rand-weighted-micro.graph is the full path to the rand-weighted-micro.graph supplied in the inputs directory of the benchmarks archive and via this link. You may get this with a command like:
realpath inputs/RL3k.graph
This program was selected because it should have poor data cache locality.
-
sha: computes the SHA-1 cryptographic hash of its input. Source code for this benchmark is in the
sha-src
directory.Our suggested command-line for this program is
./sha path/to/example-sha-input.txt
where path/to/example-sha-input.txt is the full path to the example-sha-input.txt supplied in the inputs directory. You may get this with a command like:
realpath inputs/example-sha-input.txt
This program was selected because it should be integer-operation intensive and very friendly for branch prediction and cache.
-
queens: solves the N queens problem for an N specified as an argument. This is taken from the LLVM (a compiler toolkit) test suite, but based on comments in the source file queens.c, it is much older. Source code is in the
queens.c
file.Our suggested command-line for this program is
./queens -c 10
The
-c
option indicates to count solutions instead of printing out any solutions.This program was selected because it should be very friendly to the cache, but very challenging for branch prediction.
Tasks
Part A: General attributes of benchmark programs (required for checkpoint)
Test each program with the suggested command-line above and create a table reporting the following with the default settings for the simulated processor:
- Portion of microops run that are:
- integer instructions other than multiplies/divides
- integer multiplies or divides
- floating-point instructions
- branch and function call instructions
- memory/cache accessing instructions
- Mean number of microops generated per program instruction
- Portion of branches correctly predicted
- Average instructions executed per cycle
- Average microops executed per cycle
- Instruction cache hit rate
- Data cache hit rate
Part B: Use the pipeline viewer (required for checkpoint)
gem5 includes a pipeline viewer, briefly described here. Try viewing the pipeline during the execution of the blocked-matmul program. First run blocked-matmul with debugging options to record pipelines from simulation ticks 500000000 to 501000000:
./gem5.opt --debug-flags=O3PipeView --debug-start=500000000 --debug-file=.`/trace.out \
gemscript.py --directory=matmul-trace -c ./blocked-matmul -m 501000000
The -m option specifies to terminate the simulation after running 501000000 ticks. After running this command, you will have the usual output the matmul-trace directory, and file called trace.out in that directory. If you look at this file, you will see that it cotnains a list of instructions, along with the clock tick in which they completed each phase of their execution. If an instruction was speculatively executed and did not complete some phase of its execution, ‘0’ is listed instead.
You can pass this trace.out to gem5’s pipeline viewer to get an easier to read output:
PATH/TO/GEM5/util/o3-pipeview.py --cycle-time=500 --color ./trace.out -o pipeview.out
The –cycle-time option specifies the clock rate of the simulated CPU in simulation ticks. This command produces an output file call pipeview.out, which you can view with a command like:
less -r pipeview.out
Note that the output is very wide by default. You can adjust the width of the output by supplying a value for the -w option:
PATH/TO/GEM5/util/o3-pipeview.py -w 40 --cycle-time=500 --color ./trace.out -o pipeview.out
For each instruction, this indicates what the address of the instruction was, the kind of instruction it was (using gem5’s internal name) and how long each phase of its completion took.
Some lines have instruction names prefixed with ‘-----
’. These represent instructions
which were executed speculatively and `squashed’ (its effects cancelled), for example
because the branch was mispredicted.
- Include your pipeview.out in your submission.
Part C: Effective cache miss penalty versus miss latency (required for checkpoint)
For this section, use only the BFS program. Notice that it has a relatively high data cache miss rate. The default configuration of the processor not only overlaps cache misses with other comptuation but can have many outstanding cache misses.
The simulated cache has a number of MSHR (Miss Status Handling Registers)
that each can make up to one request at a time to the memory system. The
number of MSHRs is controlled by the mshrs
parameter of each cache object
(like the dcache
variable produced by the supplied code).
-
Adjusting the number of MSHRs in the data cache to 1 (from the default of
84) for the BFS program. Compare the performance to a higher number of MSHRs and thesystem.cpu.dcache.blocked_cycles::no_mshrs
statistics from each run.What was the difference in performance?
-
Try adjusting the data cache size to decrease the number of cache misses to a neglible number. What was the difference in performance?
-
Examine the
system.cpu.dcache.overall_miss_latency
statistic, which shows the total number of simulation ticks memory accesses triggered by a cache miss took. Based on this (and any other statistics), estimate the performance gained by the ability to overlap cache misses with other instructions (both other cache acceses and non-cache instructions like integer arithmetic). Identify any major limitations of your estimate.The
overall_miss_latency
statistic counts time when there are two active cache misses twice. For example, if there is a miss at time 1 that takes until time 11 to resolve and a miss at time 5 that takes until time 14, theoverall_miss_latency
will be (11-1) + (14-5) = 19.When making your estimate, note that the simulated processor always overlaps cache misses with other operations, even when configured with one MSHR, so it can only handle one cache miss at a time.
Part D: Branch Prediction Benefits
Examine the counts of `squashed’ instructions and memory requests ignored due to squashing.
-
Based on the counters, estimate what portion of throughput is lost due to branch misprediction for each benchmark program. Identify any major limitations of your estimate.
-
Based on the rate of branches, estimate the total benefit of the correct branch prediction for each benchmark. Identify any major limitations of your estimate.
-
Change the branch predictor to the NeverTakenBP we have provided (by setting
the_cpu.branchPred = NeverTakenBP()
) and run each benchmark again. This branch predictor predicts every conditional branch as not taken resulting in a very low rate of correct branch prediction. How does the performance difference compare to your estimates?
Part E: Achievable bandwidth
Examine the instruction mixes for each benchmark program based on committed microoperations
(the system.cpu.commit.op_class_0::
counters).
Note that program instructions may be split into multiple
microoperations each of which is executed indiivdually on functional units.
Examine the available functional units in the gem5 source code in src/o3/FuncUnitConfig.py
.
In FuncUnitConfig.py
, the counts indicate the “width” of each functional unit
available. Unless otherwise specified with the pipelined=False, each functional
unit can dispatch count instructions per cycle. Unless specified with the opLat
option, each functional unit takes 1 cycle to produce a result after a value is
dispatched to it.
-
Based on the instruction mix and the available functional units, if the other stages were not bottlenecks, what is the maximum average number of microoperations which could be dispatched per cycle? Assume that the mix of instructions is approximately constant throughout each benchmark’s execution.
-
How close is each program to the maximum possible average computation rate it could achieve given its instructions and the available functional units?
- Computation speed is also limited by the widths of the non-execute phases. Examine the counters
and histograms in the stats.txt files for:
- instructions fetched per cycle
- instructions decoded per cycle
- instructions renamed per cycle
- instructions dispatched per cycle
- instructions issued per cycle
- instructions committed per cycle
- instructions squashed per cycle (when undoing a mispredicted branch in the commit phase)
Note that, by default, each of the maximum number of instructions (usually actually microops) per cycle for each of these phases is 8.
Consider changing all these widths to 4. Which benchmark seems like it should be most affected and why? Which benchmark seems like it should be least affected and why?
-
Now try running each benchmark with all these widths changed to 4. How does performance actually change?
-
A fundamental limit on out-of-order processors is the “dependence limit” — the inability to issue instructions faster than their dependencies can be computed. The processor in question tries to evade the dependence limit somewhat by speculatively executing past unexecuted branches, and speculatively executing memory accesses before it knows if their addresses will conflict, but there is a limit on how far it can go.
Try to run the simulated processor much closer to this dependence limit by increasing all of
- all the widths from Part E.3
- the number of functional units of each type
- the number of reorder buffer entries
- the number of physical registers of every type, and
- the size of the load/store queues. How much does the instructions per cycle increase by?
Part F: Miscellaneous Resource
Choose two of following parameters:
a. the latency of the available functional units; b. the size of the reorder buffer; c. the size of the instruction queue; d. the sizes of the load and store queues; e. the number of physical registers (internal register names); f. the size of the branch predictors;
-
For each parameter you choose, look for evidence in benchmark statistics from benchmarks you already ran that changing it is likely to have a small or large effect on performance. What do you find? Do you think it is possible to infer the effects of the parameter from the statistics? Explain why or why not.
-
Try varying the parameter and running the benchmark programs. How much does performance change? Does this match your expectations?
Note on errors
If you get an error about gettid being unimplemented, check the program.err file. Often this means the program failed and tried to execute an error handling routine that needed to know some process information that gem5 does not simulate properly.
Building benchmarks and gem5
You should not need the instructions in the section if you are using our prebuilt archives on a 64-bit Linux system.
Rebuilding the benchmarks
The benchmarks includes a Makefile.
If you have GNU make
and gcc and g++ installed, running make clean
, then make
should rebuild all
the benchmark programs for your system.
If you are using one of our supplied VMs, you may need to install these packages. One way
to do this is sudo apt-get install build-essential
.
If you’d like to build the benchmarks with a different compiler and/or differnt compiler
options, you will need to edit the Makefile
and the Makefile in
breadthFirstSearch/deterministicBFS
and the Makefile in sha
.
Rebuilding gem5
These instructions are for building gem5 on a Unix-like system.
To build gem5 from source, first make sure the following dependencies (listed with Ubuntu/Debian package names) are installed (from the video on this page):
- mercurial
- scons
- swig
- gcc
- g++
- m4
- python-dev (development packages for python 2)
- libgoogle-perftools-dev (development packages for google-perftools)
Then acquire the gem5 source. You can take our prebuilt package and remove
the build
directory. Alternatively, you can checkout gem5 from mercurial yourself using
hg clone http://repo.gem5.org/gem5
and then apply our patch using
patch -p 1 < ./gem5.patch
Then use scons build/X86/gem5.opt
to actually build the gem5 directory.
Linux quick start
From a default graphical install of Ubuntu, log in and use alt-F2 and type
gnome-terminal
to get a command line window. (Alternately, if using vagrant,
vagrant ssh
will give you a command-line window in the virtual machine.)
Then, here are some useful commands (adapted from our advice in CS 3330):
pwd
: identify the current directory you are inls
: list files in the current directorymkdir
: create a new directoryman ls
: display a manual page for ls. This also works for essentially every other command.cd dirname
: enter thedirname
directorycd ..
: enter the parent directory of the current directory; for example, if you are in/home/cr4bd/benchmarks
, the parent directory is/home/cr4bd
. You can also generally use..
to refer to files in the parent directory, for example../foo.txt
names a file called foo.txt in the parent directory../foo
runs an executable calledfoo
in the current directory.history
: show commands you ran recently.wget URL
to download the file from URL to the current directory.tar -zxvf FILE.tar.gz
to extract the tar archives we provide.gedit &
: open a graphical text editor. Other possible graphical editors includegeany
andkate
. The&
indicates to run the command in the background.nano
orpico
: open an editor that operates in the terminal (likely necessary if you are SSHing into your VM.)sudo apt install PACKAGE
. Install a software package. If you’re using Ubuntu 16.04, you can search possible packages on this page, selecting thexenial
distribution. Thesudo
indicates to run the specified command as root. It will require your password. In the VM image we supply, the password ispassword
.
In the terminal, you can use up-arrow and down-arrow to repeat recent commands. If you press TAB, this will
try to complete your current command, for example cd be
TAB will probably result in cd benchmarks
if a directory
named benchmarks
exists in the current directory. If there were multiple directories starting with be
, then
it would not complete to any name, but pressing TAB twice would give you a list of the possible completions.