This schedule will be updated as the semester progresses; it is likely only accurate about a week in advance.
H&P refers to Hennessy and Patterson, Computer Architecture: A Quantitative Approach
, 5th edition. Textbook readings are provided primarily for reference; most discussion will be based on the assigned papers and material presented in lecture.
Some papers can only be accessed through the UVa network (or another institution that subscribes to the appropriate service). If you are Off-Grounds, you can use the UVa proxy or UVa VPN.
For all paper readings not marked with an *, please submit a paper review by 1pm the day of class on Collab. See this description of what should be in a paper review.
Other assignments are due at 11:59 PM.
no class
blank
blank
Logistics / Tech Trends (slides 1up/4up)
Progress in Digital Integrated Electronics
Validity of the single processor approach to achieving large scale computing capabilities
nothing assigned
Memory Hierarchy 1 (slides 1up/4up)
Cache memories, 1982
Cache-timing attacks on AES, 2005
Homework 1 (Membench) out
Memory Hierarchy 2 (slides 1up/4up)
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers, 1990
A Hardware Evaluation of Cache Partitioning to Improve Utilization and Energy-Efficiency while Preserving Responsiveness, 2013
nothing assigned
Memory Hierarchy 3 (slides 1up/4up)
Anatomy of a High-Performance Matrix Multiplcation, 2008
Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server, 2015
The PhiPAC v1.0 Matrix-Multiply Distribution, 1998
nothing assigned
Performance from architecture: comparing a RISC and CISC with similar hardware organization, 1991
The RISC V Instruction Set Manual: Volume I: User-Level ISA, Chapter 1 (including commentary) only
nothing assigned
Out-of-Order 0: Static Scheduling / Branch Prediction (slides 1up/4up)
A Study of Scalar Compilation Techniques for Pipelined Supercomputers, 1990
Combining Branch Predictors, 1993
Homework 1 Checkpoint DUE
Out-of Order 1: Multiple Issue (slides 1up/4up)
Very Long Instruction Word Architectures and the ELI-512, 1983
A VLIW Architecture for a Trace Scheduling Compiler, 1987
nothing assigned
Out-of-Order 2: Dynamic Issue I / Precise Interrupts (slides)
Implementation of Precise Interrupts for Pipelined Processors
blank
Out-of-Order 3: Dynamic Issue II (slides)
An Effective Algorithm for Exploiting Multiple Arithmetic Units
Homework 1 DUE Friday
Out-of-Order 4: Dynamic Issue III(slides)
The MIPS R10000 Superscalar microprocessor
Intel’s Haswell CPU Microarchitecture
Homework 1 DUE NOON
Out-of-Order 5: SMT (slides)
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor
The Tera Computer System
Homework 2 (OOO) out
reading day
blank
blank
Multicore 1: Processor networks (slides audio screencapture)
C.mmp—A multi-mini-processor
Synchronization and Communication in the T3E Multiprocessor
Fat-Trees: Universal Networks for Hardware-Efficient Supercomputing
nothing assigned
Multicore 2: Snooping cache coherence (slides audio screencapture)
Using cache memory to reduce processor-memory traffic
Cache coherence protocols: evaluation using a multiprocessor simulation model
nothing assigned
Multicore 3: Directory-based cache coherence (slides audio screencapture)
The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor
IBM POWER6 Microarchitecture
Homework 2 (OOO) Checkpoint DUE SATURDAY
Multicore 4: Memory models (slides audio screencapture)
Shared Memory Consistency Models: A Tutorial
Foundations of the C++ Concurrency Memory Model, section 1 only
nothing assigned
Multicore 5: Synchronization support (slides audio screencapture)
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
Multicore Locks: The Case Is Not Closed Yet
Everything you always wanted to know about synchronization but were afraid to ask
nothing assigned
Multicore 6: Transactional Memory (slides audio)
Transactional Memory: Architectural Support for Lock-Free Data Structures
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory
Intel Disables TSX Instructions: Erratum Found in Haswell, Haswell-E/EP, Broadwell-Y
Homework 2 (OOO) Due TUESDAY
Vector 1: Vector supercomputers / GPUs (slides audio screencapture)
The CRAY-1 Computer System
A User-Programmable Vertex Engine
blank
Vector 2: Vector Programming Interfaces 1 (slides audio)
Guest lecture (Jack Wadden)
blank
Vector 3: Vector Programming Interfaces 2 (slides audio)
Guest lecture (Jack Wadden)
Homework 3 (GPGPU) Out
Homework 2 Post-Mortem / Vector 4: GPGPU Case Studies (slides audio video)
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Exploring the Design Space of SPMD Divergence Management on Data-Parallel Architectures
Benchmarking GPUs to Tune Dense Linear Algebra
blank
Architecture of FPGAs and CPLDs: A Tutorial. Section 1.3, 2.2.1-7 optional.
A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services
blank
ASIC accelerators (slides audio)
Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators
The Aladdin Approach to Accelerator Design and Modeling
EIE: Efficient Inference Engine on Compressed Neural Networks
Homework 3 Part 1 Due
Warehouse Scale Computers (slides audio screencapture)
blank
Security (slides audio screencapture)
Building a high-performance, programmable secure coprocessor, 1998, Sections 1-6, 10
Intel SGX Explained(probably too long/detailed to read in full; focus on sections 1, 3, 4, 6)
blank
no class
blank
blank
Exam Review (slides audio screencapture)
TBA
Homework 3 (GPGPU) due TUESDAY
Exam Review (slides audio screencapture)
blank
blank
Exam (in-class)
blank
Date | Topic | Readings | Due |
---|