Unless otherwise noted, all required readings require a paper review.
"H&P" refers to Hennessy and Patterson, "Computer Architecture: A Quantitative
Approach", 5th edition.
Week 1 |
Mon 21 Aug |
(no class)
|
|
Wed 23 Aug |
Overview / Tech Trends
Readings (Optional)
|
HIERARCHY released
|
Week 2 |
Mon 28 Aug |
Memory Hierarchy 1: Cache Design Tradeoffs
Readings (Required)
-
Smith,
"Cache Memories",
1982
Read H&P appendix B first if you are not familar with set associative caches.
Readings (Optional)
|
|
Wed 30 Aug |
Memory Hierarchy 2: Auxillary Caches / Timing Leaks
Readings (Required)
-
Jouppi,
"Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers",
1990
H&P briefly covers the two topics in this: prefetching and victim caches
-
Bernstein,
"Cache Timing Attacks on AES",
2005
You are not expected to understand how AES works; focus on the other sections of the paper.
Readings (Optional)
|
|
Week 3 |
Mon 04 Sep |
Memory Hierarchy 3: Access Patterns
Readings (Required)
-
Goto and van de Gaijn,
"Anatomy of a High-Performance Matrix Multiplication",
2008
-
Beamer et al,
"Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server",
2015
|
|
Wed 06 Sep |
Pipelining / RISC versus CISC
Readings (Required)
-
H&P C, 3.3 (no paper review)
You may skip parts you are already familiar with from your undergraduate preparation.
-
Waterman et al,
"The RISC V Instruction Set Manual: Volume I: User-Level ISA: Chapter 1"
(no paper review)
Read chapter 1 (including commentary) ONLY.
Readings (Optional)
|
|
Fri 08 Sep |
—
|
HIERARCHY checkpoint due
|
Week 4 |
Mon 11 Sep |
Out-of-Order 1: Static Scheduling / Multiple Issue
Readings (Required)
-
Fisher,
"Very Long Instruction Word Architectures and the ELI-512",
1983
-
Colwell et al.,
"A VLIW Architecture for a Trace Scheduling Compiler",
1987
Readings (Optional)
|
|
Wed 13 Sep |
Out-of-Order 2: Dynamic Issue I
Readings (Required)
-
H&P 3.4-5, 3.8 (no paper review)
-
Tomasulo,
"An Effective Algorithm for Exploiting Mulitple Arithmetic Units"
Reading at least sections of 3.4-5 textbook first or at the same time is strongly recommended. This paper is difficult to understand and our textbook goes over it in a lot of detail.
|
|
Week 5 |
Mon 18 Sep |
Out-of-Order 3: Precise Interrupts / Re-order Buffers
Readings (Required)
-
Smith and Pleszkan,
"Implementation of Precise Interrupts for Pipelined Processors",
1985
Chapter 3.6 of the textbook covers the same material (even though it appears to be tackling a different problem), and you may find it easier to read first.
-
H&P 3.6 (no paper review)
-
McFarling,
"Combining Branch Predictors",
1993
Readings (Optional)
|
|
Wed 20 Sep |
Out-of-Order 4: Dynamic Issue II
Readings (Required)
-
Yeager,
"The MIPS R10000 Superscalar microprocessor",
1996
-
Li et al,
"Performance Impacts of Non-blocking Caches in Out-of-order Processors",
2011
Readings (Optional)
|
|
Fri 22 Sep |
—
|
HIERARCHY due
OOO released
|
Week 6 |
Mon 25 Sep |
Out-of-Order 5: SMT
Readings (Required)
-
Alverson et al,
"The Tera Computer System",
1990
-
Tullsen et al,
"Exploiting choice: instruction fetch and issue on an implementable simulatenous mulithreading processor",
1996
Readings (Optional)
|
|
Wed 27 Sep |
Multicore 1: Processor Networks
Readings (Required)
Readings (Optional)
|
|
Week 7 |
Mon 02 Oct |
Exam Review
|
|
Wed 04 Oct |
Midterm Exam
|
|
Week 8 |
Mon 09 Oct |
(no class)
|
OOO checkpoint due
|
Wed 11 Oct |
Multicore 2: Snooping cache coherence
Readings (Required)
-
Archibald and Baer,
"Cache coherence protocols: evaluation using a multiprocessor simulation model",
1986
This paper summarizes the technique in Goodman, among others, so you may find it helpful to read first. The strategies compared in this paper are also described in Section 5.2 of our textbook.
-
Goodman,
"Using cache memory to reduce processor-memory traffic",
1983
Readings (Optional)
|
|
Week 9 |
Mon 16 Oct |
Multicore 3: Directory-based cache coherence
Readings (Required)
-
Lenoski et al,
"The Directory-Based Cache Coherence Protocl for the DASH Multiprocessor",
1990
Readings (Optional)
|
|
Wed 18 Oct |
Multicore 4: Memory models
Readings (Required)
-
H&P 5.6 (no paper review)
-
Adve and Gharachorloo,
"Shared Memory Consistency Models: A Tutorial",
1995
-
Boehm and Adve,
"Foundations of the C++ Concurrency Memory Model"
(no paper review)
|
|
Fri 20 Oct |
—
|
OOO due
|
Week 10 |
Mon 23 Oct |
Multicore 5: Synchronization support
Readings (Required)
-
Anderson,
"The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors",
1990
-
Guiroux and Lachaize,
"Multicore Locks: The Case Is Not Closed Yet",
2016
Readings (Optional)
|
|
Wed 25 Oct |
Multicore 6: Transactional Memory
Readings (Required)
-
Herlihy and Moss,
"Transactional Memory: Architecturel Support for Lock-Free Data Structures",
1993
-
McKenney et al,
"Why The Grass May not Be Greener On The Other Side: A Comparison of Locking vs. Transaction Memory",
2010
Readings (Optional)
|
|
Week 11 |
Mon 30 Oct |
Vector 1: Vector Processors
Readings (Required)
-
Russel,
"The CRAY-1 Computer System",
1978
-
H&P 4.2 (no paper review)
Readings (Optional)
|
|
Wed 01 Nov |
Vector 2: GPUs
Readings (Required)
-
Lindholm et al,
"A User-Programmable Vertex Engine",
2001
A brief introduction:
- The images constructed by this processor are composed notable from polygons,
which are specified with lists of vertices, and textures, which are
matrices usually representing the pixels of an image. The paper mostly talks about a language
for writing programs intended to run on the GPU (a seperate chip) that manipulate
a list of vertices (and their attributes) produced by a program running on a normal CPU.
- Swizzle means to rearrange the components of a vector (section 3.3 of the paper),
for example to convert a vector containing
x, y, z, w to one containing
x, x, y, y or containing w, z, y, x .
-
Smith,
"AMD Graphics Core Next Preview: AMD's New GPU, Architected For Compute",
2011
Readings (Optional)
|
|
Week 12 |
Mon 06 Nov |
Vector 3: Vector Programming Interfaces
Readings (Required)
-
Nickolis et al,
"Scalable Parallel Programming with CUDA",
2008
-
NVIDIA,
"CUDA C Programming Guide, Sections 1 through 4",
2017
Readings (Optional)
|
GPU released
|
Wed 08 Nov |
Vector 4: Vector Case Studies
Readings (Required)
-
Lee et al,
"Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU"
-
Stephens et al,
"The ARM Scalable Vector Extension",
2017
Readings (Optional)
|
|
Week 13 |
Mon 13 Nov |
FPGAs and custom ASIC design
Readings (Required)
-
Brown and Rose,
"Architecture of FPGAs and CPLDs: A Tutorial"
(no paper review)
You may skip sections 1.3 and 2.2.1-7.
-
Shao et al,
"The Aladdin Approach to Accelerator Design and Modeling",
2015
|
|
Wed 15 Nov |
Accelerators
Readings (Required)
-
Jouppi et al,
"In-Datacenter Performance Analysis of a Tensoor Processing Unit",
2017
-
Putnam et al,
"A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services",
2014
|
|
Fri 17 Nov |
—
|
GPU checkpoint due
|
Week 14 |
Mon 20 Nov |
Warehouse Scale Computers
Readings (Required)
-
Barroso et al,
"The Datacenter as a Computer, chapters 1, 3 and 6"
|
|
Wed 22 Nov |
(no class)
|
|
Week 15 |
Mon 27 Nov |
Security
Readings (Required)
-
H&P B.5 (no paper review)
-
Smith and Weingart,
"Building a high-performance, programmable secure coprocessor",
1998
You may skip sections 7-9.
Readings (Optional)
|
|
Wed 29 Nov |
TBA
|
|
Fri 01 Dec |
—
|
GPU due
|
Week 16 |
Mon 04 Dec |
Exam Review
|
|