CS 6354: Schedule

This schedule is tentative more than one weeks in advance.

Unless otherwise noted, all required readings require a paper review.

"H&P" refers to Hennessy and Patterson, "Computer Architecture: A Quantitative Approach", 5th edition.

Assignments are due at 11:59PM on the day indicated on the schedule unless otherwise noted. Note that there may be some changes to assignments that are not yet released; we still link to their writeups in the hopes that they will be useful.

Date	Topic	Assignment
Week 1
Mon 21 Aug	(no class)
Wed 23 Aug	Overview / Tech Trends Readings (Optional) H&P 1.4-6, 1.8-10 Moore, "Progress in Digital Integrated Electronics" Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities"	HIERARCHY released
Week 2
Mon 28 Aug	Memory Hierarchy 1: Cache Design Tradeoffs Readings (Required) Smith, "Cache Memories", 1982 Read H&P appendix B first if you are not familar with set associative caches. Readings (Optional) H&P appendix B
Wed 30 Aug	Memory Hierarchy 2: Auxillary Caches / Timing Leaks Readings (Required) Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers", 1990 H&P briefly covers the two topics in this: prefetching and victim caches Bernstein, "Cache Timing Attacks on AES", 2005 You are not expected to understand how AES works; focus on the other sections of the paper. Readings (Optional) H&P 2.2-4 Cook et al, "A Hardware Evaluation of Cache Partitioning to Improve Utilization and Energy-Efficiency while Preserving Responsiveness", 2013
Week 3
Mon 04 Sep	Memory Hierarchy 3: Access Patterns Readings (Required) Goto and van de Gaijn, "Anatomy of a High-Performance Matrix Multiplication", 2008 Beamer et al, "Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server", 2015
Wed 06 Sep	Pipelining / RISC versus CISC Readings (Required) H&P C, 3.3 (no paper review) You may skip parts you are already familiar with from your undergraduate preparation. Waterman et al, "The RISC V Instruction Set Manual: Volume I: User-Level ISA: Chapter 1" (no paper review) Read chapter 1 (including commentary) ONLY. Readings (Optional) Bhandarker and Clark, "Performance from architecture: comparing a RISC and CISC with similar hardware organization", 1991
Fri 08 Sep	—	HIERARCHY checkpoint due
Week 4
Mon 11 Sep	Out-of-Order 1: Static Scheduling / Multiple Issue Readings (Required) Fisher, "Very Long Instruction Word Architectures and the ELI-512", 1983 Colwell et al., "A VLIW Architecture for a Trace Scheduling Compiler", 1987 Readings (Optional) H&P 3.7, H.3-4 Weiss and Smith, "A Study of Scalar Compilation Techniques for Pipelined Supercomputers", 1990
Wed 13 Sep	Out-of-Order 2: Dynamic Issue I Readings (Required) H&P 3.4-5, 3.8 (no paper review) Tomasulo, "An Effective Algorithm for Exploiting Mulitple Arithmetic Units" Reading at least sections of 3.4-5 textbook first or at the same time is strongly recommended. This paper is difficult to understand and our textbook goes over it in a lot of detail.
Week 5
Mon 18 Sep	Out-of-Order 3: Precise Interrupts / Re-order Buffers Readings (Required) Smith and Pleszkan, "Implementation of Precise Interrupts for Pipelined Processors", 1985 Chapter 3.6 of the textbook covers the same material (even though it appears to be tackling a different problem), and you may find it easier to read first. H&P 3.6 (no paper review) McFarling, "Combining Branch Predictors", 1993 Readings (Optional) H&P 3.1-3
Wed 20 Sep	Out-of-Order 4: Dynamic Issue II Readings (Required) Yeager, "The MIPS R10000 Superscalar microprocessor", 1996 Li et al, "Performance Impacts of Non-blocking Caches in Out-of-order Processors", 2011 Readings (Optional) Kanter, "Intel's Haswell CPU Microarchitecture", 2012
Fri 22 Sep	—	HIERARCHY due OOO released
Week 6
Mon 25 Sep	Out-of-Order 5: SMT Readings (Required) Alverson et al, "The Tera Computer System", 1990 Tullsen et al, "Exploiting choice: instruction fetch and issue on an implementable simulatenous mulithreading processor", 1996 Readings (Optional) H&P 3.12
Wed 27 Sep	Multicore 1: Processor Networks Readings (Required) Scott, "Synchronization and Communication in the T3E Multiprocessor", 1996 Bell, "Supercomputers: The Amazing Race", 2015 Readings (Optional) H&P 5.1
Week 7
Mon 02 Oct	Exam Review
Wed 04 Oct	Midterm Exam
Week 8
Mon 09 Oct	(no class)	OOO checkpoint due
Wed 11 Oct	Multicore 2: Snooping cache coherence Readings (Required) Archibald and Baer, "Cache coherence protocols: evaluation using a multiprocessor simulation model", 1986 This paper summarizes the technique in Goodman, among others, so you may find it helpful to read first. The strategies compared in this paper are also described in Section 5.2 of our textbook. Goodman, "Using cache memory to reduce processor-memory traffic", 1983 Readings (Optional) H&P 5.2-3
Week 9
Mon 16 Oct	Multicore 3: Directory-based cache coherence Readings (Required) Lenoski et al, "The Directory-Based Cache Coherence Protocl for the DASH Multiprocessor", 1990 Readings (Optional) H&P 5.4 Le et al, "IBM POWER6 Microarchitecture", 2007
Wed 18 Oct	Multicore 4: Memory models Readings (Required) H&P 5.6 (no paper review) Adve and Gharachorloo, "Shared Memory Consistency Models: A Tutorial", 1995 Boehm and Adve, "Foundations of the C++ Concurrency Memory Model" (no paper review) Read section 1 only
Fri 20 Oct	—	OOO due
Week 10
Mon 23 Oct	Multicore 5: Synchronization support Readings (Required) Anderson, "The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors", 1990 Guiroux and Lachaize, "Multicore Locks: The Case Is Not Closed Yet", 2016 Readings (Optional) H&P 5.5
Wed 25 Oct	Multicore 6: Transactional Memory Readings (Required) Herlihy and Moss, "Transactional Memory: Architecturel Support for Lock-Free Data Structures", 1993 McKenney et al, "Why The Grass May not Be Greener On The Other Side: A Comparison of Locking vs. Transaction Memory", 2010 Readings (Optional) Cutress, "Intel Disables TSC Instructions: Erratum Found in Haswell, Haswell-E/EP, Broadwell-Y"
Week 11
Mon 30 Oct	Vector 1: Vector Processors Readings (Required) Russel, "The CRAY-1 Computer System", 1978 H&P 4.2 (no paper review) Readings (Optional) H&P 4.1
Wed 01 Nov	Vector 2: GPUs Readings (Required) Lindholm et al, "A User-Programmable Vertex Engine", 2001 A brief introduction: The images constructed by this processor are composed notable from polygons, which are specified with lists of vertices, and textures, which are matrices usually representing the pixels of an image. The paper mostly talks about a language for writing programs intended to run on the GPU (a seperate chip) that manipulate a list of vertices (and their attributes) produced by a program running on a normal CPU. Swizzle means to rearrange the components of a vector (section 3.3 of the paper), for example to convert a vector containing `x, y, z, w` to one containing `x, x, y, y` or containing `w, z, y, x`. Smith, "AMD Graphics Core Next Preview: AMD's New GPU, Architected For Compute", 2011 Readings (Optional) H&P 4.4
Week 12
Mon 06 Nov	Vector 3: Vector Programming Interfaces Readings (Required) Nickolis et al, "Scalable Parallel Programming with CUDA", 2008 NVIDIA, "CUDA C Programming Guide, Sections 1 through 4", 2017 Readings (Optional) H&P 4.4 Lee et al, "Exploring the Design Space of SPMD Divergence Management on Data-Parallel Architectures"	GPU released
Wed 08 Nov	Vector 4: Vector Case Studies Readings (Required) Lee et al, "Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU" Stephens et al, "The ARM Scalable Vector Extension", 2017 Readings (Optional) H&P 4.3, 4.7-8
Week 13
Mon 13 Nov	FPGAs and custom ASIC design Readings (Required) Brown and Rose, "Architecture of FPGAs and CPLDs: A Tutorial" (no paper review) You may skip sections 1.3 and 2.2.1-7. Shao et al, "The Aladdin Approach to Accelerator Design and Modeling", 2015
Wed 15 Nov	Accelerators Readings (Required) Jouppi et al, "In-Datacenter Performance Analysis of a Tensoor Processing Unit", 2017 Putnam et al, "A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services", 2014
Fri 17 Nov	—	GPU checkpoint due
Week 14
Mon 20 Nov	Warehouse Scale Computers Readings (Required) Barroso et al, "The Datacenter as a Computer, chapters 1, 3 and 6"
Wed 22 Nov	(no class)
Week 15
Mon 27 Nov	Security Readings (Required) H&P B.5 (no paper review) Smith and Weingart, "Building a high-performance, programmable secure coprocessor", 1998 You may skip sections 7-9. Readings (Optional) Costan and Devadas, "Intel SGX Explained", 2016
Wed 29 Nov	TBA
Fri 01 Dec	—	GPU due
Week 16
Mon 04 Dec	Exam Review

Overview / Tech Trends

Readings (Optional)

Memory Hierarchy 1: Cache Design Tradeoffs

Readings (Required)

Readings (Optional)

Memory Hierarchy 2: Auxillary Caches / Timing Leaks

Readings (Required)

Readings (Optional)

Memory Hierarchy 3: Access Patterns

Readings (Required)

Pipelining / RISC versus CISC

Readings (Required)

Readings (Optional)

Out-of-Order 1: Static Scheduling / Multiple Issue

Readings (Required)

Readings (Optional)

Out-of-Order 2: Dynamic Issue I

Readings (Required)

Out-of-Order 3: Precise Interrupts / Re-order Buffers

Readings (Required)

Readings (Optional)

Out-of-Order 4: Dynamic Issue II

Readings (Required)

Readings (Optional)

Out-of-Order 5: SMT

Readings (Required)

Readings (Optional)

Multicore 1: Processor Networks

Readings (Required)

Readings (Optional)

Exam Review

Midterm Exam

Multicore 2: Snooping cache coherence

Readings (Required)

Readings (Optional)

Multicore 3: Directory-based cache coherence

Readings (Required)

Readings (Optional)

Multicore 4: Memory models

Readings (Required)

Multicore 5: Synchronization support

Readings (Required)

Readings (Optional)

Multicore 6: Transactional Memory

Readings (Required)

Readings (Optional)

Vector 1: Vector Processors

Readings (Required)

Readings (Optional)

Vector 2: GPUs

Readings (Required)

Readings (Optional)

Vector 3: Vector Programming Interfaces

Readings (Required)

Readings (Optional)

Vector 4: Vector Case Studies

Readings (Required)

Readings (Optional)

FPGAs and custom ASIC design

Readings (Required)

Accelerators

Readings (Required)

Warehouse Scale Computers

Readings (Required)

Security

Readings (Required)

Readings (Optional)

TBA

Exam Review