This schedule is tentative more than one weeks in advance.

Unless otherwise noted, all required readings require a paper review.

"H&P" refers to Hennessy and Patterson, "Computer Architecture: A Quantitative Approach", 5th edition.

Assignments are due at 11:59PM on the day indicated on the schedule unless otherwise noted. Note that there may be some changes to assignments that are not yet released; we still link to their writeups in the hopes that they will be useful.

DateTopicAssignment
Week 1
Mon 21 Aug
(no class)
Wed 23 Aug

Overview / Tech Trends

Readings (Optional)

  • H&P 1.4-6, 1.8-10
  • Moore, "Progress in Digital Integrated Electronics"
  • Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities"
HIERARCHY released
Week 2
Mon 28 Aug

Memory Hierarchy 1: Cache Design Tradeoffs

Readings (Required)

  • Smith, "Cache Memories", 1982

    Read H&P appendix B first if you are not familar with set associative caches.

Readings (Optional)

  • H&P appendix B
Wed 30 Aug

Memory Hierarchy 2: Auxillary Caches / Timing Leaks

Readings (Required)

  • Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers", 1990

    H&P briefly covers the two topics in this: prefetching and victim caches

  • Bernstein, "Cache Timing Attacks on AES", 2005

    You are not expected to understand how AES works; focus on the other sections of the paper.

Readings (Optional)

  • H&P 2.2-4
  • Cook et al, "A Hardware Evaluation of Cache Partitioning to Improve Utilization and Energy-Efficiency while Preserving Responsiveness", 2013
Week 3
Mon 04 Sep

Memory Hierarchy 3: Access Patterns

Readings (Required)

  • Goto and van de Gaijn, "Anatomy of a High-Performance Matrix Multiplication", 2008
  • Beamer et al, "Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server", 2015
Wed 06 Sep

Pipelining / RISC versus CISC

Readings (Required)

  • H&P C, 3.3 (no paper review)

    You may skip parts you are already familiar with from your undergraduate preparation.

  • Waterman et al, "The RISC V Instruction Set Manual: Volume I: User-Level ISA: Chapter 1" (no paper review)

    Read chapter 1 (including commentary) ONLY.

Readings (Optional)

  • Bhandarker and Clark, "Performance from architecture: comparing a RISC and CISC with similar hardware organization", 1991
Fri 08 Sep
HIERARCHY checkpoint due
Week 4
Mon 11 Sep

Out-of-Order 1: Static Scheduling / Multiple Issue

Readings (Required)

  • Fisher, "Very Long Instruction Word Architectures and the ELI-512", 1983
  • Colwell et al., "A VLIW Architecture for a Trace Scheduling Compiler", 1987

Readings (Optional)

  • H&P 3.7, H.3-4
  • Weiss and Smith, "A Study of Scalar Compilation Techniques for Pipelined Supercomputers", 1990
Wed 13 Sep

Out-of-Order 2: Dynamic Issue I

Readings (Required)

  • H&P 3.4-5, 3.8 (no paper review)
  • Tomasulo, "An Effective Algorithm for Exploiting Mulitple Arithmetic Units"

    Reading at least sections of 3.4-5 textbook first or at the same time is strongly recommended. This paper is difficult to understand and our textbook goes over it in a lot of detail.

Week 5
Mon 18 Sep

Out-of-Order 3: Precise Interrupts / Re-order Buffers

Readings (Required)

  • Smith and Pleszkan, "Implementation of Precise Interrupts for Pipelined Processors", 1985

    Chapter 3.6 of the textbook covers the same material (even though it appears to be tackling a different problem), and you may find it easier to read first.

  • H&P 3.6 (no paper review)
  • McFarling, "Combining Branch Predictors", 1993

Readings (Optional)

  • H&P 3.1-3
Wed 20 Sep

Out-of-Order 4: Dynamic Issue II

Readings (Required)

  • Yeager, "The MIPS R10000 Superscalar microprocessor", 1996
  • Li et al, "Performance Impacts of Non-blocking Caches in Out-of-order Processors", 2011

Readings (Optional)

  • Kanter, "Intel's Haswell CPU Microarchitecture", 2012
Fri 22 Sep
HIERARCHY due
OOO released
Week 6
Mon 25 Sep

Out-of-Order 5: SMT

Readings (Required)

  • Alverson et al, "The Tera Computer System", 1990
  • Tullsen et al, "Exploiting choice: instruction fetch and issue on an implementable simulatenous mulithreading processor", 1996

Readings (Optional)

  • H&P 3.12
Wed 27 Sep

Multicore 1: Processor Networks

Readings (Required)

Readings (Optional)

  • H&P 5.1
Week 7
Mon 02 Oct

Exam Review

Wed 04 Oct

Midterm Exam

Week 8
Mon 09 Oct
(no class)
OOO checkpoint due
Wed 11 Oct

Multicore 2: Snooping cache coherence

Readings (Required)

  • Archibald and Baer, "Cache coherence protocols: evaluation using a multiprocessor simulation model", 1986

    This paper summarizes the technique in Goodman, among others, so you may find it helpful to read first. The strategies compared in this paper are also described in Section 5.2 of our textbook.

  • Goodman, "Using cache memory to reduce processor-memory traffic", 1983

Readings (Optional)

  • H&P 5.2-3
Week 9
Mon 16 Oct

Multicore 3: Directory-based cache coherence

Readings (Required)

  • Lenoski et al, "The Directory-Based Cache Coherence Protocl for the DASH Multiprocessor", 1990

Readings (Optional)

  • H&P 5.4
  • Le et al, "IBM POWER6 Microarchitecture", 2007
Wed 18 Oct

Multicore 4: Memory models

Readings (Required)

  • H&P 5.6 (no paper review)
  • Adve and Gharachorloo, "Shared Memory Consistency Models: A Tutorial", 1995
  • Boehm and Adve, "Foundations of the C++ Concurrency Memory Model" (no paper review)

    Read section 1 only

Fri 20 Oct
OOO due
Week 10
Mon 23 Oct

Multicore 5: Synchronization support

Readings (Required)

  • Anderson, "The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors", 1990
  • Guiroux and Lachaize, "Multicore Locks: The Case Is Not Closed Yet", 2016

Readings (Optional)

  • H&P 5.5
Wed 25 Oct

Multicore 6: Transactional Memory

Readings (Required)

  • Herlihy and Moss, "Transactional Memory: Architecturel Support for Lock-Free Data Structures", 1993
  • McKenney et al, "Why The Grass May not Be Greener On The Other Side: A Comparison of Locking vs. Transaction Memory", 2010

Readings (Optional)

  • Cutress, "Intel Disables TSC Instructions: Erratum Found in Haswell, Haswell-E/EP, Broadwell-Y"
Week 11
Mon 30 Oct

Vector 1: Vector Processors

Readings (Required)

  • Russel, "The CRAY-1 Computer System", 1978
  • H&P 4.2 (no paper review)

Readings (Optional)

  • H&P 4.1
Wed 01 Nov

Vector 2: GPUs

Readings (Required)

  • Lindholm et al, "A User-Programmable Vertex Engine", 2001

    A brief introduction:

    • The images constructed by this processor are composed notable from polygons, which are specified with lists of vertices, and textures, which are matrices usually representing the pixels of an image. The paper mostly talks about a language for writing programs intended to run on the GPU (a seperate chip) that manipulate a list of vertices (and their attributes) produced by a program running on a normal CPU.
    • Swizzle means to rearrange the components of a vector (section 3.3 of the paper), for example to convert a vector containing x, y, z, w to one containing x, x, y, y or containing w, z, y, x.
  • Smith, "AMD Graphics Core Next Preview: AMD's New GPU, Architected For Compute", 2011

Readings (Optional)

  • H&P 4.4
Week 12
Mon 06 Nov

Vector 3: Vector Programming Interfaces

Readings (Required)

  • Nickolis et al, "Scalable Parallel Programming with CUDA", 2008
  • NVIDIA, "CUDA C Programming Guide, Sections 1 through 4", 2017

Readings (Optional)

  • H&P 4.4
  • Lee et al, "Exploring the Design Space of SPMD Divergence Management on Data-Parallel Architectures"
GPU released
Wed 08 Nov

Vector 4: Vector Case Studies

Readings (Required)

  • Lee et al, "Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU"
  • Stephens et al, "The ARM Scalable Vector Extension", 2017

Readings (Optional)

  • H&P 4.3, 4.7-8
Week 13
Mon 13 Nov

FPGAs and custom ASIC design

Readings (Required)

  • Brown and Rose, "Architecture of FPGAs and CPLDs: A Tutorial" (no paper review)

    You may skip sections 1.3 and 2.2.1-7.

  • Shao et al, "The Aladdin Approach to Accelerator Design and Modeling", 2015
Wed 15 Nov

Accelerators

Readings (Required)

  • Jouppi et al, "In-Datacenter Performance Analysis of a Tensoor Processing Unit", 2017
  • Putnam et al, "A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services", 2014
Fri 17 Nov
GPU checkpoint due
Week 14
Mon 20 Nov

Warehouse Scale Computers

Readings (Required)

  • Barroso et al, "The Datacenter as a Computer, chapters 1, 3 and 6"
Wed 22 Nov
(no class)
Week 15
Mon 27 Nov

Security

Readings (Required)

  • H&P B.5 (no paper review)
  • Smith and Weingart, "Building a high-performance, programmable secure coprocessor", 1998

    You may skip sections 7-9.

Readings (Optional)

  • Costan and Devadas, "Intel SGX Explained", 2016
Wed 29 Nov

TBA

Fri 01 Dec
GPU due
Week 16
Mon 04 Dec

Exam Review