Reading List for CS 6501, Special Topics in Computer Architecture: CPU/GPU Memory Systems and Near-Data Processing

Spring 2025

Kevin Skadron

University of Virginia

 

(2/27) DRAM and Rowhammer:

·       H. Nam, et al. "DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands." ISCA 2024. https://ieeexplore.ieee.org/document/10609668
(This paper explores many DRAM implementation details, partly through the lens of understanding Rowhammer attacks. Which necessitates the next paper!)

·       O. Mutlu and J. Kim. "RowHammer: A Retrospective." arXiv, 2019. https://arxiv.org/abs/1904.09724

 

(3/4) GPUs:

·       Lee et al., "Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU", ISCA 2010, https://dl.acm.org/doi/10.1145/1815961.1816021

·       Chatterjee et al. "Architecting an Energy-Efficient DRAM System for GPUs", HPCA 2017, https://www.cs.utexas.edu/~skeckler/pubs/HPCA_2017_Subchannels.pdf

 

(3/6) Some important accelerators to know about – TPU, FPGA…

·       Jouppi et al, "In-Datacenter Performance Analysis of a Tensor Processing Unit," ISCA 2017. https://dl.acm.org/doi/10.1145/3079856.3080246

·       Cong et al, "Understanding Performance Differences of FPGAs and GPUs," FCCM 2018. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=845763

 

(3/18) More important accelerators to know about – Xeon Phi (Intel’s GPU-like competitor) and Google’s Protobuf accelerator

·       Sodani et al, "Knights Landing: Second-Generation Intel Xeon Phi Product," IEEE Micro 2016, https://ieeexplore.ieee.org/abstract/document/7453080

·       Karandikar et al, "A Hardware Accelerator for Protocol Buffers," MICRO'21, https://dl.acm.org/doi/10.1145/3466752.3480051

 

(3/20) PIM 1, Newton and Aquabolt:

·       M. He et al. "Newton: A DRAM-maker’s Accelerator-in-Memory (AiM) Architecture for Machine Learning," MICRO'20. https://ieeexplore.ieee.org/document/9251855

·       S. Lee et al. "Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product" ISCA'21. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9499894

 

(3/25) PIM 2, SIMDRAM and Gearbox/Fulcrum:

·       Hajinazar et al, SIMDRAM, ASPLOS’21 – we will read a slightly extended version on arXiv: https://arxiv.org/pdf/2105.12839.pdf

·       Lenjani et al, Gearbox, ISCA'22 - http://www.cs.virginia.edu/~ml2au/papers/GearboxISCAFinalVersion.pdf

 

(3/27) PIM+ML:

·       Zhou et al. "TransPIM: A Memory-based Acceleration via Software-Hardware Co-Design for Transformer," HPCA'22, https://ieeexplore.ieee.org/document/9773212  

·       He et al. PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System,” to appear in ASPLOS 2025 https://arxiv.org/pdf/2502.15470

 

(4/1) PIM 3, UPMEM:

·       Gómez-Luna et al. “Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-in-Memory Hardware”, IGSC Oct. 2021, https://people.inf.ethz.ch/omutlu/pub/Benchmarking-Memory-Centric-Computing-Systems_cut21.pdf

·       Ortega et al. “PIM-AI: A Novel Architecture for High-Efficiency LLM Inference,” https://arxiv.org/abs/2411.17309

 

(4/3) Stream processing:

·       Kapasi et al, "The Imagine Stream Processor", ICCD 2002. https://ieeexplore.ieee.org/abstract/document/1106783

·       Dally et al. "Merrimac: Supercomputing with Streams." SC 2003. https://dl.acm.org/doi/abs/10.1145/1048935.105018

 

(4/8) Non-volatile memory technologies:

·       TBD

 

(4/10) Non-volatile memory technologies, cont.:

·       TBD

 

(4/15) CGRAs:

·       Ansaloni et al, "EGRA: A Coarse Grained Reconfigurable Architectural Template," TVLSI 2011 - https://uweb.engr.arizona.edu/~ece506/readings/project-reading/3-coarse-grain/EGRA.pdf

·       Prabhakar et al, "Plasticine: A Reconfigurable Architecture For Parallel Patterns," ISCA 2017 - https://stanford-ppl.github.io/website/papers/isca17-raghu-plasticine.pdf

 

(4/17) SmartNICs:

·       Lin et al, "PANIC: A High-Performance Programmable NIC for Multi-tenant Networks," OSDI'20, https://wisr.cs.wisc.edu/papers/osdi20-panic.pdf

·       Shashidhara et al, "FlexTOE: Flexible TCP Offload with Fine-Grained Parallelism," NSDI 2022, https://www.usenix.org/system/files/nsdi22-paper-shashidhara.pdf

 

(4/22) Graph Analytics:

·       Mukkara et al, "Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling," MICRO 2018, https://www.cs.cmu.edu/~beckmann/publications/papers/2018.micro.hats.pdf

·       Zhang et al, "GraphP: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition," HPCA2018, https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8327036

 

(4/24) Computational Storage:

·       Paper 1

·       Paper 2

(4/27) Project presentations 2