Reading List for CS
6501, Special Topics in Computer Architecture: CPU/GPU Memory Systems and
Near-Data Processing
Spring 2025
Kevin Skadron
University of Virginia
(2/27)
DRAM and Rowhammer:
· H. Nam, et al. "DRAMScope: Uncovering DRAM Microarchitecture and
Characteristics by Issuing Memory Commands." ISCA 2024. https://ieeexplore.ieee.org/document/10609668
(This paper explores many DRAM implementation details, partly through the lens
of understanding Rowhammer attacks. Which
necessitates the next paper!)
· O. Mutlu and J. Kim.
"RowHammer: A Retrospective." arXiv, 2019. https://arxiv.org/abs/1904.09724
(3/4)
GPUs:
· Lee et al.,
"Debunking the 100X GPU vs. CPU myth: an evaluation of throughput
computing on CPU and GPU", ISCA 2010, https://dl.acm.org/doi/10.1145/1815961.1816021
· Chatterjee et al.
"Architecting an Energy-Efficient DRAM System for GPUs", HPCA 2017, https://www.cs.utexas.edu/~skeckler/pubs/HPCA_2017_Subchannels.pdf
(3/6)
Some important accelerators to know about – TPU, FPGA…
· Jouppi et al,
"In-Datacenter Performance Analysis of a Tensor Processing Unit,"
ISCA 2017. https://dl.acm.org/doi/10.1145/3079856.3080246
· Cong et al,
"Understanding Performance Differences of FPGAs and GPUs," FCCM 2018.
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=845763
(3/18)
More important accelerators to know about – Xeon Phi (Intel’s GPU-like
competitor) and Google’s Protobuf accelerator
· Sodani et al, "Knights
Landing: Second-Generation Intel Xeon Phi Product," IEEE Micro
2016, https://ieeexplore.ieee.org/abstract/document/7453080
· Karandikar et al, "A
Hardware Accelerator for Protocol Buffers," MICRO'21, https://dl.acm.org/doi/10.1145/3466752.3480051
(3/20)
PIM 1, Newton and Aquabolt:
· M. He et al.
"Newton: A DRAM-maker’s Accelerator-in-Memory (AiM)
Architecture for Machine Learning," MICRO'20. https://ieeexplore.ieee.org/document/9251855
· S. Lee et al.
"Hardware Architecture and Software Stack for PIM Based on Commercial DRAM
Technology : Industrial Product" ISCA'21. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9499894
(3/25)
PIM 2, SIMDRAM and Gearbox/Fulcrum:
· Hajinazar et al, SIMDRAM,
ASPLOS’21 – we will read a slightly extended version on arXiv:
https://arxiv.org/pdf/2105.12839.pdf
· Lenjani et al, Gearbox,
ISCA'22 - http://www.cs.virginia.edu/~ml2au/papers/GearboxISCAFinalVersion.pdf
(3/27)
PIM+ML:
· Zhou et al. "TransPIM: A Memory-based Acceleration via Software-Hardware
Co-Design for Transformer," HPCA'22, https://ieeexplore.ieee.org/document/9773212
· He et al. PAPI:
Exploiting Dynamic Parallelism in Large Language Model Decoding with a
Processing-In-Memory-Enabled Computing System,” to appear in ASPLOS 2025 https://arxiv.org/pdf/2502.15470
(4/1)
PIM 3, UPMEM:
· Gómez-Luna et al.
“Benchmarking Memory-Centric Computing Systems: Analysis of Real
Processing-in-Memory Hardware”, IGSC Oct. 2021, https://people.inf.ethz.ch/omutlu/pub/Benchmarking-Memory-Centric-Computing-Systems_cut21.pdf
· Ortega et al. “PIM-AI:
A Novel Architecture for High-Efficiency LLM Inference,”
https://arxiv.org/abs/2411.17309
(4/3)
Stream processing:
· Kapasi et al, "The
Imagine Stream Processor", ICCD 2002. https://ieeexplore.ieee.org/abstract/document/1106783
· Dally et al.
"Merrimac: Supercomputing with Streams." SC 2003. https://dl.acm.org/doi/abs/10.1145/1048935.105018
(4/8)
Non-volatile memory technologies:
·
TBD
(4/10)
Non-volatile memory technologies, cont.:
·
TBD
(4/15)
CGRAs:
· Ansaloni et al, "EGRA: A
Coarse Grained Reconfigurable Architectural Template," TVLSI 2011 - https://uweb.engr.arizona.edu/~ece506/readings/project-reading/3-coarse-grain/EGRA.pdf
· Prabhakar et al,
"Plasticine: A Reconfigurable Architecture For Parallel Patterns,"
ISCA 2017 - https://stanford-ppl.github.io/website/papers/isca17-raghu-plasticine.pdf
(4/17)
SmartNICs:
·
Lin et al, "PANIC: A High-Performance Programmable NIC for
Multi-tenant Networks," OSDI'20, https://wisr.cs.wisc.edu/papers/osdi20-panic.pdf
·
Shashidhara et al, "FlexTOE: Flexible TCP Offload with Fine-Grained
Parallelism," NSDI 2022, https://www.usenix.org/system/files/nsdi22-paper-shashidhara.pdf
(4/22) Graph Analytics:
·
Mukkara et al,
"Exploiting Locality in Graph Analytics through Hardware-Accelerated
Traversal Scheduling," MICRO 2018, https://www.cs.cmu.edu/~beckmann/publications/papers/2018.micro.hats.pdf
·
Zhang et al, "GraphP: Reducing
Communication for PIM-based Graph Processing with Efficient Data
Partition," HPCA2018, https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8327036
(4/24) Computational Storage:
·
Paper 1
·
Paper 2
(4/27) Project presentations 2