Research
|
My research currently focuses on specialized and
heterogeneous architectures and how to design architectures in the
presence of severe physical constraints, especially thermal, power
delivery, process variations, and wear-out. We are chiefly focusing
on these issues in the context of asymmetric and heterogeneous
designs, which provide the best balance between high single-thread
performance, high throughput for parallel tasks, and high
performance on critical functions. To address these challenges, we
are taking a variety of approaches.
Currently, our focus is in these areas:
- New processing-in-memory architectures, including Fulcrum in
HPCA'20, Gearbox in ISCA'22, Sieve in ISCA'21, DRAM-CAM in CAL'22, DRAM-AP (digital bit-serial PIM) in IISWC'24, our Sunder in-memory automata architecture in MICRO'21; and our PIMeval simulation framework, PIMbench benchmark suite, and PIM programming API, in IISWC'24
- New capabilities and programming models for regular-expression
and automata processing, including runtime monitoring using temporal logic, as part of the UVA
Center for Automata Processing (CAP). We have
developed novel architectures (e.g. Sunder, in MICRO'21, Impala,
in HPCA'20), FPGA implementations (eg., Grapefruit, in FCCM'20),
new applications (e.g., frequent-itemset mining in IPDPS'15
and sequential pattern mining in CF'16),
a new benchmark suite (AutomataZoo),
new modeling tools (e.g., MNCaRT)
, a new programming language for inexact pattern
matching (RAPID, in ASPLOS'16),
a new automata-based method for string kernels on FPGAs (IEEE Micro, Sep.-Oct. 2023),
acceleration for LTLf specifications (FMCAD'20),
and new debugging support (in ASPLOS'19).
- New heterogeneous architectures, including accelerators and
reconfigurable units (e.g., our work on crypto
processors), and new design-space exploration tools such
as Lumos to
help understand the right mixture of heterogeneous units
- Developing new programming abstractions to simplify
programming for heterogeneous systems, such as
our ICS'09/IJPP'11,
JPDC'13-Trellis,
and ASPLOS'16-RAPID
papers, and our PIM API in IISWC'24
We also continue to maintain several research tools we developed,
including:
In prior work, my group has:
- Explored scaling implications for power delivery (e.g., our
ISCA'14 paper) and fault tolerance (e.g., our IEEE
Micro'13 paper), explored new runtime reliability management
techniques to balance performance and wear-out (e.g. our IEEE
Micro'05 paper), and cope with transient faults (e.g. our
GH'06
and GH'07
papers for GPUs) and take advantage of coarse-grained
reconfigurable resources (e.g., our DATE'11
and CASES'11
papers). We also evaluated technology scaling limits and
implications (e.g., 2007 presentation
to the NRC CSTB study on "Sustaining
Growth in Computing Performance," our
paper in IEEE Micro on scaling with design constraints (preprint
pdf), and our "Implications of Dim Silicon paper (preprint
pdf)).
- Developed new power delivery modeling capabilities (e.g., see
our VoltSpot
ISCA'14 paper) and exploring new optimization techniques
(e.g., our ASP-DAC'14
and DAC'14
papers)
- Developed new, efficient cryptography accelerators (e.g., in the VLSI Journal)
- Developed new design-space exploration capabilities that
reduce simulation requirements, such as genetically programmed
response surfaces (e.g. our DAC'08
paper and software)
and Lumos.
- New cache organizations for many cores (e.g., our SC'10
and ICCD'09
papers), cache-conscious thread scheduling (e.g., our IPDPS'10
paper), and cache-conscious data layout for heterogeneous
systems (SC'11)
- Explored how to most effectively use texture, constant,
per-block shared memory, and other features that GPUs and GPU
languages such as CUDA provide (e.g., see our ACM
Queue'08, JPDC'08,
IPDPS'09,
and ICS'09/IJPP'11
papers)
- Developed new techniques to make SIMD architectures more
effective in the presence of irregular data structures or
irregular parallelism (see our IPDPS'12,
ISCA'10
and SC'09
papers)
- Developed the first publicly available architectural simulator
for GPUs, Qsilver
(see our GH'04
paper)
- Developed improved power map derivation using thermal maps
(e.g., our ICCD'10
paper)
- A new, pre-RTL floorplanning algorithm (see our ArchFP
software)
- New temperature sensing and thermal-management
capabilities (e.g. our ITEHRM'06
and follow-on IEEE
Trans. Computers papers; see also our ACM
Computing Surveys paper on dynamic thermal management and
"prolegomenon" in IEEE Micro), as well as novel
temperature-aware design techniques (e.g. our HPCA'06,
DAC'08,
and SEMI-THERM'10
papers)
- Described new techniques for coherence, including bypassing
coherence for private data and a simplified form of "sharing
tracker" coherence for avoiding refetch of shared,
read-only data
- Dynamic combination or "federation" of scalar cores
to support runtime variations in ILP and DLP (e.g. our DAC'08
and follow-on ACM
TACO papers)
- New, lightweight out-of-order execution techniques with much
better performance/mm2 and performance/watt (see our
ACM
TACO paper - lightweight OO was an enabling technique for
federation)
- Described new power management techniques, especially in the
context of real-time constraints, spanning a variety of
application types from multimedia (e.g. our Asilomar'06
paper) to multi-tier e-commerce workloads (e.g. our PACT'08
paper)
- Described a
new form of hybrid neural branch predictor
- Described power-management
techniques for branch prediction that do not impede
prediction accuracy or performance and shown the importance of
branch prediction for energy efficiency
- Evaluated the optimal energy-efficient scaling of microarchitectural
structure sizes for simultaneous multithreading (SMT)
- Evaluated
whether trace caches are energy efficient
- Explored how current power
and thermal management features may expose security
vulnerabilities
- Developed new reliability modeling capabilities (e.g.
our IEEE
TVLSI'07 paper)
- Shown the value of control theory in managing adaptive
hardware structures, including controlling
the DVS setting for a multimedia workload; setting
the decay interval for leakage-power management using cache
decay; and thermal
regulation
- New techniques for fast
and provably accurate warm-up when moving between many smaller
samples in cycle-accurate simulations
These research projects have stimulated several innovations in
our computer architecture courses, including the development of a
Microprocessor
Survey Course (also described in a paper
at SIGCSE), and we were one of the first departments to use
CUDA to
teach both concurrency and parallel architecture.
This work is currently supported by the National
Science Foundation under grant no. PPoSS-2217071 and CCF-2312740, grants from the Laboratory for Physical Sciences, grants from the SRC Global Research Corporation CADT program, and by PRISM, one of seven centers in JUMP2.0, a Semiconductor
Research Corporation program sponsored by MARCO and DARPA.
Prior support has come from the National
Science Foundation under grant nos. ITR-0082671, CCR-0133634
(CAREER), CCR-0105626, EIA-0224434, DOS-0306404, CCF-0429765,
CNS-0509245, CNS-0551630 (CRI), IIS-0612049, CNS-0615277,
CCF-0903471 (MCDA), CNS-0916908 (ARRA), CCF-1451571, EF-1124931
CCF-1116673, CCF-1629450 (XPS) and CCF-1619127; a grant from the joint NSF-Intel FoMR program, CCF-1912608; CRISP, one of six centers in JUMP 1.0 and C-FAR, one of six centers of STARnet, both programs sponsored by the Semiconductor Research Corporation program sponsored by MARCO and
DARPA; DARPA MTO (PERFECT program) under contract
HR0011-13-C-0022; DARPA IIO (VMR program) under contract
FA8750-12-C-0181; the Army
Research Office under grant no. W911NF-04-1-0288; grants from Brookhaven National Labs; the Semiconductor
Research Corporation under task no. 1607, 1972 and 2042; the
Virginia CIT CRCF program under grant MF14S-021-IT, MF16-032-IT
and MF17-027-IT; grants from AMD Research, Intel Research, NVIDIA
Research, NEC Labs, IBM
Research, and Xilinx; and an Excellence Award from the
University of Virginia Fund for Excellence in Science and
Technology (FEST). Additional support has been provided by
William A. Ballard Fellowships for John W. Haskins and David
Tarjan, a University of Virginia Award for Excellence in
Scholarship in the Sciences & Engineering for David Tarjan and
Jiayuan Meng, an ATI graduate fellowship for Jeremy Sheaffer, an
NVIDIA Ph.D. fellowship for Jiayuan Meng, a GRC/AMD Ph.D.
fellowship for Michael Boyer, and an ARCS Scholarship for Jack
Wadden. We have also received equipment donations and extended
loans from Micron, Xilinx,
NVIDIA, AMD, and Hewlett
Packard. The U.Va.
Center for Automata Processing (CAP) was supported in part
by Micron.
Please note that any opinions, findings, and conclusions or
recommendations expressed in this material are those of the
author(s) and do not necessarily reflect the views of the funding
agencies.
Current Graduate Students and Research Staff:
- Morteza Baradaran
- Zhenxing Fan
- Beenish Gul
- Mohammadhosein Gholemrezaei
- Deyuan Guo
- Khyati Kiyawat
- Abdullah Mughrabi (Postdoctoral research associate)
- Yasas Seneviratne, co-advised with Samira Khan
- Akhil Shekar
- Farzana Siddique
- Tom Tracy (Research Scientist)
Undergraduate and Masters Researchers
- Hugo Abbot
- William Bradford
- Raul Cancho
- Kyle Durrer
- Ethan Ermovic
- Reid Farmer
- Han Yan
- Ruiyang Zhou
Recently-graduated PhD Students
- Alif Ahmed, defended Oct. 2023, official diploma conferral Dec. 2023 - now with IBM Systems
- Oluwole Jaiyeoba, defended Aug. 2023, official diploma conferral Dec. 2023 - now with AMD Research
- Lingxi Wu, defended Sep. 2023, official diploma conferral Dec. 2023 - now with Black Sesame
- Marzieh Lenjani, May 2022 - now with Apple
- Elaheh Sadredini, May 2019 - now faculty at UC Riverside
- Complete list of LAVA alumni - postdocs, graduate students, undergraduate students
|
Selected Publications
Please note that papers linked here represent author
preprints. The official, published version must be
obtained from the publisher's website or the published print
copy. This material is presented here to ensure timely
dissemination of scholarly and technical work. Copyright and
all rights therein are retained by authors or by other
copyright holders. All persons copying this information
are expected to adhere to the terms and constraints invoked by
each document's copyright terms. In most cases, these works may
not be reposted without the explicit permission of the copyright
holder. Permission is given to make digital or hard copies of
all or part of this material without fee for personal or
classroom use, provided that the copies are not made or
distributed for profit or commercial advantage, and that copies
bear the appropriate copyright notice and the full bibliographic
citation on the first page. Copyrights for components of
this work owned by others must also be honored. To copy
otherwise, to republish, to post on servers, to redistribute to
lists, etc. requires specific permission and/or a fee. In
particular, permission to reprint/republish this material for
advertising or promotional purposes or for creating new
collective works for resale or redistribution to servers or
lists, or to reuse any copyrighted component of this work in
other works, must be obtained from the copyright owner.
Please note further that any opinions, findings,
conclusions, or recommendations expressed in this material are
those of the authors and do not necessarily reflect the views
of the sponsoring agencies, employers, or publishers.
|
A complete list of my publications can be found on my Google
Scholar Page (which will also provide
paper links) or DBLP
page
Recent
Highlights
(graph processing, multi-FPGA)
O. Jaiyeoba, A. Mughrabi, M. Baradaran, B. Gul, and K. Skadron. "Swift: A Multi-FPGA Framework for Scaling Up Accelerated Graph Analytics." In Proceedings of the IEEE International Conference on Field Programmable Technology (FPT), Dec. 2024, to appear. (pdf)
(processing in memory/PIM, simulation, benchmarks)
F. A. Siddique, D. Guo, Z. Fan, M. Gholamrezaei, M. Baradaran, A. Ahmed, H. Abbot, K. Durrer, E. Ermovick, K. Nandagopal, E. Ermovick, K. Kiyawat, B. Gul, A. Mughrabi, A. Venkat, and K. Skadron. "Architectural Modeling and Benchmarking for Digital DRAM
PIM." In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), Sep. 2024. (pdf)
(dynamic graph processing)
A. Ahmed, F. Siddique, and K. Skadron. "GraphTango: A Hybrid Representation Format for Efficient Streaming Graph Updates and Analysis." International Journal of Parallel Programming, Springer, to appear.
(dynamic graph processing, FPGA)
O. Jaiyeoba and K. Skadron. "Dynamic-ACTS - A Dynamic Graph Analytics Accelerator For HBM-Enabled FPGAs." ACM Transactions on Reconfigurable Technology and Systems, 17(3), 29 pp., Sept. 2024.
(processing in storage/computational storage, accelerators, bioinformatics)
L. Wu, M. Zhou, Weihong Xu, A. Venkat, T. Rosing, and K. Skadron. "Abakus: Accelerating k-mer Counting With Storage Technology." ACM Transactions on Architecture and Code Optimization (TACO), 21(1), 26 pp., Jan. 2024.
(graph processing, FPGA)
O. Jaiyeoba and K. Skadron. "ACTS: A Near-Memory FPGA Graph Processing Framework." Proceedings of the IEEE International Symposium on Field-Programmable Gate Arrays (FPGA), Feb. 2023.
(processing in memory/PIM, sorting)
M. Lenjani, A. Ahmed, and K. Skadron. "Pulley: An Algorithm/Hardware Co-optimization for In-memory Sorting." IEEE Computer Architecture Letters, 21(2):109-112, July-Dec. 2022.
-
(processing in memory/PIM)
L. Wu, R. Sharifi, A. Venkat, and K. Skadron. "DRAM-CAM: General-Purpose Bit-Serial Exact Pattern Matching." IEEE Computer Architecture Letters, 21(2):89-92, July-Dec. 2022.
(automata processing, FPGAs, heterogeneous architectures, accelerators)
K. Angstadt, T. Tracy, J.-B. Jeannin, K. Skadron, and W. Weimer. "Synthesizing Legacy String Code for FPGAs Using Bounded Automata Learning." IEEE Micro special issue on Compiling for Accelerators, 42(5):70-77, Sep.-Oct. 2022.
(microarchitecture, microcode, runtime optimization)
L. Moody, W. Qi, A. Sharifi, L. Berry, J. Rudek, J. Gaur, J. Parkhurst, S. Subramoney, K. Skadron, A. Venkat. "Speculative Code Compaction: Eliminating Dead Code via Speculative Microcode Transformations." In Proceedings of the ACM/IEEE International Symposium on Microarchitecture (MICRO), Oct. 2022.
(processing in memory/PIM)
M. Lenjani and K. Skadron. "Supporting Moderate Data Dependency, Position Dependency, and Divergence in PIM-based Accelerators." IEEE Micro special issue on Processing in Memory, published online Dec. 2021, in print Jan/Feb. 2022, 42(1):108-115. DOI 10.1109/MM.2021.3136189
-
(processing in memory/PIM) M. Lenjani, A. Ahmed, M. R. Stan, and K. Skadron. "Gearbox: A Case for Supporting Accumulation Dispatching and Hybrid Partitioning in PIM-based Accelerators." In Proceedings of the IEEE/ACM International Symposium on Computer Architecture (ISCA), June
2022.
-
(processing in memory/PIM, FPGA) S. Mosanu, M. N. Sakib, T. Tracy III, E. Cukurtas, A. Ahmed, P. Ivanov, S. Khan, K. Skadron, M. Stan. "PiMulator: A Fast and Flexible Processing-in-Memory Emulation Platform." In Proceedings of the ACM/IEEE/EDAA/EDAC Conference on Design, Automation and Test in Europe (DATE), Mar.
2022.
-
(automata processing,
heterogeneous architectures, accelerators) E.
Sadredini, R. Rahimi, M. Imani, and K. Skadron. “Sunder:
Enabling Low-Overhead and Scalable Near Data Pattern
Matching Acceleration.” In Proceedings of the IEEE/ACM
International Symposium on Microarchitecture (MICRO), Oct.
2021.
-
(processing in memory/PIM,
bioinformatics) M. Zhou, L. Wu, M. Li, N.
Moshiri, K. Skadron, and T. Rosing. “Ultra Efficient
Acceleration for De Novo Genome Assembly via Near-Memory
Computing.” In Proceedings of the ACM/IEEE/IFIP International
Conference on Parallel Architectures and Compiler Techniques
(PACT), Sept. 2021
(thermal modeling)
J. Han, R. E. West, K. Skadron, and M. R. Stan. "Thermal Simulation of Processing-in-Memory Devices Using HotSpot 7.0." In Proceedings of the 27th International Workshop on Thermal Investigations of ICs (THERMINIC), Sept. 2021.
-
(processing in memory/PIM,
bioinformatics) L. Wu, R. Sharifi, M.
Lenjani, K. Skadron, and A. Venkat. “Sieve: Scalable In-situ
DRAM-based Accelerator Designs for Massively Parallel k-mer
Matching.” In Proceedings of the ACM/IEEE International
Symposium on Computer Architecture (ISCA), June 2021.
-
(automata processing,
heterogeneous architectures, accelerators, FPGAs, linear
temporal logic, verification) T. Tracy II,
L. Tabajara, M. Vardi, and K. Skadron. “Runtime Verification
on FPGAs with LTLf Specifications.” In Proceedings of the
Formal Methods in Computer-Aided Design (FMCAD), Sept. 2020. (pdf)
-
(automata processing,
FPGAs, heterogeneous architectures, accelerators) R.
Rahimi, E. Sadredini, M. Stan, and K. Skadron. “Grapefruit: An
Open-Source, Full-Stack, and Customizable Automata Processing
on FPGAs.” In Proceedings of the IEEE International Symposium
on Field Customizable Computing Machines (FCCM), May 2020.
Nominated for best paper.
-
(automata processing,
heterogeneous architectures, accelerators) E.
Sadredini, R. Rahimi, M. Lenjani, M. Stan, and K. Skadron.
“FlexAmata: A Universal and Efficient Adaption of Applications
to Spatial Automata Processing Accelerators.” In Proceedings
of the ACM International Symposium on Architectural Support
for Programming Languages and Operating Systems (ASPLOS), Mar.
2020.
-
(automata processing,
heterogeneous architectures, accelerators) R.
Rahimi, E. Sadredini, M. Lenjani, M. Stan, and K. Skadron.
“Impala: Algorithm/Architecture Co-Design for In-Memory
Multi-Stride Pattern Matching.” In Proceedings of the IEEE
International Symposium on High Performance Computer
Architecture (HPCA), Feb. 2020. Nominated for best paper.
-
(processing in memory/PIM) M.
Lenjani, P. Gonzalez, E. Sadredini, S. Li, Y. Xie, A. Akel, S.
Eilert, M. R. Stan, and K. Skadron. “Fulcrum: a Simplified
Control and Access Mechanism toward Flexible and Practical
In-Situ Accelerators.” In Proceedings of the IEEE
International Symposium on High Performance Computer
Architecture (HPCA), Feb. 2020.
- (automata processing,
heterogeneous architectures, accelerators, on-chip
interconnect) E. Sadredini, R. Rahimi, V.
Verma, M. Stan, and K. Skadron. “eAP: A Scalable and Efficient
In-Memory Accelerator for In-Memory Processing.” In Proceedings
of the ACM/IEEE International Symposium on Microarchitecture
(MICRO), Oct. 2019.
-
(memory performance
benchmarking) A. Ahmed, K. Skadron.
“Hopscotch: A Micro-benchmark Suite for Memory
Performance Evaluation.” In Proceedings of the International
Symposium on Memory Systems (MEMSYS), Sep.-Oct. 2019.
-
(dynamic graph processing) O.
Jaiyeoba and K. Skadron. “GraphTinker: A High Performance Data
Structure for Dynamic Graph Processing.” In Proceedings
of the IEEE International Parallel and Distributed
Processing Symposium, May 2019.
- (automata processing,
FPGAs, heterogeneous architectures, accelerators) C.
Bo, V. Dang, T. Xie, J. Wadden, M. Stan, and K. Skadron.
"Automata Processing in Reconfigurable Architectures:
In-the-cloud Deployment, Cross-platform Evaluation, and Fast
Symbol-only Reconfiguration." ACM Transactions on
Reconfigurable Technology and Systems (TRETS), 12(2), May
2019, DOI 10.1145/3314576.
-
(automata processing,
heterogeneous architectures, accelerators, debugging) M.
Casias, K. Angstadt, T. Tracy II, K. Skadron, and W.
Weimer. “Debugging Support for Pattern-Matching
Languages and Accelerators.” In Proceedings of the
ACM International Conference on Architectural Support for
Programming Languages and Operating Systems (ASPLOS),
Apr. 2019.
-
(automata processing,
heterogeneous architectures, accelerators) K.
Angstadt, A. Subramaniyan, E. Sadredini, R. Rahimi, W. Weimer,
K. Skadron, and R. Das. “ASPEN: A Scalable In-SRAM
Architecture for Pushdown Automata.” In Proceedings
of the ACM/IEEE International Symposium on Microarchitecture
(MICRO), Oct. 2018.
-
(automata processing,
heterogeneous architectures, accelerators, benchmarks) J.
Wadden, T. Tracy II, E. Sadredini, L. Wu, C. Bo, J. Du,** Y.
Zhou, M. Wallace,* J. Udall,* M. Stan, and K. Skadron.
“AutomataZoo: A Modern Automata Processing Benchmark Suite.”
In Proceedings of the IEEE International Symposium on
Workload Characterization (IISWC), Oct. 2018.
-
(automata processing,
heterogeneous architectures, accelerators, natural
language processing) E. Sadredini, D. Guo,
C. Bo, R. Rahimi, K. Skadron, and H. Wang. “A Scalable
Solution for Rule-Based Part-of-Speech Tagging on Novel
Hardware Accelerators.” In Proceedings of the ACM
SIGKDD Conference on Knowledge Discovery and Data Mining
(KDD), Applied Data Science track, full paper with
poster presentation, Aug. 2018.
-
(automata processing,
heterogeneous architectures, accelerators, simulation) K.
Angstadt, J. Wadden, V. Dang, T. Xie, D. Kramp, W. Weimer, M.
Stan, and K. Skadron. “MNCaRT: An Open-Source,
Multi-Architecture Automata-Processing Research and Execution
Ecosystem.” IEEE Computer Architecture Letters,
published online Dec. 2017, published in print 17(1):84-87,
Jan.-June 2018. DOI 10.1109/LCA.2017.2780105.
-
(automata processing,
heterogeneous architectures, accelerators) J.
Wadden, K. Angstadt, and K. Skadron. “Characterizing and
Mitigating Output Reporting Bottlenecks in
Spatial-Reconfigurable Automata Processing Architectures.” In
Proceedings of the IEEE International Symposium on High
Performance Computer Architecture (HPCA), Feb. 2018.
-
(automata processing,
heterogeneous architectures, accelerators, bioinformatics)
C. Bo, V. Dang, E. Sadredini, K.
Skadron. “Searching for Potential gRNA Off-Target Sites
for CRISPR/Cas9 using Automata Processing across Different
Platforms.” In Proceedings of the IEEE International Symposium
on High Performance Computer Architecture (HPCA), Feb. 2018.
-
(automata processing,
FPGAs, heterogeneous architectures, accelerators) T.
Xie, V. Dang, J. Wadden, K. Skadron, and M. R. Stan. “REAPR:
Reconfigurable Engine for Automata Processing.” In Proceedings
of the International Conference on Field-Programmable Logic
and Applications (FPL), Sept. 2017. (pdf)
-
(automata processing,
heterogeneous architectures, accelerators, frequent tree
mining, pattern mining) E. Sadredini, K.
Wang, and K. Skadron. “Frequent Subtree Mining on the Automata
Processor: Challenges and Opportunities.” In Proceedings
of the ACM International Conference on Supercomputing (ICS),
June 2017. (pdf)
-
(automata processing,
heterogeneous architectures, spatial architectures,
accelerators, routing) J. Wadden, Samira
Khan, and K. Skadron. “Automata-to-Routing: An Open-Source
Toolchain for Design-Space Exploration of Spatial Automata
Processing Architectures.” In Proceedings of the IEEE
International Symposium on Field-Programmable Custom
Computing Machines (FCCM), Apr. 2017. (pdf)
-
(automata
processing, heterogeneous architecture, accelerators,
association rule mining, frequent itemset mining) K.
Wang, E. Sadredini, and K. Skadron. "Hierarchical
Pattern Mining with the Automata Processor." International
Journal of Parallel Programming, Springer, published
online Jan. 2017. DOI:10.1007/s10766-017-0489-y
(preprint
pdf)
- (automata processing,
heterogeneous architectures, accelerators, entity
resolution) C. Bo, K. Wang, J. Fox, and K.
Skadron. “Entity Resolution Acceleration using Micron’s
Automata Processor. In Proceedings of the 2016 IEEE
International Conference on Big Data (BigData), Dec.
2016. (pdf)
-
(automata processing,
heterogeneous architectures, accelerators, programming
languages, compilers) K. Angstadt, W. Weimer, and K. Skadron. "RAPID Programming
of Pattern-Recognition Processors." In Proceedings of the
ACM International Symposium on Architectural Support for
Programming Languages and Operating Systems (ASPLOS),
Apr. 2016. (pdf
| software)
Highlights from Prior Work
(specialized architectures, encryption)
X. Guo, M. El-Hadedy, S. Mosanu, X. Wei, K. Skadron, and M. R. Stan. "Implementation of Configurable AES Primitive with Agile Design Approach." Integration, the VLSI Journal, Elsevier, 85(C):87-96, July 2022.
-
(fuzzing, security) A.
Ahmed, J. Hiser, A. Nguyen-Tuong, J. W. Davidson, and K.
Skadron. “BigMap: Future-proofing Fuzzers with Efficient Large
Maps.” In Proceedings of the IEEE/IFIP International
Conference on Dependable Systems and Networks (DSN), June
2021.
- (power delivery, IR
drop, thermal) S. Rahimipour, Runjie Zhang, Ke Wang,
K. Skadron, F. Z. Rohkani, and M. Stan. "MTTF Enhancement Power-C4 Bump Placement Optimization." IEEE
Transactions on Very Large Scale Integration Systems (TVLSI),
27(7):1633-39, July 2019, DOI 10.1109/TVLSI.2019.2904048.
-
(specialized architectures, encryption) M. El-Hadedy, H.
Mihajloska, D. Gligoroski, A. Kulkarni, D. Stroobandt and K.
Skadron. "A 16-bit Reconfigurable Encryption Processor for
Pi-Cipher." In Proceedings of the 23rd Reconfigurable
Architectures Workshop (RAW), in conjunction with IPDPS, May
2016. Best paper award! (pdf)
- (specialized
architectures, transpose memory, signal processing) M.
El-Hadedy, X. Guo, M. Margala, M. R. Stan, and K. Skadron.
“Dual-Data Rate Transpose-Memory Architecture Improves the
Performance, Power and Area of Signal-Processing Systems.”
Journal of Signal Processing Systems, Springer,
published online Nov. 2016. DOI 10.1007/s11265-016-1199-1.
-
(automata
processing, heterogeneous architecture, accelerators,
association rule mining) K. Wang, E. Sadredini, and
K. Skadron. "Sequential Pattern Mining with the Micron
Automata Processor." In Proceedings of the ACM
International Conference on Computing Frontiers, May
2016. Best paper award! (pdf)
-
(benchmark suites,
accelerators) G. Juckeland, W. Brantley, S.
Chandrasekaran, B. Chapman, S. Che, M. Colgrove, H. Feng, A.
Grund, R. Henschel, W-M. W. Hwu, H. Li, M. S. Mueller, M.
Perimov, P. Shelepugin, K. Skadron, J. Stratton, A. Titov, K.
Wang, M. van Waveren, B. Whitney, S. Wienke, R. Xu, and K.
Kumaran. "SPEC ACCEL: A Standard Application Suite for
Measuring Hardware Accelerator Performance." In Proceedings of the Fifth International Workshop on Performance
Modeling, Benchmarking and Simulation of High Performance
Computer Systems (PMBS14), in conjunction with SC, Nov.
2014.
-
(heterogeneous architectures,
reconfigurable logic, design space exploration) L.
Wang and K. Skadron. "Lumos+: Rapid, Pre-RTL Design Space
Exploration on Accelerator-Rich Heterogeneous Architectures
with Reconfigurable Logic." In Proceedings of the
IEEE International Conference on Computer Design (ICCD),
Oct. 2016. (pdf)
- (power
delivery, thermal) K. Wang, K. Skadron, and M. R.
Stan. “ Closing the Power Delivery/Heat Removal Cycle for
Heterogeneous Multi-Scale Systems.” In Proceedings of
the 22nd International Workshop on Thermal
Investigations of ICs (THERMINIC), Sept. 2016 (pdf).
-
(power
delivery, transient voltage noise) R. Zhang, K.
Mazumder, B. H. Meyer, K. Wang, K. Skadron, and M. R. Stan.
"Transient Voltage Noise in Charge-Recycled Power
Delivery Networks for Many-Layer 3D-IC."
In Proceedings
of the ACM/IEEE International Symposium on Low Power
Electronics and Design (ISLPED), July 2015. (pdf)
-
(GPGPU,
reliability, redundant multithreading) J.
Wadden, A. Lyashevsky, S. Gurumurthi, V. Sridharan, and K.
Skadron. "Real-World Design and Evaluation of
Compiler-Managed GPU Redundant Multithreading." In Proceedings
of the ACM/IEEE International Symposium on Computer
Architecture, June 2014. (pdf)
-
(power
delivery, transient voltage noise) K.
Wang, R. Zhang, B. H. Meyer, M. R. Stan, and K. Skadron.
"Managing C4 Placement for Transient Voltage Noise
Minimization." In Proceedings of the ACM/IEEE Conference
on Design Automation (DAC), June 2014 (please download
from this
listing).
-
(power
delivery, IR drop) K.
Wang, R. Zhang, B. H. Meyer, K. Skadron, and M. R. Stan.
"Walking Pads: Fast Power-Supply Pad-Placement
Optimization." In Proceedings of the ACM/IEEE Asia and
South Pacific Design Automation Conference (ASP-DAC),
Jan. 2014. Best paper candidate. (preprint
pdf)
-
(heterogeneous
architecture, dark silicon) L. Wang and K. Skadron.
"Implications of the Power Wall: Dim Cores and Reconfigurable
Logic." IEEE Micro special issue on Dark Silicon,
33(5): 40-49, Sept.-Oct. 2013.DOI
10.1109/MM.2013.74.
(preprint
pdf | Lumos
software)
-
(gpgpu,
accelerators, heterogeneous architecture, memory,
performance portability) S.
Che, J. Meng, and K. Skadron. "Dymaxion++: A
Directive-based API to Optimize Data Layout and Memory
Mapping for Heterogeneous Systems." In Proceedings
of the Fourth International Workshop on Accelerators and
Hybrid Exascale Systems, in conjunction with IPDPS,
May 2014. (pdf)
-
(performance
prediction, gpgpu, accelerators, manycore, heterogeneous
architecture, benchmarks) S.
Che and K. Skadron. "BenchFriend: Correlating the
Performance of GPU Benchmarks." Sage International
Journal of High Performance Computing Applications
(IJHPCA), 28(2):236-248, May 2014.(preprint
pdf)
-
(gpgpu,
accelerators, manycore, heterogeneous architecture,
benchmarks) S.
Che. B. M. Beckmann, S. K. Reinhardt, and K. Skadron,
"Pannotia: A Characterization of GPGPU Graph Applications,
In Proceedings of the IEEE International Symposium on
Workload Characterization (IISWC), Sept. 2013. (pdf)
-
(gpgpu,
accelerators, manycore, heterogeneous architecture,
programming models and frameworks) L. Szafaryn, T.
Gamblin, B. R. de Supinski, and K. Skadron. "Trellis:
Portability Across Architectures with a High-level Framework."
Elsevier Journal of Parallel and Distributed Computing,
73(10):1400-13, Oct. 2013. (First published online July,
2013.) DOI 10.1016/j.jpdc.2013.07.001.
(preprint
pdf)
-
(reliability)
L. Szafaryn, B. H.
Meyer, and K. Skadron. "Evaluating the Overheads of Soft
Error Protection Mechanisms in the Context of Multi-bit
Errors at the Scope of a Processor Core." IEEE Micro special
issue on Reliability Aware Design, 33(4):56-65, July-Aug.
2013. DOI 10.1109/MM.2013.68.
(preprint
pdf)
-
(CPU-GPU
load balancing) M.
Boyer, K. Skadron, S. Che, N. Jayasena. "Load Balancing in a
Changing World: Dealing with Heterogeneity and Performance
Variability. In Proceedings of the ACM Conference on
Computing Frontiers, May 2013. (pdf)
-
(thermal)
K. Sankaranarayanan, B. H. Meyer, W. Huang, R. J.
Ribando, H. Haj-Hajiri, M. R. Stan, and K. Skadron.
"Architectural Implications of Spatial Thermal
Filtering." Elsevier Integration, the VLSI Journal,
46(1):44-56, Jan. 2013.DOI 10.1016/j.vlsi.2011.12.002
(preprint
pdf)
-
(floorplanning,
thermal, power delivery) G.
Faust, R. Zhang, K. Skadron, M.R. Stan, and B. Meyer.
"ArchFP: Rapid Prototyping of pre-RTL Floorplans." In Proceedings
of the IFIP/IEEE International Conference on Very Large
Scale Integration (VLSI-SoC), Oct. 2012. (pdf
| ArchFP
software)
-
(SIMD,
cache, branch divergence) J.
Meng, J. W. Sheaffer, and K. Skadron. "Robust SIMD:
Dynamically Adapted SIMD Width and Multi-Threading
Depth." In Proceedings of the IEEE International
Parallel & Distributed Processing Symposium (IPDPS),
May 2012. (pdf)
-
(datacenter
co-scheduling, cache) J.
Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa.
"BubbleUp: Increasing Sensible Co-locations for Improved
Utilization in Modern Warehouse Scale Computers." In
Proceedings of the ACM/IEEE International Symposium on
Microarchitecture (MICRO), Dec. 2011. (pdf)
Also in IEEE Micro "Top Picks from 2011 Architecture
Conferences."
-
(gpgpu,
accelerators, heterogeneous architecture, memory,
performance portability) S.
Che, J. W. Sheaffer, and K. Skadron. "Dymaxion:
Optimizing Memory Access Patterns for Heterogeneous
Systems." In Proceedings of the ACM/IEEE
International Conference for High Performance Computing,
Networking, Storage and Analysis (SC), Nov. 2011. (pdf)
-
(reliability,
fault tolerance, redundant execution) B.
Meyer, B. Calhoun, J. Lach, and K. Skadron.
"Cost-effective Safety and Fault Localization using
Distributed Temporal Redundancy." In Proceedings
of the ACM/IEEE International Conference on Compilers,
Architectures, and Synthesis for Embedded Systems (CASES),
Oct. 2011. (pdf)
-
(thermal,
power, scaling, dark silicon) W.
Huang, K. Rajamani, M. R. Stan, and K. Skadron.
"Scaling with Design Constraints -- Predicting the Future of
Big Chips." IEEE Micro special issue on Big
Chips, 31(4):16-29, July/Aug. 2011, DOI 10.1109/MM.2011.42.
(preprint
pdf)
-
(manycore,
GPU architecture, power) M.
Gebhart, D. R. Johnson, D. Tarjan, S. W. Keckler, W. J.
Dally, E. Lindholm, and K. Skadron. "Energy-efficient
Mechanisms for Managing Thread Context in Throughput
Processors." In Proceedings of the ACM/IEEE
International Symposium on Computer Architecture (ISCA), June
2011. (pdf)
-
(gpgpu,
accelerators, manycore, heterogeneous architecture,
benchmarks) S. Che,
J. W. Sheaffer, M. Boyer, L. G. Szafaryn, L. Wang, and K.
Skadron. "A Characterization of the Rodinia Benchmark
Suite with Comparison to Contemporary CMP Workloads."
In Proceedings of the IEEE International Symposium on
Workload Characterization (IISWC), Dec. 2010. (pdf)
-
(manycore,
dynamic cores) M.
Boyer, D. Tarjan, K. Skadron. "Federation: Boosting
Per-Thread Performance of Throughput-Oriented Manycore
Architectures." ACM Transactions on Architecture
and Code Optimization (TACO), 7(4):1-38, Dec. 2010,
DOI 10.1145/1880043.1880046.
(preprint
pdf)
-
(manycore,
cache, coherence) D.
Tarjan and K. Skadron. "The Sharing Tracker: Using
Ideas from Cache Coherence Hardware to Reduce Off-Chip
Memory Traffic with Non-Coherent Caches." In Proceedings
of theACM/IEEE
International Conference for High Performance Computing,
Networking, Storage and Analysis (SC),
Nov. 2010. (pdf)
-
(SIMD,
cache, branch divergence) J.
Meng, D. Tarjan, and K. Skadron. "Dynamic Warp
Subdivision for Integrated Branch and Memory Divergence
Tolerance." In Proceedings of the 37th ACM/IEEE
International Symposium on Computer Architecture, June
2010. (pdf)
-
(manycore,
cache, coherence) J.
Meng and K. Skadron "Avoiding Cache Thrashing due to Private
Data Placement in Last-Level Cache for Manycore
Scaling." In Proceedings of the IEEE International
Conference on Computer Design (ICCD), pp. 282-88, Oct.
2009. (pdf)
-
(power,
real-time, control theory) T. Horvath and K.
Skadron. "Multi-mode Energy Management for Multi-tier
Server Clusters." In Proceedings of the
ACM/IEEE/IFIP International Conference on Parallel
Architectures and Compilation Techniques (PACT), pp.
270-79, Oct. 2008. (preprint
pdf)
-
(thermal)
W. Huang, K. Sankaranarayanan, K. Skadron, R. J.
Ribando, and M. R. Stan. "Accurate, Pre-RTL
Temperature-Aware Processor Design Using a Parameterized,
Geometric Thermal Model." IEEE Transactions on
Computers, 57(9):1277-88, Sept. 2008, DOI
10.1109/TC.2008.64. (pdf)
-
(gpgpu,
fpga, accelerators, heterogeneous architecture) S.
Che, J. Li, J. W. Sheaffer, K. Skadron, and J. Lach.
"Accelerating Compute Intensive Applications with GPUs and
FPGAs." In Proceedings of the IEEE Symposium on
Application Specific Processors (SASP), pp.
101-07, June 2008. (pdf)
-
(simulation
methodology) H. Cook and K. Skadron. "Predictive
Design Space Exploration Using Genetically Programmed Response
Surfaces." In Proceedings of the ACM/IEEE Conference on
Design Automation (DAC), June 2008. (pdf)
-
(gpgpu)
J.
Nickolls, I. Buck, M. Garland, K. Skadron. "Scalable
Parallel Programming with CUDA." ACM Queue,
6(2):40-53, Mar.-Apr. 2008. DOI
10.1145/1365490.1365500 (pdf)
-
(parameter
variations, multicore, thermal, power, leakage) E.
Humenay, D. Tarjan, and K. Skadron. "Impact of Process
Variations on Multicore Performance Symmetry." In Proceedings
of the 2007 Conference on Design, Automation and Test in
Europe (DATE), pp. 1653-58, Apr. 2007. (pdf)
-
(reliability,
thermal) Z. Lu, W. Huang, M. Stan, K. Skadron, and
J. Lach. "Interconnect Lifetime Prediction for
Reliability-Aware Systems." IEEE Transactions on
VLSI Systems, 15(2):159-72, Feb. 2007. (pdf)
-
(branch
prediction, trace cache, power) M. Co, D. A.B.
Weikle, and K. Skadron. "Evaluating Trace Cache Energy
Efficiency." ACM Transactions on Architecture and Code
Optimization (TACO), 3(4):450-76,
Dec. 2006. (Abstract
| pdf)
-
(graphics
architecture, reliability) J. W. Sheaffer, D. P.
Luebke, and K. Skadron. "The Visual Vulnerability Spectrum:
Characterizing Architectural Vulnerability for Graphics
Hardware." In Proceedings of Eurographics/ACM Graphics
Hardware 2006 (GH), pp. 9-16, Sept. 2006. (pdf)
-
(thermal)
S. W. Chung and K. Skadron. "Using on-Chip Event
Counters for High-Resolution, Real-Time Temperature
Measurements." In Proceedings of the IEEE/ASME Tenth
Intersociety Conference on Thermal and Thermomechanical
Phenomena in Electronic Systems (ITHERM), June 2006. (pdf)
- (thermal) W.
Huang, S. Ghosh, S. Velusamy, K. Sankaranarayanan, K. Skadron,
and M. R. Stan. “HotSpot: A Compact Thermal Modeling
Methodology for Early-Stage VLSI Design.” IEEE
Transactions on Very Large Scale Integration (VLSI) Systems,
14(5):501-513, May 2006. (pdf)
Google
Scholar Classic Paper in in Computer Hardware Design for
2006 (3rd most cited paper).
-
(multi-core
architecture, power, thermal) Y. Li, B. C. Lee, D.
Brooks, Z. Hu, and K. Skadron. "CMP Design Space
Exploration Subject to Physical Constraints." In Proceedings
of the Twelfth IEEE
International Symposium on High Performance Computer
Architecture (HPCA), pp. 15-26, Feb. 2006. (pdf)
-
(branch
prediction) D. Tarjan and K. Skadron. "Merging Path
and Gshare Indexing in Perceptron Branch Prediction." ACM
Transactions on Architecture and Code Optimization,
Sept. 2005, 2(3):280-300. (pdf)
-
(thermal,
security) P. Dadvar
and K. Skadron. "Potential Thermal Security
Risks." In Proceedings of the IEEE Semiconductor
Thermal Measurement, Modeling, and Management Symposium
(Semi-Therm 21), pp. 229-34, Mar. 2005. (pdf)
-
(thermal)
K.
Skadron, K. Sankaranarayanan, S. Velusamy, D. Tarjan, M.R.
Stan, and W. Huang. "Temperature-Aware
Microarchitecture: Modeling and Implementation." ACM
Transactions on Architecture and Code Optimization,
1(1):94-125, Mar. 2004.
(pdf)
-
(leakage
power) Y. Li, D.
Parikh, Y. Zhang, K. Sankaranarayanan, M. R. Stan, and K.
Skadron. "State-Preserving vs. Non-State-Preserving
Leakage Control in Caches." In Proceedings of the
2004 Design, Automation and Test in Europe (DATE)
Conference, pp. 22-27, Feb. 2004. (pdf)
[HotLeakage
software home page]
- (power, real-time)
V. Sharma, A. Thomas, T.
Abdelzaher, Z. Lu, and K. Skadron. "Power-Aware QoS
Management on Web Servers." In Proceedings of the
24th International Real-Time Systems Symposium, pp.
63-72, Dec. 2003. (pdf)
(Best student paper!)
-
(branch
prediction) Z. Lu, J.
Lach, M. Stan, and K. Skadron. "Alloyed Branch
History: Combining Global and Local Branch History for
Robust Performance." International Journal of Parallel
Programming, Kluwer, 31(2):137-77, Apr. 2003. (pdf
| Abstract)
-
(write
buffers) K. Skadron and
D.W. Clark. "Design Issues and Tradeoffs for Write Buffers."
In Proceedings of the Third International Symposium on
High-Performance Computer Architecture, pp. 144-55,
February 1997. (postscript
| pdf
| abstract)
Complete
list of Skadron's publications through 2017;
for a complete list of my publications, please see my Google Scholar page.
|