Professor Kevin Skadron

	"Sequentiality is an illusion" Kevin Skadron Harry Douglas Forsyth Professor of Computer Science Department of Computer Science School of Engineering and Applied Science University of Virginia 85 Engineer's Way, Rice Hall, Box 400740 Charlottesville, VA 22904-4740 Office: Rice 421 Phone: (434) 982-2042 Fax: (434) 982-2214 skadron (ampersand) virginia.edu

(classes \| bio \| note to grad-student/summer-intern applicants \| research summary \| selected publications \| software)

Areas of Interest	Computer architecture, especially pertaining to novel heterogeneous processor organizations, accelerator architecture, processing in memory, and automata processing.

Biographical Sketch	evin Skadron has been on the faculty at University of Virginia since 1999. He received his B.S. in Electrical and Computer Engineering and B.A. in Economics from Rice University in 1994, and his Ph.D. in Computer Science from Princeton University in 1999. He spent the 2007-08 academic year on sabbatical at NVIDIA Research. He served as department chair from 2012-2021. He also helped found and serves as director for the UVA Center for Automata Processing (CAP) and served as the SRC/DARPA-funded Center for Research on Intelligent Storage and Processing in Memory (CRISP), part of the JUMP 1.0 program. He is also a member of the Center for Research on Processing in Storage and Memory (PRISM), part of the JUMP 2.0 program. Skadron is the recipient of the 2023 SRC/SIA University Research Award for lifetime research contributions to the U.S. semiconductor industry, the 2011 ACM SIGARCH Maurice Wilkes Award, and a Fellow of the IEEE and ACM. For the year 2003-04, he was named a University of Virginia Teaching Fellow. Among other professional activities, he is co-founder and editorial board member of IEEE Computer Architecture Letters, for which he served as associate editor-in-chief from 2001-2009 and editor-in chief from 2010-2012, and as associate editor from 2012-2016. He served on the editorial board of IEEE Micro from 2004-2012 and as co-founder/co-editor (with Kevin Rudd) of its "Prolegomena" column, as secretary-treasurer of ACM's SIGARCH from 2007-2011, as technical program co-chair of PACT 2006, general co-chair for PACT 2002 and MICRO-37, and on numerous technical program committees.

Note to postdoc, graduate student and summer-intern applicants	International summer-intern requests: Due to visa complexities, undergraduate summer interns from outside the US are typically not feasible. I get a very large number of these requests; please understand that I generally cannot respond personally. Inquiries from prospective graduate students: Due to the large number of these inquiries, please understand that I am not able to respond to form letters. But I am always happy to discuss mutual research interests! Potential applicants may also want to read more about my advising philosophy.

Teaching	Over the last several semesters I have taught: Spring 2025: CS 6501, Special Topics in Computer Architecture: CPU/GPU Memory Systems and Near-Data Processing Spring 2024, Spring 2023: CS 6501: Special Topics in Computer Architecture: Hardware Accelerators Fall 2024: CS 3130: Computer Systems and Organizaiton 2 (CSO2) (with Prof. Reiss, who served as lead instructor) Fall 2023, Fall 2022: CS 4414: Operating Systems

Research	My research currently focuses on specialized and heterogeneous architectures and how to design architectures in the presence of severe physical constraints, especially thermal, power delivery, process variations, and wear-out. We are chiefly focusing on these issues in the context of asymmetric and heterogeneous designs, which provide the best balance between high single-thread performance, high throughput for parallel tasks, and high performance on critical functions. To address these challenges, we are taking a variety of approaches. Currently, our focus is in these areas: New processing-in-memory architectures, including Fulcrum in HPCA'20, Gearbox in ISCA'22, Sieve in ISCA'21, DRAM-CAM in CAL'22, DRAM-AP (digital bit-serial PIM) in IISWC'24, our Sunder in-memory automata architecture in MICRO'21; and our PIMeval simulation framework, PIMbench benchmark suite, and PIM programming API, in IISWC'24 New capabilities and programming models for regular-expression and automata processing, including runtime monitoring using temporal logic, as part of the UVA Center for Automata Processing (CAP). We have developed novel architectures (e.g. Sunder, in MICRO'21, Impala, in HPCA'20), FPGA implementations (eg., Grapefruit, in FCCM'20), new applications (e.g., frequent-itemset mining in IPDPS'15 and sequential pattern mining in CF'16), a new benchmark suite (AutomataZoo), new modeling tools (e.g., MNCaRT) , a new programming language for inexact pattern matching (RAPID, in ASPLOS'16), a new automata-based method for string kernels on FPGAs (IEEE Micro, Sep.-Oct. 2023), acceleration for LTLf specifications (FMCAD'20), and new debugging support (in ASPLOS'19). New heterogeneous architectures, including accelerators and reconfigurable units (e.g., our work on crypto processors), and new design-space exploration tools such as Lumos to help understand the right mixture of heterogeneous units Developing new programming abstractions to simplify programming for heterogeneous systems, such as our ICS'09/IJPP'11, JPDC'13-Trellis, and ASPLOS'16-RAPID papers, and our PIM API in IISWC'24 We also continue to maintain several research tools we developed, including: The MNCaRT automata and regular expression processing toolset The Rodinia benchmark suite of applications with both optimized GPU and multicore-CPU implementations of a diverse set of applications (see our IISWC'10, IISWC'09 and JPDC'08 papers and ASPLOS 2010 tutorial) -- also support for the SPECaccel benchmark suite, which incorporates over half of the Rodinia suite. HotSpot (e.g. our IEEE. Trans. Computers'08 and ISPASS'09 papers), and most recently HotSpot 7.0. In prior work, my group has: Explored scaling implications for power delivery (e.g., our ISCA'14 paper) and fault tolerance (e.g., our IEEE Micro'13 paper), explored new runtime reliability management techniques to balance performance and wear-out (e.g. our IEEE Micro'05 paper), and cope with transient faults (e.g. our GH'06 and GH'07 papers for GPUs) and take advantage of coarse-grained reconfigurable resources (e.g., our DATE'11 and CASES'11 papers). We also evaluated technology scaling limits and implications (e.g., 2007 presentation to the NRC CSTB study on "Sustaining Growth in Computing Performance," our paper in IEEE Micro on scaling with design constraints (preprint pdf), and our "Implications of Dim Silicon paper (preprint pdf)). Developed new power delivery modeling capabilities (e.g., see our VoltSpot ISCA'14 paper) and exploring new optimization techniques (e.g., our ASP-DAC'14 and DAC'14 papers) Developed new, efficient cryptography accelerators (e.g., in the VLSI Journal) Developed new design-space exploration capabilities that reduce simulation requirements, such as genetically programmed response surfaces (e.g. our DAC'08 paper and software) and Lumos. New cache organizations for many cores (e.g., our SC'10 and ICCD'09 papers), cache-conscious thread scheduling (e.g., our IPDPS'10 paper), and cache-conscious data layout for heterogeneous systems (SC'11) Explored how to most effectively use texture, constant, per-block shared memory, and other features that GPUs and GPU languages such as CUDA provide (e.g., see our ACM Queue'08, JPDC'08, IPDPS'09, and ICS'09/IJPP'11 papers) Developed new techniques to make SIMD architectures more effective in the presence of irregular data structures or irregular parallelism (see our IPDPS'12, ISCA'10 and SC'09 papers) Developed the first publicly available architectural simulator for GPUs, Qsilver (see our GH'04 paper) Developed improved power map derivation using thermal maps (e.g., our ICCD'10 paper) A new, pre-RTL floorplanning algorithm (see our ArchFP software) New temperature sensing and thermal-management capabilities (e.g. our ITEHRM'06 and follow-on IEEE Trans. Computers papers; see also our ACM Computing Surveys paper on dynamic thermal management and "prolegomenon" in IEEE Micro), as well as novel temperature-aware design techniques (e.g. our HPCA'06, DAC'08, and SEMI-THERM'10 papers) Described new techniques for coherence, including bypassing coherence for private data and a simplified form of "sharing tracker" coherence for avoiding refetch of shared, read-only data Dynamic combination or "federation" of scalar cores to support runtime variations in ILP and DLP (e.g. our DAC'08 and follow-on ACM TACO papers) New, lightweight out-of-order execution techniques with much better performance/mm² and performance/watt (see our ACM TACO paper - lightweight OO was an enabling technique for federation) Described new power management techniques, especially in the context of real-time constraints, spanning a variety of application types from multimedia (e.g. our Asilomar'06 paper) to multi-tier e-commerce workloads (e.g. our PACT'08 paper) Described a new form of hybrid neural branch predictor Described power-management techniques for branch prediction that do not impede prediction accuracy or performance and shown the importance of branch prediction for energy efficiency Evaluated the optimal energy-efficient scaling of microarchitectural structure sizes for simultaneous multithreading (SMT) Evaluated whether trace caches are energy efficient Explored how current power and thermal management features may expose security vulnerabilities Developed new reliability modeling capabilities (e.g. our IEEE TVLSI'07 paper) Shown the value of control theory in managing adaptive hardware structures, including controlling the DVS setting for a multimedia workload; setting the decay interval for leakage-power management using cache decay; and thermal regulation New techniques for fast and provably accurate warm-up when moving between many smaller samples in cycle-accurate simulations These research projects have stimulated several innovations in our computer architecture courses, including the development of a Microprocessor Survey Course (also described in a paper at SIGCSE), and we were one of the first departments to use CUDA to teach both concurrency and parallel architecture. This work is currently supported by the National Science Foundation under grant no. PPoSS-2217071 and CCF-2312740, grants from the Laboratory for Physical Sciences, grants from the SRC Global Research Corporation CADT program, and by PRISM, one of seven centers in JUMP2.0, a Semiconductor Research Corporation program sponsored by MARCO and DARPA. Prior support has come from the National Science Foundation under grant nos. ITR-0082671, CCR-0133634 (CAREER), CCR-0105626, EIA-0224434, DOS-0306404, CCF-0429765, CNS-0509245, CNS-0551630 (CRI), IIS-0612049, CNS-0615277, CCF-0903471 (MCDA), CNS-0916908 (ARRA), CCF-1451571, EF-1124931 CCF-1116673, CCF-1629450 (XPS) and CCF-1619127; a grant from the joint NSF-Intel FoMR program, CCF-1912608; CRISP, one of six centers in JUMP 1.0 and C-FAR, one of six centers of STARnet, both programs sponsored by the Semiconductor Research Corporation program sponsored by MARCO and DARPA; DARPA MTO (PERFECT program) under contract HR0011-13-C-0022; DARPA IIO (VMR program) under contract FA8750-12-C-0181; the Army Research Office under grant no. W911NF-04-1-0288; grants from Brookhaven National Labs; the Semiconductor Research Corporation under task no. 1607, 1972 and 2042; the Virginia CIT CRCF program under grant MF14S-021-IT, MF16-032-IT and MF17-027-IT; grants from AMD Research, Intel Research, NVIDIA Research, NEC Labs, IBM Research, and Xilinx; and an Excellence Award from the University of Virginia Fund for Excellence in Science and Technology (FEST). Additional support has been provided by William A. Ballard Fellowships for John W. Haskins and David Tarjan, a University of Virginia Award for Excellence in Scholarship in the Sciences & Engineering for David Tarjan and Jiayuan Meng, an ATI graduate fellowship for Jeremy Sheaffer, an NVIDIA Ph.D. fellowship for Jiayuan Meng, a GRC/AMD Ph.D. fellowship for Michael Boyer, and an ARCS Scholarship for Jack Wadden. We have also received equipment donations and extended loans from Micron, Xilinx, NVIDIA, AMD, and Hewlett Packard. The U.Va. Center for Automata Processing (CAP) was supported in part by Micron. Please note that any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies. Current Graduate Students and Research Staff: Morteza Baradaran Zhenxing Fan Beenish Gul Mohammadhosein Gholemrezaei Deyuan Guo Khyati Kiyawat Yasas Seneviratne, co-advised with Samira Khan Akhil Shekar Farzana Siddique Tom Tracy (Research Scientist) Undergraduate and Masters Researchers Hugo Abbot William Bradford Raul Cancho Kyle Durrer Ethan Ermovic Reid Farmer Kumaresh Nandagopal Thomas Nguyan Dominic Tran Han Yan Ruiyang Zhou Recently-graduated PhD Students Alif Ahmed, defended Oct. 2023, official diploma conferral Dec. 2023 - now with IBM Systems Oluwole Jaiyeoba, defended Aug. 2023, official diploma conferral Dec. 2023 - now with AMD Research Lingxi Wu, defended Sep. 2023, official diploma conferral Dec. 2023 - now with Black Sesame Marzieh Lenjani, May 2022 - now with Apple Elaheh Sadredini, May 2019 - now faculty at UC Riverside Complete list of LAVA alumni - postdocs, graduate students, undergraduate students

Selected Publications Please note that papers linked here represent author preprints. The official, published version must be obtained from the publisher's website or the published print copy. This material is presented here to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each document's copyright terms. In most cases, these works may not be reposted without the explicit permission of the copyright holder. Permission is given to make digital or hard copies of all or part of this material without fee for personal or classroom use, provided that the copies are not made or distributed for profit or commercial advantage, and that copies bear the appropriate copyright notice and the full bibliographic citation on the first page. Copyrights for components of this work owned by others must also be honored. To copy otherwise, to republish, to post on servers, to redistribute to lists, etc. requires specific permission and/or a fee. In particular, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the copyright owner. Please note further that any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsoring agencies, employers, or publishers.	A complete list of my publications can be found on my Google Scholar Page (which will also provide paper links) or DBLP page Recent Highlights (graph processing, multi-FPGA) O. Jaiyeoba, A. Mughrabi, M. Baradaran, B. Gul, and K. Skadron. "Swift: A Multi-FPGA Framework for Scaling Up Accelerated Graph Analytics." In Proceedings of the IEEE International Conference on Field Programmable Technology (FPT), Dec. 2024. (pdf) (processing in memory/PIM, simulation, benchmarks) F. A. Siddique, D. Guo, Z. Fan, M. Gholamrezaei, M. Baradaran, A. Ahmed, H. Abbot, K. Durrer, E. Ermovick, K. Nandagopal, E. Ermovick, K. Kiyawat, B. Gul, A. Mughrabi, A. Venkat, and K. Skadron. "Architectural Modeling and Benchmarking for Digital DRAM PIM." In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), Sep. 2024. (pdf) (dynamic graph processing) A. Ahmed, F. Siddique, and K. Skadron. "GraphTango: A Hybrid Representation Format for Efficient Streaming Graph Updates and Analysis." International Journal of Parallel Programming, Springer, to appear. (dynamic graph processing, FPGA) O. Jaiyeoba and K. Skadron. "Dynamic-ACTS - A Dynamic Graph Analytics Accelerator For HBM-Enabled FPGAs." ACM Transactions on Reconfigurable Technology and Systems, 17(3), 29 pp., Sept. 2024. (processing in storage/computational storage, accelerators, bioinformatics) L. Wu, M. Zhou, Weihong Xu, A. Venkat, T. Rosing, and K. Skadron. "Abakus: Accelerating k-mer Counting With Storage Technology." ACM Transactions on Architecture and Code Optimization (TACO), 21(1), 26 pp., Jan. 2024. (graph processing, FPGA) O. Jaiyeoba and K. Skadron. "ACTS: A Near-Memory FPGA Graph Processing Framework." Proceedings of the IEEE International Symposium on Field-Programmable Gate Arrays (FPGA), Feb. 2023. (processing in memory/PIM, sorting) M. Lenjani, A. Ahmed, and K. Skadron. "Pulley: An Algorithm/Hardware Co-optimization for In-memory Sorting." IEEE Computer Architecture Letters, 21(2):109-112, July-Dec. 2022. (processing in memory/PIM) L. Wu, R. Sharifi, A. Venkat, and K. Skadron. "DRAM-CAM: General-Purpose Bit-Serial Exact Pattern Matching." IEEE Computer Architecture Letters, 21(2):89-92, July-Dec. 2022. (automata processing, FPGAs, heterogeneous architectures, accelerators) K. Angstadt, T. Tracy, J.-B. Jeannin, K. Skadron, and W. Weimer. "Synthesizing Legacy String Code for FPGAs Using Bounded Automata Learning." IEEE Micro special issue on Compiling for Accelerators, 42(5):70-77, Sep.-Oct. 2022. (microarchitecture, microcode, runtime optimization) L. Moody, W. Qi, A. Sharifi, L. Berry, J. Rudek, J. Gaur, J. Parkhurst, S. Subramoney, K. Skadron, A. Venkat. "Speculative Code Compaction: Eliminating Dead Code via Speculative Microcode Transformations." In Proceedings of the ACM/IEEE International Symposium on Microarchitecture (MICRO), Oct. 2022. (processing in memory/PIM) M. Lenjani and K. Skadron. "Supporting Moderate Data Dependency, Position Dependency, and Divergence in PIM-based Accelerators." IEEE Micro special issue on Processing in Memory, published online Dec. 2021, in print Jan/Feb. 2022, 42(1):108-115. DOI 10.1109/MM.2021.3136189 (processing in memory/PIM) M. Lenjani, A. Ahmed, M. R. Stan, and K. Skadron. "Gearbox: A Case for Supporting Accumulation Dispatching and Hybrid Partitioning in PIM-based Accelerators." In Proceedings of the IEEE/ACM International Symposium on Computer Architecture (ISCA), June 2022. (processing in memory/PIM, FPGA) S. Mosanu, M. N. Sakib, T. Tracy III, E. Cukurtas, A. Ahmed, P. Ivanov, S. Khan, K. Skadron, M. Stan. "PiMulator: A Fast and Flexible Processing-in-Memory Emulation Platform." In Proceedings of the ACM/IEEE/EDAA/EDAC Conference on Design, Automation and Test in Europe (DATE), Mar. 2022. (automata processing, heterogeneous architectures, accelerators) E. Sadredini, R. Rahimi, M. Imani, and K. Skadron. “Sunder: Enabling Low-Overhead and Scalable Near Data Pattern Matching Acceleration.” In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct. 2021. (processing in memory/PIM, bioinformatics) M. Zhou, L. Wu, M. Li, N. Moshiri, K. Skadron, and T. Rosing. “Ultra Efficient Acceleration for De Novo Genome Assembly via Near-Memory Computing.” In Proceedings of the ACM/IEEE/IFIP International Conference on Parallel Architectures and Compiler Techniques (PACT), Sept. 2021 (thermal modeling) J. Han, R. E. West, K. Skadron, and M. R. Stan. "Thermal Simulation of Processing-in-Memory Devices Using HotSpot 7.0." In Proceedings of the 27th International Workshop on Thermal Investigations of ICs (THERMINIC), Sept. 2021. (processing in memory/PIM, bioinformatics) L. Wu, R. Sharifi, M. Lenjani, K. Skadron, and A. Venkat. “Sieve: Scalable In-situ DRAM-based Accelerator Designs for Massively Parallel k-mer Matching.” In Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA), June 2021. (automata processing, heterogeneous architectures, accelerators, FPGAs, linear temporal logic, verification) T. Tracy II, L. Tabajara, M. Vardi, and K. Skadron. “Runtime Verification on FPGAs with LTLf Specifications.” In Proceedings of the Formal Methods in Computer-Aided Design (FMCAD), Sept. 2020. (pdf) (automata processing, FPGAs, heterogeneous architectures, accelerators) R. Rahimi, E. Sadredini, M. Stan, and K. Skadron. “Grapefruit: An Open-Source, Full-Stack, and Customizable Automata Processing on FPGAs.” In Proceedings of the IEEE International Symposium on Field Customizable Computing Machines (FCCM), May 2020. Nominated for best paper. (automata processing, heterogeneous architectures, accelerators) E. Sadredini, R. Rahimi, M. Lenjani, M. Stan, and K. Skadron. “FlexAmata: A Universal and Efficient Adaption of Applications to Spatial Automata Processing Accelerators.” In Proceedings of the ACM International Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Mar. 2020. (automata processing, heterogeneous architectures, accelerators) R. Rahimi, E. Sadredini, M. Lenjani, M. Stan, and K. Skadron. “Impala: Algorithm/Architecture Co-Design for In-Memory Multi-Stride Pattern Matching.” In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), Feb. 2020. Nominated for best paper. (processing in memory/PIM) M. Lenjani, P. Gonzalez, E. Sadredini, S. Li, Y. Xie, A. Akel, S. Eilert, M. R. Stan, and K. Skadron. “Fulcrum: a Simplified Control and Access Mechanism toward Flexible and Practical In-Situ Accelerators.” In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), Feb. 2020. (automata processing, heterogeneous architectures, accelerators, on-chip interconnect) E. Sadredini, R. Rahimi, V. Verma, M. Stan, and K. Skadron. “eAP: A Scalable and Efficient In-Memory Accelerator for In-Memory Processing.” In Proceedings of the ACM/IEEE International Symposium on Microarchitecture (MICRO), Oct. 2019. (memory performance benchmarking) A. Ahmed, K. Skadron. “Hopscotch: A Micro-benchmark Suite for Memory Performance Evaluation.” In Proceedings of the International Symposium on Memory Systems (MEMSYS), Sep.-Oct. 2019. (dynamic graph processing) O. Jaiyeoba and K. Skadron. “GraphTinker: A High Performance Data Structure for Dynamic Graph Processing.” In Proceedings of the IEEE International Parallel and Distributed Processing Symposium, May 2019. (automata processing, FPGAs, heterogeneous architectures, accelerators) C. Bo, V. Dang, T. Xie, J. Wadden, M. Stan, and K. Skadron. "Automata Processing in Reconfigurable Architectures: In-the-cloud Deployment, Cross-platform Evaluation, and Fast Symbol-only Reconfiguration." ACM Transactions on Reconfigurable Technology and Systems (TRETS), 12(2), May 2019, DOI 10.1145/3314576. (automata processing, heterogeneous architectures, accelerators, debugging) M. Casias, K. Angstadt, T. Tracy II, K. Skadron, and W. Weimer. “Debugging Support for Pattern-Matching Languages and Accelerators.” In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Apr. 2019. (automata processing, heterogeneous architectures, accelerators) K. Angstadt, A. Subramaniyan, E. Sadredini, R. Rahimi, W. Weimer, K. Skadron, and R. Das. “ASPEN: A Scalable In-SRAM Architecture for Pushdown Automata.” In Proceedings of the ACM/IEEE International Symposium on Microarchitecture (MICRO), Oct. 2018. (automata processing, heterogeneous architectures, accelerators, benchmarks) J. Wadden, T. Tracy II, E. Sadredini, L. Wu, C. Bo, J. Du,** Y. Zhou, M. Wallace,* J. Udall,* M. Stan, and K. Skadron. “AutomataZoo: A Modern Automata Processing Benchmark Suite.” In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), Oct. 2018. (automata processing, heterogeneous architectures, accelerators, natural language processing) E. Sadredini, D. Guo, C. Bo, R. Rahimi, K. Skadron, and H. Wang. “A Scalable Solution for Rule-Based Part-of-Speech Tagging on Novel Hardware Accelerators.” In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Applied Data Science track, full paper with poster presentation, Aug. 2018. (automata processing, heterogeneous architectures, accelerators, simulation) K. Angstadt, J. Wadden, V. Dang, T. Xie, D. Kramp, W. Weimer, M. Stan, and K. Skadron. “MNCaRT: An Open-Source, Multi-Architecture Automata-Processing Research and Execution Ecosystem.” IEEE Computer Architecture Letters, published online Dec. 2017, published in print 17(1):84-87, Jan.-June 2018. DOI 10.1109/LCA.2017.2780105. (automata processing, heterogeneous architectures, accelerators) J. Wadden, K. Angstadt, and K. Skadron. “Characterizing and Mitigating Output Reporting Bottlenecks in Spatial-Reconfigurable Automata Processing Architectures.” In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), Feb. 2018. (automata processing, heterogeneous architectures, accelerators, bioinformatics) C. Bo, V. Dang, E. Sadredini, K. Skadron. “Searching for Potential gRNA Off-Target Sites for CRISPR/Cas9 using Automata Processing across Different Platforms.” In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), Feb. 2018. (automata processing, FPGAs, heterogeneous architectures, accelerators) T. Xie, V. Dang, J. Wadden, K. Skadron, and M. R. Stan. “REAPR: Reconfigurable Engine for Automata Processing.” In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL), Sept. 2017. (pdf) (automata processing, heterogeneous architectures, accelerators, frequent tree mining, pattern mining) E. Sadredini, K. Wang, and K. Skadron. “Frequent Subtree Mining on the Automata Processor: Challenges and Opportunities.” In Proceedings of the ACM International Conference on Supercomputing (ICS), June 2017. (pdf) (automata processing, heterogeneous architectures, spatial architectures, accelerators, routing) J. Wadden, Samira Khan, and K. Skadron. “Automata-to-Routing: An Open-Source Toolchain for Design-Space Exploration of Spatial Automata Processing Architectures.” In Proceedings of the IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), Apr. 2017. (pdf) (automata processing, heterogeneous architecture, accelerators, association rule mining, frequent itemset mining) K. Wang, E. Sadredini, and K. Skadron. "Hierarchical Pattern Mining with the Automata Processor." International Journal of Parallel Programming, Springer, published online Jan. 2017. DOI:10.1007/s10766-017-0489-y (preprint pdf) (automata processing, heterogeneous architectures, accelerators, entity resolution) C. Bo, K. Wang, J. Fox, and K. Skadron. “Entity Resolution Acceleration using Micron’s Automata Processor. In Proceedings of the 2016 IEEE International Conference on Big Data (BigData), Dec. 2016. (pdf) (automata processing, heterogeneous architectures, accelerators, programming languages, compilers) K. Angstadt, W. Weimer, and K. Skadron. "RAPID Programming of Pattern-Recognition Processors." In Proceedings of the ACM International Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Apr. 2016. (pdf \| software) Highlights from Prior Work (specialized architectures, encryption) X. Guo, M. El-Hadedy, S. Mosanu, X. Wei, K. Skadron, and M. R. Stan. "Implementation of Configurable AES Primitive with Agile Design Approach." Integration, the VLSI Journal, Elsevier, 85(C):87-96, July 2022. (fuzzing, security) A. Ahmed, J. Hiser, A. Nguyen-Tuong, J. W. Davidson, and K. Skadron. “BigMap: Future-proofing Fuzzers with Efficient Large Maps.” In Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), June 2021. (power delivery, IR drop, thermal) S. Rahimipour, Runjie Zhang, Ke Wang, K. Skadron, F. Z. Rohkani, and M. Stan. "MTTF Enhancement Power-C4 Bump Placement Optimization." IEEE Transactions on Very Large Scale Integration Systems (TVLSI), 27(7):1633-39, July 2019, DOI 10.1109/TVLSI.2019.2904048. (specialized architectures, encryption) M. El-Hadedy, H. Mihajloska, D. Gligoroski, A. Kulkarni, D. Stroobandt and K. Skadron. "A 16-bit Reconfigurable Encryption Processor for Pi-Cipher." In Proceedings of the 23rd Reconfigurable Architectures Workshop (RAW), in conjunction with IPDPS, May 2016. Best paper award! (pdf) (specialized architectures, transpose memory, signal processing) M. El-Hadedy, X. Guo, M. Margala, M. R. Stan, and K. Skadron. “Dual-Data Rate Transpose-Memory Architecture Improves the Performance, Power and Area of Signal-Processing Systems.” Journal of Signal Processing Systems, Springer, published online Nov. 2016. DOI 10.1007/s11265-016-1199-1. (automata processing, heterogeneous architecture, accelerators, association rule mining) K. Wang, E. Sadredini, and K. Skadron. "Sequential Pattern Mining with the Micron Automata Processor." In Proceedings of the ACM International Conference on Computing Frontiers, May 2016. Best paper award! (pdf) (benchmark suites, accelerators) G. Juckeland, W. Brantley, S. Chandrasekaran, B. Chapman, S. Che, M. Colgrove, H. Feng, A. Grund, R. Henschel, W-M. W. Hwu, H. Li, M. S. Mueller, M. Perimov, P. Shelepugin, K. Skadron, J. Stratton, A. Titov, K. Wang, M. van Waveren, B. Whitney, S. Wienke, R. Xu, and K. Kumaran. "SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance." In Proceedings of the Fifth International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS14), in conjunction with SC, Nov. 2014. (heterogeneous architectures, reconfigurable logic, design space exploration) L. Wang and K. Skadron. "Lumos+: Rapid, Pre-RTL Design Space Exploration on Accelerator-Rich Heterogeneous Architectures with Reconfigurable Logic." In Proceedings of the IEEE International Conference on Computer Design (ICCD), Oct. 2016. (pdf) (power delivery, thermal) K. Wang, K. Skadron, and M. R. Stan. “ Closing the Power Delivery/Heat Removal Cycle for Heterogeneous Multi-Scale Systems.” In Proceedings of the 22nd International Workshop on Thermal Investigations of ICs (THERMINIC), Sept. 2016 (pdf). (power delivery, transient voltage noise) R. Zhang, K. Mazumder, B. H. Meyer, K. Wang, K. Skadron, and M. R. Stan. "Transient Voltage Noise in Charge-Recycled Power Delivery Networks for Many-Layer 3D-IC." In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), July 2015. (pdf) (GPGPU, reliability, redundant multithreading) J. Wadden, A. Lyashevsky, S. Gurumurthi, V. Sridharan, and K. Skadron. "Real-World Design and Evaluation of Compiler-Managed GPU Redundant Multithreading." In Proceedings of the ACM/IEEE International Symposium on Computer Architecture, June 2014. (pdf) (power delivery, transient voltage noise) K. Wang, R. Zhang, B. H. Meyer, M. R. Stan, and K. Skadron. "Managing C4 Placement for Transient Voltage Noise Minimization." In Proceedings of the ACM/IEEE Conference on Design Automation (DAC), June 2014 (please download from this listing). (power delivery, IR drop) K. Wang, R. Zhang, B. H. Meyer, K. Skadron, and M. R. Stan. "Walking Pads: Fast Power-Supply Pad-Placement Optimization." In Proceedings of the ACM/IEEE Asia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2014. Best paper candidate. (preprint pdf) (heterogeneous architecture, dark silicon) L. Wang and K. Skadron. "Implications of the Power Wall: Dim Cores and Reconfigurable Logic." IEEE Micro special issue on Dark Silicon, 33(5): 40-49, Sept.-Oct. 2013.DOI 10.1109/MM.2013.74. (preprint pdf \| Lumos software) (gpgpu, accelerators, heterogeneous architecture, memory, performance portability) S. Che, J. Meng, and K. Skadron. "Dymaxion++: A Directive-based API to Optimize Data Layout and Memory Mapping for Heterogeneous Systems." In Proceedings of the Fourth International Workshop on Accelerators and Hybrid Exascale Systems, in conjunction with IPDPS, May 2014. (pdf) (performance prediction, gpgpu, accelerators, manycore, heterogeneous architecture, benchmarks) S. Che and K. Skadron. "BenchFriend: Correlating the Performance of GPU Benchmarks." Sage International Journal of High Performance Computing Applications (IJHPCA), 28(2):236-248, May 2014.(preprint pdf) (gpgpu, accelerators, manycore, heterogeneous architecture, benchmarks) S. Che. B. M. Beckmann, S. K. Reinhardt, and K. Skadron, "Pannotia: A Characterization of GPGPU Graph Applications, In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), Sept. 2013. (pdf) (gpgpu, accelerators, manycore, heterogeneous architecture, programming models and frameworks) L. Szafaryn, T. Gamblin, B. R. de Supinski, and K. Skadron. "Trellis: Portability Across Architectures with a High-level Framework." Elsevier Journal of Parallel and Distributed Computing, 73(10):1400-13, Oct. 2013. (First published online July, 2013.) DOI 10.1016/j.jpdc.2013.07.001. (preprint pdf) (reliability) L. Szafaryn, B. H. Meyer, and K. Skadron. "Evaluating the Overheads of Soft Error Protection Mechanisms in the Context of Multi-bit Errors at the Scope of a Processor Core." IEEE Micro special issue on Reliability Aware Design, 33(4):56-65, July-Aug. 2013. DOI 10.1109/MM.2013.68. (preprint pdf) (CPU-GPU load balancing) M. Boyer, K. Skadron, S. Che, N. Jayasena. "Load Balancing in a Changing World: Dealing with Heterogeneity and Performance Variability. In Proceedings of the ACM Conference on Computing Frontiers, May 2013. (pdf) (thermal) K. Sankaranarayanan, B. H. Meyer, W. Huang, R. J. Ribando, H. Haj-Hajiri, M. R. Stan, and K. Skadron. "Architectural Implications of Spatial Thermal Filtering." Elsevier Integration, the VLSI Journal, 46(1):44-56, Jan. 2013.DOI 10.1016/j.vlsi.2011.12.002 (preprint pdf) (floorplanning, thermal, power delivery) G. Faust, R. Zhang, K. Skadron, M.R. Stan, and B. Meyer. "ArchFP: Rapid Prototyping of pre-RTL Floorplans." In Proceedings of the IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC), Oct. 2012. (pdf \| ArchFP software) (SIMD, cache, branch divergence) J. Meng, J. W. Sheaffer, and K. Skadron. "Robust SIMD: Dynamically Adapted SIMD Width and Multi-Threading Depth." In Proceedings of the IEEE International Parallel & Distributed Processing Symposium (IPDPS), May 2012. (pdf) (datacenter co-scheduling, cache) J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. "BubbleUp: Increasing Sensible Co-locations for Improved Utilization in Modern Warehouse Scale Computers." In Proceedings of the ACM/IEEE International Symposium on Microarchitecture (MICRO), Dec. 2011. (pdf) Also in IEEE Micro "Top Picks from 2011 Architecture Conferences." (gpgpu, accelerators, heterogeneous architecture, memory, performance portability) S. Che, J. W. Sheaffer, and K. Skadron. "Dymaxion: Optimizing Memory Access Patterns for Heterogeneous Systems." In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov. 2011. (pdf) (reliability, fault tolerance, redundant execution) B. Meyer, B. Calhoun, J. Lach, and K. Skadron. "Cost-effective Safety and Fault Localization using Distributed Temporal Redundancy." In Proceedings of the ACM/IEEE International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES), Oct. 2011. (pdf) (thermal, power, scaling, dark silicon) W. Huang, K. Rajamani, M. R. Stan, and K. Skadron. "Scaling with Design Constraints -- Predicting the Future of Big Chips." IEEE Micro special issue on Big Chips, 31(4):16-29, July/Aug. 2011, DOI 10.1109/MM.2011.42. (preprint pdf) (manycore, GPU architecture, power) M. Gebhart, D. R. Johnson, D. Tarjan, S. W. Keckler, W. J. Dally, E. Lindholm, and K. Skadron. "Energy-efficient Mechanisms for Managing Thread Context in Throughput Processors." In Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA), June 2011. (pdf) (gpgpu, accelerators, manycore, heterogeneous architecture, benchmarks) S. Che, J. W. Sheaffer, M. Boyer, L. G. Szafaryn, L. Wang, and K. Skadron. "A Characterization of the Rodinia Benchmark Suite with Comparison to Contemporary CMP Workloads." In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), Dec. 2010. (pdf) (manycore, dynamic cores) M. Boyer, D. Tarjan, K. Skadron. "Federation: Boosting Per-Thread Performance of Throughput-Oriented Manycore Architectures." ACM Transactions on Architecture and Code Optimization (TACO), 7(4):1-38, Dec. 2010, DOI 10.1145/1880043.1880046. (preprint pdf) (manycore, cache, coherence) D. Tarjan and K. Skadron. "The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches." In Proceedings of theACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov. 2010. (pdf) (SIMD, cache, branch divergence) J. Meng, D. Tarjan, and K. Skadron. "Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance." In Proceedings of the 37th ACM/IEEE International Symposium on Computer Architecture, June 2010. (pdf) (manycore, cache, coherence) J. Meng and K. Skadron "Avoiding Cache Thrashing due to Private Data Placement in Last-Level Cache for Manycore Scaling." In Proceedings of the IEEE International Conference on Computer Design (ICCD), pp. 282-88, Oct. 2009. (pdf) (power, real-time, control theory) T. Horvath and K. Skadron. "Multi-mode Energy Management for Multi-tier Server Clusters." In Proceedings of the ACM/IEEE/IFIP International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 270-79, Oct. 2008. (preprint pdf) (thermal) W. Huang, K. Sankaranarayanan, K. Skadron, R. J. Ribando, and M. R. Stan. "Accurate, Pre-RTL Temperature-Aware Processor Design Using a Parameterized, Geometric Thermal Model." IEEE Transactions on Computers, 57(9):1277-88, Sept. 2008, DOI 10.1109/TC.2008.64. (pdf) (gpgpu, fpga, accelerators, heterogeneous architecture) S. Che, J. Li, J. W. Sheaffer, K. Skadron, and J. Lach. "Accelerating Compute Intensive Applications with GPUs and FPGAs." In Proceedings of the IEEE Symposium on Application Specific Processors (SASP), pp. 101-07, June 2008. (pdf) (simulation methodology) H. Cook and K. Skadron. "Predictive Design Space Exploration Using Genetically Programmed Response Surfaces." In Proceedings of the ACM/IEEE Conference on Design Automation (DAC), June 2008. (pdf) (gpgpu) J. Nickolls, I. Buck, M. Garland, K. Skadron. "Scalable Parallel Programming with CUDA." ACM Queue, 6(2):40-53, Mar.-Apr. 2008. DOI 10.1145/1365490.1365500 (pdf) (parameter variations, multicore, thermal, power, leakage) E. Humenay, D. Tarjan, and K. Skadron. "Impact of Process Variations on Multicore Performance Symmetry." In Proceedings of the 2007 Conference on Design, Automation and Test in Europe (DATE), pp. 1653-58, Apr. 2007. (pdf) (reliability, thermal) Z. Lu, W. Huang, M. Stan, K. Skadron, and J. Lach. "Interconnect Lifetime Prediction for Reliability-Aware Systems." IEEE Transactions on VLSI Systems, 15(2):159-72, Feb. 2007. (pdf) (branch prediction, trace cache, power) M. Co, D. A.B. Weikle, and K. Skadron. "Evaluating Trace Cache Energy Efficiency." ACM Transactions on Architecture and Code Optimization (TACO), 3(4):450-76, Dec. 2006. (Abstract \| pdf) (graphics architecture, reliability) J. W. Sheaffer, D. P. Luebke, and K. Skadron. "The Visual Vulnerability Spectrum: Characterizing Architectural Vulnerability for Graphics Hardware." In Proceedings of Eurographics/ACM Graphics Hardware 2006 (GH), pp. 9-16, Sept. 2006. (pdf) (thermal) S. W. Chung and K. Skadron. "Using on-Chip Event Counters for High-Resolution, Real-Time Temperature Measurements." In Proceedings of the IEEE/ASME Tenth Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITHERM), June 2006. (pdf) (thermal) W. Huang, S. Ghosh, S. Velusamy, K. Sankaranarayanan, K. Skadron, and M. R. Stan. “HotSpot: A Compact Thermal Modeling Methodology for Early-Stage VLSI Design.” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(5):501-513, May 2006. (pdf) Google Scholar Classic Paper in in Computer Hardware Design for 2006 (3rd most cited paper). (multi-core architecture, power, thermal) Y. Li, B. C. Lee, D. Brooks, Z. Hu, and K. Skadron. "CMP Design Space Exploration Subject to Physical Constraints." In Proceedings of the Twelfth IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 15-26, Feb. 2006. (pdf) (branch prediction) D. Tarjan and K. Skadron. "Merging Path and Gshare Indexing in Perceptron Branch Prediction." ACM Transactions on Architecture and Code Optimization, Sept. 2005, 2(3):280-300. (pdf) (thermal, security) P. Dadvar and K. Skadron. "Potential Thermal Security Risks." In Proceedings of the IEEE Semiconductor Thermal Measurement, Modeling, and Management Symposium (Semi-Therm 21), pp. 229-34, Mar. 2005. (pdf) (thermal) K. Skadron, K. Sankaranarayanan, S. Velusamy, D. Tarjan, M.R. Stan, and W. Huang. "Temperature-Aware Microarchitecture: Modeling and Implementation." ACM Transactions on Architecture and Code Optimization, 1(1):94-125, Mar. 2004. (pdf) (leakage power) Y. Li, D. Parikh, Y. Zhang, K. Sankaranarayanan, M. R. Stan, and K. Skadron. "State-Preserving vs. Non-State-Preserving Leakage Control in Caches." In Proceedings of the 2004 Design, Automation and Test in Europe (DATE) Conference, pp. 22-27, Feb. 2004. (pdf) [HotLeakage software home page] (power, real-time) V. Sharma, A. Thomas, T. Abdelzaher, Z. Lu, and K. Skadron. "Power-Aware QoS Management on Web Servers." In Proceedings of the 24th International Real-Time Systems Symposium, pp. 63-72, Dec. 2003. (pdf) (Best student paper!) (branch prediction) Z. Lu, J. Lach, M. Stan, and K. Skadron. "Alloyed Branch History: Combining Global and Local Branch History for Robust Performance." International Journal of Parallel Programming, Kluwer, 31(2):137-77, Apr. 2003. (pdf \| Abstract) (write buffers) K. Skadron and D.W. Clark. "Design Issues and Tradeoffs for Write Buffers." In Proceedings of the Third International Symposium on High-Performance Computer Architecture, pp. 144-55, February 1997. (postscript \| pdf \| abstract) Complete list of Skadron's publications through 2017; for a complete list of my publications, please see my Google Scholar page.

Software Releases	PIMeval, PIMbench, and PIM API - a new simulator, benchmark suite, and programming API for DRAM processing in memory (PIM) AutomataZoo - a new benchmark suite for automata processing (replaces ANMLZoo) MNRL - a specification language for finite state machines MNCaRT - an end-to-end research toolkit for automata processing across multiple processing platforms ATR - Automata-to-routing - an adaptation of VPR for automata processing on spatial architectures VASim - An engine for cross-platform automata processing, supporting CPU, FGPA, Micron AP, and coming soon, GPU iNFAnt2 - a GPU NFA execution engine ANMLZoo - A benchmark suite for automata processing RAPID - a C-like programming language for inexact pattern matching VoltSpot software- now version 2! HotSpot software - now version 6! HotLeakage software ArchFP software Rodinia Benchmarks Wiki - now version 3! Lumos MV5 simulator - a modification of M5 supporting SIMD and multithreaded architectures DelayDVS and ClusterControlWare - software for power management in multi-tier servers (based on this PACT'08 paper) Genetically Programmed Response Surfaces Toolkit (based on this DAC'08 paper) Qsilver AmberBBV, for generating basic block vectors (e.g. for use with SimPoint) from Amber traces (part of the CHUD tools). MRRL tools HydraScalar

Archival links to miscellaneous materials:	PIM2: Combined Tutorial and Workshop on Processing in Memory, in conjunctino with MICRO-55 Birds-of-a-feather session at SC'09 on "Benchmark Suite Construction for Multicore and Accelerator Architectures" - slides Birds-of-a-feather session at SC'09 on "The Art of Performance Tuning for CUDA and Manycore Architectures" with Paulius Micikevicius (NVIDIA) and David Tarjan (NVIDIA Research) - slides Tutorial on NVIDIA GPU programming (CUDA), with David Luebke (NVIDIA Research), Michael Garland (NVIDIA Research), and John Owens (UC Davis), at ASPLOS-XIII, Seattle, WA, Mar. 2008. Sept. 2007 Presentation to National Research Council CSTB study on "Sustaining Growth in Computing Performance" (slides) Workshop on Temperature Aware Computer Systems Tutorial on Thermal Issues for Temperature-Aware Computer Systems, with David Brooks (Harvard), Antonio Gonzalez (UPC Barcelona and Intel Barcelona), Lev Finkelstein (Intel Haifa), and Mircea R. Stan (Univ. of Virginia), at ISCA-31, Munich Germany, June 2004. Tutorial on Power-Aware Design for High-Performance Processors, with José González (Intel Barcelona), at HPCA-10, Madrid Spain, Feb. 2004. Notes for how to be a successful conference publications chair; also a template (courtesy of Martin Schulz) for a receipt to send for extra-page charges A brief summary of my advising philosophy (Also see Good advice on how to succeed as a graduate student from Christos Kozyrakis's website) Position papers & final report, 2001 NSF Workshop on Computer Performance Evaluation, co-organized with Margaret Martonosi. The panel's recommendations also appear in an article in the Aug. 2003 issue of IEEE Computer. Undergraduate work on checkpointing and urban decentralization.

Last updated: Mar. 2025
Return to CS home page

Email web page comments to webman@cs.virginia.edu
Email CS admission inquiries to inquiry@cs.virginia.edu
©1999-2025, Kevin Skadron
Thanks to Joseph Calandrino for help with the design of this website.

Areas of Interest

Biographical Sketch

Note to postdoc, graduate student and summer-intern applicants

Teaching

Research

Selected Publications

Software Releases

Archival links to miscellaneous materials: