RESEARCH INTERESTS
Keywords: Data-driven decision making, Machine learning, Performance Analytics, High Performance Computing (HPC), Distributed Systems, Cycle Sharing Systems
Others: Scientific workflow applications, Fault-tolerance, Power-Aware Resilience.
Given that most applications pay for allocations on the HPC systems (aka supercomputers), their efficient utilization of the underlying systems is essential. Also, the difference in performance between a slow and a fast variant of an application could mean a more fine-grained scientific discovery attained faster. However, the complexity of many- and multi-core machines with heterogeneous architectures (e.g., dynamic branch prediction, prefetching, out-of-order scheduling, CPU and GPU units on the same node) makes it challenging for applications to scale up and for developers to pinpoint the cause of the performance bottlenecks.
Keeping that in mind, the goal of my research group--"Scalability"--is to help scientists efficiently utilize the power of HPC. My research group develops performance measurement tools, analysis methodologies, and novel visualizations for users (both application scientists and HPC management) to quickly identify the causes of the performance and scalability issues of applications running on these systems. We develop new machine learning techniques to understand how factors such as network topology, source code structure, and machine models impact the performance of applications as well as the utilization of the HPC systems under various power, performance, and resilience constraints.
Collaborators (partial list):
Khaled Ibrahim (LBNL)
Yang Liu (LBNL)
Jae-Seung Yeom (LLNL)
Jayaraman J. Thiagarajan (LLNL)
Barry Rountree (LLNL)
Aniruddha Marathe (LLNL)
Kathryn Mohror (LLNL)
Tapasya Patki (LLNL)
Kerstin Kleese Van Dam (BNL)
Line Pouchard (BNL)
Bogdan Nicolae (ANL)
Rob Ross (ANL)
Past Members:
Tarek Ramadan (M.Sc., Oracle)
Arunavo Dey (M.Sc.)
Russell Hernandez Ruiz (B.Sc., TxState)
Ethan Greene (B.Sc., TxState)
Holland Schutte (B.Sc., WWU)
Gian-Carlo DeFazio (B.Sc., currently at LLNL)
Nathan Pinnow (M.Sc., currently at LLNL)
Nicholas Majeske (M.Sc., currently Ph.D. student at Indiana University)
Anna Zivkovik (B.Sc., WWU)
Jack Stratton (B.Sc., WWU)
David Smith (B.Sc., WWU)
Tony Dinh (B.Sc., WWU)
Trevor Marcus (B.Sc., WWU)
Chloe Dawson (B.Sc., WWU)
Alexis Ayala (B.Sc., graduate student at WWU)
Quentin Jensen (B.Sc., graduate student at WWU)
Philip Wu Liang (B.Sc., WWU)
Forest Sweeney (B.Sc., WWU)
Cody Pragner (B.Sc., WWU)
Open-source Software Releases:
- libNVCD: An easy-to-use, performance measurement and analysis tool for NVIDIA-based GPUs. Latest public, open-source release of libNVCD was version 1.0 on September, 2022.
-
Dashing: I developed an interpretable machine learning toolkit for HPC Performance Analysis. Latest public, open-source release of Dashing was version 1.0 on Aug 4, 2020.
-
GPTune: GPTune is an online autotuning framework an autotuner for suggesting optimal execution parameters to users. We integrated Dashing's importance analysis and visualization capabilities to GPTune. https://gptune.lbl.gov/
-
SCR: Scalable Checkpoint/Restart for MPI. The project won R&D 100 award in 2019. Latest public release of SCR was version 2.0.0 on March 28, 2019.
-
Gyan: Performance Measurement Tool for MPI implementations. Latest public, open-source release of Gyan was version 1.0 on May 7, 2014.
Open-source Data Releases:
-
On-node scaling data on HPC systems. 2019. DOI: 10.5281/zenodo.4315003.
-
Performance characterization data for AMReX applications developed by the DOE Exascale Computing Project (ECP). 2020. doi: 10.5281/zenodo.3403037
Research Projects
ECRP: PerfGen: Synthesizing Performance using GenAI
The PerfGen project addresses the critical need for extensive performance data in HPC environments, where traditional data collection is often time-consuming and resource-intensive. By developing generative AI methods, specifically using GAN-based and LLM-based approaches, PerfGen synthesizes high-fidelity performance data, with LLM-based methods showing superior performance. The framework also introduces a new dissimilarity metric to evaluate the quality of the generated data, ensuring it supports accurate and effective downstream machine learning tasks. This innovative approach enables scalable and efficient performance optimization in HPC systems.
ECRP: Performance-in-a-Graph (PinG)
The performance analytics domain in High Performance Computing (HPC) uses tabular data to solve regression problems, such as predicting the execution time. Existing Machine Learning (ML) techniques leverage the correlations among features given tabular datasets, not leveraging the relationships between samples directly. Moreover, since high-quality embed-dings from raw features improve the fidelity of the downstream predictive models, existing methods rely on extensive feature engineering and pre-processing steps, costing time and manual effort. To fill these two gaps, we propose a novel idea of transforming tabular performance data into graphs to leverage the advancement of Graph Neural Network-based (GNN) techniques in capturing complex relationships between features and samples. In contrast to other ML application domains, such as social networks, the graph is not given; instead, we need to build it. To address this gap, we propose graph-building methods where nodes represent samples, and the edges are automatically inferred iteratively based on the similarity between the features in the samples. We evaluate the effectiveness of the generated embeddings from GNNs based on how well they make even a simple feed-forward neural network perform for regression tasks compared to other state-of-the-art representation learning techniques. Our evaluation demonstrates that even with up to 25 % random missing values for each dataset, our method outperforms commonly used graph and Deep Neural Network (DNN)-based approaches and achieves up to 61.67% & 78.56 % improvement in MSE loss over the DNN baseline respectively for HPC dataset and Machine Learning Datasets.
Comparative Performance analysis
My research developed principled approach for comparing performance between applications. The application of this methodology was proxy application validation, which is important for DOE's co-design centers. This project resulted in many publications, and open-source software tools
Power-Aware Resilience
This project developed several algorithms for shifting power in an I/O-aware manner to other applications in an HPC system to improve performance. Since I/O phases of applications use less power, moving it to applications or processes crunching numbers accelerates their computation.
Fractal: Few-Shot Transfer Learning for Performance
Few-shot transfer learning and generative AI significantly enhance the prediction of relative performance in HPC environments. By using a few samples from the target application, few-shot learning adapts the source model to improve generalizability, while generative AI synthesizes performance samples to mitigate data scarcity, ensuring accurate and efficient knowledge transfer across different platforms.
Performance Characterization
Performance characterization is crucial as it identifies the key interactions between applications and hardware, captured by performance counters, which are essential for predicting execution times and guiding optimization efforts. It serves as a prerequisite to performance optimization and autotuning by providing insights into bottlenecks and resource usage patterns, allowing for informed adjustments that enhance overall system performance. Without this foundational understanding, tuning efforts are less effective and may not fully leverage the potential of HPC systems.
Performance-Aware Application Development
My group is developing machine learning models to predict the impact of a code change on application performance. This project aims to help application developers assess how their proposed code changes will impact that application's performance before the code is executed. We envision that the performance information collected over time from nightly tests provided as feedback to the model will significantly improve its accuracy.
MPI and MPI_T
In the past I have contributed to designing fault-tolerant interface for MPI. I developed performance measurement tools for assessing the performance of MPI_T interface.
RECUP: Perf. Reproducibility
Studying performance reproducibility is vital in the era of heterogeneous supercomputing due to increased performance variation and reduced consistency across runs. Understanding how factors like network traffic, power limits, concurrency tuning, and job interference impact performance is essential for achieving both optimal and reproducible outcomes in HPC environments.
Data-driven decision making
This new direction of my research investigates the viability of leveraging automated data-driven analysis in decision making for various domains including job scheduling, resource management, health care, transportation planning and more. There are several collaborative research opportunities currently pursue in my lab and I am looking for students interests in optimizations to join my team.
Proxy Application Development and Validation
Proxy applications are written to represent subsets of performance behaviors of larger, and more complex applications that often have distribution restrictions. In this research, we developed a systematic methodology for quantitatively compare how well proxies match with their parents.
Scalable Checkpoint/Restart
My Ph.D. thesis built scalable checkpoint/restart systems for both high-throughput (Grid using Condor) and high-performance computing environments. A significant part of my thesis contributed to the Scalable Checkpoint Restart framework that won the R&D 100 award in 2019.