Publications

💡 Hover over nodes to highlight connections. Click to filter the papers list.

VATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation

Spotlight

workshop

Harshil Patel, Kunal Pai

Second Workshop on Agents in the Wild: Safety, Security, and Beyond, ICML 2026

This paper introduces VATS, a framework demonstrating that autonomous agents are highly vulnerable to "error-path injections," where adversarial payloads disguised as tool error messages exploit the Model Context Protocol (MCP) to bypass safety heuristics and triple the success rate of standard indirect prompt injections across major frontier models.

Artificial IntelligenceLarge Language Models (LLMs)AI Safety

View Pre-Print

Toward Reproducible and Standardized Computer Architecture Simulation with gem5

conference

Kunal Pai, Harshil Patel, Erin Le, Noah Krim, Mahyar Samani, Bobby R. Bruce, Jason Lowe-Power

IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) 2026

To address the inconsistencies in simulation-based research, this work enhances the gem5 ecosystem by standardizing disk-image creation, introducing a flexible class-based exit event system for better guest-host communication, and implementing native tools like Suites and MultiSim to streamline and stabilize complex multi-workload workflows.

Computer Architecturegem5SimulationReproducibility

View Publication View Pre-Print View Artifact

NAAMSE: Framework for Evolutionary Security Evaluation of Agents

workshop

Kunal Pai, Parth Shah, Harshil Patel

ICLR 2026 Agents in the Wild: Safety, Security, and Beyond Workshop

NAAMSE is an evolutionary framework that automates AI agent security testing by using a feedback-driven optimization process to mutate prompts and uncover high-severity vulnerabilities that manual or static benchmarks often miss.

Artificial IntelligenceLarge Language Models (LLMs)AI SafetyAdversarial Testing

View Publication View Source View Project Page

Implications of Full-System Modeling for Superconducting Architectures

workshop

Kunal Pai, Mahyar Samani, Anusheel Nand, Jason Lowe-Power

Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC Workshops '25)

As Moore's Law slows, superconducting electronics offer ultra-low-power, high-speed computation potential. This paper presents the first full-system superconducting modeling in gem5, including cryogenic and superconducting cores, caches, and interconnects. Our results show that superconducting cores and caches can yield up to 24× speedup for compute-intensive workloads, but memory-intensive applications remain bottlenecked by room-temperature DRAM. This makes superconducting technology more suitable for domain-specific accelerators rather than general-purpose computing, with performance dependent on workload memory access patterns and data widths.

Computer ArchitectureSuperconductingCryogenic Computinggem5

View Publication Slides Talk

HASHIRU: Hierarchical Agent System for Hybrid Intelligent Resource Utilization

preprint

Kunal Pai, Parth Shah, Harshil Patel

arXiv preprint

To support resource-efficient multi-agent reasoning, we introduce HASHIRU, a hierarchical agent system that dynamically instantiates specialized agents under cost and memory constraints. HASHIRU combines hybrid LLM usage, autonomous API/tool creation, and a novel economic model for agent hiring/firing, outperforming larger models like Gemini 2.0 Flash on complex reasoning and safety tasks.

Artificial IntelligenceLarge Language Models (LLMs)Multi-Agent Systems

View Pre-Print View Source

CoDocBench: A Dataset for Code-Documentation Alignment in Software Maintenance

conference

Kunal Pai, Premkumar Devanbu, Toufique Ahmed

International Conference on Mining Software Repositories (MSR) 2025: Data and Tool Showcase Track

Understanding and implementing code changes is a key aspect of software maintenance. To support this, we introduce a new dataset of coupled changes to code and documentation mined from high-quality GitHub projects, where each sample represents a single commit with simultaneous updates to code and docstrings. This dataset enables training and evaluation on realistic, change-related tasks, which remain challenging for current models like Llama 3.1 405B and Mixtral 8×22B.

Software EngineeringGitHub MiningLarge Language Models (LLMs)

View Source View Publication View Pre-Print

Calibration and Correctness of Language Models for Code

conference

Claudio Spiess, David Gros, Kunal Suresh Pai, Michael Pradel, Md Rafiqul Islam Rabin, Amin Alipour, Sushmit Jha, Premkumar Devanbu, Toufique Ahmed

International Conference on Software Engineering (ICSE) 2025

Machine learning models often produce incorrect outputs, making reliable confidence measures essential for determining the trustworthiness of these outputs. This paper introduces a framework to evaluate and improve the calibration of code-generating models, finding that these models are generally poorly calibrated initially but can be improved using methods like Platt scaling, thereby enhancing decision-making in software engineering.

Software EngineeringMachine LearningNaturalness of Software

View Publication View Pre-Print

Potential and Limitation of High-Frequency Cores and Caches

poster

Kunal Pai, Anusheel Nand, Jason Lowe-Power

ModSim 2024: Workshop on Modeling & Simulation of Systems and Applications

The poster presentation explores the potential and limitations of high-frequency in-order and out-of-order cores and caches in modern processors, highlighting the trade-offs between speedups and bandwidth.

Computer ArchitectureCryogenic ComputingSuperconducting

View Poster View Presentation View Pre-Print

Automatic semantic augmentation of language model prompts (for code summarization)

conference

Toufique Ahmed, Kunal Suresh Pai, Premkumar Devanbu, Earl T. Barr

International Conference on Software Engineering (ICSE) 2024

Adding explicit semantic facts as prompts to Large Language Models improves their performance in code summarization tasks, with notable improvements exceeding 2 BLEU and, in some cases, even surpassing 30 BLEU, demonstrating the effectiveness of this approach in enhancing code analysis and extraction of essential information.

Software EngineeringMachine LearningNaturalness of Software

View Publication

Validating Hardware and SimPoints with gem5: A RISC-V Board Case Study

poster

Kunal Pai, Zhantong Qiu, Jason Lowe-Power

gem5 Workshop at International Symposium on Computer Architecture (ISCA) 2023

The poster discusses the development of a RISC-V board model (RISCVMatched) in gem5, along with a methodology for fine-tuning gem5 configurations to closely match real-life systems, resulting in more accurate hardware validation and simulation capabilities.

Computer Architecturegem5

View Publication

gem5 Vision

poster

Parth Shah, Kunal Pai, Harshil Patel, Arslan Ali

gem5 Workshop at International Symposium on Computer Architecture (ISCA) 2023

The gem5 Vision Project seeks to improve user-friendliness and accessibility by introducing advanced search functionality, comprehensive resource categorization, and expanded database support within the gem5 ecosystem for researchers and developers.

Computer Architecturegem5

View Publication

✨ Citation copied to clipboard!