Papers

2025

Probabilistic Soundness Guarantees in LLM Reasoning Chains
Weiqiu You, Anton Xue, Shreya Havaldar, Delip Rao, Helen Jin, Chris Callison-Burch, Eric Wong
Instruction Following by Boosting Attention of Large Language Models
Vitoria Guardieiro, Adam Stein, Avishree Khare, Eric Wong
Blog Post + Source Code on GitHub
Benchmarking Misuse Mitigation Against Covert Adversaries
Davis Brown, Mahdi Sabbaghi, Luze Sun, Alexander Robey, George J. Pappas, Eric Wong, Hamed Hassani
Site + Source Code on GitHub
Probabilistic Stability Guarantees for Feature Attributions
Helen Jin, Anton Xue, Weiqiu You, Surbhi Goel, Eric Wong
Blog Post + Source Code on GitHub
The FIX Benchmark: Extracting Features Interpretable to eXperts
Helen Jin, Shreya Havaldar, Chaehyeon Kim, Anton Xue, Weiqiu You, Helen Qu, Marco Gatti, Daniel A. Hashimoto, Bhuvnesh Jain, Amin Madani, Masao Sako, Lyle Ungar, Eric Wong
Journal of Data-centric Machine Learning Research (DMLR), 2025
Site + Blog Post + Source Code on GitHub
Sum-of-Parts Models: Faithful Attributions for Groups of Features
Weiqiu You, Helen Qu, Marco Gatti, Bhuvnesh Jain, Eric Wong
International Conference on Machine learning (ICML), 2025
Blog Post + Source Code on GitHub
DOLPHIN: A Programmable Framework for Scalable Neurosymbolic Learning
Aaditya Naik, Jason Liu, Claire Wang, Amish Sethi, Saikat Dutta, Mayur Naik, Eric Wong
International Conference on Machine learning (ICML), 2025
Adaptively profiling models with task elicitation
Davis Brown, Prithvi Balehannina, Helen Jin, Shreya Havaldar, Hamed Hassani, Eric Wong
The Road to Generalizable Neuro-Symbolic Learning Should be Paved with Foundation Models
Adam Stein, Aaditya Naik, Neelay Velingker, Eric Wong
Avoiding Copyright Infringement via Machine Unlearning
Guangyao Dou, Zheyuan Liu, Qing Lyu, Kaize Ding, Eric Wong
Findings of the Association for Computational Linguistics (NAACL), 2025
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
Anton Xue, Avishree Khare, Rajeev Alur, Surbhi Goel, Eric Wong
International Conference on Learning Representations (ICLR), 2025
Blog Post + Source Code on GitHub
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Alexander Robey, Eric Wong, Hamed Hassani, George J. Pappas
Transactions on Machine Learning Research (TMLR), 2025
Blog Post + Source Code on GitHub
Jailbreaking Black Box Large Language Models in Twenty Queries
Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong
3rd IEEE Conference on Secure and Trustworthy Machine Learning, 2025
Blog Post + Source Code on GitHub

2024

AR-Pro: Counterfactual Explanations for Anomaly Repair with Formal Properties
Xiayan Ji, Anton Xue, Eric Wong, Oleg Sokolsky, Insup Lee
Neural Information Processing Systems (NeurIPS), 2024
Source Code on GitHub
Crowd-sourced machine learning prediction of long COVID using data from the National COVID Cohort Collaborative
Timothy Bergquist, Johanna Loomba, Emily Pfaff, Fangfang Xia, Zixuan Zhao, Yitan Zhu, Elliot Mitchell, Biplab Bhattacharya, Gaurav Shetty, Tamanna Munia, Grant Delong, Adbul Tariq, Zachary Butzin-Dozier, Yunwen Ji, Haodong Li, Jeremy Coyle, Seraphina Shi, Rachael V. Philips, Andrew Mertens, Romain Pirracchio, Mark van der Laan, John M. Colford Jr., Alan Hubbard, Jifan Gao, Guanhua Chen, Neelay Velingker, Ziyang Li, Yinjun Wu, Adam Stein, Jiani Huang, Zongyu Dai, Qi Long, Mayur Naik, John Holmes, Danielle Mowery, Eric Wong, Ravi Parekh, Emily Getzen, Jake Hightower, Jennifer Blase
eBioMedicine
Data-Efficient Learning with Neural Programs
Alaia Solko-Breslin, Seewon Choi, Ziyang Li, Neelay Velingker, Rajeev Alur, Mayur Naik, Eric Wong
Neural Information Processing Systems (NeurIPS), 2024
Blog Post + Source Code on GitHub
Towards Compositionality in Concept Learning
Adam Stein, Aaditya Naik, Yinjun Wu, Mayur Naik, Eric Wong
International Conference on Machine learning (ICML), 2024
Blog Post + Source Code on GitHub
DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation
Yinjun Wu, Mayank Keoliya, Kan Chen, Neelay Velingker, Ziyang Li, Emily J Getzen, Qi Long, Mayur Naik, Ravi B Parikh, Eric Wong
International Conference on Machine learning (ICML), 2024
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong
Neural Information Processing Systems (NeurIPS), 2024
Site + Source Code on GitHub
Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Jiabao Ji, Bairu Hou, Alexander Robey, George J. Pappas, Hamed Hassani, Yang Zhang, Eric Wong, Shiyu Chang
Evaluating Groups of Features via Consistency, Contiguity, and Stability
Chaehyeon Kim, Weiqiu You, Shreya Havaldar, Eric Wong
International Conference on Learning Representations (ICLR), 2024 Tiny Papers Track
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation
Chongyu Fan, Jiancheng Liu, Yihua Zhang, Dennis Wei, Eric Wong, Sijia Liu
International Conference on Learning Representations (ICLR), 2024
Source Code on GitHub
Initialization Matters for Adversarial Transfer Learning
Andong Hua, Jindong Gu, Zhiyu Xue, Nicholas Carlini, Eric Wong, Yao Qin
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
TorchQL: A Programming Framework for Integrity Constraints in Machine Learning
Aaditya Naik, Adam Stein, Yinjun Wu, Eric Wong, Mayur Naik
Object-oriented Programming, Systems, Languages, and Applications (OOPSLA), 2024
Source Code on GitHub

2023

Comparing Styles across Languages
Shreya Havaldar, Matthew Pressimone, Eric Wong, Lyle Ungar
Empirical Methods in Natural Language Processing (EMNLP), 2023
Stability Guarantees for Feature Attributions with Multiplicative Smoothing
Anton Xue, Rajeev Alur, Eric Wong
Neural Information Processing Systems (NeurIPS), 2023
Blog Post + Source Code on GitHub
TopEx: Topic-based Explanations for Model Comparison
Shreya Havaldar, Adam Stein, Eric Wong, Lyle Ungar
International Conference on Learning Representations (ICLR), 2023 Tiny Papers Track
Rectifying Group Irregularities in Explanations for Distribution Shift
Adam Stein, Yinjun Wu, Eric Wong, Mayur Naik
Do Machine Learning Models Learn Statistical Rules Inferred from Data?
Aaditya Naik, Yinjun Wu, Mayur Naik, Eric Wong
International Conference on Machine learning (ICML), 2023
Blog Post + Source Code on GitHub
In-context Example Selection with Influences
Tai Nguyen, Eric Wong
Blog Post + Source Code on GitHub
Adversarial Prompting for Black Box Foundation Models
Natalie Maus*, Patrick Chao*, Eric Wong, Jacob Gardner
Blog Post + Source Code on GitHub
Faithful Chain-of-Thought Reasoning
Qing Lyu*, Shreya Havaldar*, Adam Stein*, Li Zhang, Delip Rao, Eric Wong, Marianna Apidianaki, Chris Callison-Burch
IJCNLP-AACL, 2023
Blog Post + Source Code on GitHub
A data-based perspective on transfer learning
Saachi Jain*, Hadi Salman*, Alaa Khaddaj*, Eric Wong, Sung Min Park, Aleksander Madry
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
Blog Post + Source Code on GitHub

2022

When does bias transfer in transfer learning
Hadi Salman*, Saachi Jain*, Andrew Ilyas*, Logan Engstrom*, Eric Wong, Aleksander Madry
Blog Post + Source Code on GitHub
Missingness bias in model debugging
Saachi Jain*, Hadi Salman*, Pengchuan Zhang, Vibhav Vineet, Sal Vemprala, Aleksander Madry
International Conference on Learning Representations (ICLR), 2022
Blog Post + Source Code on GitHub
Certified patch robustness via smoothed vision transformers
Hadi Salman*, Saachi Jain*, Eric Wong*, Aleksander Madry
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
Blog Post + Source Code on GitHub
DeepSplit: Scalable verification of deep neural networks via operator splitting
Shaoru Chen*, Eric Wong*, J. Zico Kolter, Mahyar Fazlyab
IEEE Open Journal of Control Systems (OJCS), 2022

2021

Leveraging Sparse Linear Layers for Debuggable Deep Networks
Eric Wong*, Shibani Santurkar*, Aleksander Madry
International Conference on Machine learning (ICML), 2021 Long Oral
Blog Post + Source Code on GitHub
Learning perturbation sets for robust machine learning
Eric Wong, J. Zico Kolter
International Conference on Learning Representations (ICLR), 2021
Blog Post + Source Code on GitHub

2020

Overfitting in adversarially robust deep learning
Leslie Rice*, Eric Wong*, J. Zico Kolter
International Conference on Machine learning (ICML), 2020
Source Code on GitHub
Neural network virtual sensors for fuel injection quantities with provable performance specifications
Eric Wong, Tim Schneider, Joerg Schmitt, Frank R. Schmidt, J. Zico Kolter
IEEE Intelligent Vehicles Syimposium (IV), 2020
Fast is better than free: revisiting adversarial training
Eric Wong*, Leslie Rice*, J. Zico Kolter
International Conference on Learning Representations (ICLR), 2020
Adversarial robustness against the union of multiple perturbation models
Pratyush Maini, Eric Wong, J. Zico Kolter
International Conference on Machine learning (ICML), 2020
Source Code on GitHub

2019

Wasserstein adversarial examples
Eric Wong, Frank R. Schmidt, J. Zico Kolter
International Conference on Machine Learning (ICML), 2019

2018

Scaling provable adversarial defenses
Eric Wong, Frank R. Schmidt, Jan Hendrik Metzen, J. Zico Kolter
In Neural Information Processing Systems (NeurIPS), 2018
Source Code on GitHub
Provable defenses against adversarial examples via the convex outer adversarial polytope
Eric Wong, J. Zico Kolter
International Conference on Machine Learning (ICML), 2018; Best defense paper at NIPS 2017 ML & Security Workshop
Blog Post + Source Code on GitHub

2017

A Semismooth Newton Method for Fast, Generic Convex Programming
Alnur Ali*, Eric Wong*, J. Zico Kolter
International Conference on Machine Learning (ICML), 2017
Source Code on GitHub

2015

An SVD and Derivative Kernel Approach to Learning from Geometric Data
Eric Wong, J. Zico Kolter
Conference on Artificial Intelligence (AAAI), 2015

Other

My PhD thesis can be found here.