publications | James Liu

2025

NeurIPS
REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites

Divyansh Garg, Shaun VanWeelden, Diego Caples, and 15 more authors

In Advances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks, 2025

Abs arXiv Bib HTML PDF

REAL comprises high-fidelity, deterministic replicas of 11 widely-used websites across e-commerce, travel, communication, and professional networking domains, with 112 practical tasks mirroring everyday complex user interactions. Frontier language models achieve at most a 41% success rate, highlighting critical gaps in autonomous web navigation and task completion.
@inproceedings{liu2025real, title = {REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites}, author = {Garg, Divyansh and VanWeelden, Shaun and Caples, Diego and Draguns, Andis and Ravi, Nikil and Putta, Pranav and Garg, Naman and Abraham, Tomas and Lara, Michael and Lopez, Federico and Liu, James and Gundawar, Atharva and Hebbar, Prannay and Joo, Youngchul and Gu, Jindong and London, Charles and Schroeder de Witt, Christian and Motwani, Sumeet}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks}, year = {2025}, }
In Progress
Diversity-Driven Multi-Agent Reinforcement Learning for Language Models

James Liu and Yejin Choi

2025

Ongoing research at Stanford AI Lab with Professor Yejin Choi

Abs Bib

Training shared-parameter language models using a multi-agent RL pipeline with VERL on Slurm-managed NVIDIA H100 clusters. Dual reward functions for quality and diversity improve Shannon Evenness Index by over 15%. Research conducted at the Stanford Artificial Intelligence Lab.
@article{liu2025marl, title = {Diversity-Driven Multi-Agent Reinforcement Learning for Language Models}, author = {Liu, James and Choi, Yejin}, year = {2025}, note = {Ongoing research at Stanford AI Lab with Professor Yejin Choi}, }