The 2024 Conference on Neural Information Processing Systems (NeurIPS) — the premier conference in the field of AI — begins today, and the Amazon papers accepted there display the breadth of the company’s AI research.
Large language models (LLMs) and other foundation models have dominated the field for the past few years, and Amazon’s papers reflect that trend, covering topics such as retrieval-augmented generation, the use of LLMs for code generation, commonsense reasoning, and multimodal models. Training methodology also emerges as an area of focus, with papers on memory-efficient training, reinforcement learning with human feedback, classification with rejection, and convergence rates in transformer models.
But Amazon’s papers also demonstrate an abiding interest in topics such as bandit problems — long a staple of Amazon’s NeurIPS submissions — and speech processing, as well as newer concerns such as the applications of machine learning to scientific computing and automated reasoning. And one paper, “B’MOJO: Hybrid state space realizations of foundation models with eidetic and fading memory”, proposes a new paradigm of machine learning, rooted in the concept of transductive learning.
Automated reasoning
Neural model checking
Mirco Giacobbe, Daniel Kroening, Abhinandan Pal, Michael Tautschnig
Bandit problems
Adaptive experimentation when you can’t experiment
Yao Zhao, Kwang-Sung Jun, Tanner Fiez, Lalit Jain
Online posterior sampling with a diffusion prior
Branislav Kveton, Boris Oreshkin, Youngsuk Park, Aniket Deshmukh, Rui Song
Code generation
Training LLMs to better self-debug and explain code
Nan Jiang, Xiaopeng LI, Shiqi Wang, Qiang Zhou, Baishakhi Ray, Varun Kumar, Xiaofei Ma, Anoop Deoras
Commonsense reasoning
Can language models learn to skip steps?
Tengxiao Liu, Qipeng Guo, Xiangkun Hu, Jiayang Cheng, Yue Zhang, Xipeng Qiu, Zheng Zhang
Computational fluid dynamics
WindsorML: High-fidelity computational fluid dynamics dataset for automotive aerodynamics
Neil Ashton, Jordan B. Angel, Aditya S. Ghate, Gaetan K. W. Kenway, Man Long Wong, Cetin Kiris, Astrid Walle, Danielle Maddix Robinson, Gary Page
LLM evaluation
SetLexSem Challenge: Using set operations to evaluate the lexical and semantic robustness of language models
Bardiya Akhbari, Manish Gawali, Nicholas Dronen
Memory management
Online weighted paging with unknown weights
Orin Levy, Aviv Rosenberg, Noam Touitou
Model architecture
B’MOJO: Hybrid state space realizations of foundation models with eidetic and fading memory
Luca Zancato, Arjun Seshadri, Yonatan Dukler, Aditya Golatkar, Yantao Shen, Ben Bowman, Matthew Trager, Alessandro Achille, Stefano Soatto
Privacy
Reconstruction attacks on machine unlearning: Simple models are vulnerable
Martin Bertran Lopez, Shuai Tang, Michael Kearns, Jamie Morgenstern, Aaron Roth, Zhiwei Steven Wu
Retrieval-augmented generation (RAG)
RAGChecker: A fine-grained framework for diagnosing retrieval-augmented generation
Dongyu Ru, Lin Qiu, Xiangkun Hu, Tianhang Zhang, Peng Shi, Shuaichen Chang, Cheng Jiayang, Cunxiang Wang, Shichao Sun, Huanyu Li, Zizhao Zhang, Binjie Wang, Jiarong Jiang, Tong He, Zhiguo Wang, Pengfei Liu, Yue Zhang, Zheng Zhang
Speech processing
CA-SSLR: Condition-aware self-supervised learning representation for generalized speech processing
Yen-Ju Lu, Jing Liu, Thomas Thebaud, Laureano Moro-Velazquez, Ariya Rastrow, Najim Dehak, Jesus Villalba
Training methods
CoMERA: Computing- and memory-efficient training via rank-adaptive tensor optimization
Zi Yang, Ziyue Liu, Samridhi Choudhary, Xinfeng Xie, Cao Gao, Siegfried Kunzmann, Zheng Zhang
Optimal design for human preference elicitation
Subhojyoti Mukherjee, Anusha Lalitha, Kousha Kalantari, Aniket Deshmukh, Ge Liu, Yifei Ma, Branislav Kveton
Rejection via learning density ratios
Alexander Soen, Hisham Husain, Philip Schulz, Vu Nguyen
Unraveling the gradient descent dynamics of transformers
Bingqing Song, Boran Han, Shuai Zhang, Jie Ding, Mingyi Hong
Video
One token to seg them all: Language instructed reasoning segmentation in videos
Zechen Bai, Tong He, Haiyang Mei, Pichao Wang, Ziteng Gao, Joya Chen, Lei Liu, Pichao Wang, Zheng Zhang, Mike Zheng Shou
Video token merging for long-form video understanding
Seon Ho Lee, Jue Wang, Zhikang Zhang, David Fan, Xinyu (Arthur) Li
Vision-language models
Unified lexical representation for interpretable visual-language alignment
Yifan Li, Yikai Wang, Yanwei Fu, Dongyu Ru, Zheng Zhang, Tong He