Amazon’s papers at this year’s Computer Vision and Pattern Recognition Conference (CVPR), sorted by research topic.
3-D perception
Implicit surface contrastive clustering for LiDAR point clouds
Zaiwei Zhang, Min Bai, Erran Li
Anomaly classification
WinCLIP: Zero-/few-shot anomaly classification and segmentation
Jongheon Jeong, Yang Zou, Taewan Kim, Dongqing Zhang, Avinash Ravichandran, Onkar Dabeer
Data annotation
CVPR highlight*:
HandsOff: Labeled dataset generation with no additional human annotations
Austin Xu, Mariya Vasileva, Achal Dave, Arjun Seshadri
Knowledge combination to learn rotated detection without rotated annotation
Tianyu Zhu, Bryce Ferenczi, Pulak Purkait, Tom Drummond, Hamid Rezatofighi, Anton van den Hengel
Image generation
FlexNeRF: Photorealistic free-viewpoint rendering of moving humans from sparse views
Vinoj Jayasundara, Amit Agrawal, Nicolas Heron, Abhinav Shrivastava, Larry Davis
LEMaRT: Label-efficient masked region transform for image harmonization
Sheng Liu, Cong Phuoc Huynh, Cong Chen, Maxim Arap, Raffay Hamid
Image segmentation
Network-free, unsupervised semantic segmentation with synthetic images
Qianli Feng, Raghudeep Gadde, Wentong Liao, Eduard Ramon Maldonado, Aleix Martinez
PolyFormer: Referring image segmentation as sequential polygon generation
Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha
Spatio-Temporal Pixel-Level contrastive learning-based source-free domain adaptation for video semantic segmentation
Shao-Yuan Lo, Poojankumar Oza, Sumanth Chennupati, Alejandro Galindo, Vishal M. Patel
Machine learning
A meta-learning approach to predicting performance and data requirements
Achin Jain, Gurumurthy Swaminathan, Paolo Favaro, Hao Yang, Avinash Ravichandran, Hrayr Harutyunyan, Alessandro Achille, Onkar Dabeer, Bernt Schiele, Ashwin Swaminathan, Stefano Soatto
Leveraging inter-rater agreement for classification in the presence of noisy labels
Maria Sofia Bucarelli, Lucas Cassano, Federico Siciliano, Amin Mantrach, Fabrizio Silvestri
Train/test-time adaptation with retrieval
Luca Zancato, Alessandro Achille, Tian Yu Liu, Matthew Trager, Pramuditha Perera, Stefano Soatto
Multimodal models
Dynamic inference with grounding based vision and language models
Burak uzkent, Amanmeet Garg, Wentao Zhu, Keval Doshi, Jingru Yi, Andy Wang, Mohamed Omar
GIVL: Improving geographical inclusivity of vision-language models with pre-training methods
Da Yin, Feng Gao, Govind Thattai, Michael Johnston, Kai-Wei Chang
Grounding counterfactual explanation of image classifier to textual concept space
Siwon Kim, Jinoh Oh, Sungjin Lee, Seunghak Yu, Jae Do, Tara Taghavi
Understanding and constructing latent modality structures in multi-modal representation learning
Qian Jiang, Changyou Chen, Han Zhao, Liqun Chen, Qing Ping, Son Tran, Yi Xu, Belinda Zeng, Trishul Chilimbi
Object detection
ScaleDet: A scalable multi-dataset object detector
Yanbei Chen, Manchen Wang, Abhay Mittal, Zhenlin Xu, Paolo Favaro, Joe Tighe, Davide Modolo
Product characterization
Learning attribute and class-specific representation duet for fine-grained fashion analysis
Yang (Andrew) Jiao, Yan Gao, Jingjing Meng, Jin Shang, Yi Sun
SkiLL: Skipping Color and Label Landscape: self supervised design representations for products in e-commerce
Vinay Kumar Verma, Dween Rabius Sanny, Prateek Sircar, Shreyas Sunil Kulkarni, Deepak Gupta, Abhishek Singh
Video understanding
Movies2Scenes: Using movie metadata to learn scene representation
Shixing Chen, Chun-Hao Liu, Xiang Hao, Xiaohan Nie, Maxim Arap, Raffay Hamid
Selective structured state-spaces for long-form video understanding
Jue Wang, Wentao Zhu, Pichao Wang, Xiang Yu, Linda Liu, Mohamed Omar, Raffay Hamid
* distinction accorded to the top 10% of papers accepted to the conference