Amazon’s papers at this year’s International Conference on Computer Vision, organized by topic.
3-D
HAL3D: Hierarchical active learning for fine-grained 3D part labeling
Fenggen Yu, Yiming Qian, Francisca Gil Ureta, Brian Jackson, Eric Bennett, Richard Zhang
ImGeoNet: Image-induced geometry-aware voxel representation for multi-view 3D object detection
Tao Tu, Shun-Po Chuang, Yu-Lun Liu, Cheng Sun, Ke Zhang, Donna Roy, Cheng-Hao Kuo, Min Sun
Action recognition
SkeleTR: Towards skeleton-based action recognition in the wild
Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joe Tighe, Alessandro Bergamo
Data representation
Linear spaces of meanings: Compositional structures in vision-language models
Matthew Trager, Pramuditha Perera, Luca Zancato, Alessandro Achille, Parminder Bhatia, Stefano Soatto
Motion-guided masking for spatiotemporal representation learning
David Fan, Jue Wang, Leo Liao, Yi Zhu, Vimal Bhat, Hector Santos, Rohith Mysore Vijaya Kumar, Xinyu (Arthur) Li
Dubbed-video generation
SIDGAN: High-resolution dubbed video generation via shift-invariant learning
Urwa Muaz, Wondong Jang, Rohun Tripathi, santhosh Mani, Wenbin Ouyang, Ravi Teja Gadde, Baris Gecer, Sergio Elizondo, Reza Madad, Naveen Nair
Geospatial foundation models
Towards geospatial foundation models via continual pretraining
Matias Mendieta, Boran Han, Xingjian Shi, Yi Zhu, Chen Chen
Graph neural networks
Learning adaptive neighborhoods for graph neural networks
Avi Saha, Oscar Mendez, Chris Russell, Richard Bowden
Image retrieval
FashionNTM: Multi-turn fashion image retrieval via cascaded memory
Anwesan Pal, Sahil Wadhwa, Ayush Jaiswal, Xu Zhang, Yue Wu, Rakesh Chada, Pradeep Natarajan, Henrik I. Christensen
Image segmentation
Coarse-to-fine amodal segmentation with shape prior
Jianxiong Gao, Xuelin Qian, Yikai Wang, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu
LD-ZNet: A latent diffusion approach for text-based image segmentation
Koutilya PNVR, Bharat Singh, Pallabi Ghosh, Behjat Siddiquie, David Jacobs
Rethinking amodal video segmentation from learning supervised signals with object-centric representation
Ke Fan, Jingshi Lei, Xuelin Qian, Miaopeng Yu, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu
Information extraction
DocTr: Document transformer for structured information extraction in documents
Haofu Liao, Aruni RoyChowdhury, Weijian Li, Ankan Bansal, Yuting Zhang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan
Machine unlearning
SAFE: Machine unlearning with shard graph
Yonatan Dukler, Ben Bowman, Alessandro Achille, Aditya Golatkar, Ashwin Swaminathan, Stefano Soatto
Object detection
Bidirectional alignment for domain adaptive detection with transformers
Liqiang He, Wei Wang, Albert Chen, Min Sun, Cheng-Hao Kuo, Sinisa Todorovic
Unsupervised open-vocabulary object localization in videos
Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He
Object tracking
Object-centric multiple object tracking
Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao
Scene text recognition
CLIPTER: Looking at the bigger picture in scene text recognition
Aviad Aberdam, David Haim Bensaid, Alona Golts, Roy Ganz, Oren Nuriel, Royee Tichauer, Shai Mazor, Ron Litman
Towards models that can see and read
Roy Ganz, Oren Nuriel, Aviad Aberdam, Yair Kittenplon, Shai Mazor, Ron Litman
Transfer learning
PADCLIP: Pseudo-labeling with adaptive debiasing in CLIP for unsupervised domain adaptation
Zhengfeng Lai, Sol Vesdapunt, Ning Zhou, Jun Wu, Cong Phuoc Huynh, Xuelu Li, Kah Kuen Fu, Chen-Nee Chuah
Video retrieval
Audio-enhanced text-to-video retrieval using text-conditioned feature alignment
Sarah Ibrahimi, Xiaohang Sun, Pichao Wang, Amanmeet Garg, Ashutosh Sanan, Mohamed Omar
Video segmentation
MEGA: Multimodal alignment aggregation and distillation for cinematic video segmentation
Najmeh Sadoughi, Xinyu (Arthur) Li, Avijit Vajpayee, David Fan, Bing Shuai, Hector Santos, Vimal Bhat, Rohith Mysore Vijaya Kumar