eventJune 4, 2025

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025

Apple is presenting new research at the annual conference on IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), which takes place in person in Nashville, Tennessee from June 11 to June 15. We are proud to sponsor the conference, which brings together the scientific and industrial research communities in computer vision and pattern recognition. Below is an overview of Apple's participation at CVPR 2025.

Jump to a section:

Schedule
Booth Programming & Demos
Accepted Papers
Acknowledgements

Schedule

Stop by the Apple booth in the Music City Center, booth #1217, during exhibition hours. All times listed in CDT (Nashville time):

Friday, June 13: 10:00am - 6:30pm
Saturday, June 14: 10:00am - 6:30pm
Sunday, June 15: 10:00am - 3:00pm

Wednesday, June 11

WORKSHOP
LatinX in Computer Vision (LXCV)
8:00am - 3:00pm, Room 105 A
Arnab Kumar Mondal will be representing Apple at the LXCV Mentoring hour

WORKSHOP
Computer Vision for Metaverse Workshop (CV4Metaverse) 2025
8:10am - 12:20pm, Room 107 A
- POSTER
- "A Stereo Image Quality Predictor for AR/VR"
- Netanel Tamir (Weizmann Institute of Science), Shir Amir, Ranel Itzhaky, Noam Atia (Tel Aviv University), Shobhita Sundaram (Massachusetts Institute of Technology), Stephanie Fu (Massachusetts Institute of Technology), Miriam Farber, Ron Sokolovsky, Richard Zhang (Independent researcher), Tali Dekel (Weizmann Institute of Science), Phillip Isola (Massachusetts Institute of Technology)

WORKSHOP
Fine-Grained Visual Categorization (FGVC12) 2025
9:00am - 5:15pm, Room 104 E
- POSTER
- "Rethinking Semi-Supervised Domain Adaptation for Semantic Segmentation with Semi-Supervised Learning in the Foundation Model Era"
- Joshua Kurien (University of Waterloo), Bavesh Balaji (University of Waterloo), Henry Lai, Pablo Guerrero Vela, C Thomas, Alex Wong, Sirisha Rambhatla

INVITED TALK
Workshop on Video Large Language Models (VidLLMs)
9:40am - 10:10am, Grand A1
Presenter: Afshin Dehghan

TUTORIAL INVITED TALK
CVPR Tutorial on Scalable Generative Models in Computer Vision
10:50am - 11:40am, Room 202 B
Presenter: Jiatao Gu

INVITED TALK
Workshop on Generative Models for Computer Vision
2:40pm - 3:10pm, Grand A2
Presenter: Jiatao Gu

INVITED TALK
Workshop on Uncertainty Quantification for Computer Vision
3:50pm - 4:30pm, Room 102 B
Presenter: Michael Kirchhof

Thursday, June 12

WORKSHOP
Women in Computer Vision (WiCV)
8:30am - 1:00pm (Workshop), Room 105 B
6:00pm - 8:00pm (Mentorship Dinner), Room 202 C
Fazilet Gokbudak, Jess Knowles, and Michael Kirchhof will be representing Apple at the WiCV Mentorship Dinner

INVITED TALK
Visual Generative Modeling: What's After Diffusion?
2:30pm - 3:00pm, Room 103 A
Presenter: Jiatao Gu

WORKSHOP KEYNOTE
Workshop on Open-World 3D Scene Understanding with Foundation Models (OpenSUN3D)
3:45pm - 4:15pm, Room 105 A
Presenter: Afshin Dehghan

Friday, June 13

HIGHLIGHT POSTER
Multimodal Autoregressive Pre-Training of Large Vision Encoders
4:00pm - 6:00pm, #407, Poster Session 2, Exhibit Hall D
Enrico Fini, Mustafa Shukor (Sorbonne University), Xiujun Li, Philipp Dufter, Michal Klein, David Haldimann, Sai Aitharaju, Louis Béthune, Zhe Gan, Victor Turrisi, Alexander Toshev, Marcin Eichner, Yinfei Yang, Moin Nabi, Josh Susskind, Alaaeldin El-Nouby

Saturday, June 14

ORAL PRESENTATION
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
9:00am - 10:15am, Presentation #5, Oral Session 3, Davidson Ballroom
Andrew Szot (Georgia Institute of Technology), Bogdan Mazoure, Omar Attia, Aleksei Timofeev, Harsh Agrawal, Devon Hjelm, Zhe Gan, Zsolt Kira (Georgia Institute of Technology), Alexander Toshev

POSTER
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
10:30am - 12:30pm, #329, Poster Session 3, Exhibit Hall D
Andrew Szot (Georgia Institute of Technology), Bogdan Mazoure, Omar Attia, Aleksei Timofeev, Harsh Agrawal, Devon Hjelm, Zhe Gan, Zsolt Kira (Georgia Institute of Technology), Alexander Toshev

HIGHLIGHT POSTER
Matrix3D: Large Photogrammetry Model All-in-One
10:30am - 12:30pm, #57, Poster Session 3, Exhibit Hall D
Yuanxun Lu (Nanjing University), Jingyang Zhang, Tian Fang, Danny Nahmias, Yanghai Tsin, Long Quan (Hong Kong University of Science and Technology), Xun Cao (Nanjing University), Yao Yao (Nanjing University), Shiwei Li

POSTER
FastVLM: Efficient Vision Encoding for Vision Language Models
5:00pm - 7:00pm, #378, Poster Session 4, Exhibit Hall D
Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li, Cem Koc, Nate True, Albert Antony, Gokul Santhanam, James Gabriel, Peter Grasch, Oncel Tuzel, Hadi Pour Ansari

Sunday, June 15

HIGHLIGHT POSTER
Cubify Anything: Scaling Indoor 3D Object Detection
10:30am - 12:30pm, #112, Poster Session 5, Exhibit Hall D
Justin Lazarow, David Griffiths, Gefen Kohavi, Francisco Crespo, Afshin Dehghan

HIGHLIGHT POSTER
World-Consistent Video Diffusion with Explicit 3D Modeling
10:30am - 12:30pm, #60, Poster Session 5, Exhibit Hall D
Qihang Zhang (The Chinese University of Hong Kong), Kevin Miao, Shuangfei Zhai, Miguel Angel Bautista Martin, Alexander Toshev, Josh Susskind, Jiatao Gu

POSTER
Novel View Synthesis with Pixel-Space Diffusion Models
4:00pm - 6:00pm, #59, Poster Session 6, Exhibit Hall D
Noam Elata (Technion), Bahjat Kawar, Yaron Ostrovsky-Berman, Miriam Farber, Ron Sokolovsky

Booth Programming & Demos

Visit Apple's booth at Music City Center, Booth #1217, during exhibition hours.

Featured Research Sessions

IN-BOOTH POSTER SESSION
FastVLM: Efficient Vision Encoding for Vision Language Models
Friday, June 13, 10:00am - 12:30pm
Presenter: Pavan Kumar Anasosalu Vasu

IN-BOOTH POSTER SESSION
Matrix3D: Large Photogrammetry Model All-in-One
Friday, June 13, 10:00am - 12:30pm
Presenter: Yuanxun Lu

IN-BOOTH POSTER SESSION
World-Consistent Video Diffusion with Explicit 3D Modeling
Saturday, June 14, 10:00am - 12:30pm
Presenter: Jiatao Gu

Technical Demos

DEMO
FastVLM
FastVLM is a family of mobile-friendly vision language models.These models use a mix of CNN and Transformer architectures for vision encoding designed specifically for processing high-resolution images. Together, they deliver the best balance between accuracy and speed.

Friday, June 13: 10:00am - 12:30pm, 2:30pm - 4:30pm
Saturday, June 14: 10:00am - 12:30pm, 2:30pm - 4:30pm
Sunday, June 15: 10:00am - 12:30pm

Accepted Papers

Cubify Anything: Scaling Indoor 3D Object Detection
Justin Lazarow, David Griffiths, Gefen Kohavi, Francisco Crespo, Afshin Dehghan

FastVLM: Efficient Vision Encoding for Vision Language Models
Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li, Cem Koc, Nate True, Albert Antony, Gokul Santhanam, James Gabriel, Peter Grasch, Oncel Tuzel, Hadi Pour Ansari

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
Andrew Szot (Georgia Institute of Technology), Bogdan Mazoure, Omar Attia, Aleksei Timofeev, Harsh Agrawal, Devon Hjelm, Zhe Gan, Zsolt Kira (Georgia Institute of Technology), Alexander Toshev

Matrix3D: Large Photogrammetry Model All-in-One
Yuanxun Lu (Nanjing University), Jingyang Zhang, Tian Fang, Danny Nahmias, Yanghai Tsin, Long Quan (Hong Kong University of Science and Technology), Xun Cao (Nanjing University), Yao Yao (Nanjing University), Shiwei Li

Multimodal Autoregressive Pre-training of Large Vision Encoders
Enrico Fini, Mustafa Shukor (Sorbonne University), Xiujun Li, Philipp Dufter, Michal Klein, David Haldimann, Sai Aitharaju, Louis Béthune, Zhe Gan, Victor Turrisi, Alexander Toshev, Marcin Eichner, Yinfei Yang, Moin Nabi, Josh Susskind, Alaaeldin El-Nouby

Novel View Synthesis with Pixel-Space Diffusion Models
Noam Elata (Technion), Bahjat Kawar, Yaron Ostrovsky-Berman, Miriam Farber, Ron Sokolovsky

World-Consistent Video Diffusion with Explicit 3D Modeling
Qihang Zhang (The Chinese University of Hong Kong), Kevin Miao, Shuangfei Zhai, Miguel Angel Bautista Martin, Alexander Toshev, Josh Susskind, Jiatao Gu

Acknowledgements

Jack Langerman is Workshop Co-Organizer for the Workshop on Urban Scene Modeling: Where Vision Meets Photogrammetry and Graphics at CVPR.

Jeff Bigham is Workshop Co-Organizer for the VizWiz Grand Challenge Workshop at CVPR.

Qi Shan is Session Chair for CVPR.

Alex Colburn, Fartash Faghri, Hadi Pour Ansari, Mingze Xu, and Oncel Tuzel are Area Chairs for CVPR.

Amin Karimi Monsefi, Andrew Szot, Guandao Yang, Harsh Agrawal, Helisa Dhamo, Huangjie Zheng, Jack Langerman, Jiatao Gu, Liangchen Song, Michael Kirchhof, Marcin Eichner, Noam Elata, Pavan Kumar Anasosalu Vasu, Peter Fu, Raviteja Vemulapalli, Shaobo Fang, Rick Chang, Xiaoming Zhao, and Xudong Liu are Reviewers for CVPR.