Apple is presenting new research at the annual conference on IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), which takes place in person in Nashville, Tennessee from June 11 to June 15. We are proud to sponsor the conference, which brings together the scientific and industrial research communities in computer vision and pattern recognition. Below is an overview of Apple's participation at CVPR 2025.
Jump to a section:
Schedule
Stop by the Apple booth in the Music City Center, booth #1217, during exhibition hours. All times listed in CDT (Nashville time):
- Friday, June 13: 10:00am - 6:30pm
- Saturday, June 14: 10:00am - 6:30pm
- Sunday, June 15: 10:00am - 3:00pm
Wednesday, June 11
- WORKSHOP
- LatinX in Computer Vision (LXCV)
- 8:00am - 3:00pm, Room 105 A
- Arnab Kumar Mondal will be representing Apple at the LXCV Mentoring hour
- WORKSHOP
- Computer Vision for Metaverse Workshop (CV4Metaverse) 2025
- 8:10am - 12:20pm, Room 107 A
-
- POSTER
- "A Stereo Image Quality Predictor for AR/VR"
- Netanel Tamir (Weizmann Institute of Science), Shir Amir, Ranel Itzhaky, Noam Atia (Tel Aviv University), Shobhita Sundaram (Massachusetts Institute of Technology), Stephanie Fu (Massachusetts Institute of Technology), Miriam Farber, Ron Sokolovsky, Richard Zhang (Independent researcher), Tali Dekel (Weizmann Institute of Science), Phillip Isola (Massachusetts Institute of Technology)
- WORKSHOP
- Fine-Grained Visual Categorization (FGVC12) 2025
- 9:00am - 5:15pm, Room 104 E
-
- POSTER
- "Rethinking Semi-Supervised Domain Adaptation for Semantic Segmentation with Semi-Supervised Learning in the Foundation Model Era"
- Joshua Kurien (University of Waterloo), Bavesh Balaji (University of Waterloo), Henry Lai, Pablo Guerrero Vela, C Thomas, Alex Wong, Sirisha Rambhatla
- INVITED TALK
- Workshop on Video Large Language Models (VidLLMs)
- 9:40am - 10:10am, Grand A1
- Presenter: Afshin Dehghan
- TUTORIAL INVITED TALK
- CVPR Tutorial on Scalable Generative Models in Computer Vision
- 10:50am - 11:40am, Room 202 B
- Presenter: Jiatao Gu
- INVITED TALK
- Workshop on Generative Models for Computer Vision
- 2:40pm - 3:10pm, Grand A2
- Presenter: Jiatao Gu
- INVITED TALK
- Workshop on Uncertainty Quantification for Computer Vision
- 3:50pm - 4:30pm, Room 102 B
- Presenter: Michael Kirchhof
Thursday, June 12
- WORKSHOP
- Women in Computer Vision (WiCV)
- 8:30am - 1:00pm (Workshop), Room 105 B
- 6:00pm - 8:00pm (Mentorship Dinner), Room 202 C
- Fazilet Gokbudak, Jess Knowles, and Michael Kirchhof will be representing Apple at the WiCV Mentorship Dinner
- INVITED TALK
- Visual Generative Modeling: What's After Diffusion?
- 2:30pm - 3:00pm, Room 103 A
- Presenter: Jiatao Gu
- WORKSHOP KEYNOTE
- Workshop on Open-World 3D Scene Understanding with Foundation Models (OpenSUN3D)
- 3:45pm - 4:15pm, Room 105 A
- Presenter: Afshin Dehghan
Friday, June 13
- HIGHLIGHT POSTER
- Multimodal Autoregressive Pre-Training of Large Vision Encoders
- 4:00pm - 6:00pm, #407, Poster Session 2, Exhibit Hall D
- Enrico Fini, Mustafa Shukor (Sorbonne University), Xiujun Li, Philipp Dufter, Michal Klein, David Haldimann, Sai Aitharaju, Louis Béthune, Zhe Gan, Victor Turrisi, Alexander Toshev, Marcin Eichner, Yinfei Yang, Moin Nabi, Josh Susskind, Alaaeldin El-Nouby
Saturday, June 14
- ORAL PRESENTATION
- From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
- 9:00am - 10:15am, Presentation #5, Oral Session 3, Davidson Ballroom
- Andrew Szot (Georgia Institute of Technology), Bogdan Mazoure, Omar Attia, Aleksei Timofeev, Harsh Agrawal, Devon Hjelm, Zhe Gan, Zsolt Kira (Georgia Institute of Technology), Alexander Toshev
- POSTER
- From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
- 10:30am - 12:30pm, #329, Poster Session 3, Exhibit Hall D
- Andrew Szot (Georgia Institute of Technology), Bogdan Mazoure, Omar Attia, Aleksei Timofeev, Harsh Agrawal, Devon Hjelm, Zhe Gan, Zsolt Kira (Georgia Institute of Technology), Alexander Toshev
- HIGHLIGHT POSTER
- Matrix3D: Large Photogrammetry Model All-in-One
- 10:30am - 12:30pm, #57, Poster Session 3, Exhibit Hall D
- Yuanxun Lu (Nanjing University), Jingyang Zhang, Tian Fang, Danny Nahmias, Yanghai Tsin, Long Quan (Hong Kong University of Science and Technology), Xun Cao (Nanjing University), Yao Yao (Nanjing University), Shiwei Li
- POSTER
- FastVLM: Efficient Vision Encoding for Vision Language Models
- 5:00pm - 7:00pm, #378, Poster Session 4, Exhibit Hall D
- Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li, Cem Koc, Nate True, Albert Antony, Gokul Santhanam, James Gabriel, Peter Grasch, Oncel Tuzel, Hadi Pour Ansari
Sunday, June 15
- HIGHLIGHT POSTER
- Cubify Anything: Scaling Indoor 3D Object Detection
- 10:30am - 12:30pm, #112, Poster Session 5, Exhibit Hall D
- Justin Lazarow, David Griffiths, Gefen Kohavi, Francisco Crespo, Afshin Dehghan
- HIGHLIGHT POSTER
- World-Consistent Video Diffusion with Explicit 3D Modeling
- 10:30am - 12:30pm, #60, Poster Session 5, Exhibit Hall D
- Qihang Zhang (The Chinese University of Hong Kong), Kevin Miao, Shuangfei Zhai, Miguel Angel Bautista Martin, Alexander Toshev, Josh Susskind, Jiatao Gu
- POSTER
- Novel View Synthesis with Pixel-Space Diffusion Models
- 4:00pm - 6:00pm, #59, Poster Session 6, Exhibit Hall D
- Noam Elata (Technion), Bahjat Kawar, Yaron Ostrovsky-Berman, Miriam Farber, Ron Sokolovsky
Booth Programming & Demos
Visit Apple's booth at Music City Center, Booth #1217, during exhibition hours.
Featured Research Sessions
- IN-BOOTH POSTER SESSION
- FastVLM: Efficient Vision Encoding for Vision Language Models
- Friday, June 13, 10:00am - 12:30pm
- Presenter: Pavan Kumar Anasosalu Vasu
- IN-BOOTH POSTER SESSION
- Matrix3D: Large Photogrammetry Model All-in-One
- Friday, June 13, 10:00am - 12:30pm
- Presenter: Yuanxun Lu
- IN-BOOTH POSTER SESSION
- World-Consistent Video Diffusion with Explicit 3D Modeling
- Saturday, June 14, 10:00am - 12:30pm
- Presenter: Jiatao Gu
Technical Demos
- DEMO
- FastVLM
- FastVLM is a family of mobile-friendly vision language models.These models use a mix of CNN and Transformer architectures for vision encoding designed specifically for processing high-resolution images. Together, they deliver the best balance between accuracy and speed.
- Friday, June 13: 10:00am - 12:30pm, 2:30pm - 4:30pm
- Saturday, June 14: 10:00am - 12:30pm, 2:30pm - 4:30pm
- Sunday, June 15: 10:00am - 12:30pm
Accepted Papers
- Cubify Anything: Scaling Indoor 3D Object Detection
- Justin Lazarow, David Griffiths, Gefen Kohavi, Francisco Crespo, Afshin Dehghan
- FastVLM: Efficient Vision Encoding for Vision Language Models
- Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li, Cem Koc, Nate True, Albert Antony, Gokul Santhanam, James Gabriel, Peter Grasch, Oncel Tuzel, Hadi Pour Ansari
- From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
- Andrew Szot (Georgia Institute of Technology), Bogdan Mazoure, Omar Attia, Aleksei Timofeev, Harsh Agrawal, Devon Hjelm, Zhe Gan, Zsolt Kira (Georgia Institute of Technology), Alexander Toshev
- Matrix3D: Large Photogrammetry Model All-in-One
- Yuanxun Lu (Nanjing University), Jingyang Zhang, Tian Fang, Danny Nahmias, Yanghai Tsin, Long Quan (Hong Kong University of Science and Technology), Xun Cao (Nanjing University), Yao Yao (Nanjing University), Shiwei Li
- Multimodal Autoregressive Pre-training of Large Vision Encoders
- Enrico Fini, Mustafa Shukor (Sorbonne University), Xiujun Li, Philipp Dufter, Michal Klein, David Haldimann, Sai Aitharaju, Louis Béthune, Zhe Gan, Victor Turrisi, Alexander Toshev, Marcin Eichner, Yinfei Yang, Moin Nabi, Josh Susskind, Alaaeldin El-Nouby
- Novel View Synthesis with Pixel-Space Diffusion Models
- Noam Elata (Technion), Bahjat Kawar, Yaron Ostrovsky-Berman, Miriam Farber, Ron Sokolovsky
- World-Consistent Video Diffusion with Explicit 3D Modeling
- Qihang Zhang (The Chinese University of Hong Kong), Kevin Miao, Shuangfei Zhai, Miguel Angel Bautista Martin, Alexander Toshev, Josh Susskind, Jiatao Gu
Acknowledgements
Jack Langerman is Workshop Co-Organizer for the Workshop on Urban Scene Modeling: Where Vision Meets Photogrammetry and Graphics at CVPR.
Jeff Bigham is Workshop Co-Organizer for the VizWiz Grand Challenge Workshop at CVPR.
Qi Shan is Session Chair for CVPR.
Alex Colburn, Fartash Faghri, Hadi Pour Ansari, Mingze Xu, and Oncel Tuzel are Area Chairs for CVPR.
Amin Karimi Monsefi, Andrew Szot, Guandao Yang, Harsh Agrawal, Helisa Dhamo, Huangjie Zheng, Jack Langerman, Jiatao Gu, Liangchen Song, Michael Kirchhof, Marcin Eichner, Noam Elata, Pavan Kumar Anasosalu Vasu, Peter Fu, Raviteja Vemulapalli, Shaobo Fang, Rick Chang, Xiaoming Zhao, and Xudong Liu are Reviewers for CVPR.
Related readings and updates.
Apple will be sponsoring the International Conference on Learning Representations (ICLR), which will take place in person from April 24 to 28, 2025, in Singapore. ICLR brings together professionals dedicated to the advancement of deep learning.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024
June 10, 2024research area Computer Vision
Apple sponsored the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), which took place in person from June 17 to 21 in Seattle, Washington. CVPR is the annual computer vision event comprising the main conference and several co-located workshops and short courses. Below was the schedule of our sponsored workshops and events at CVPR 2024.