Apple's Depth Pro model 3D maps 2D images in a fraction of a second

Apple's Depth Pro model 3D maps 2D images in a fraction of a second
By: New Atlas Posted On: October 14, 2024 View: 29

Apple's Machine Learning Research wing has developed a foundational AI model "for zero-shot metric monocular depth estimation." Depth Pro enables high-speed generation of detailed 3D depth maps from a single two-dimensional image.

Our brains process visual information from two image sources – our eyes. Each has a slightly different view of the world, and these are combined into a single stereo image, with the differences also helping us to gauge how close or far objects are.

Many cameras and smartphones look at life through a single lens, but three dimensional depth maps can be created using information hidden in metadata of 2D photos (such as focal lengths and sensor info) or estimated using multiple images.

The Depth Pro system doesn't bother with all that though, yet is able to generate a detailed 3D depth map at 2.25 megapixels from a single image in 0.3 seconds via a standard graphics processing unit.

The AI model's architecture includes something called a multi-scale vision transformer to simultaneously process the overall context of an image as well as all the finer details like "hair, fur, and other fine structures." And it's able to estimate both relative and absolute depth, meaning that the model can furnish real-world measurements to allow, for example, augmented reality apps to precisely position virtual objects in a physical space.

The AI is able to do all this without needing resource-intensive training on very specific datasets, employing something called zero-shot learning – which IBM describes as "a machine learning scenario in which an AI model can recognize and categorize unseen classes without labeled examples." This makes for quite a versatile beast.

As for applications, beyond the AR scenario mentioned above, Depth Pro could make for much more efficient photo editing or even lead to real-time 3D imagery using a single-lens camera, and prove useful for helping machines like autonomous vehicles and robots to better perceive the world around them in real-time.

The project is still at the research stage, but perhaps unusually for Apple, the code and supporting documentation are being made available as open source on GitHub, allowing developers, scientists and coders to take the technology to the next level

A paper on the project has been published on the Arxiv server, and there's a live demo available for anyone who wants to experience the current version for themselves.

Source: Apple

Adblock test (Why?)

Read this on New Atlas Header Banner
  Contact Us
  • Contact Form
  Follow Us
  About

Brainfind is your one-stop shop for breaking news headlines and personalized news stories. Not only are we a news aggregator and content curator, we also allow registered users to publish their own articles on our website with full credit and their social links.