October 11, 2024

New Monocular Depth Estimation Models Released on Hugging Face

Listen to this article as Podcast
0:00 / 0:00
New Monocular Depth Estimation Models Released on Hugging Face

Advances in Depth Perception: New Models on Hugging Face

Depth perception, the ability of a computer to perceive the distance of objects in an image or scene, is a rapidly growing field within Artificial Intelligence (AI). Currently, two new models on the Hugging Face platform are causing a stir, each taking different approaches to mastering this complex task.

Depth Perception: A Key to Artificial Intelligence

Depth perception is crucial for many AI applications, from self-driving cars that need to detect obstacles to robots that need to navigate complex environments. But it also plays an important role in augmented reality, where virtual objects are seamlessly integrated into the real world.

Traditionally, depth perception has relied on specialized sensors like LiDAR, which emit laser beams and measure the time it takes for the light to return from an object. While accurate, this method is also expensive and cumbersome.

Therefore, monocular depth perception, where depth information is extracted from a single image, is increasingly coming into focus. However, this task is significantly more complex, as the algorithm only has two-dimensional information available to develop a three-dimensional understanding of the scene.

New Models on Hugging Face

Hugging Face, a platform for machine learning models and datasets, recently released two new models for monocular depth perception:

DepthPro by Apple

Developed by Apple, the DepthPro model is based on a Transformer network and is characterized by its speed and accuracy. It can generate high-resolution depth maps in real time, accurately capturing even fine details and object boundaries.

Of particular note is that DepthPro outputs depth information in metric units, i.e. the actual distance of objects in meters. This is a decisive advantage over models that can only provide relative depth information.

Lotus from the Hong Kong University of Science and Technology

The second model, Lotus, was developed by researchers at the Hong Kong University of Science and Technology and takes a diffusion-based approach. Diffusion models are a relatively new class of generative models that have recently gained attention for their ability to generate high-quality images.

Lotus leverages the power of diffusion models to create realistic and detailed depth maps. Unlike DepthPro, Lotus focuses on generating relative depth information, which, while not providing absolute distances, is still valuable for many applications.

The Future of Depth Perception

The release of DepthPro and Lotus on Hugging Face is another step in the rapid development of depth perception. Both models demonstrate the power of different approaches and open up new possibilities for AI applications.

It is to be expected that research in this area will continue to be driven forward at a rapid pace and that we will soon see further advances. The increasing prevalence of powerful models for monocular depth perception will further accelerate the development of new applications in areas such as robotics, augmented reality, and autonomous driving.

Bibliography

https://huggingface.co/apple/DepthPro
https://www.linkedin.com/posts/niels-rogge-a3b7a3127_2-new-depth-estimation-models-are-now-supported-activity-7216025583351181313-ST-M
https://huggingface.co/docs/transformers/tasks/monocular_depth_estimation
https://huggingface.co/papers
https://huggingface.co/papers/2406.09414
https://huggingface.co/blog/Isayoften/monocular-depth-estimation-guide
https://huggingface.co/models?pipeline_tag=depth-estimation
https://huggingface.co/docs/diffusers/v0.30.0/en/api/pipelines/marigold