apple
/

DepthPro-hf

@@ -1,87 +1,77 @@
 ---
 license: apple-ascl
 pipeline_tag: depth-estimation
-library_name: depth-pro
 ---
-# Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
-![Depth Pro Demo Image](https://github.com/apple/ml-depth-pro/raw/main/data/depth-pro-teaser.jpg)
-We present a foundation model for zero-shot metric monocular depth estimation. Our model, Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details. The predictions are metric, with absolute scale, without relying on the availability of metadata such as camera intrinsics. And the model is fast, producing a 2.25-megapixel depth map in 0.3 seconds on a standard GPU. These characteristics are enabled by a number of technical contributions, including an efficient multi-scale vision transformer for dense prediction, a training protocol that combines real and synthetic datasets to achieve high metric accuracy alongside fine boundary tracing, dedicated evaluation metrics for boundary accuracy in estimated depth maps, and state-of-the-art focal length estimation from a single image.
-Depth Pro was introduced in **[Depth Pro: Sharp Monocular Metric Depth in Less Than a Second](https://arxiv.org/abs/2410.02073)**, by *Aleksei Bochkovskii, Amaël Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, and Vladlen Koltun*.
-The checkpoint in this repository is a reference implementation, which has been re-trained. Its performance is close to the model reported in the paper but does not match it exactly.
-## How to Use
-Please, follow the steps in the [code repository](https://github.com/apple/ml-depth-pro) to set up your environment. Then you can download the checkpoint from the _Files and versions_ tab above, or use the `huggingface-hub` CLI:
 ```bash
-pip install huggingface-hub
-huggingface-cli download --local-dir checkpoints apple/DepthPro
 ```
-### Running from commandline
-The code repo provides a helper script to run the model on a single image:
-```bash
-# Run prediction on a single image:
-depth-pro-run -i ./data/example.jpg
-# Run `depth-pro-run -h` for available options.
-```
-### Running from Python
-```python
 from PIL import Image
-import depth_pro
-# Load model and preprocessing transform
-model, transform = depth_pro.create_model_and_transforms()
-model.eval()
-# Load and preprocess an image.
-image, _, f_px = depth_pro.load_rgb(image_path)
-image = transform(image)
-# Run inference.
-prediction = model.infer(image, f_px=f_px)
-depth = prediction["depth"]  # Depth in [m].
-focallength_px = prediction["focallength_px"]  # Focal length in pixels.
 ```
-### Evaluation (boundary metrics)
-Boundary metrics are implemented in `eval/boundary_metrics.py` and can be used as follows:
-```python
-# for a depth-based dataset
-boundary_f1 = SI_boundary_F1(predicted_depth, target_depth)
-# for a mask-based dataset (image matting / segmentation)
-boundary_recall = SI_boundary_Recall(predicted_depth, target_mask)
 ```
-## Citation
-If you find our work useful, please cite the following paper:
-```bibtex
-@article{Bochkovskii2024:arxiv,
-  author     = {Aleksei Bochkovskii and Ama\"{e}l Delaunoy and Hugo Germain and Marcel Santos and
-               Yichao Zhou and Stephan R. Richter and Vladlen Koltun}
-  title      = {Depth Pro: Sharp Monocular Metric Depth in Less Than a Second},
-  journal    = {arXiv},
-  year       = {2024},
-}
 ```
-## Acknowledgements
-Our codebase is built using multiple opensource contributions, please see [Acknowledgements](https://github.com/apple/ml-depth-pro/blob/main/ACKNOWLEDGEMENTS.md) for more details.
-Please check the paper for a complete list of references and datasets used in this work.

 ---
 license: apple-ascl
 pipeline_tag: depth-estimation
 ---
+# DepthPro: Monocular Depth Estimation
+Install the required libraries:
 ```bash
+pip install -q numpy pillow torch torchvision
+pip install -q git+https://github.com/geetu040/transformers.git@depth-pro-projects#egg=transformers
 ```
+Import the required libraries:
+```py
+import requests
 from PIL import Image
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from huggingface_hub import hf_hub_download
+import matplotlib.pyplot as plt
+# custom installation from this PR: https://github.com/huggingface/transformers/pull/34583
+# !pip install git+https://github.com/geetu040/transformers.git@depth-pro-projects#egg=transformers
+from transformers import DepthProConfig, DepthProImageProcessorFast, DepthProForDepthEstimation
 ```
+Load the model and image processor:
+```py
+checkpoint = "geetu040/DepthPro"
+revision = "project"
+image_processor = DepthProImageProcessorFast.from_pretrained(checkpoint, revision=revision)
+model = DepthProForDepthEstimation.from_pretrained(checkpoint, revision=revision)
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model = model.to(device)
 ```
+Inference:
+```py
+# inference
+url = "https://huggingface.co/spaces/geetu040/DepthPro_Segmentation_Human/resolve/main/assets/examples/man_with_arms_open.jpg"
+image = Image.open(requests.get(url, stream=True).raw)
+image = image.convert("RGB")
+# prepare image for the model
+inputs = image_processor(images=image, return_tensors="pt")
+inputs = {k: v.to(device) for k, v in inputs.items()}
+with torch.no_grad():
+  outputs = model(**inputs)
+# interpolate to original size
+post_processed_output = image_processor.post_process_depth_estimation(
+  outputs, target_sizes=[(image.height, image.width)],
+)
+# visualize the prediction
+depth = post_processed_output[0]["predicted_depth"]
+depth = (depth - depth.min()) / depth.max()
+depth = depth * 255.
+depth = depth.detach().cpu().numpy()
+depth = Image.fromarray(depth.astype("uint8"))
+# visualize the prediction
+fig, axes = plt.subplots(1, 2, figsize=(20, 20))
+axes[0].imshow(image)
+axes[0].set_title(f'Image {image.size}')
+axes[0].axis('off')
+axes[1].imshow(depth)
+axes[1].set_title(f'Depth {depth.size}')
+axes[1].axis('off')
+plt.subplots_adjust(wspace=0, hspace=0)
+plt.show()
 ```