Instructions to use microsoft/cvt-13-384 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/cvt-13-384 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-classification", model="microsoft/cvt-13-384") pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")# Load model directly from transformers import AutoImageProcessor, AutoModelForImageClassification processor = AutoImageProcessor.from_pretrained("microsoft/cvt-13-384") model = AutoModelForImageClassification.from_pretrained("microsoft/cvt-13-384") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| tags: | |
| - vision | |
| - image-classification | |
| datasets: | |
| - imagenet-1k | |
| widget: | |
| - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg | |
| example_title: Tiger | |
| - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/teapot.jpg | |
| example_title: Teapot | |
| - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/palace.jpg | |
| example_title: Palace | |
| # Convolutional Vision Transformer (CvT) | |
| CvT-13 model pre-trained on ImageNet-1k at resolution 384x384. It was introduced in the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Wu et al. and first released in [this repository](https://github.com/microsoft/CvT). | |
| Disclaimer: The team releasing CvT did not write a model card for this model so this model card has been written by the Hugging Face team. | |
| ## Usage | |
| Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: | |
| ```python | |
| from transformers import AutoFeatureExtractor, CvtForImageClassification | |
| from PIL import Image | |
| import requests | |
| url = 'http://images.cocodataset.org/val2017/000000039769.jpg' | |
| image = Image.open(requests.get(url, stream=True).raw) | |
| feature_extractor = AutoFeatureExtractor.from_pretrained('microsoft/cvt-13-384') | |
| model = CvtForImageClassification.from_pretrained('microsoft/cvt-13-384') | |
| inputs = feature_extractor(images=image, return_tensors="pt") | |
| outputs = model(**inputs) | |
| logits = outputs.logits | |
| # model predicts one of the 1000 ImageNet classes | |
| predicted_class_idx = logits.argmax(-1).item() | |
| print("Predicted class:", model.config.id2label[predicted_class_idx]) | |
| ``` | |
| ``` |