AI vs Real Art Detection (ViT)
This model is a Fine-Tuned Vision Transformer (ViT) designed to distinguish between Human-Created Artwork and AI-Generated Images.
It is based on google/vit-base-patch16-224-in21k and has been fine-tuned on a dataset of scraped web images containing digital art, paintings, and AI generations.
Model Details
- Model Architecture: Vision Transformer (ViT)
- Task: Binary Image Classification
- Classes:
AI_GENERATEDvsREAL - Base Model: Google ViT (Patch 16, 224x224)
Performance Metrics
The model achieves an overall accuracy of 80.09%.
Detailed Classification Report
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| AI_GENERATED | 0.7338 | 0.9444 | 0.8259 | 108 |
| REAL | 0.9221 | 0.6574 | 0.7676 | 108 |
| Accuracy | 0.8009 | 216 | ||
| Macro Avg | 0.8279 | 0.8009 | 0.7967 | 216 |
| Weighted Avg | 0.8279 | 0.8009 | 0.7967 | 216 |
Confusion Matrix
Analysis
The model demonstrates a high recall for AI_GENERATED images (94.4%), meaning it is very effective at catching AI artwork. However, it has lower recall for REAL images (65.7%), suggesting that it is strict and occasionally misclassifies real artwork as AI-generated.
How to Use
You can use this model with the Hugging Face pipeline.
from transformers import pipeline
# Load the pipeline
pipe = pipeline("image-classification", model="Kowshik24/ai_vs_real_image_detection_art")
# Predict
image_path = "path_to_your_image.jpg"
result = pipe(image_path)
print(result)
- Downloads last month
- 275
