AI vs Real Art Detection (ViT)

This model is a Fine-Tuned Vision Transformer (ViT) designed to distinguish between Human-Created Artwork and AI-Generated Images.

It is based on google/vit-base-patch16-224-in21k and has been fine-tuned on a dataset of scraped web images containing digital art, paintings, and AI generations.

Model Details

  • Model Architecture: Vision Transformer (ViT)
  • Task: Binary Image Classification
  • Classes: AI_GENERATED vs REAL
  • Base Model: Google ViT (Patch 16, 224x224)

Performance Metrics

The model achieves an overall accuracy of 80.09%.

Detailed Classification Report

Class Precision Recall F1-Score Support
AI_GENERATED 0.7338 0.9444 0.8259 108
REAL 0.9221 0.6574 0.7676 108
Accuracy 0.8009 216
Macro Avg 0.8279 0.8009 0.7967 216
Weighted Avg 0.8279 0.8009 0.7967 216

Confusion Matrix

Confusion Matrix

Analysis

The model demonstrates a high recall for AI_GENERATED images (94.4%), meaning it is very effective at catching AI artwork. However, it has lower recall for REAL images (65.7%), suggesting that it is strict and occasionally misclassifies real artwork as AI-generated.

How to Use

You can use this model with the Hugging Face pipeline.

from transformers import pipeline

# Load the pipeline
pipe = pipeline("image-classification", model="Kowshik24/ai_vs_real_image_detection_art")

# Predict
image_path = "path_to_your_image.jpg"
result = pipe(image_path)

print(result)
Downloads last month
275
Safetensors
Model size
85.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support