AI vs Real Art Detection (ViT)

This model is a Fine-Tuned Vision Transformer (ViT) designed to distinguish between Human-Created Artwork and AI-Generated Images.

It is based on google/vit-base-patch16-224-in21k and has been fine-tuned on a dataset of scraped web images containing digital art, paintings, and AI generations.

Model Details

Model Architecture: Vision Transformer (ViT)
Task: Binary Image Classification
Classes: AI_GENERATED vs REAL
Base Model: Google ViT (Patch 16, 224x224)

Performance Metrics

The model achieves an overall accuracy of 80.09%.

Detailed Classification Report

Class	Precision	Recall	F1-Score	Support
AI_GENERATED	0.7338	0.9444	0.8259	108
REAL	0.9221	0.6574	0.7676	108

Accuracy			0.8009	216
Macro Avg	0.8279	0.8009	0.7967	216
Weighted Avg	0.8279	0.8009	0.7967	216

Confusion Matrix

Analysis

The model demonstrates a high recall for AI_GENERATED images (94.4%), meaning it is very effective at catching AI artwork. However, it has lower recall for REAL images (65.7%), suggesting that it is strict and occasionally misclassifies real artwork as AI-generated.

How to Use

You can use this model with the Hugging Face pipeline.

from transformers import pipeline

# Load the pipeline
pipe = pipeline("image-classification", model="Kowshik24/ai_vs_real_image_detection_art")

# Predict
image_path = "path_to_your_image.jpg"
result = pipe(image_path)

print(result)

Downloads last month: 275

Safetensors

Model size

85.8M params

Tensor type

F32