Spaces:

handecarkci
/

Gemstone-Price-Prediction

Sleeping

App Files Files Community

hç commited on Jun 1, 2025

Commit

6a537b5

verified ·

1 Parent(s): 6189726

Upload 7 files

Browse files

Files changed (7) hide show

Proje Özeti.txt +21 -0
README.md +31 -84
app.py +28 -35
model_columns.pkl +2 -2
project_description.txt +60 -0
requirements.txt +1 -1
rf_model.pkl +2 -2

Proje Özeti.txt ADDED Viewed

	@@ -0,0 +1,21 @@

+Proje Özeti
+Amaç: Sentetik olarak oluşturulmuş bir veri seti kullanarak, değerli taşların fiyatlarını tahmin etmek.
+Problem Türü: Regresyon (sürekli değer tahmini)
+Veri Özellikleri: Tamamı sayısal sütunlardan oluşan, temiz ve eksiksiz bir veri seti.
+Hedef Değişken: price
+Kullanım Alanları: Makine öğrenimi algoritmalarının karşılaştırılması, modelleme pratiği, özellik mühendisliği ve hiperparametre optimizasyonu.
+Eğitim Odaklı: Kaggle'ın Playground serisi, öğrenme ve pratik yapma amacıyla tasarlanmıştır.
+Temiz Veri: Eksik değer içermeyen, tamamen sayısal bir veri seti.
+Modelleme Fırsatları: Regresyon problemleri için çeşitli makine öğrenimi algoritmalarını deneme imkanı sunar.
+Görselleştirme ve Analiz: EDA (Exploratory Data Analysis) ve veri görselleştirme için zengin bir veri seti.
+Gemstone Price Prediction

README.md CHANGED Viewed

@@ -1,8 +1,8 @@
 ---
-title: "🍇 Blueberry Yield Regression"
-emoji: 🌾
 colorFrom: indigo
-colorTo: green
 sdk: streamlit
 app_file: app.py
 pinned: true
@@ -11,106 +11,53 @@ tags:
   - regression
   - machine-learning
   - streamlit
-  - kaggle
-  - agriculture
 ---
-# 🍇 Blueberry Yield Prediction with Machine Learning
-This project is a complete machine learning pipeline that predicts the **yield of wild blueberries** using various environmental and biological features such as pollinator counts, rainfall, and fruit measurements.
-## 📌 Project Type
-- Supervised Learning
-- Regression Problem
 ---
-## 🔍 Problem Description
-Predicting agricultural yield is a crucial component in planning, sustainability, and food economics. The dataset used in this project comes from the **Kaggle Playground Series S3E14** competition and contains information on:
-- Different species of pollinators (honeybee, bumblebee, osmia...)
-- Environmental conditions (rainfall days, temperature ranges...)
-- Fruit attributes (fruit mass, fruit set, seed count...)
-🎯 **Goal**: Predict the `yield` (kg/ha) of blueberries based on input features.
----
-## 📊 Dataset Info
-- `train.csv`: 15,289 samples with 18 features
-- `test.csv`: same structure, no target
-- No missing values, clean numerical data
 ---
-## 📈 What We Did (Pipeline Summary)
-1. **EDA (Exploratory Data Analysis)**
-   - Checked for missing values ✅
-   - Analyzed feature distributions & target (`yield`)
-   - Built correlation heatmaps — strongest positive correlations:
-     - `fruitmass`, `fruitset`, `seeds`
-2. **Data Preprocessing**
-   - Removed `id` column
-   - Standard feature selection based on correlation
-   - No categorical encoding needed (all numerical)
-3. **Model Training**
-   - Model: `RandomForestRegressor`
-   - Train-Test Split: 80/20
-   - **Results**:
-     - RMSE ≈ **573.8**
-     - R² Score ≈ **0.81** ✅
-4. **Test Prediction & Submission**
-   - Predictions made on `test.csv`
-   - `submission.csv` generated for Kaggle submission
-5. **Streamlit App**
-   - Users input bee counts, rain days, and fruit measurements
-   - Predicts blueberry yield in kg/ha
-   - Uses trained model (`rf_model.pkl`) behind the scenes
----
-## 🚀 Try it Online
-🌐 You can try this app live here:
-[Hugging Face Space Link](https://huggingface.co/spaces/yazodi/blueberry-yield-regression-app)
----
-## 🔮 What Could Be Improved?
-| Area | Suggestion |
-|------|------------|
-| Feature Engineering | Create interaction terms, try log/ratio features |
-| Model | Try LightGBM, XGBoost, or stacking |
-| Tuning | GridSearchCV or Optuna for hyperparameter optimization |
-| Visualization | Add interactive charts in Streamlit app |
-| Real-World Data | Add satellite weather data, soil types, historical trends |
----
-## 📁 Project Structure
-📦 blueberry-yield-regression
-├── app.py
-├── rf_model.pkl
-├── model_columns.pkl
-├── requirements.txt
-├── submission.csv
-└── README.md
----
-## 📜 License
-MIT License – Free to use, modify and distribute.
----

 ---
+title: "💎 Gemstone Price Regression"
+emoji: 💰
 colorFrom: indigo
+colorTo: blue
 sdk: streamlit
 app_file: app.py
 pinned: true
   - regression
   - machine-learning
   - streamlit
+  - diamonds
+  - synthetic-data
 ---
+# 💎 Gemstone Price Prediction App
+This Streamlit app predicts the price of a gemstone using its physical and quality-related features.
+## 🧠 Project Overview
+- This project simulates a **gemstone pricing system** using synthetic tabular data.
+- Features include: `carat`, `depth`, `table`, `x`, `y`, `z`, `clarity_score`, `color_score`, and `cut_score`.
+- The target variable is **price** (USD).
+- Model: **RandomForestRegressor**
+- Trained on 1000 synthetic samples.
 ---
+## 📊 Performance
+- RMSE: **605.16**
+- R² Score: **0.9549**
 ---
+## 🚀 How to Run Locally
+```bash
+pip install -r requirements.txt
+streamlit run app.py
+🔮 Future Work
+Area	Improvement
+Model	Try XGBoost, LightGBM
+Feature Engineering	Interaction terms, log/carat scaling
+Deployment	Add API endpoint with FastAPI
+Real-world Data	Integrate real gemstone datasets
+📁 Files
+app.py: Streamlit interface
+rf_model.pkl: Trained model
+model_columns.pkl: List of input features
+requirements.txt: Required libraries

app.py CHANGED Viewed

@@ -3,48 +3,41 @@ import pandas as pd
 import numpy as np
 import joblib
-# Başlık
-st.title("🍇 Blueberry Yield Prediction App")
-st.write("Bu uygulama, çevresel ve biyolojik faktörlere göre yaban mersini verimini tahmin eder.")
 # Giriş alanları
-clonesize = st.slider("Klon Boyutu", 0.0, 10.0, 1.0)
-honeybee = st.slider("Bal Arısı Sayısı", 0.0, 10.0, 1.0)
-bumbles = st.slider("Bumblebee Sayısı", 0.0, 10.0, 1.0)
-andrena = st.slider("Andrena Sayısı", 0.0, 10.0, 1.0)
-osmia = st.slider("Osmia Sayısı", 0.0, 10.0, 1.0)
-RainingDays = st.slider("Yağmurlu Günler", 0.0, 100.0, 20.0)
-AverageRainingDays = st.slider("Ortalama Yağmurlu Günler", 0.0, 100.0, 30.0)
-fruitset = st.slider("Fruit Set", 0.0, 1.0, 0.5)
-fruitmass = st.slider("Fruit Mass", 0.0, 10.0, 5.0)
-seeds = st.slider("Tohum Sayısı", 0.0, 100.0, 50.0)
-# DataFrame'e dönüştür
 user_input = pd.DataFrame([{
-    "clonesize": clonesize,
-    "honeybee": honeybee,
-    "bumbles": bumbles,
-    "andrena": andrena,
-    "osmia": osmia,
-    "RainingDays": RainingDays,
-    "AverageRainingDays": AverageRainingDays,
-    "fruitset": fruitset,
-    "fruitmass": fruitmass,
-    "seeds": seeds
 }])
-# Model ve sütunlar yükleniyor
 model = joblib.load("rf_model.pkl")
-model_columns = joblib.load("model_columns.pkl")
-# Eksik sütunları ekle
-for col in model_columns:
-    if col not in user_input.columns:
-        user_input[col] = 0
-user_input = user_input[model_columns]
 # Tahmin
-if st.button("Tahmini Göster"):
-    pred = model.predict(user_input)[0]
-    st.success(f"🌱 Tahmini Yaban Mersini Verimi: {pred:.2f} kg/ha")

 import numpy as np
 import joblib
+st.title("💎 Gemstone Price Estimator")
+st.write("Bu uygulama, değerli taşların fiyatını tahmin eder.")
 # Giriş alanları
+carat = st.slider("Carat", 0.2, 5.0, 1.0)
+depth = st.slider("Depth", 50.0, 70.0, 60.0)
+table = st.slider("Table", 50.0, 70.0, 58.0)
+x = st.slider("x (mm)", 3.0, 10.0, 6.0)
+y = st.slider("y (mm)", 3.0, 10.0, 6.0)
+z = st.slider("z (mm)", 2.0, 6.0, 4.0)
+clarity_score = st.slider("Clarity Score", 1, 10, 5)
+color_score = st.slider("Color Score", 1, 7, 3)
+cut_score = st.slider("Cut Score", 1, 5, 3)
+# Veriyi dataframe yap
 user_input = pd.DataFrame([{
+    "carat": carat,
+    "depth": depth,
+    "table": table,
+    "x": x,
+    "y": y,
+    "z": z,
+    "clarity_score": clarity_score,
+    "color_score": color_score,
+    "cut_score": cut_score
 }])
+# Model ve kolonlar yükleniyor
 model = joblib.load("rf_model.pkl")
+columns = joblib.load("model_columns.pkl")
+# Sıra uyumu
+user_input = user_input[columns]
 # Tahmin
+if st.button("Tahmini Fiyatı Göster"):
+    prediction = model.predict(user_input)[0]
+    st.success(f"💰 Tahmini Fiyat: ${prediction:,.2f}")

model_columns.pkl CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2f8f2353d1c8c3d79295e022ad6bd9a36aa8bc6bb2ce3f6b597b67cc2fea59ac
-size 255

 version https://git-lfs.github.com/spec/v1
+oid sha256:5ba62c0315c390a17bf49a71aa039e95ff212a5951b61580313c9e309f56c9c8
+size 94

project_description.txt ADDED Viewed

	@@ -0,0 +1,60 @@

+# Proje Adı:
+Gemstone Price Prediction – Makine Öğrenmesi ile Değerli Taş Fiyat Tahmini
+# Proje Türü:
+Regresyon – Sürekli Değer Tahmini
+# Veri Seti:
+Yapay olarak oluşturulmuş 1000 satırlık tabular veri seti.
+Veri seti, değerli taşlara ait fiziksel ölçümler (carat, depth, x, y, z...) ve kalite puanları (clarity_score, color_score, cut_score) ile oluşturulmuştur.
+# Amaç:
+Verilen fiziksel ve kalite özelliklerine göre değerli bir taşın fiyatını (USD) tahmin etmektir.
+# Kullanılan Kütüphaneler:
+- pandas
+- numpy
+- scikit-learn
+- streamlit
+- joblib
+- matplotlib / seaborn (EDA için)
+# Aşamalar:
+1. 📊 Veri Keşfi (EDA):
+   - Veri tipi ve dağılımları incelendi
+   - Korelasyon analizi ile fiyat üzerinde en etkili değişkenler belirlendi: carat, clarity_score, cut_score
+2. 🧹 Veri Ön İşleme:
+   - Eksik değer bulunmadığı için doğrudan modellemeye geçildi
+   - Özellikler normalize edilmedi (Random Forest bu gereksinime ihtiyaç duymaz)
+3. 🧠 Modelleme:
+   - Kullanılan algoritma: RandomForestRegressor
+   - Train-Test ayrımı %80 / %20 olarak yapıldı
+   - Model performansı:
+     - RMSE: 605.16
+     - R²: 0.9549 → yüksek doğruluk
+4. 🧪 Test Senaryosu:
+   - Kullanıcıdan alınan fiziksel ve kalite değerlerine göre tahmin yapacak yapı kuruldu
+5. 🌐 Streamlit Arayüzü:
+   - Kullanıcıdan `carat`, `depth`, `x`, `y`, `z`, `clarity_score`, `cut_score`, vb. bilgiler alınıyor
+   - Eğitilen model tahmin yapıyor ve fiyatı ekranda gösteriyor
+# Çıktılar:
+- `rf_model.pkl` → Eğitilmiş model dosyası
+- `model_columns.pkl` → Modelin beklediği özellik listesi
+- `app.py` → Streamlit uygulama arayüzü
+- `requirements.txt` → Gerekli kütüphaneler listesi
+# Gelecek Geliştirmeler:
+- XGBoost, LightGBM gibi algoritmalar denenebilir
+- Gerçek veri entegrasyonu yapılabilir (örneğin: gerçek elmas veri setleri)
+- Özellik mühendisliği: log dönüşümü, oranlar, etkileşimli değişkenler
+- API servisi ile RESTful endpoint oluşturulabilir
+# Projenin Amacı:
+Bu proje, regresyon temelli bir fiyat tahmini probleminde uçtan uca makine öğrenmesi sürecinin (veri üretimi, modelleme, değerlendirme, dağıtım) tamamını göstermektedir. Gerçek veriye benzer yapay veri ile çalışmak, gerçek dünya problemlerine yaklaşımı öğrenme açısından faydalı bir simülasyon sunmaktadır.

requirements.txt CHANGED Viewed

@@ -2,4 +2,4 @@ streamlit
 pandas
 numpy
 scikit-learn
-joblib

 pandas
 numpy
 scikit-learn
+jobli

rf_model.pkl CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:68b74682bb46d81c2aa0e680cea3abae0a97da6f372a366babe5a3bebd77e300
-size 108065345

 version https://git-lfs.github.com/spec/v1
+oid sha256:7e41679192d03f13b8bbfc23b104659d12bc2b9966bc5126a54e30cbdcf36328
+size 7300001