Spaces:
Runtime error
Runtime error
Upload CLAUDE.md with huggingface_hub
Browse files
CLAUDE.md
ADDED
|
@@ -0,0 +1,108 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Development Guidelines
|
| 2 |
+
|
| 3 |
+
## Build & Test Commands
|
| 4 |
+
```
|
| 5 |
+
# Install dependencies
|
| 6 |
+
pip install -r requirements.txt
|
| 7 |
+
pip install -r requirements-test.txt
|
| 8 |
+
|
| 9 |
+
# Run linting
|
| 10 |
+
python -m ruff check .
|
| 11 |
+
|
| 12 |
+
# Run formatting
|
| 13 |
+
python -m ruff format .
|
| 14 |
+
|
| 15 |
+
# Type checking
|
| 16 |
+
python -m mypy .
|
| 17 |
+
|
| 18 |
+
# Run a specific test
|
| 19 |
+
python -m pytest test_e2e.py -v
|
| 20 |
+
|
| 21 |
+
# Run a specific test function
|
| 22 |
+
python -m pytest test_e2e.py::test_end_to_end -v
|
| 23 |
+
|
| 24 |
+
# Deploy to Cloud Run
|
| 25 |
+
./deploy_rag.sh --project=YOUR_PROJECT_ID --region=YOUR_REGION
|
| 26 |
+
|
| 27 |
+
# Local development
|
| 28 |
+
python app.py
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
## Code Style
|
| 32 |
+
- **Line Length**: 100 characters max (defined in pyproject.toml)
|
| 33 |
+
- **Docstrings**: Google style docstrings required (follow existing patterns)
|
| 34 |
+
- **Type Hints**: Required for all function parameters and return values
|
| 35 |
+
- **Imports**: Group standard lib, third-party, then local imports with blank lines between
|
| 36 |
+
- **Error Handling**: Use specific exception types with logging
|
| 37 |
+
- **Linters**: Ruff for linting (F, E, W, D, N, C, B, Q, A rules)
|
| 38 |
+
- **Naming**: snake_case for variables/functions, CamelCase for classes
|
| 39 |
+
- **Environment Variables**: Use os.environ.get() with defaults when appropriate
|
| 40 |
+
|
| 41 |
+
## Architecture
|
| 42 |
+
- Flask web application for serving RAG queries
|
| 43 |
+
- Google Cloud services: BigQuery, Vertex AI, DocumentAI, Cloud Storage
|
| 44 |
+
- Cloud Functions triggered by GCS events
|
| 45 |
+
- Cloud Run for serving the web application
|
| 46 |
+
|
| 47 |
+
## Hugging Face Implementation Plan
|
| 48 |
+
|
| 49 |
+
### Repository Link
|
| 50 |
+
- GitHub: https://github.com/YOUR_USERNAME/cloud-rag-webhook
|
| 51 |
+
|
| 52 |
+
### Migration Steps
|
| 53 |
+
1. Create a new Hugging Face Space with Docker SDK
|
| 54 |
+
2. Enable Dev Mode for VS Code access
|
| 55 |
+
3. Clone the GitHub repository
|
| 56 |
+
4. Set up environment variables for secrets
|
| 57 |
+
5. Configure persistent storage (20GB purchased)
|
| 58 |
+
|
| 59 |
+
### Running on Hugging Face
|
| 60 |
+
1. Configure Space to always stay running (persistent execution)
|
| 61 |
+
2. Use "Secrets" in Space settings for API keys and credentials
|
| 62 |
+
3. Set up scheduled tasks with GitHub Actions for:
|
| 63 |
+
- Processing files (daily)
|
| 64 |
+
- Backing up code (every 6 hours)
|
| 65 |
+
|
| 66 |
+
### Implementation Details
|
| 67 |
+
1. **File Storage**:
|
| 68 |
+
- Store input files in Hugging Face's persistent storage
|
| 69 |
+
- Use Hugging Face Datasets for managing processed data
|
| 70 |
+
|
| 71 |
+
2. **Process Automation**:
|
| 72 |
+
- For "under the hood" processing:
|
| 73 |
+
- Configure Space to run continuously
|
| 74 |
+
- Set up GitHub Actions for scheduled tasks
|
| 75 |
+
- Use Docker health checks to ensure service stays alive
|
| 76 |
+
|
| 77 |
+
3. **Deployment Architecture**:
|
| 78 |
+
- Hugging Face Space = Cloud Run equivalent
|
| 79 |
+
- Space will run the server continuously
|
| 80 |
+
- Configure autoscaling in the Dockerfile settings
|
| 81 |
+
|
| 82 |
+
### Key Files
|
| 83 |
+
- `auto_process_bucket.py`: Batch file processor
|
| 84 |
+
- `process_text.py`: Individual file processor
|
| 85 |
+
- `rag_query.py`: Query interface
|
| 86 |
+
- `app.py`: Web application
|
| 87 |
+
- `auto_backup.sh`: GitHub backup script
|
| 88 |
+
- `setup_all.sh`: Complete setup script
|
| 89 |
+
|
| 90 |
+
### Required Environment Variables
|
| 91 |
+
- `GOOGLE_APPLICATION_CREDENTIALS`: Google Cloud credentials
|
| 92 |
+
- `PROJECT_ID`: Google Cloud project ID
|
| 93 |
+
- `BUCKET_NAME`: GCS bucket name
|
| 94 |
+
- `GITHUB_TOKEN`: For GitHub access
|
| 95 |
+
- `HF_TOKEN`: For Hugging Face API access
|
| 96 |
+
|
| 97 |
+
### Hugging Face Specific Updates
|
| 98 |
+
- Update Dockerfile for Hugging Face compatibility
|
| 99 |
+
- Create Space UI in `app.py` using Gradio or Streamlit
|
| 100 |
+
- Use Hugging Face Datasets API in addition to BigQuery
|
| 101 |
+
|
| 102 |
+
## Project Goal
|
| 103 |
+
Create an automated RAG system that:
|
| 104 |
+
1. Automatically processes text/PDF files
|
| 105 |
+
2. Runs continuously "under the hood"
|
| 106 |
+
3. Provides a simple query interface
|
| 107 |
+
4. Backs up all code and data
|
| 108 |
+
5. Requires minimal maintenance
|