| | --- |
| | title: PDF Analysis & Orchestrator |
| | emoji: π |
| | colorFrom: blue |
| | colorTo: purple |
| | sdk: gradio |
| | sdk_version: 4.44.0 |
| | app_file: app.py |
| | pinned: false |
| | license: mit |
| | short_description: AI-powered PDF analysis with advanced features |
| | --- |
| | |
| | # π PDF Analysis & Orchestrator |
| |
|
| | A powerful, intelligent PDF analysis tool that provides comprehensive document processing through AI-powered agents. This application offers advanced features including document chunking, caching, streaming responses, batch processing, and custom prompt management. |
| |
|
| | ## π Features |
| |
|
| | ### Core Analysis |
| | - **AI-Powered Analysis**: GPT-4 powered document analysis with context-aware responses |
| | - **Audience Adaptation**: Automatically adapts explanations for different audiences |
| | - **Document Segmentation**: Identifies and segments documents by themes and topics |
| | - **Multi-Agent Orchestration**: Specialized AI agents for different analysis aspects |
| |
|
| | ### Performance Optimizations |
| | - **Document Chunking**: Smart processing of large documents (>15k chars) with sentence boundary detection |
| | - **Caching System**: PDF text extraction caching for improved performance |
| | - **Streaming Responses**: Real-time progress updates and status indicators |
| | - **Configurable Parameters**: Adjustable chunk sizes and processing options |
| |
|
| | ### Enhanced Features |
| | - **Batch Processing**: Handle multiple PDFs simultaneously with comprehensive reporting |
| | - **Result Export**: Export analysis results in TXT, JSON, and PDF formats |
| | - **Custom Prompts**: Save, manage, and reuse custom analysis prompts |
| | - **Progress Indicators**: Real-time feedback during long-running analyses |
| | - **Session Management**: Per-user session isolation with persistent storage |
| |
|
| | ## π― Use Cases |
| |
|
| | - **Document Summarization**: Create concise summaries of complex documents |
| | - **Technical Explanation**: Explain technical content for general audiences |
| | - **Executive Summaries**: Generate high-level overviews for decision makers |
| | - **Content Analysis**: Extract key findings and insights from documents |
| | - **Batch Processing**: Analyze multiple documents with consistent instructions |
| | - **Research Assistance**: Process and analyze research papers and reports |
| |
|
| | ## π οΈ Setup |
| |
|
| | ### Prerequisites |
| | - Python 3.10+ |
| | - OpenAI API key |
| |
|
| | ### Installation |
| |
|
| | 1. **Clone the repository:** |
| | ```bash |
| | git clone https://huggingface.co/spaces/your-username/pdf-analysis-orchestrator |
| | cd pdf-analysis-orchestrator |
| | ``` |
| |
|
| | 2. **Install dependencies:** |
| | ```bash |
| | pip install -r requirements.txt |
| | ``` |
| |
|
| | 3. **Set up environment variables:** |
| | ```bash |
| | export OPENAI_API_KEY="sk-your-api-key-here" |
| | ``` |
| |
|
| | 4. **Run the application:** |
| | ```bash |
| | python app.py |
| | ``` |
| |
|
| | ## π Usage |
| |
|
| | ### Single Document Analysis |
| | 1. Upload a PDF document |
| | 2. Enter your analysis instructions |
| | 3. Choose analysis options (streaming, chunk size) |
| | 4. Click "Analyze & Orchestrate" |
| | 5. View results and export if needed |
| |
|
| | ### Batch Processing |
| | 1. Upload multiple PDF files |
| | 2. Enter batch analysis instructions |
| | 3. Click "Process Batch" |
| | 4. Review comprehensive batch results |
| |
|
| | ### Custom Prompts |
| | 1. Go to "Manage Prompts" tab |
| | 2. Create custom prompt templates |
| | 3. Organize by categories |
| | 4. Reuse prompts across analyses |
| |
|
| | ## ποΈ Architecture |
| |
|
| | ### Core Components |
| | - **AnalysisAgent**: Primary analysis engine using GPT-4 |
| | - **CollaborationAgent**: Provides reviewer-style feedback |
| | - **ConversationAgent**: Handles user interaction |
| | - **MasterOrchestrator**: Coordinates agent interactions |
| |
|
| | ### Key Files |
| | - `app.py`: Main application with Gradio interface |
| | - `agents.py`: AI agent implementations with streaming support |
| | - `config.py`: Centralized configuration management |
| | - `utils/`: Utility functions for PDF processing, caching, and export |
| |
|
| | ## π§ Configuration |
| |
|
| | ### Environment Variables |
| | - `OPENAI_API_KEY`: Required OpenAI API key |
| | - `OPENAI_MODEL`: Model to use (default: gpt-4) |
| | - `CHUNK_SIZE`: Document chunk size (default: 15000) |
| | - `CACHE_ENABLED`: Enable caching (default: true) |
| | - `ANALYSIS_MAX_UPLOAD_MB`: Max upload size in MB (default: 50) |
| |
|
| | ### Model Configuration |
| | - **Temperature**: 0.2 (consistent, focused responses) |
| | - **Max tokens**: 1000 (concise but comprehensive) |
| | - **System prompts**: Designed for high-quality output |
| |
|
| | ## π Performance |
| |
|
| | - **Response Time**: Typically 2-5 seconds for analysis |
| | - **File Size Limit**: 50MB (configurable) |
| | - **Concurrent Users**: Supports multiple simultaneous sessions |
| | - **Memory Usage**: Optimized for efficient processing |
| | - **Caching**: Reduces processing time for repeated documents |
| |
|
| | ## π Security |
| |
|
| | - File size validation |
| | - Session isolation |
| | - Secure file handling |
| | - No persistent storage of sensitive data |
| | - Environment-based configuration |
| |
|
| | ## π€ Contributing |
| |
|
| | 1. Fork the repository |
| | 2. Create a feature branch |
| | 3. Make your changes |
| | 4. Add tests if applicable |
| | 5. Submit a pull request |
| |
|
| | ## π License |
| |
|
| | This project is licensed under the MIT License - see the LICENSE file for details. |
| |
|
| | ## π Acknowledgments |
| |
|
| | - Built on the successful Analysis & Orchestrate feature from Sharmaji ka PDF Blaster V1 |
| | - Powered by OpenAI's GPT-4 model |
| | - UI framework: Gradio |
| | - PDF processing: pdfplumber |
| |
|
| | ## π Support |
| |
|
| | For issues and questions: |
| | 1. Check the documentation |
| | 2. Review existing issues |
| | 3. Create a new issue with detailed information |
| |
|
| | --- |
| |
|
| | **Note**: This project focuses exclusively on the Analysis & Orchestrate functionality, providing the same high-quality results in a streamlined, focused package with enhanced performance and user experience. |