# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview PaperCast is an AI agent application that transforms research papers into engaging podcast-style audio conversations. It takes arXiv URLs or PDF uploads as input, analyzes the paper, generates a natural dialogue between a host and expert, and produces downloadable audio with distinct voices. **Target Platform:** HuggingFace Spaces (Gradio 6 application) **Hackathon:** MCP 1st Birthday - Track 2 (MCP in Action - Consumer) **Required Tag:** `mcp-in-action-track-consumer` ## Development Commands ### Environment Setup ```bash pip install -r requirements.txt ``` ### Running Locally ```bash python app.py # Or: gradio app.py ``` ### Testing on HuggingFace Spaces The application must be deployed to HuggingFace Spaces under the `MCP-1st-Birthday` organization. ## Architecture Overview ### Core Pipeline Flow 1. **Input Processing**: Accept arXiv URL or PDF upload 2. **Paper Extraction**: Extract text content from PDF 3. **Agent Analysis**: Identify paper structure (abstract, methodology, findings, conclusions) 4. **Script Generation**: Create natural dialogue between Host and Guest characters 5. **Audio Synthesis**: Generate audio with distinct voices for each speaker 6. **Output Delivery**: Provide transcript and audio file for download ### Agent Behaviors (Critical for Track 2) The application MUST demonstrate autonomous agent capabilities: - **Planning**: Analyze paper structure and determine conversation flow strategy - **Reasoning**: Identify which concepts need simplification, determine appropriate depth - **Execution**: Orchestrate multi-step pipeline (fetch → extract → analyze → generate → synthesize) - **Context Management**: Maintain coherence across the dialogue, referencing earlier points ### MCP Integration Requirements Must use MCP (Model Context Protocol) servers as tools. Potential use cases: - Web fetching for URL-based paper retrieval - PDF processing and text extraction - Document parsing and structured analysis - Vector database operations if implementing RAG ### Character Design - **Host**: Enthusiastic, asks clarifying questions, explains for general audience, keeps conversation flowing - **Guest**: Technical expert/researcher persona, provides depth, answers questions with appropriate detail ## Key Technical Considerations ### PDF Processing Academic PDFs have inconsistent formatting. Robust error handling is essential: - Handle multi-column layouts - Extract references and citations appropriately - Deal with equations, figures, and tables - Support various paper formats (arXiv, PubMed, conference papers) ### LLM Dialogue Generation - Use system prompts to establish distinct character personalities - Maintain conversation continuity (reference previous points) - Balance technical accuracy with accessibility - Target appropriate script length (aim for 5-15 minute podcasts) ### Text-to-Speech Critical for user experience: - Must have clearly distinct voices for Host vs Guest - Audio quality must be intelligible - Processing time should be reasonable (target: under 5 minutes total) - Consider voice emotion/intonation for natural conversation ### Performance & UX - Processing can take 2-5 minutes - show clear progress indicators - Consider async operations for long-running tasks - Implement graceful error handling (invalid URLs, corrupted PDFs, API failures) - Optional: Allow script preview before audio generation - Cache generated podcasts to avoid reprocessing ### Free/Open Source Priority Budget is limited - prioritize freely available solutions: - HuggingFace hosted models where possible - Open source libraries (PyMuPDF, pdfplumber, etc.) - Free tier APIs within rate limits - Self-hosted components on HF Spaces infrastructure ## Gradio 6 Interface Requirements The UI should be simple and intuitive: - Input section: URL input field + PDF upload (mutually exclusive or combined) - Processing section: Clear status messages and progress indicators - Output section: - Audio player for immediate listening - Download buttons for audio file and transcript - Display transcript with speaker labels - Error messages should be user-friendly ## Submission Requirements Checklist Required for valid submission: - [ ] Working Gradio app deployed to HuggingFace Space - [ ] Published under `MCP-1st-Birthday` organization (not personal profile) - [ ] README.md includes `mcp-in-action-track-consumer` tag - [ ] Demo video (1-5 minutes) showing project in action - [ ] Social media post link (X/LinkedIn) in README - [ ] Clear documentation of purpose, usage, and technical approach - [ ] All dependencies in requirements.txt - [ ] Team member HuggingFace usernames in README ## Judging Criteria Priority When making design decisions, optimize for: 1. **Completeness**: All deliverables submitted 2. **Design/UI-UX**: Intuitive, polished interface 3. **Functionality**: Effective use of Gradio 6, MCPs, and agent capabilities 4. **Creativity**: Innovative approach to the problem 5. **Documentation**: Clear README and demo video 6. **Real-world impact**: Practical usefulness ## Critical Implementation Notes ### Agent vs API Chaining This must demonstrate true agent behavior, not just API chaining: - Show decision-making (e.g., determining which sections to emphasize) - Demonstrate adaptive behavior (e.g., different strategies for different paper types) - Use MCP servers as tools the agent reasons about, not just sequential calls ### Natural Dialogue Generation Avoid robotic Q&A format: - Use conversational connectors ("That's fascinating...", "Building on that point...") - Include natural reactions and acknowledgments - Vary sentence structure and length - Use analogies and examples appropriate for general audience - Host should ask genuine questions that guide the conversation ### Testing Strategy Test with diverse paper types: - Different fields (CS, biology, physics, social sciences) - Various lengths (short letters vs full papers) - Different repositories (arXiv, bioRxiv, PubMed) - Papers with heavy math vs conceptual papers ## File Organization (Recommended) ``` papercast/ ├── app.py # Main Gradio application ├── requirements.txt # Python dependencies ├── README.md # Project documentation (must include track tag) ├── agents/ # Agent logic and orchestration ├── mcp_servers/ # MCP server integrations ├── processing/ # PDF extraction and text processing ├── generation/ # Script and dialogue generation ├── synthesis/ # Text-to-speech audio generation └── utils/ # Helper functions ``` ## Known Constraints - Deadline: November 30, 2025, 11:59 PM UTC - Must be original work created November 14-30, 2025 - HuggingFace Spaces free tier (GPU available) - Processing time target: under 5 minutes per paper - All work must demonstrate MCP integration ## Reference Materials - Project brief: `PAPERCAST_PROJECT_BRIEF.md` - Gradio 6 docs: https://www.gradio.app/ - MCP documentation: https://huggingface.co/blog/gradio-mcp - Hackathon page: https://huggingface.co/MCP-1st-Birthday