# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

PaperCast is an AI agent application that transforms research papers into engaging podcast-style audio conversations. It takes arXiv URLs or PDF uploads as input, analyzes the paper, generates a natural dialogue between a host and expert, and produces downloadable audio with distinct voices.

**Target Platform:** HuggingFace Spaces (Gradio 6 application)  
**Hackathon:** MCP 1st Birthday - Track 2 (MCP in Action - Consumer)  
**Required Tag:** `mcp-in-action-track-consumer`

## Development Commands

### Environment Setup
```bash
pip install -r requirements.txt
```

### Running Locally
```bash
python app.py
# Or: gradio app.py
```

### Testing on HuggingFace Spaces
The application must be deployed to HuggingFace Spaces under the `MCP-1st-Birthday` organization.

## Architecture Overview

### Core Pipeline Flow
1. **Input Processing**: Accept arXiv URL or PDF upload
2. **Paper Extraction**: Extract text content from PDF
3. **Agent Analysis**: Identify paper structure (abstract, methodology, findings, conclusions)
4. **Script Generation**: Create natural dialogue between Host and Guest characters
5. **Audio Synthesis**: Generate audio with distinct voices for each speaker
6. **Output Delivery**: Provide transcript and audio file for download

### Agent Behaviors (Critical for Track 2)
The application MUST demonstrate autonomous agent capabilities:
- **Planning**: Analyze paper structure and determine conversation flow strategy
- **Reasoning**: Identify which concepts need simplification, determine appropriate depth
- **Execution**: Orchestrate multi-step pipeline (fetch → extract → analyze → generate → synthesize)
- **Context Management**: Maintain coherence across the dialogue, referencing earlier points

### MCP Integration Requirements
Must use MCP (Model Context Protocol) servers as tools. Potential use cases:
- Web fetching for URL-based paper retrieval
- PDF processing and text extraction
- Document parsing and structured analysis
- Vector database operations if implementing RAG

### Character Design
- **Host**: Enthusiastic, asks clarifying questions, explains for general audience, keeps conversation flowing
- **Guest**: Technical expert/researcher persona, provides depth, answers questions with appropriate detail

## Key Technical Considerations

### PDF Processing
Academic PDFs have inconsistent formatting. Robust error handling is essential:
- Handle multi-column layouts
- Extract references and citations appropriately
- Deal with equations, figures, and tables
- Support various paper formats (arXiv, PubMed, conference papers)

### LLM Dialogue Generation
- Use system prompts to establish distinct character personalities
- Maintain conversation continuity (reference previous points)
- Balance technical accuracy with accessibility
- Target appropriate script length (aim for 5-15 minute podcasts)

### Text-to-Speech
Critical for user experience:
- Must have clearly distinct voices for Host vs Guest
- Audio quality must be intelligible
- Processing time should be reasonable (target: under 5 minutes total)
- Consider voice emotion/intonation for natural conversation

### Performance & UX
- Processing can take 2-5 minutes - show clear progress indicators
- Consider async operations for long-running tasks
- Implement graceful error handling (invalid URLs, corrupted PDFs, API failures)
- Optional: Allow script preview before audio generation
- Cache generated podcasts to avoid reprocessing

### Free/Open Source Priority
Budget is limited - prioritize freely available solutions:
- HuggingFace hosted models where possible
- Open source libraries (PyMuPDF, pdfplumber, etc.)
- Free tier APIs within rate limits
- Self-hosted components on HF Spaces infrastructure

## Gradio 6 Interface Requirements

The UI should be simple and intuitive:
- Input section: URL input field + PDF upload (mutually exclusive or combined)
- Processing section: Clear status messages and progress indicators
- Output section: 
  - Audio player for immediate listening
  - Download buttons for audio file and transcript
  - Display transcript with speaker labels
- Error messages should be user-friendly

## Submission Requirements Checklist

Required for valid submission:
- [ ] Working Gradio app deployed to HuggingFace Space
- [ ] Published under `MCP-1st-Birthday` organization (not personal profile)
- [ ] README.md includes `mcp-in-action-track-consumer` tag
- [ ] Demo video (1-5 minutes) showing project in action
- [ ] Social media post link (X/LinkedIn) in README
- [ ] Clear documentation of purpose, usage, and technical approach
- [ ] All dependencies in requirements.txt
- [ ] Team member HuggingFace usernames in README

## Judging Criteria Priority

When making design decisions, optimize for:
1. **Completeness**: All deliverables submitted
2. **Design/UI-UX**: Intuitive, polished interface
3. **Functionality**: Effective use of Gradio 6, MCPs, and agent capabilities
4. **Creativity**: Innovative approach to the problem
5. **Documentation**: Clear README and demo video
6. **Real-world impact**: Practical usefulness

## Critical Implementation Notes

### Agent vs API Chaining
This must demonstrate true agent behavior, not just API chaining:
- Show decision-making (e.g., determining which sections to emphasize)
- Demonstrate adaptive behavior (e.g., different strategies for different paper types)
- Use MCP servers as tools the agent reasons about, not just sequential calls

### Natural Dialogue Generation
Avoid robotic Q&A format:
- Use conversational connectors ("That's fascinating...", "Building on that point...")
- Include natural reactions and acknowledgments
- Vary sentence structure and length
- Use analogies and examples appropriate for general audience
- Host should ask genuine questions that guide the conversation

### Testing Strategy
Test with diverse paper types:
- Different fields (CS, biology, physics, social sciences)
- Various lengths (short letters vs full papers)
- Different repositories (arXiv, bioRxiv, PubMed)
- Papers with heavy math vs conceptual papers

## File Organization (Recommended)

```
papercast/
├── app.py                 # Main Gradio application
├── requirements.txt       # Python dependencies
├── README.md             # Project documentation (must include track tag)
├── agents/               # Agent logic and orchestration
├── mcp_servers/          # MCP server integrations
├── processing/           # PDF extraction and text processing
├── generation/           # Script and dialogue generation
├── synthesis/            # Text-to-speech audio generation
└── utils/                # Helper functions
```

## Known Constraints

- Deadline: November 30, 2025, 11:59 PM UTC
- Must be original work created November 14-30, 2025
- HuggingFace Spaces free tier (GPU available)
- Processing time target: under 5 minutes per paper
- All work must demonstrate MCP integration

## Reference Materials

- Project brief: `PAPERCAST_PROJECT_BRIEF.md`
- Gradio 6 docs: https://www.gradio.app/
- MCP documentation: https://huggingface.co/blog/gradio-mcp
- Hackathon page: https://huggingface.co/MCP-1st-Birthday