Spaces:

MCP-1st-Birthday
/

papercast

Running

papercast / plan.md

batuhanozkose

feat: Implement initial PaperCast application with core modules, documentation, a periodic curl script, and a Gradio certificate.

472739a 24 days ago

preview code

raw

history blame contribute delete

4.03 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

PaperCast Implementation Plan

This plan outlines the steps to build PaperCast, an AI agent that converts research papers into podcast-style conversations using MCP, Gradio, and LLMs.

1. Infrastructure & Dependencies

Update requirements.txt
- Add transformers, accelerate, bitsandbytes (for 4-bit LLM loading).
- Add scipy (for audio processing).
- Add beautifulsoup4 (for web parsing).
- Add python-multipart (for API handling).
- Ensure mcp and gradio versions are pinned.
Project Structure Setup
- Create app.py (entry point).
- Ensure __init__.py in all subdirs.
- Create config.py in utils/ for global settings (LLM model names, paths).

2. Core Processing Modules

2.1. PDF Processing (`processing/`)

Implement pdf_reader.py
- Function extract_text_from_pdf(pdf_path) -> str.
- Use PyMuPDF (fitz) for fast extraction.
- Implement basic cleaning (remove headers/footers/references if possible).
Implement url_fetcher.py
- Function fetch_paper_from_url(url) -> str.
- Handle arXiv URLs (convert /abs/ to /pdf/ or scrape abstract).
- Download PDF to temporary storage.

2.2. Generation Logic (`generation/`)

Implement script_generator.py
- Model: unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit.
- Define System Prompts for "Host" and "Guest" personas.
- Function generate_podcast_script(paper_text) -> List[Dict].
- Output format: [{"speaker": "Host", "text": "...", "emotion": "excited"}, {"speaker": "Guest", "text": "...", "emotion": "neutral"}].
- Key Logic: Prompt the model to include emotion tags (e.g. [laugh], [sigh]) supported by Maya1.

2.3. Audio Synthesis (`synthesis/`)

Implement tts_engine.py
- Model: maya-research/maya1.
- Function synthesize_dialogue(script_json) -> audio_path.
- Parse the script for emotion tags and pass them to Maya1.
- Combine audio segments into a single file using pydub or scipy.

3. MCP Server Integration (`mcp_servers/`)

To satisfy the "MCP in Action" requirement, we will expose our core tools as MCP resources/tools.

Create paper_tools_server.py
- Implement an MCP server that provides:
  - Tool: read_pdf(path)
  - Tool: fetch_arxiv(url)
  - Tool: synthesize_podcast(script)
- This allows the "Agent" to call these tools via the MCP protocol.

4. Agent Orchestration (`agents/`)

Implement podcast_agent.py
- Create a PodcastAgent class.
- Planning Loop:
  1. Receive User Input.
  2. Plan: Decide to fetch/read paper.
  3. Analyze: Extract key topics.
  4. Draft: Generate script using Phi-4-mini.
  5. Synthesize: Create audio using Maya1.
- Use sequential_thinking pattern (simulated) to show "Agentic" behavior in the logs/UI.
- Crucial: The Agent should use the MCP Client to call the tools defined in Step 3, demonstrating "Autonomous reasoning using MCP tools".

5. User Interface (`app.py`)

Build Gradio UI
- Input: Textbox (URL) or File Upload (PDF).
- Output: Audio Player, Transcript Textbox, Status/Logs Markdown.
- Agent Visualization: Show the "Thoughts" of the agent as it plans and executes (e.g., "Fetching paper...", "Analyzing structure...", "Generating script...").
Deployment Config
- Create Dockerfile (if needed for custom deps) or rely on HF Spaces default.

6. Verification & Polish

Test Run
- Run with a real arXiv paper.
- Verify audio quality and script coherence.
Documentation
- Update README.md with usage instructions and "MCP in Action" details.
- Record Demo Video.

7. Bonus Features (Time Permitting)

RAG Integration: Use a vector store to answer questions about the paper after the podcast.
Background Music: Mix in intro/outro music.

PaperCast Implementation Plan

1. Infrastructure & Dependencies

2. Core Processing Modules

2.1. PDF Processing (processing/)

2.2. Generation Logic (generation/)

2.3. Audio Synthesis (synthesis/)

3. MCP Server Integration (mcp_servers/)

4. Agent Orchestration (agents/)

5. User Interface (app.py)