-
DSG: An End-to-End Document Structure Generator
Paper • 2310.09118 • Published • 2 -
OCR-free Document Understanding Transformer
Paper • 2111.15664 • Published • 6 -
DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents
Paper • 2304.12484 • Published • 1 -
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2401.02823
-
openai/whisper-large-v3-turbo
Automatic Speech Recognition • 0.8B • Updated • 7.01M • • 2.97k -
jasperai/Flux.1-dev-Controlnet-Upscaler
Image-to-Image • Updated • 2.8k • 866 -
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 36 -
Contextual Document Embeddings
Paper • 2410.02525 • Published • 24
-
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 36 -
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters
Paper • 2403.02677 • Published • 18 -
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 32 -
TextGrad: Automatic "Differentiation" via Text
Paper • 2406.07496 • Published • 31
-
Language models are weak learners
Paper • 2306.14101 • Published • 11 -
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence
Paper • 2306.07075 • Published • 11 -
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
Paper • 2307.08674 • Published • 49 -
Nougat: Neural Optical Understanding for Academic Documents
Paper • 2308.13418 • Published • 42
-
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 36 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 65 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 191 -
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 2
-
DSG: An End-to-End Document Structure Generator
Paper • 2310.09118 • Published • 2 -
OCR-free Document Understanding Transformer
Paper • 2111.15664 • Published • 6 -
DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents
Paper • 2304.12484 • Published • 1 -
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 2
-
Language models are weak learners
Paper • 2306.14101 • Published • 11 -
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence
Paper • 2306.07075 • Published • 11 -
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
Paper • 2307.08674 • Published • 49 -
Nougat: Neural Optical Understanding for Academic Documents
Paper • 2308.13418 • Published • 42
-
openai/whisper-large-v3-turbo
Automatic Speech Recognition • 0.8B • Updated • 7.01M • • 2.97k -
jasperai/Flux.1-dev-Controlnet-Upscaler
Image-to-Image • Updated • 2.8k • 866 -
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 36 -
Contextual Document Embeddings
Paper • 2410.02525 • Published • 24
-
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 36 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 65 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 191 -
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 2
-
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 36 -
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters
Paper • 2403.02677 • Published • 18 -
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 32 -
TextGrad: Automatic "Differentiation" via Text
Paper • 2406.07496 • Published • 31