Papers - Decoders
updated
Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
Paper
• 2205.10350
• Published
• 2
Blockwise Parallel Decoding for Deep Autoregressive Models
Paper
• 1811.03115
• Published
• 2
Fast Transformer Decoding: One Write-Head is All You Need
Paper
• 1911.02150
• Published
• 9
Sequence-Level Knowledge Distillation
Paper
• 1606.07947
• Published
• 2
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper
• 2403.05135
• Published
• 45
SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question
Answering over Knowledge Graphs
Paper
• 2106.09997
• Published
• 2
Chain-of-Thought Reasoning Without Prompting
Paper
• 2402.10200
• Published
• 109
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head
Checkpoints
Paper
• 2305.13245
• Published
• 6
Chain of Thought Empowers Transformers to Solve Inherently Serial
Problems
Paper
• 2402.12875
• Published
• 13
Unveiling Encoder-Free Vision-Language Models
Paper
• 2406.11832
• Published
• 54
A Thorough Examination of Decoding Methods in the Era of LLMs
Paper
• 2402.06925
• Published
• 1