Peter Szemraj's picture

Peter Szemraj PRO

pszemraj

·

https://pszemraj.carrd.co/

AI & ML interests

metallic intuition

Recent Activity

liked a model about 17 hours ago

mradermacher/granite-embedding-small-english-r2-GGUF

updated a dataset 1 day ago

pszemraj/LocalLLaMA-posts

liked a model 2 days ago

onnx-community/granite-embedding-small-english-r2-ONNX

View all activity

Organizations

upvoted a paper 3 days ago

dLLM: Simple Diffusion Language Modeling

Paper • 2602.22661 • Published 7 days ago • 110

upvoted a paper 7 days ago

On Data Engineering for Scaling LLM Terminal Capabilities

Paper • 2602.21193 • Published 9 days ago • 90

upvoted a paper 8 days ago

On the "Induction Bias" in Sequence Models

Paper • 2602.18333 • Published 13 days ago • 4

upvoted a collection 8 days ago

Nemotron-Terminal

We are releasing Nemotron-Terminal models and training datasets. • 5 items • Updated 1 day ago • 25

upvoted a paper 8 days ago

Agents of Chaos

Paper • 2602.20021 • Published 10 days ago • 29

upvoted 3 papers 9 days ago

Revisiting the Platonic Representation Hypothesis: An Aristotelian View

Paper • 2602.14486 • Published 17 days ago • 11

Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook

Paper • 2602.14299 • Published 18 days ago • 26

Reinforced Fast Weights with Next-Sequence Prediction

Paper • 2602.16704 • Published 15 days ago • 13

upvoted a collection 21 days ago

Health AI Developer Foundations (HAI-DEF)

Groups models released for use in health AI by Google. Read more about HAI-DEF at http://goo.gle/hai-def • 22 items • Updated Jan 12 • 200

upvoted a paper 23 days ago

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

Paper • 2602.06291 • Published 27 days ago • 23

upvoted a paper 24 days ago

Revisiting the Shape Convention of Transformer Language Models

Paper • 2602.06471 • Published 27 days ago • 4

upvoted 2 papers 27 days ago

Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR

Paper • 2602.05261 • Published 28 days ago • 49

Horizon-LM: A RAM-Centric Architecture for LLM Training

Paper • 2602.04816 • Published 29 days ago • 17

upvoted a paper 29 days ago

Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation

Paper • 2601.22813 • Published Jan 30 • 57

upvoted 6 papers about 1 month ago

Linear representations in language models can change dramatically over a conversation

Paper • 2601.20834 • Published Jan 28 • 21

Do Reasoning Models Enhance Embedding Models?

Paper • 2601.21192 • Published Jan 29 • 25

Scaling Embeddings Outperforms Scaling Experts in Language Models

Paper • 2601.21204 • Published Jan 29 • 100

CGPT: Cluster-Guided Partial Tables with LLM-Generated Supervision for Table Retrieval

Paper • 2601.15849 • Published Jan 22 • 14

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Paper • 2601.18778 • Published Jan 26 • 41

Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 24