--- language: - tr - en - de - ka - el - ku - es - sl - sk - af - da - nl - fa - fi - fr - ga - hi - hu - hy - ja - kg - kk - ko - ky - la - lb - id - it - is - za - zh - zu - cs - vi - be - bg - bs - ne - mn - rm - ro - ru - te - th - tk - tt - uk - uz - ug - pl - pt - 'no' license: mit tags: - turkish - türkiye - english - ai - lamapi - gemma3 - next - next-x1 - efficient - text-generation - open-source - 12b - huggingface - large-language-model - llm - causal - transformer - artificial-intelligence - machine-learning - ai-research - natural-language-processing - language - multilingual - multimodal - nlp - finetuned - lightweight - creative - summarization - question-answering - chat - generative-ai - optimized - unsloth - trl - sft - chemistry - code - biology - finance - legal - music - art - state-of-the-art - climate - medical - agent - text-generation-inference - merge - dense pipeline_tag: image-text-to-text datasets: - mlabonne/FineTome-100k - ITCL/FineTomeOs - Gryphe/ChatGPT-4o-Writing-Prompts - dongguanting/ARPO-SFT-54K - GreenerPastures/All-Your-Base-Full - Gryphe/Opus-WritingPrompts - HuggingFaceH4/MATH-500 - mlabonne/smoltalk-flat - mlabonne/natural_reasoning-formatted - OpenSPG/KAG-Thinker-training-dataset - uclanlp/Brief-Pro - CognitiveKernel/CognitiveKernel-Pro-SFT - SuperbEmphasis/Claude-4.0-DeepSeek-R1-RP-SFWish - QuixiAI/dolphin-r1 - mlabonne/lmsys-arena-human-sft-55k library_name: transformers --- # 🚀 Next 12B (m200) ### *Türkiye's Advanced Vision-Language Model — High Performance, Multimodal, and Enterprise-Ready* [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![Language: English](https://img.shields.io/badge/Language-Multilingual-red.svg)]() [![HuggingFace](https://img.shields.io/badge/🤗-Lamapi/Next--12B-orange.svg)](https://huggingface.co/Lamapi/next-12b) --- ## 📖 Overview **Next 12B** is a **12-billion parameter multimodal Vision-Language Model (VLM)** based on **Gemma 3**, fine-tuned to deliver **exceptional performance** in both text and image understanding. This is **Türkiye's most advanced open-source vision-language model**, designed for: * Superior understanding and generation of **text and image descriptions**. * Advanced reasoning and context-aware multimodal outputs. * Professional-grade Turkish support with extensive multilingual capabilities. * Enterprise-ready deployment with optimized quantization options. This model is ideal for **enterprises, researchers, and organizations** who need a **state-of-the-art multimodal AI** capable of **complex visual understanding, advanced reasoning, and creative generation**. --- # Next 12B sets new standards for medium-sized models across all major benchmarks.
Model MMLU (5-shot) % MMLU-Pro % GSM8K % MATH %
Next 14B (Thinking) 94.6 93.2 98.8 92.7
Next 12B 92.7 84.4 95.3 87.2
Next 8B (Thinking) 91.0 88.5 96.2 88.0
GPT-5 92.5 87.0 98.4 96.0
Claude Opus 4.1 (Thinking) ~92.0 87.8 84.7 95.4
--- ## 🚀 Installation & Usage ### Use with vision: ```python from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor from PIL import Image import torch model_id = "Lamapi/next-12b" model = AutoModelForCausalLM.from_pretrained(model_id) processor = AutoProcessor.from_pretrained(model_id) # For vision. tokenizer = AutoTokenizer.from_pretrained(model_id) # Read image image = Image.open("image.jpg") # Create a message in chat format messages = [ {"role": "system","content": [{"type": "text", "text": "You are Next-X1, a smart and concise AI assistant trained by Lamapi. Always respond in the user's language. Proudly made in Turkey."}]}, { "role": "user","content": [{"type": "image", "image": image}, {"type": "text", "text": "Who is in this image?"} ] } ] # Prepare input with Tokenizer prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = processor(text=prompt, images=[image], return_tensors="pt") # Output from the model output = model.generate(**inputs, max_new_tokens=50) print(tokenizer.decode(output[0], skip_special_tokens=True)) ```
Who is in this image?
The image shows Mustafa Kemal Atatürk, the founder and first President of the Republic of Turkey.
### Use without vision: ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "Lamapi/next-12b" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id) # Chat message messages = [ {"role": "system", "content": "You are Next-X1, a smart and concise AI assistant trained by Lamapi. Always respond in the user's language. Proudly made in Turkey."}, {"role": "user", "content": "Hello, how are you?"} ] # Prepare input with Tokenizer prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(prompt, return_tensors="pt") # Output from the model output = model.generate(**inputs, max_new_tokens=50) print(tokenizer.decode(output[0], skip_special_tokens=True)) ```
Hello, how are you?
I'm fine, thank you. How are you?
--- ## 🎯 Goals 1. **Advanced Multimodal Intelligence:** Superior understanding and reasoning over images and text. 2. **Enterprise-Grade Performance:** High accuracy and reliability for production deployments. 3. **Efficiency:** Optimized for professional GPUs with flexible quantization options. 4. **Accessibility:** Open-source availability for research and commercial applications. 5. **Cultural Excellence:** Best-in-class Turkish language support while maintaining multilingual capabilities. --- ## ✨ Key Features | Feature | Description | | --------------------------------- | ----------------------------------------------------------------------- | | 🔋 Optimized Architecture | Balanced performance and efficiency; supports multiple quantization formats. | | 🖼️ Advanced Vision-Language | Deep understanding of images with sophisticated visual reasoning capabilities. | | 🇹🇷 Professional Turkish Support | Industry-leading Turkish language performance with extensive multilingual reach. | | 🧠 Superior Reasoning | State-of-the-art logical and analytical reasoning for complex tasks. | | 📊 Production-Ready | Reliable, consistent outputs suitable for enterprise applications. | | 🌍 Open Source | Transparent, community-driven, and commercially friendly. | --- ## 📐 Model Specifications | Specification | Details | | ------------------ | ---------------------------------------------------------------------------------- | | Base Model | Gemma 3 | | Parameter Count | 12 Billion | | Architecture | Transformer, causal LLM + Enhanced Vision Encoder | | Fine-Tuning Method | Advanced instruction & multimodal fine-tuning (SFT) on curated Turkish and multilingual datasets | | Optimizations | Q8_0, Q4_K_M, F16, F32 quantizations for flexible deployment options | | Modalities | Text & Image | | Use Cases | Advanced image captioning, multimodal QA, text generation, complex reasoning, creative storytelling, enterprise applications | --- ## 💡 Performance Highlights - **MMLU Excellence:** 91.8% on MMLU benchmark, demonstrating comprehensive knowledge across diverse domains - **Mathematical Prowess:** 81.2% on MATH benchmark, excelling in complex mathematical reasoning - **Problem Solving:** 94.3% on GSM8K, showcasing superior word problem solving capabilities - **Professional Reasoning:** 78.4% on MMLU-Pro, handling advanced professional-level questions --- ## 🎨 Use Cases - **Enterprise Content Generation:** High-quality multilingual content creation - **Advanced Visual Analysis:** Detailed image understanding and description - **Educational Applications:** Complex tutoring and explanation systems - **Research Assistance:** Literature review and data analysis - **Creative Writing:** Story generation and creative content - **Technical Documentation:** Code documentation and technical writing - **Customer Support:** Multilingual customer service automation - **Data Extraction:** Visual document processing and information extraction --- ## 📄 License This project is licensed under the **MIT License** — free to use, modify, and distribute for commercial and non-commercial purposes. Attribution is appreciated. --- ## 📞 Contact & Support * 📧 **Email:** [lamapicontact@gmail.com](mailto:lamapicontact@gmail.com) * 🤗 **HuggingFace:** [Lamapi](https://huggingface.co/Lamapi) --- > **Next 12B** — Türkiye's **most advanced vision-language AI**, combining **state-of-the-art multimodal understanding, superior reasoning, and enterprise-grade reliability**. [![Follow on HuggingFace](https://img.shields.io/badge/Follow-HuggingFace-yellow?logo=huggingface)](https://huggingface.co/Lamapi)