MANZANO: A Easy and Scalable Unified Multimodal Mannequin with a Hybrid Imaginative and prescient Tokenizer

January 12, 2026

7

Unified multimodal Giant Language Fashions (LLMs) that may each perceive and generate visible content material maintain immense potential. Nevertheless, current open-source fashions usually undergo from a efficiency trade-off between these capabilities. We current Manzano, a easy and scalable unified framework that considerably reduces this stress by coupling a hybrid picture tokenizer with a well-curated coaching recipe. A single shared imaginative and prescient encoder feeds two light-weight adapters that produce steady embeddings for image-to-text understanding and discrete tokens for text-to-image technology inside a standard semantic house. A unified autoregressive LLM predicts high-level semantics within the type of textual content and picture tokens, with an auxiliary diffusion decoder subsequently translating the picture tokens into pixels. The structure, along with a unified coaching recipe over understanding and technology knowledge, permits scalable joint studying of each capabilities. Manzano achieves state-of-the-art outcomes amongst unified fashions, and is aggressive with specialist fashions, significantly on text-rich analysis. Our research present minimal activity conflicts and constant positive factors from scaling mannequin dimension, validating our design selection of a hybrid tokenizer.

† Meta
** Work completed whereas at Apple

MANZANO: A Easy and Scalable Unified Multimodal Mannequin with a Hybrid Imaginative and prescient Tokenizer

Related Articles

America Beneath Surveillance with Michael Soyfer

MoEs Are Stronger than You Assume: Hyper-Parallel Inference Scaling with RoE

Opening the AWS European Sovereign Cloud

LEAVE A REPLY Cancel reply

Latest Articles

America Beneath Surveillance with Michael Soyfer

MoEs Are Stronger than You Assume: Hyper-Parallel Inference Scaling with RoE

Opening the AWS European Sovereign Cloud

5 N8N Tasks to Grasp Low-Code AI Automation

The right way to defend your private home from rising WiFi jammer assaults earlier than it’s too late – Automated Residence