brief
InternVL 2.5 is a new generation of Multimodal Large Language Model (MLLM) series launched by OpenGVLab team. As an upgraded version of InternVL 2.0, it maintains the original architecture and achieves significant performance improvement through innovative training strategies and data processing methods. This open-source model performs well in a number of benchmark tests and can even compete with commercial models such as GPT-4o and Claude-3.5-Sonnet.

Core Highlights
- Breakthrough Performance: First open source MLLM to score over 70% in MMMU benchmarks
- Flexible Architecture: Provides a choice of models in different scales from 1B to 78B
- Innovative training strategies: Dramatically reduce training costs by adopting an incremental scaling approach
- Real Scene Optimization: Enhanced adaptation to web images through special techniques
Key Features
InternVL 2.5 series has powerful multimodal understanding and generation capabilities:
- graphic understanding: Can accurately parse the content of pictures and make inferences
- cross-modal alignment: Effectively connecting visual and verbal information
- complex inference: excels in tasks requiring multi-step reasoning
- Multi-Size Adaptation: Versions are available for small applications to enterprise level requirements
technological breakthrough
1. Progressive expansion strategy
The development team discovered an interesting phenomenon: even if a smaller language model (e.g., 20B) is used to train the visual coder, the resulting visual features can be directly understood by a larger language model (e.g., 72B). Based on this finding, they designed a staged training approach:
- Train the visual encoder with a small model first to reduce computational cost
- Then seamlessly migrate to the larger model without retraining
- The end result is high performance with significant resource savings
2. Innovative training techniques
- Random JPEG compression: Simulating web image quality differences to enhance model robustness
- Weighting of losses: Balance the gradient bias of long and short answers to enhance training effectiveness
3. Data optimization programme
- Intelligent Filtration: Use of LLM scores combined with rule-based filtering to reduce anomalous samples
- Data Packaging: Improve GPU utilization and accelerate the training process
Fits the crowd
InternVL 2.5 series is suitable:
- AI researchers: want to explore multimodal modeling frontiers
- Developers: need to build visual-verbal interaction applications
- Enterprise users: looking for commercially available open source big model solutions
- Technophiles: learners interested in the latest AI advances
MPO optimized version
The InternVL2.5-MPO series delivers an average of 2 percentage points of performance improvement over the original through hybrid preference optimization technology. Its core innovations include:
- Multimodal Preference Dataset (MMPR): Approximately 3 million high-quality samples
- Mixed preference optimization algorithm (MPO): Learning about relative preferences and absolute quality at the same time
Model Selection
InternVL 2.5 offers a wide range of model sizes, from lightweight to very large:
| model size | visual component | language component | Applicable Scenarios |
|---|---|---|---|
| 1B-8B | InternViT-300M | Small-scale LLM | Mobile/edge computing |
| 26B-78B | InternViT-6B | Large-scale LLM | Enterprise Applications |
Each model provides download links for Hugging Face and ModelScope for easy access.
summarize
The InternVL 2.5 series represents the latest advances in open source multimodal macromodels, striking an excellent balance between performance and efficiency through innovative training strategies and system optimization. It offers highly competitive options for both research and commercial applications. Most importantly, as an open source project, it makes a significant contribution to promoting the democratization of AI.
Official Resources::
byword
Open Source Multimodal Large Models, InternVL 2.5, Multimodal AI, Visual Language Models, MLLM, Artificial Intelligence, Model Training Strategies, Open Source AI Tools