--- license: mit pipeline_tag: time-series-forecasting --- # VisionTS++: Cross-Modal Time Series Foundation Model with Continual Pre-trained Visual Backbones This repository hosts the **VisionTS++** model, a state-of-the-art time series foundation model based on continual pre-training of a visual Masked AutoEncoder (MAE) on large-scale time series data. It excels in multivariate and probabilistic time series forecasting by bridging modality gaps between vision and time series data. The model was introduced in the paper: [**VisionTS++: Cross-Modal Time Series Foundation Model with Continual Pre-trained Vision Backbones**](https://arxiv.org/abs/2508.04379) Official GitHub repository: [https://github.com/HALF111/VisionTSpp](https://github.com/HALF111/VisionTSpp) Experience **VisionTS++** directly in your browser on the [Hugging Face Space](https://huggingface.co/spaces/Lefei/VisionTSpp)! You can upload your own custom time series CSV file for zero-shot forecasting. ## About VisionTS++ is built upon continual pre-training of a vision model on large-scale time series, addressing key discrepancies in cross-modal transfer from vision to time series. It introduces three key innovations: 1. **Vision-model-based filtering**: Identifies high-quality sequences to stabilize pre-training and mitigate the data-modality gap. 2. **Colorized multivariate conversion**: Encodes multivariate series as multi-subfigure RGB images to enhance cross-variate modeling. 3. **Multi-quantile forecasting**: Uses parallel reconstruction heads to generate quantile forecasts for probabilistic predictions without parametric assumptions. These innovations allow VisionTS++ to achieve state-of-the-art performance in both in-distribution and out-of-distribution forecasting, demonstrating that vision models can effectively generalize to Time Series Forecasting with appropriate adaptation.