Download The Matrix model weights at 🤗 Huggingface or 🤖 ModelScope
📚 View the Paper, Website, and Documentation
👋 Say Hi to our team and members at Matrix-Team
📍 (Coming Soon) Explore The Matrix playground online at Journee to experience real-time AI generated world.
The Matrix is an advanced world model designed to generate high-quality, infinite-time interactive videos in real-time, setting a new benchmark in the field of neural interactive simulations. It is simultaneously several innovations:
- A cutting-edge world model that generates continuous, interactive video content with unparalleled realism and length.
- A real-time system that supports infinite content generation, overcoming previous limitations seen in simpler 2D game models like DOOM or Minecraft.
- A powerful model architecture powered by the Swin-DPM model, designed to produce dynamic, ever-expanding content.
- A novel training strategy that integrates both real and simulated data, enhancing the system's ability to exceptional generalization capabilities.
At its core, The Matrix combines these elements to push the boundaries of interactive video generation, making real-time, high-quality, infinite-length content a reality.
Comprehensive documentation is available in English. This includes detailed installation steps, tutorials, and training instructions. The paper and Project Page offer more details about the method.
Model checkpoints can be found in Huggingface and ModelScope. Please refer to the Documentation for how to load them for inferences.
According to a request from Alibaba Tongyi Lab, the previous version of The Matrix was inherited from an internal version of Video DiT and could not be openly released. Therefore, we have re-implemented The Matrix code based on the previously open-released video generation model, CogVideoX. We sincerely appreciate the efforts of the CogVideo team for their contributions.
As a result, the open release of our model has been delayed, and some components are still under development. These components will be released as soon as they are finished, including:
-
Inference scripts for 8-GPU parallel inference of the DiT backbone, which will accelerate the inference speed by around 6-8 times.
-
Training of the Stream Consistency Models, which will accelerate inference speed by around 7-10 times.
-
Training on fused realistic and simulated data to acquire stronger generalization ability.
The successful release of The Matrix Project is built upon the collective efforts of our incredibly talented team members. We extend our heartfelt gratitude for their dedication, hard work, and invaluable contributions. Those members are:
Longxiang Tang, Zhicai Wang, Ruili Feng, Ruihang Chu, Han Zhang, and Zhantao Yang
Special Thanks to Longxiang and Zhicai for their excellent contributions.
There have been certain changes to the hyperparameter settings and training strategy compared to what is reported in the paper due to the re-implementation. Please be aware of these when reviewing the code.
Despite these changes, we are pleased to announce that the overall generation quality is much more advanced compared to the previous version after more careful design of methods and parameters.
If you find our work useful please consider citing:
@article{feng2024matrix,
title={The matrix: Infinite-horizon world generation with real-time moving control},
author={Feng, Ruili and Zhang, Han and Yang, Zhantao and Xiao, Jie and Shu, Zhilei and Liu, Zhiheng and Zheng, Andy and Huang, Yukun and Liu, Yu and Zhang, Hongyang},
journal={arXiv preprint arXiv:2412.03568},
year={2024}
}
The code in this repository is released under the Apache 2.0 License.
The Matrix model (including its corresponding Transformers module and VAE module) is released under the Apache 2.0 License.