TripoSG has emerged as a cutting-edge foundation model that excels at high-fidelity, high-quality, and high-generalizability image-to-3D generation. By utilizing large-scale rectified flow transformers and hybrid supervised training techniques, this innovative model achieves remarkable performance in the realm of 3D shape synthesis.

✨ Key Features

  • High-Fidelity Generation: TripoSG produces intricate 3D meshes characterized by sharp geometric features, detailed surface textures, and complex architectural structures. This makes it suitable for applications requiring precision, such as video games and virtual reality environments.
  • Semantic Consistency: The generated shapes maintain a high degree of semantic integrity, allowing them to accurately mirror the semantics and visual characteristics of the input images. This fidelity is crucial for tasks such as model reconstruction and design.
  • Strong Generalization: TripoSG is adept at handling a wide variety of input styles, ranging from photorealistic images to cartoons and sketches. This versatility is a significant advantage for artists and designers who work across different visual styles.
  • Robust Performance: The model demonstrates the ability to create coherent and accurate shapes even when presented with challenging inputs that feature complex topologies.

🔬 Technical Highlights

  • Large-Scale Rectified Flow Transformer: TripoSG employs a unique combination of rectified flow’s linear trajectory modeling with a transformer architecture. This collaborative approach ensures stable and efficient training processes.
  • Advanced VAE Architecture: The model incorporates Signed Distance Functions (SDFs) along with hybrid supervision that includes SDF loss, surface normal guidance, and eikonal loss. This advanced architecture boosts the model’s accuracy and effectiveness in generating 3D shapes.
  • High-Quality Dataset: TripoSG has been trained on an impressive dataset comprising 2 million meticulously curated Image-SDF pairs, which guarantees superior quality in the output shapes.
  • Efficient Scaling: The model features architectural optimizations that maintain high performance even at smaller scales, which is beneficial for users with limited computational resources.

🔥 Updates

As of March 2025, the release of TripoSG's 1.5B parameter rectified flow model and its VAE, trained on 2048 latent tokens, along with the inference code and an interactive demo, marks a significant milestone in its development. This update is expected to enhance user experience and expand the model's capabilities.

🔨 Installation Instructions

To get started with TripoSG, users can clone the repository using the following command:

git clone https://github.com/VAST-AI-Research/TripoSG.git
cd TripoSG

Creating a conda environment is optional, but recommended for managing dependencies:

conda create -n tripoSG python=3.10
conda activate tripoSG

Next, users need to install the required dependencies, including PyTorch (make sure to select the correct CUDA version) and other necessary libraries:

# pytorch (select correct CUDA version)
pip install torch torchvision --index-url https://download.pytorch.org/whl/{your-cuda-version}
# other dependencies
pip install -r requirements.txt

💡 Quick Start

To generate a 3D mesh from an image, users simply need to run the following command:

python -m scripts.inference_triposg --image-input assets/example_data/hjswed.png

Upon execution, the required model weights will be automatically downloaded, including:

  • TripoSG model from VAST-AI/TripoSG → pretrained_weights/TripoSG
  • RMBG model from briaai/RMBG-1.4 → pretrained_weights/RMBG-1.4

💻 System Requirements

To effectively utilize TripoSG, a CUDA-enabled GPU with at least 8GB of VRAM is required, ensuring that users can handle the demanding computations involved in 3D generation.

📝 Tips for Users

If users wish to utilize the full VAE module, including the encoder component, they must uncomment Line-15 in triposg/models/autoencoders/autoencoder_kl_triposg.py and install torch-cluster. After doing so, they can run:

python -m scripts.inference_vae --surface-input assets/example_data_point/surface_point_demo.npy

🤝 Community & Support

For those eager to explore TripoSG, an interactive demo is available on Hugging Face Spaces. Users are encouraged to utilize GitHub Issues for reporting any bugs or submitting feature requests. The TripoSG team is also welcoming contributions from users keen on enhancing the project.

📚 Citation

Those looking to reference this work can use the following citation:

@article{li2025triposg,
title={TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models},
author={Li, Yangguang and Zou, Zi-Xin and Liu, Zexiang and Wang, Dehu and Liang, Yuan and Yu, Zhipeng and Liu, Xingchao and Guo, Yuan-Chen and Liang, Ding and Ouyang, Wanli and others},
journal={arXiv preprint arXiv:2502.06608},
year={2025} }

⭐ Acknowledgements

The success of TripoSG can be attributed to the collaborative efforts of numerous open-source projects and research initiatives. Special thanks are extended to:

  • DINOv2 for providing powerful visual features.
  • RMBG-1.4 for developing background removal technology.
  • 🤗 Diffusers for their exceptional diffusion model framework.
  • HunyuanDiT for DiT technology.
  • 3DShape2VecSet for their contributions to 3D shape representation.

The wider research community's open exploration and contributions to the field of 3D generation are also deeply appreciated.