Z Image Turbo
NEW
text to image
Z Image Turbo
Fast text-to-image with sub-second inference and accurate in-image text rendering.
01
/
07
Models
Featured models
Take a look at these crowd-favorite models.
text-to-image
Z-Image-Turbo
Fast text-to-image with sub-second inference and accurate in-image text rendering.
Z-Image-Turbo
text-to-video
Motif-Video-T2V-2B
Generate videos from text prompts with Motif-Video-2B.
image-to-video
Motif-Video-I2V-2B
Generate videos from an input image + text prompt with Motif-Video-2B.
image-to-3d
SAM-3D-Object
Reconstruct 3D object models from images using SAM 3D Object.
video-to-video
Cosmos-Transfer2.5 General
Multi-control video-to-video generation for Sim2Real transformation. Transform simulation videos to photorealistic using depth, edge, segmentation, and visibility controls.
text-to-audio
Supertonic2 Multilingual TTS
Multilingual text-to-speech synthesis supporting Korean, English, Spanish, Portuguese, and French with multiple voice styles.
Supertonic2 Multilingual TTS
video-to-video
VOID-Inpaint-V1
Remove objects from video using VOID's quadmask flow or auto SAM3 + Gemini masking.
Model list
image-to-image
Qwen-Image-Edit-LoRA
LoRA-enhanced image editing with precise control and flexible style adaptation.
Qwen-Image-Edit-LoRA
text-to-image
Qwen-Image
Text‑centric text‑to‑image with sharp glyph rendering and stable layout.
Qwen-Image
text-to-image
Qwen-Image-LoRA
LoRA-tuned text-to-image with enhanced style control and fine-grained customization.
Qwen-Image-LoRA
text-to-image
SDXL
Excels at photorealistic image synthesis with detailed text interpretation and rich visual fidelity
SDXL
image-to-image
Qwen-Image-Edit
Delivers precise image edits guided by text prompts, supporting nuanced visual modifications. Output image size can be slightly resized from the input image.
Qwen-Image-Edit
text-to-image
FLUX.1-schnell
Speed‑tuned text‑to‑image with tight prompt alignment and legible text.
FLUX.1-schnell
audio-to-text
Whisper Large V3 Turbo (Korean ASR)
Automatic speech recognition model fine-tuned for Korean. Transcribes audio files to text with high accuracy.
Whisper Large V3 Turbo (Korean ASR)
image-to-3d
SAM-3D-Body
Reconstruct 3D human body models from images using SAM 3D Body.
image-to-mask
SAM3-Auto-Image
Automatic Segmentation for images using SAM3.
SAM3-Auto-Image
image-to-mask
SAM3-PVS-Image
Promptable Visual Segmentation for images using SAM3.
SAM3-PVS-Image
video-to-mask
SAM3-PVS-Video
Promptable Visual Segmentation for videos using SAM3.
video-to-video
Wan2.2-Animate-Replace
Replace characters in videos while preserving the original background and environment.
video-to-video
Wan2.2-Animate-Move
Transfer a reference character into a motion video, replacing both character and scene.
image-to-video
Wan2.2-I2V-A14B
High-quality 14B MoE image-to-video with dual-expert denoising architecture.
text-to-video
Wan2.2-T2V-A14B
High-quality 14B MoE text-to-video with dual-expert denoising architecture.
image-to-video
Wan2.2-FLF2V-A14B
Interpolate smooth video between first and last keyframe images with text guidance.
image-to-video
Wan2.2-I2V-5B
Lightweight 5B image-to-video optimized for fast inference on consumer GPUs.
text-to-text
Prompt-Enhancer
Expand short text prompts into detailed, optimized descriptions for image generation.
Prompt-Enhancer
text-to-image
Sana-Sprint-1.6B
Distilled 1-4 step text-to-image for sub-second generation via consistency distillation.
Sana-Sprint-1.6B
text-to-image
Sana-v1.5-1.6B
Lightweight text-to-image with inference-time scaling for quality beyond its model size.
Sana-v1.5-1.6B
text-to-image
Sana-v1.5-4.8B
High-fidelity 4.8B text-to-image with 60-layer deep architecture for maximum quality.
Sana-v1.5-4.8B
image-to-mask
SAM3-PCS-Image
Promptable Concept Segmentation for images using SAM3.
SAM3-PCS-Image
video-to-mask
SAM3-PCS-Video
Promptable Concept Segmentation for videos using SAM3.
text-to-image
ERNIE-Image-Turbo
Fast text-to-image generation by Baidu. DMD and RL optimized 8B DiT model achieving high-quality results in only 8 inference steps. Excels at complex instruction following, text rendering, and structured image generation.
ERNIE-Image-Turbo
text-to-image
Nucleus-Image
Sparse MoE text-to-image with 17B total parameters (2B active per pass).
Nucleus-Image
text-to-text
deepseek/deepseek-v4-flash
Fast DeepSeek V4 chat model with a 1M-token context window for coding, reasoning, and agentic text generation.
deepseek/deepseek-v4-flash
text-to-video
Cosmos3-Nano t2v
Generate videos from text prompts with Cosmos Nano 16B. A World Foundation Model for Physical AI that generates physically accurate motion and realistic dynamics.
image-to-video
Cosmos3-Nano i2v
Generate videos from an image and text prompt with Cosmos Nano 16B. A World Foundation Model for Physical AI that generates physically accurate motion and realistic dynamics.
Categories