AI Video Maker

Use -1 for random seed

This feature is only available to subscribed users

Sample Videos

The same woman from the reference image looks directly into the camera, takes a breath, then smiles brightly and speaks with enthusiasm: “Have you heard? Alibaba Wan 2.5 API is now available on Ai Generator Hub !” Ambient audio: quiet indoor atmosphere, soft natural room tone. Camera: medium close-up, steady framing, natural daylight mood, accurate lip-sync with dialogue.

Wan 2.5Native Audio-Video Sync Solution

Whether text-to-video or image-to-video, Wan 2.5 generates cinematic visuals, native A/V sync, and diverse outputs at a fraction of traditional costs.

Alibaba Wan 2.5: A New Frontier in AI Video

Wan 2.5 is a cutting-edge AI video generation model that transforms text prompts and reference images into cinematic videos. Originally released via Alibaba Cloud DashScope, it demonstrates strong capabilities in visual realism, motion performance, and native audio-video synchronization. To facilitate integration, Alibaba introduced Wan 2.5 with preview interfaces for both Text-to-Video (T2V) and Image-to-Video (I2V), supporting lip-sync and audio-synchronized short videos. It serves as a powerful alternative to Google Veo 3, offering creators and developers a flexible, high-performance way to integrate Alibaba's frontier video technology.

Wan 2.5 Generation Modes

Text-to-Video (T2V)

Generate videos directly from text prompts. Describe scenes, actions, and environments to produce cinematic clips with native lip-sync and audio—perfect for storyboarding, marketing, and social media.

Image-to-Video (I2V)

Transform static images into dynamic short videos. Add realistic animation and perspective changes while preserving original style and character features, ideal for portraits and product displays.

Core Advantages of Wan 2.5

Native Audio & Seamless A/V Sync

Generate video and audio simultaneously in a single request. Dialogue, environmental sounds, and BGM are automatically synced for immersive experiences.

Precise Instruction Execution

Handles complex prompts with high fidelity. Camera angles, lighting, and scene dynamics are accurately rendered for stable creative output.

Flexible Style Adaptation

Supports diverse visual styles—from cinematic realism to anime and illustration—while maintaining character and scene consistency.

Multi-Modal Options

Supports multiple resolutions (720p, 1080p) and aspect ratios (16:9, 9:16, 1:1), providing flexible generation options for any platform.

Wan 2.5 vs. Veo 3: How to Choose?

Both Wan 2.5 and Google Veo 3 represent the latest in AI video tech, but they emphasize different strengths: Veo 3 leans toward cinematic realism, while Wan 2.5 focuses on native A/V sync and flexible output options.

FeatureWan 2.5Veo 3
Generation Modes
Text-to-Video & Image-to-Video
Text-to-Video & Image-to-Video
Audio & A/V Sync
Native A/V generation with dialogue and ambient sync
Audio available but less integrated
Prompt Adherence
High fidelity to complex camera and motion logic
Excellent realism; may struggle with abstract prompts
Style Adaptation
Cinematic, Anime, Illustration; strong stylization
Focus on cinematic realism; less flexible stylization
Multilingual Support
Strong English & Chinese support
Primarily English-focused
Video Duration
Up to 10 seconds
Up to ~8 seconds
Aspect Ratio Options
16:9, 9:16, 1:1
Primarily cinematic formats

Wan 2.5 Best Practices

To get the best results from Wan 2.5, clear and structured prompts are key. Here are some tips:

Precise Dialogue Scripting

Don’t just request "dialogue." Provide the exact words and specify speaker order (e.g., Character A: "Hello", Character B: "Hi").

Controlling Silence

If you don't want voices, explicitly state "no dialogue" or "no actors speaking" to maintain creative focus.

Soundscapes & Atmosphere

Describe ambient sounds like "soft rain tapping" or "dramatic action music" to set the emotional tone.

Detailed Scene Descriptions

Include settings, lighting, and camera perspectives (e.g., "wide shot at sunset, golden light") for visually coherent results.