Happy Horse 1.0 AI Video Generator with Lip-Sync
Alibaba's newest AI video model — ranked #1 on Artificial Analysis Video Arena (Text-to-Video Elo 1333, Image-to-Video Elo 1392), above Sora 2, Veo 3.1, and Kling.
Joint audio-video generation in a single pass. 1080p output. Multilingual lip-sync across 7 languages: English, Mandarin, Cantonese, Japanese, Korean, German, French.
Audio
Happy Horse 1.0 generates audio + video together with lip-sync — no separate audio toggle. Output always includes native synced audio.
Video Preview
Enter a prompt and click generate to create your Happy Horse video
Happy Horse 1.0 Model
Alibaba's #1 AI video model with joint audio-video and 7-language lip-sync.
Happy Horse 1.0
Joint audio-video generation with multilingual lip-sync
- Joint audio-video output (single pass)
- 3-15 second range
- Multilingual lip-sync (7 languages)
- Up to 1080p resolution
Key Capabilities
Why Happy Horse 1.0 ranks #1 on Artificial Analysis Video Arena across both Text-to-Video and Image-to-Video benchmarks.
Native Audio + Lip-Sync
Joint diffusion of audio and video in a single forward pass — no post-production merge. Multilingual lip-sync across 7 languages for character dialogue.
Text to Video
Transform text descriptions into 3-15 second cinematic videos with native synced sound and lip-form alignment for any speech.
Image to Video
Animate still images with natural motion and synced audio. Upload a reference image and describe the motion + dialogue you want.
5 Aspect Ratios
Support for 16:9 (YouTube), 9:16 (TikTok / Reels), 1:1 (Instagram), 4:3 (legacy), and 3:4 (portrait). Pick at generation time.
Feature Deep Dive
How Happy Horse 1.0 delivers joint audio-video generation in a single forward pass.
Text-to-Video Generation
Create scenes with character dialogue, ambient sound, and expressive motion from text alone. Specify the spoken language and Happy Horse aligns lip motion in 7 languages.
Prompt example
A barista in Tokyo welcomes a customer in Japanese ("いらっしゃいませ"), warm cafe ambience, soft jazz, slow dolly forward.

Image-to-Video Animation
Bring a still photo to life with natural motion and synced audio. Drop in any reference image and describe the action + dialogue.
Prompt example
The street vendor smiles and says "Hello, my friend!" in English, neon signs flicker, drizzle catches the light.

Joint Audio-Video Generation
Audio is co-generated, not bolted on. Lip-sync alignment lands in 7 languages: English, Mandarin, Cantonese, Japanese, Korean, German, French.
Prompt example
A French chef explains a recipe in French ("On commence par le beurre…"), pan sizzles, knife taps cutting board, hand-held camera.

Five Aspect Ratios
Pick a ratio at generation time. Optimized for the major social platforms — no cropping, no letterbox.
Prompt example
9:16 vertical clip — a skateboarder lands a kickflip, ambient street, cheering crowd.

Credits Pricing
12 credits per second @ 720p · 24 credits per second @ 1080p (about $0.06/credit on Plykit Pro).
| Duration | 720p | 1080p |
|---|---|---|
| 3s | 40 credits | 80 credits |
| 5s | 60 credits | 120 credits |
| 8s | 96 credits | 192 credits |
| 10s | 120 credits | 240 credits |
| 12s | 144 credits | 288 credits |
| 15s | 180 credits | 360 credits |
How to Use Happy Horse 1.0
Generate your first Happy Horse video in three steps.
Pick a mode
Text-to-Video starts from scratch. Image-to-Video animates a reference image you upload. For both, write the action + dialogue (specify language for lip-sync).
Configure size + duration
Pick 3-15 seconds and 720p or 1080p. Pick aspect ratio for your target platform. We suggest a 5s 720p test first, then re-run at 1080p for the keeper.
Generate and download
Click Generate Video. About 38 seconds for 1080p — you get back the video with native synced audio and aligned lip-sync.
Gallery
A sample of videos created with Happy Horse 1.0.
Tokyo Barista — Japanese Dialogue
A barista in Tokyo welcomes a customer in Japanese, warm cafe ambience, soft jazz, slow dolly forward.
Multilingual lip-sync: Japanese dialogue lands cleanly with native ambience.
Hong Kong Skateboarder
9:16 vertical clip — a skateboarder lands a kickflip on a Hong Kong rooftop, ambient street, cheering crowd.
Human motion: skating physics + crowd reaction in one pass.
French Chef Tutorial
A French chef explains a recipe in French, pan sizzles, knife taps cutting board, hand-held camera.
Audio sync: pan-sizzle + knife-tap match visual motion frame-perfect.
New York Street Vendor
A street vendor smiles and says "Hello, my friend!" in English, neon signs flicker, drizzle catches the light.
I2V: still photograph animated with motion, weather, and synced English greeting.
Creators Love Happy Horse 1.0
Early feedback from creators using Happy Horse on Plykit.
The lip-sync in Mandarin is shockingly clean — better than anything I've used. No post-production matching needed.
Joint audio-video is a game-changer. I can prototype tutorial videos with native French dialogue in under a minute.
1080p in 38 seconds with synced audio at this price beats every API I've tested.
Explore More Video Models
Compare Happy Horse 1.0 with other AI video generators on Plykit.
Explore More AI Tools
Explore other AI video generation models
FAQ
Common questions about Happy Horse 1.0 on Plykit.
Ready to Create Videos with Synced Audio?
Generate AI videos with native audio + multilingual lip-sync using Happy Horse 1.0 — Alibaba's #1 video model.