Is Happy Horse 1.0 free?

Happy Horse 1.0 uses a credit system. New users receive 5 free credits on signup. A 5s 720p clip costs about 60 credits; a 5s 1080p clip about 120 credits. Plykit Pro at $9.9/mo includes 1000 credits.

What makes Happy Horse different from Sora 2 / Veo 3 / Kling?

Happy Horse 1.0 ranks #1 on Artificial Analysis Video Arena across both Text-to-Video (Elo 1333) and Image-to-Video (Elo 1392), above Sora 2, Veo 3.1, and Kling. Its real differentiator is joint audio-video generation in a single pass, with multilingual lip-sync across 7 languages — every other model needs post-production to achieve this.

How much does it cost?

12 credits per second @ 720p ($0.14/sec API cost), 24 credits per second @ 1080p ($0.28/sec). For example: 5s @ 720p = 60 credits; 10s @ 1080p = 240 credits. 1 credit ≈ $0.06 on Plykit Pro.

Can I use generated videos commercially?

Yes. You own the content you create and can use it commercially, subject to Plykit's standard terms of service.

🏆 #1 on Video Arena · Launched 2026-04-26NEW · Happy Horse 1.0 by Alibaba · 2026-04

Happy Horse 1.0 AI Video Generator with Lip-Sync

Alibaba's newest AI video model — ranked #1 on Artificial Analysis Video Arena (Text-to-Video Elo 1333, Image-to-Video Elo 1392), above Sora 2, Veo 3.1, and Kling.

Joint audio-video generation in a single pass. 1080p output. Multilingual lip-sync across 7 languages: English, Mandarin, Cantonese, Japanese, Korean, German, French.

3-15s Range

Native Audio + Lip-Sync

5 Aspect Ratios

Mode

Model

Happy Horse 1.060

Duration

Aspect Ratio

Resolution

Audio

Happy Horse 1.0 generates audio + video together with lip-sync — no separate audio toggle. Output always includes native synced audio.

Prompt

Video Preview

Enter a prompt and click generate to create your Happy Horse video

Happy Horse 1.0 Model

Alibaba's #1 AI video model with joint audio-video and 7-language lip-sync.

#1 VIDEO ARENA

Happy Horse 1.0

Joint audio-video generation with multilingual lip-sync

#1 Video Arena

Joint audio-video output (single pass)
3-15 second range
Multilingual lip-sync (7 languages)
Up to 1080p resolution

From 36 credits / 3s @ 720p

Key Capabilities

Why Happy Horse 1.0 ranks #1 on Artificial Analysis Video Arena across both Text-to-Video and Image-to-Video benchmarks.

Native Audio + Lip-Sync

Joint diffusion of audio and video in a single forward pass — no post-production merge. Multilingual lip-sync across 7 languages for character dialogue.

Text to Video

Transform text descriptions into 3-15 second cinematic videos with native synced sound and lip-form alignment for any speech.

Image to Video

Animate still images with natural motion and synced audio. Upload a reference image and describe the motion + dialogue you want.

5 Aspect Ratios

Support for 16:9 (YouTube), 9:16 (TikTok / Reels), 1:1 (Instagram), 4:3 (legacy), and 3:4 (portrait). Pick at generation time.

Feature Deep Dive

How Happy Horse 1.0 delivers joint audio-video generation in a single forward pass.

Text to Video

Text-to-Video Generation

Create scenes with character dialogue, ambient sound, and expressive motion from text alone. Specify the spoken language and Happy Horse aligns lip motion in 7 languages.

Prompt example

A barista in Tokyo welcomes a customer in Japanese ("いらっしゃいませ"), warm cafe ambience, soft jazz, slow dolly forward.

Image to Video

Image-to-Video Animation

Bring a still photo to life with natural motion and synced audio. Drop in any reference image and describe the action + dialogue.

Prompt example

The street vendor smiles and says "Hello, my friend!" in English, neon signs flicker, drizzle catches the light.

Audio + Lip-Sync

Joint Audio-Video Generation

Audio is co-generated, not bolted on. Lip-sync alignment lands in 7 languages: English, Mandarin, Cantonese, Japanese, Korean, German, French.

Prompt example

A French chef explains a recipe in French ("On commence par le beurre…"), pan sizzles, knife taps cutting board, hand-held camera.

Aspect Ratios

Five Aspect Ratios

Pick a ratio at generation time. Optimized for the major social platforms — no cropping, no letterbox.

Prompt example

9:16 vertical clip — a skateboarder lands a kickflip, ambient street, cheering crowd.

Credits Pricing

12 credits per second @ 720p · 24 credits per second @ 1080p (about $0.06/credit on Plykit Pro).

1 credit ≈ $0.06 on Plykit Pro. Audio is always included — Happy Horse generates audio + video in one pass.
Duration	720p	1080p
3s	40 credits	80 credits
5s	60 credits	120 credits
8s	96 credits	192 credits
10s	120 credits	240 credits
12s	144 credits	288 credits
15s	180 credits	360 credits

How to Use Happy Horse 1.0

Generate your first Happy Horse video in three steps.

Step 1

Pick a mode

Text-to-Video starts from scratch. Image-to-Video animates a reference image you upload. For both, write the action + dialogue (specify language for lip-sync).

Step 2

Configure size + duration

Pick 3-15 seconds and 720p or 1080p. Pick aspect ratio for your target platform. We suggest a 5s 720p test first, then re-run at 1080p for the keeper.

Step 3

Generate and download

Click Generate Video. About 38 seconds for 1080p — you get back the video with native synced audio and aligned lip-sync.

Gallery

A sample of videos created with Happy Horse 1.0.

Tokyo Barista — Japanese Dialogue

A barista in Tokyo welcomes a customer in Japanese, warm cafe ambience, soft jazz, slow dolly forward.

Multilingual lip-sync: Japanese dialogue lands cleanly with native ambience.

Hong Kong Skateboarder

9:16 vertical clip — a skateboarder lands a kickflip on a Hong Kong rooftop, ambient street, cheering crowd.

Human motion: skating physics + crowd reaction in one pass.

French Chef Tutorial

A French chef explains a recipe in French, pan sizzles, knife taps cutting board, hand-held camera.

Audio sync: pan-sizzle + knife-tap match visual motion frame-perfect.

New York Street Vendor

A street vendor smiles and says "Hello, my friend!" in English, neon signs flicker, drizzle catches the light.

I2V: still photograph animated with motion, weather, and synced English greeting.

Creators Love Happy Horse 1.0

Early feedback from creators using Happy Horse on Plykit.

The lip-sync in Mandarin is shockingly clean — better than anything I've used. No post-production matching needed.

Lin — Travel Vlogger

Joint audio-video is a game-changer. I can prototype tutorial videos with native French dialogue in under a minute.

Marc — French Cooking Channel

1080p in 38 seconds with synced audio at this price beats every API I've tested.

Asha — Indie Filmmaker

Explore More Video Models

Compare Happy Horse 1.0 with other AI video generators on Plykit.

Kling

Video

Cost-effective AI video with native audio by Kuaishou.

Try now

Sora 2

Video

OpenAI's advanced video model with cinematic quality.

Try now

Veo 3.1

Video

Google DeepMind's video model with best-in-class audio.

Try now

Flux 2

Image

Top open-source image model by Black Forest Labs with high fidelity.

Try now

Nano Banana

Image

Our flagship image model powered by Gemini for creative magic.

Try now

Explore More AI Tools

Explore other AI video generation models

FAQ

Common questions about Happy Horse 1.0 on Plykit.

Ready to Create Videos with Synced Audio?

Generate AI videos with native audio + multilingual lip-sync using Happy Horse 1.0 — Alibaba's #1 video model.

Happy Horse 1.0 AI Video Generator with Lip-Sync

Video Preview

Happy Horse 1.0 Model

Happy Horse 1.0

Key Capabilities

Native Audio + Lip-Sync

Text to Video

Image to Video

5 Aspect Ratios

Feature Deep Dive

Text-to-Video Generation

Image-to-Video Animation

Joint Audio-Video Generation

Five Aspect Ratios

Credits Pricing

How to Use Happy Horse 1.0

Pick a mode

Configure size + duration

Generate and download

Gallery

Creators Love Happy Horse 1.0

Explore More Video Models

Kling

Sora 2

Veo 3.1

Flux 2

Nano Banana

Explore More AI Tools

AI Video Generators

Sora 2

Veo 3

Seedance 2

FAQ

Ready to Create Videos with Synced Audio?