HappyHorse - The latest AI video model from Alibaba

HappyHorse is the latest AI video model from Alibaba's ATH AI Innovation Unit. HappyHorse-1.0 ranks #1 on Artificial Analysis Video Arena. It supports all four video generation modalities: Text to Video and Image to Video, each with and without native audio. API access is planned for launch on April 30.

HappyHorse-1.0 Rankings on Artificial Analysis Video Arena

HappyHorse has landed in #1 or #2 across all of the leaderboards in the Artificial Analysis Video Arena. It comfortably takes first place in the “without audio” categories, while its Elo score in the “with audio” leaderboards is nearly identical to ByteDance's Dreamina Seedance 2.0. (Updated April 2026)

Happy Horse ranked first in the text-to-video (without audio) track with 1389 Elo points, leaving the second-place Dreamina Seedance 2.0 by nearly 115 points.

Even in the text-to-video (with audio) category, Alibaba's latest AI video model ranked first in the Elo rankings, leading Dreamina Seedance 2.0 720p by 11 points.

In the image-to-video (without audio) category, it achieved an astonishingly high score of 1416, setting a new record for Alibaba's video model on this leaderboard.

Even in the audio track, which has extremely high requirements for audiovisual coordination, this "happy horse" is on par with Seedance 2.0's Elo score.

HappyHorse-1.0 AI Video Examples

The following are comparison examples of Text to Video with Audio generated by HappyHorse-1.0 versus Dreamina Seedance 2.0, Kling 3.0 Pro, grok-video-imagine, and PixVerse V6. (Tested by Artificial Analysis)

Prompt: A Pixar-style short about a nervous little traffic cone who dreams of being a finish line pylon at a major race. Other cones mock its ambitions. A construction worker accidentally places it at a marathon finish line. The cone's painted face shifts from terror to joy as runners pass. Confetti falls on its cone head. Other cones watch on TV, inspired. Audio: Traffic sounds becoming crowd cheers, inspirational swelling music.

Prompt: A basketball bouncing on an empty indoor court, creating a loud, rhythmic echo with every slap against the polished hardwood floor, punctuated by the sharp squeak of rubber sneakers.

Prompt: A flashlight beam exploring a cave system, illuminating wet limestone formations. The light catches crystalline calcite deposits that glitter and flash. Where the beam passes through shallow standing water, it creates bright caustic patterns on the submerged floor. Stalactites cast long, swinging shadows as the flashlight moves. Audio: Dripping water echoing, footsteps on wet rock, breathing in enclosed space.

Introduction of HappyHorse Team

HappyHorse was developed by Alibaba's Taotian Future Life Lab (led by Zhang Di, former Vice President of Kuaishou and Head of Keling Technology). This team joined Alibaba at the end of 2025, focusing on AI video generation.

As Taotian Group's AI R&D hub, the "Future Life Lab" is Alibaba's core e-commerce algorithm team (one of China's largest visual AI application scenarios), bringing together top technical talent and core computing resources. It focuses on tackling cutting-edge fields such as large-scale models and multimodal computing, aiming to build underlying algorithmic capabilities and incubate AI native applications. In just over a year since its establishment, the team has published more than 10 high-quality papers at top international conferences, demonstrating strong technological capabilities.

HappyHorse-1.0 Model Architecture

The Happy Horse 1.0 video model employs the Transfusion (Unified Multimodal) architecture. The core essence of this approach lies in the deep integration—within a single, unified framework—of discrete text modeling (autoregressive prediction) and continuous visual signals (Diffusion models). Although this architecture theoretically possesses the dual potential for both "understanding and generation," HappyHorse-1.0 clearly places its primary emphasis on achieving exceptional generative performance. Currently, this hybrid architecture is emerging as a key battleground for AI laboratories worldwide, as it enables significantly enhanced video generation continuity and visual fidelity—all while preserving linguistic coherence—through superior alignment efficiency.

What Users Say About HappyHorse

Frequently Asked Questions about HappyHorse