Why Avataar Is Bullish About Cracking AI Video And Outdoing Global Giants

12 Jun 2026, 14:07 by Lokesh Choudhary · Inc42

SUMMARY

Avataar today launched Varya, an India-built AI video model that claims to generate videos at just ₹0.50 per second
The startup is betting on efficiency over scale, arguing that lower inference costs can solve the economic challenges that have plagued AI video models like Sora
Backed by Peak XV, Avataar aims to make AI video affordable for businesses, creators and educators, positioning Varya as an India-first alternative to global models

Added to Saved Stories in Login

The race in AI video generation is currently being dominated by global giants like OpenAI, Google and Chinese startups flush with compute budgets running into the billions of dollars. While all of them are competing with bigger models and more GPUs, a Bengaluru-based startup has emerged to buck this trend.

Much like how Chinese AI lab DeepSeek turned the tide in the LLM market, the twelve-year-old Peak XV-backed Avataar has attempted a similar disruption in the AI video generation market.

With the launch of Varya, an AI video generation model, Avataar claims that videos can now be generated for as low as ₹0.50 per second, which is at least 10X lower than the cheapest AI video creation model available right now.

According to the startup, this is India’s first distilled AI video generation model developed under the government’s IndiaAI Mission. Generally, AI video generation is a token-heavy task and requires large computational power.

With its distilled AI video generation model, the startup has been able to significantly reduce its compute cost, proving that the winners in this race won’t be the ones that use more compute but the one who needs the least of it.

However, the cost advantage for Avataar does not come from building a smaller model. “Most distillation projects work by shrinking the parameter count. A 70-Bn-parameter model gets compressed to 7 Bn,” said Sravanth Aluru, Avataar’s cofounder and CEO.

This method, however, helps users save costs and time, but quality often goes for a toss.

Avataar has already taken cognisance of this. Varya is built on Alibaba’s open-source Wan 2.2 architecture and retains a 14-Bn-parameter footprint, the same size as its teacher model.

What Avataar changed is how it reasons about video generation. Standard diffusion-based video models generate output through a long iterative denoising process, typically around 50 sequential steps, where the model progressively refines a noisy signal into a coherent video.

Varya collapses this to four steps, but does so through a redesigned inference framework where each step carries a distinct function rather than repeating the same operation at a finer resolution.

Aluru told Inc42 that the first two steps focus on trajectory shaping, establishing the broad structure, motion path and compositional logic of the video, while the final two steps generate the actual output frames.

The system integrates several techniques under the hood, including role-aware supervision, distribution matching and classifier-free guidance augmentation.

Notably, on an NVIDIA H200 GPU, Varya generates a five-second 720p video in approximately 45 seconds.

The same task on Wan 2.2, its underlying base model, takes roughly 1,230 seconds. The company claims a 27X improvement in speed and cost versus its teacher model.

Can Avataar Succeed Where Sora Failed?

To understand why Varya matters, here is a look at the global AI video market, which managed to spend staggering sums of money while failing to find a mass audience.

The cautionary tale here is OpenAI’s Sora. Launched in late 2024 to extraordinary hype, the model was producing photorealistic video clips that seemed to herald a new era in content creation.

The economics behind the scenes were far less glamorous. By March 2026, each 10-second clip cost OpenAI approximately $1.30 to produce, translating into roughly $15 Mn per day in inference costs, against lifetime revenue of just $2.1 Mn.

RECOMMENDED FOR yOU