A multimodal AI model is a system that can understand and process different types of data together, such as text, images, audio, video and documents.India Today

19-year-old from Bihar uses money saved for laptop to build multimodal AI model

Abhinav Anand, 19, says he built ArcleIntelligence while studying in Bihar and training it with his own savings. His account has drawn attention to independent AI development emerging beyond formal labs in India.

09 May 2026, 12:09 by India Today Education Desk · India Today

In Short

Anand began coding after failing to afford YouTube analytics tool VidIQ
Several early projects collapsed, pushing him deeper into model design
He says ArcleIntelligence handles text, images, audio, video and documents

In a room in Bihar, while most Class 12 students were preparing for board exams, 19-year-old Abhinav Anand was thinking about something else entirely.

A model with token windows, multimodal reasoning systems, and GPU compute costs. Now he is asking for support to go beyond and develop something more.

At one point during his half-yearly examination, he says, he stopped writing answers and began thinking about architecture decisions for an AI model he was building. He failed the exam.

“I do not regret it,” he wrote in a long public post on Reddit explaining the journey behind his project, ArcleIntelligence.

The model, still under training, is a 5.82-billion-parameter multimodal AI system designed to process text, images, documents, audio and video together.

It can generate text, images and speech.

Anand says it is not a wrapper built on top of another chatbot, but a system trained through separate specialist models connected into one reasoning backbone.

At a time when many young developers are showing interest in artificial intelligence and building their own AI systems, it has become important to understand what these models actually are.

A multimodal AI model is a system that can understand and process different kinds of data together, including text, images, audio, video and documents.

Unlike traditional AI systems that work with only one format at a time, multimodal models combine multiple inputs to produce more connected and human-like responses.

IT STARTED WITH A YOUTUBE PROBLEM

The story begins with YouTube.

Two and a half years ago, Anand was creating gaming content online.

He wanted better analytics tools for his channel but could not afford VidIQ, a subscription-based YouTube analytics platform. So he attempted to build his own version.

He had no background in artificial intelligence.

“I just knew ChatGPT existed,” he wrote.

The first projects failed.

A YouTube analytics app did not work. An offline voice assistant failed. A privacy-focused AI system also collapsed during development.

Before beginning ArcleIntelligence, Anand trained a text-to-video model from scratch on a regular laptop.

He documented the process publicly online. According to him, the work drew the attention of Lightning AI, which later invited him to publish the project as an official template on its platform.

That moment, he says, convinced him that the work was moving beyond experimentation.

BUILDING A MODEL WITHOUT A TEAM

The technical ambition behind ArcleIntelligence is large.

Anand claims the system has a context window of more than two million tokens and uses a hybrid reasoning architecture combining state space models and attention mechanisms.

The model can process text, images, documents, audio and video together. It can also generate speech and images. Anand says the architecture relies on specialist AI systems connected through trained layers that allow them to communicate with a shared reasoning backbone.

The document engine, he says, scored above several well-known AI systems on the OmniDocBench benchmark.

The GitHub repository currently contains training scripts and architecture details, while the model weights are expected to be released publicly after training is completed.

THE 1.2 LAKH DECISION

The project has been built without institutional funding.

Anand says he used startup compute grants, cloud credits and his own savings. The biggest personal cost was Rs 1.2 lakh he had saved to buy a gaming laptop. Instead of buying the machine, he spent the money on GPU compute for training runs.

“I spent every rupee on compute,” he wrote.

He describes himself as a solo developer with no investors, no team and no formal computer science degree. His father works as a government officer and his mother is a homemaker.

He also says the project came at a personal cost. Sleep routines disappeared. School examinations suffered. Much of the past two years, according to him, were spent learning through trial and error.

WHY THE PROJECT IS DRAWING ATTENTION

Now Anand says he needs around $35,000 to complete training, benchmark testing and hosting infrastructure. In return, he plans to release the model weights and source code publicly for the open-source community.

Beyond the technical details, the project reflects a larger shift taking place quietly across India. Advanced AI research, once limited to major labs and universities, is increasingly being attempted by independent developers working from small towns, bedrooms and college hostels.

India has one of the world’s largest developer populations, yet very few globally recognised foundation models built independently within the country.

“The west has its AI labs. The east has its AI labs,” he wrote. “India still has very little representation in foundation models built openly.”

Whether ArcleIntelligence succeeds commercially or not may take time to determine.

But the journey itself, a teenager in Bihar trading a gaming laptop dream for compute credits and model training, already says something about where India’s next generation of AI builders may come from.

- Ends