Microsoft announces 7 new AI models of their own, trained on clean data

05 Jun 2026, 01:56

In what seems like a step in the right direction, Microsoft recently announced 7 new AI models of their own that have been trained on clean data. These are MAI Image-2.5, MAI Image-2.5-Flash, MAI Transcribe-1.5, MAI Thinking-1, MAI Voice-2, MAI Voice-2-Flash and MAI Code-1-Flash. Most LLMs (Large Language Models) are trained using public data which are usually not paid for.

Some are clean, some are still not

Announced at the recent Microsoft Build conference for developers, these 7 new models cover image generation, audio to text transcription, prompting, voice generation and coding. However, this does not mean that Microsoft’s CoPilot+ is completely clean though, as Microsoft still uses OpenAI’s ChatGPT and Anthropic’s Claude as part of their multi-modal AI approach.

Personally, we’d like it if all LLMs were retrained from scratch on clean data that has been paid for, partly to help reduce hallucinations and not just because it’s the right thing to do. But what do you think? Do you feel that it’s ok for LLMs to continue using public data to train their AI? Share your thoughts in the comments below and stay tuned to TechNave.com