Google DeepMind is making its AI text watermark open source

23 Oct 2024, 15:00 by Melissa Heikkilä · MIT Technology Review

Google DeepMind has developed a tool for identifying AI-generated text and is making it available open source.

The tool, called SynthID, is part of a larger family of watermarking tools for generative AI outputs. The company unveiled a watermark for images last year, and it has since rolled out one for AI-generated video. In May, Google announced it was applying SynthID in its Gemini app and online chatbots and made it freely available on Hugging Face, an open repository of AI data sets and models. Watermarks have emerged as an important tool to help people determine when something is AI generated, which could help counter harms such as misinformation.

“Now, other [generative] AI developers will be able to use this technology to help them detect whether text outputs have come from their own [large language models], making it easier for more developers to build AI responsibly,” says Pushmeet Kohli, the vice president of research at Google DeepMind.

SynthID works by adding an invisible watermark directly into the text when it is generated by an AI model.

Large language models work by breaking down language into “tokens” and then predicting which token is most likely to follow the other. Tokens can be a single character, word, or part of a phrase, and each one gets a percentage score for how likely it is to be the appropriate next word in a sentence. The higher the percentage, the more likely the model is going to use it.

SynthID introduces additional information at the point of generation by changing the probability that tokens will be generated, explains Kohli.

To detect the watermark and determine whether text has been generated by an AI tool, SynthID compares the expected probability scores for words in watermarked and unwatermarked text.

Google DeepMind found that using the SynthID watermark did not compromise the quality, accuracy, creativity, or speed of generated text. That conclusion was drawn from a massive live experiment of SynthID’s performance after the watermark was deployed in its Gemini products and used by millions of people. Gemini allows users to rank the quality of the AI model’s responses with a thumbs-up or a thumbs-down.

Kohli and his team analyzed the scores for around 20 million watermarked and unwatermarked chatbot responses. They found that users did not notice a difference in quality and usefulness between the two. The results of this experiment are detailed in a paper published in Nature today. Currently SynthID for text only works on content generated by Google’s models, but the hope is that open-sourcing it will expand the range of tools it’s compatible with.

SynthID does have other limitations. The watermark was resistant to some tampering, such as cropping text and light editing or rewriting, but it was less reliable when AI-generated text had been rewritten or translated from one language into another. It is also less reliable in responses to prompts asking for factual information, such as the capital city of France. This is because there are fewer opportunities to adjust the likelihood of the next possible word in a sentence without changing facts.

“Achieving reliable and imperceptible watermarking of AI-generated text is fundamentally challenging, especially in scenarios where LLM outputs are near deterministic, such as factual questions or code generation tasks,” says Soheil Feizi, an associate professor at the University of Maryland, who has studied the vulnerabilities of AI watermarking.

Feizi says Google DeepMind’s decision to open-source its watermarking method is a positive step for the AI community. “It allows the community to test these detectors and evaluate their robustness in different settings, helping to better understand the limitations of these techniques,” he adds.

There is another benefit too, says João Gante, a machine-learning engineer at Hugging Face. Open-sourcing the tool means anyone can grab the code and incorporate watermarking into their model with no strings attached, Gante says. This will improve the watermark’s privacy, as only the owner will know its cryptographic secrets.

“With better accessibility and the ability to confirm its capabilities, I want to believe that watermarking will become the standard, which should help us detect malicious use of language models,” Gante says.

But watermarks are not an all-purpose solution, says Irene Solaiman, Hugging Face’s head of global policy.

“Watermarking is one aspect of safer models in an ecosystem that needs many complementing safeguards. As a parallel, even for human-generated content, fact-checking has varying effectiveness,” she says.