Overview of LLM text generation and generative watermarking. Credit: Nature (2024). DOI: 10.1038/s41586-024-08025-4

Sampling algorithm can 'watermark' AI-generated text to show where it came from

23 Oct 2024, 19:33 by Nature Publishing Group · Tech Xplore

A tool that can watermark text generated by large language models, improving the ability for it to identify and trace synthetic content, is described in Nature this week.

Large language models (LLMs) are widely used artificial intelligence (AI) tools that can generate text for chatbots, writing support and other purposes. However, it can be difficult to identify and attribute AI-generated text to a specific source, putting the reliability of the information into question. Watermarks have been proposed as a solution to this problem, but have not been deployed at scale because of stringent quality and computational efficiency requirements in production systems.

Sumanth Dathathri, Pushmeet Kohli and colleagues have developed a scheme that uses a novel sampling algorithm to apply watermarks to AI-generated text, known as SynthID-Text. The tool uses a sampling algorithm to subtly bias the word choice of the LLM, inserting a signature that can be recognized by the associated detection software. This can either be done via a "distortionary" pathway, which improves the watermark at a slight cost of output quality, or a "non-distortionary" pathway, which preserves text quality.

The detectability of these watermarks was evaluated across several publicly available models, with SynthID-Text showing improved detectability compared to existing approaches. The quality of the text was also assessed using nearly 20 million responses from live chat interactions using the Gemini LLM, with results suggesting that the non-distortionary mode of watermarking did not decrease the text quality. Finally, the use of SynthID-Text has a negligible impact on the computational power needed to run the LLM, reducing the barrier to implementation.

The authors caution that text watermarks can be circumvented by editing or paraphrasing the output. However, this work shows the viability of a tool that can produce generative text watermarks for AI-generated content, in a further step to improving the accountability and transparency of responsible LLM use.

More information: Sumanth Dathathri et al, Scalable watermarking for identifying large language model outputs, Nature (2024). DOI: 10.1038/s41586-024-08025-4
Journal information: Nature

Provided by Nature Publishing Group