Do you find it difficult to tell whether text is written by a human or generated by an AI? Being able to identify AI-generated content fosters trust in the information and helps prevent misdelivery or misdirection. This information is essential for dealing with issues such as: Today, Google DeepMind and Hugging Face are excited to release SynthID Text in Transformers v4.46.0, released later today. This technology allows you to apply watermarks to AI-generated text using a logit processor for generation tasks and detect those watermarks with a classifier.
See Nature’s SynthID Text paper for the full technical details of this algorithm, and Google’s Responsible GenAI Toolkit for details on how to apply SynthID Text in your products.
structure
The main purpose of SynthID Text is to display AI-generated text so that it can be determined whether the text was generated from an LLM without affecting the behavior of the underlying LLM or negatively impacting the generation quality. is to encode the watermark. Google DeepMind has developed a watermarking technique that uses a pseudo-random function called the g-function to enhance the LLM generation process. Watermarks are imperceptible to humans, but become visible to trained models. It is implemented as a generation utility that is compatible with any LLM without modification using the model.generate() API and is an end-to-end guide on how to train a detector to recognize watermarked text. Examples are also included. Check out the research paper that details the SynthID Text algorithm.
Watermark settings
The watermark is constructed using a data class that parameterizes the g-function and how it is applied in the tournament sampling process. Each model used must have its own watermark settings and must be stored securely and privately. Otherwise, your watermark may be duplicated by others.
All watermark settings require two parameters to be defined.
The key parameter is a list of integers used to calculate g-function scores across the model’s vocabulary. We recommend using 20 to 30 randomly generated unique numbers to balance discoverability and generation quality.
The ngram_len parameter is used to balance robustness and discoverability. The higher the value, the more easily the watermark is detected, but the more vulnerable it is to modification. A good default value is 5, but it should be at least 2.
You can further configure the watermark based on your performance needs. For more information, see the SynthIDTextWatermarkingConfig class.
The research paper includes additional analysis of how specific configuration values affect watermark performance.
Apply watermark
Applying a watermark is a simple modification to existing generation calls. Once you have defined your settings, you can pass the SynthIDTextWatermarkingConfig object as the watermarking_config= parameter to model.generate() and all generated text will include the watermark. Check out the interactive example of a watermark application on SynthID Text Space and see if you can figure it out.
from transformer import ( AutoModelForCausalLM, AutoTokenizer, SynthIDTextWatermarkingConfig, ) tokenizer = AutoTokenizer.from_pretrained(“Repository/ID”) model = AutoModelForCausalLM.from_pretrained(“Repository/ID”)watermarking_config = SynthIDTextWatermarkingConfig(keys=(654, 400, 836, one two three, 340, 443, 597, 160, 57…), ngram_len=5) tokenized_prompts = tokenizer((“Here is the prompt”)) Output_sequences = model.generate( **tokenized_prompts, watermarking_config=watermarking_config, do_sample=truth)watermarked_text = tokenizer.batch_decode(output_sequences)
Watermark detection
Watermarks are designed to be detectable by trained classifiers but imperceptible to humans. All watermark settings used in the model require a detector trained to recognize the mark.
The basic detector training process is as follows.
Decide on the watermark configuration. Collect detector training sets separately for watermarked and non-watermarked, training or testing. We recommend at least 10,000 samples. Use the model to produce output without a watermark. Use the model to produce watermarked output. Train a watermark detection classifier. Put the model into production with watermark settings and associated detectors.
A Bayesian detector class is provided in Transformers, along with an end-to-end example of how to train a detector to recognize watermarked text using specific watermark settings. Models that use the same tokenizer can also share watermark settings and detectors, so they can share a common watermark as long as the training set of the detector contains examples of all models that share the watermark. Masu.
You can upload this trained detector to your private HF hub and make it accessible throughout your organization. Google’s Responsible GenAI Toolkit provides details on how to productize SynthID Text within your products.
Restrictions
Although SynthID text watermarking is robust to some transformations, such as cutting out part of the text, changing a few words, or mildly paraphrasing, this method has limitations.
Applying watermarks is less effective for factual responses as there is less opportunity to enhance the generation without reducing accuracy. If AI-generated text is thoroughly rewritten or translated into another language, the detector’s confidence score can drop significantly.
SynthID Text is not built to directly prevent a motivated adversary from causing harm. However, it can be difficult to use AI-generated content for malicious purposes, and combining it with other approaches can provide better coverage across content types and platforms.
Acknowledgment
The authors would like to thank Robert Stanforth and Tatiana Matejovicova for their contributions to this study.