CAMIA privacy attacks reveal what AI models remember

Researchers have developed a new attack that uncovers privacy vulnerabilities by determining whether data was used to train AI models.

The method, named Camia (Context-Aware Membership Inference Attack), was developed by researchers at Brave and Singapore National University, and is much more effective than previous attempts to investigate the “memory” of AI models.

There is growing concern about “data storage” in AI. In AI, models are carelessly stored and can leak sensitive information from the training set. In healthcare, models trained with clinical notes may incorrectly uncover patient information. In the case of businesses, if internal email is used for training, attackers can trick LLM into replicating communications from private companies.

Such privacy concerns have been amplified by recent announcements, such as LinkedIn’s plan to improve the generated AI model using user data, raising questions about whether private content could emerge in generated text.

To test this leak, security experts use membership inference attacks, or MIA. Simply put, MIA asks the model an important question: “Did you see this example while training?” If the attacker has a reliable understanding of the answer, it proves that the model is leaking information about the training data, poses a direct privacy risk.

The core idea is that when you process trained data compared to new, invisible data, the model often behaves differently. MIA is designed to systematically exploit these behavioral gaps.

Until now, most MIAs have had little effect on modern generation AI. This is because it was originally designed for simpler classification models that gave a single output per input. However, LLM generates a text token byte token, each new word being influenced by the words that came before it. This sequential process simply looks at the overall confidence of the text block and misses the dynamics of the moment when the leak actually occurs.

The key insight behind the new CAMIA privacy attack is that AI models’ memory is context dependent. AI models rely most heavily on memory when they are unsure of what to say next.

For example, given the prefix “Harry Potter is… written by Harry’s world…”, in the example below from Brave, the model can infer that the following token is “Potter” through generalization:

In such cases, confident predictions do not show memory. However, if the prefix is simply “Harry”, it becomes much more difficult to predict “Potter” without remembering a particular training sequence. Decreasing, high confidence predictions in this ambiguous scenario are much stronger indicators of memory.

Camia is the first privacy attack specifically tailored to exploit this generative nature of modern AI models. You can track how uncertainty in your model evolves during text generation and measure how quickly AI moves from “guess” to “confident recalls.” By operating at the token level, simple repetition causes low uncertainty and can be tailored to situations where subtle patterns of true memory are identified that other methods have missed.

Researchers tested Camia with Mimir benchmarks for several Pythia and GPT-Neo models. When attacking the 2.8B-parameter Pythia model with the ARXIV dataset, Camia nearly doubled the detection accuracy of the previous method. The true positive rate was increased from 20.11% to 32.00% while maintaining a very low false positive rate of just 1%.

The attack framework is also computationally efficient. With a single A100 GPU, CAMIA can process 1,000 samples in about 38 minutes, making it a practical tool for auditing models.

This work reminds the AI industry of privacy risks in training constant models on vast, unfiltered datasets. Researchers hope that their work will spur the development of techniques that will provide more privacy and contribute to the ongoing efforts to balance AI usefulness with basic user privacy.

See: True Productivity of Samsung Benchmark Enterprise AI Models

A banner for the AI & Big Data Expo event series.

Want to learn more about AI and big data from industry leaders? Check out the AI & Big Data Expo in Amsterdam, California and London. The comprehensive event is part of TechEx and will be held in collaboration with other major technology events. Click here for more information.

AI News is equipped with TechForge Media. Check out upcoming Enterprise Technology events and webinars here.

versatileai

See Full Bio

What's Hot

Pocket FM and OpenAI partner on content production: Rediff Moneynews

Gemini 2.5 Pro Preview: Even better coding performance

Build physical AI using virtual simulation data

Gemini 2.5 Pro Preview: Even better coding performance

Build physical AI using virtual simulation data

How NVIDIA builds open data for AI

Gemini’s Security Safeguard Advance – Google DeepMind

Wix Get 1 hour to expand generative AI capabilities and accelerate product innovation – TradingView News

Competitive programming with AlphaCode-Google Deepmind

Most Popular

Gemini’s Security Safeguard Advance – Google DeepMind

Wix Get 1 hour to expand generative AI capabilities and accelerate product innovation – TradingView News

Competitive programming with AlphaCode-Google Deepmind

Don't Miss

Pocket FM and OpenAI partner on content production: Rediff Moneynews

Gemini 2.5 Pro Preview: Even better coding performance

Build physical AI using virtual simulation data

Subscribe to Updates

What's Hot

CAMIA privacy attacks reveal what AI models remember

Related Posts