Apple is taking a new approach to training AI models. This avoids collecting or copying user content from your iPhone or Mac.
According to a recent blog post, the company plans to continue relying on synthetic data (constructed data used to mimic the behavior of users) and discriminatory privacy to improve features such as email summary without accessing personal emails or messages.
For users who choose Apple’s device analytics program, their AI models compare messages like synthetic emails with small samples of real users’ content stored locally on the device. The device then identifies which of the synthetic messages that match the user sample most closely and sends information back to Apple about the selected match. Actual user data never leaves the device, and Apple says it only receives aggregated information.
This technique allows Apple to improve the model of long-form text generation tasks without collecting actual user content. This is an extension of the company’s long-standing use of discriminatory privacy, helping to introduce randomized data into a broader dataset and protect individual identities. Since 2016, Apple has used this method to understand usage patterns in line with the company’s protection policy.
Improvements to Genmoji and other Apple Intelligence features
The company has already used differences in privacy to improve features like Genmoji. This feature collects general trends about the most popular prompts without linking prompts to a particular user or device. In future releases, Apple will apply similar methods to other Apple intelligence features, including Image Playground, Image Wand, Memories creation, and writing tools.
In the case of Genmoji, the company anonymously votes for participating devices to determine whether a particular prompt fragment has been seen. Each device responds with a loud signal. Some responses reflect actual use, while others are randomized. This approach makes only widely used terms visible to Apple, ensuring that individual responses cannot be traced back to the user or device, the company says.
Curation of synthetic data for better email summary
The above method works well with regard to short prompts, but Apple needed a new approach to more complex tasks, such as summarizing emails. For this reason, Apple generates thousands of sample messages, and these composite messages are converted to numerical representations or “embedded” based on language, tone, and topic. Participating user devices compare the embedding with a locally stored sample. Again, only the selected matches are shared, not the content itself.
Apple collects the most frequently selected synthetic embeddings from participating devices and uses them to improve training data. Over time, this process will allow the system to generate more relevant and realistic synthetic emails, helping Apple improve the AI output of summary and text generation without obvious compromise on user privacy.
Available in beta
Apple deploys its systems in beta versions of iOS 18.5, iPados 18.5, and MacOS 15.5. According to Mark Garman of Bloomberg, Apple is trying to address the challenges of AI development in this way. This includes fallouts due to slower deployment of features and changes in leadership of the SIRI team.
It remains to be seen whether that approach actually results in more useful AI output, but it shows a clear general effort to balance user privacy with model performance.
(Photo by photo)
See: ChatGpt got another viral moment with the trend of “AI action figures”
Want to learn more about AI and big data from industry leaders? Check out the AI & Big Data Expo in Amsterdam, California and London. The comprehensive event will be held in collaboration with other major events, including the Intelligent Automation Conference, Blockx, Digital Transformation Week, and Cyber Security & Cloud Expo.
Check out other upcoming Enterprise Technology events and webinars with TechForge here.