Apple researchers are advancing the field of ML through fundamental research that deepens the world’s understanding of this technology and helps redefine what’s possible with it. This research can lead to advancements in Apple’s products and services, and the benefits of our research are shared with the broader research community through publications, open source resources, and participation in industry and research community events. , extends beyond Apple’s ecosystem.
Next week, the 38th Annual Conference on Neural Information Processing Systems (NeurIPS) will be held in Vancouver, Canada. NeurIPS is the largest annual ML and AI research conference, and Apple is proud to once again participate and support this important event for the community with sponsorship.
At the main conference and associated workshops, Apple researchers will present many papers on a variety of ML topics. As highlighted below, this includes new initiatives such as advances in privacy-preserving ML, improving the capabilities of multimodal models, improving LLM pre-training, exploring the inference capabilities of LLM, and understanding self-supervised learning. Contains.
NeurIPS attendees will be able to experience demonstrations of Apple’s ML research at our booth (No. 323, West Hall A) during the exhibit. Apple also sponsors and participates in numerous events hosted by affinity groups that support underrepresented groups in society. ML community. A comprehensive overview of Apple’s participation and contributions to NeurIPS 2024 can be found here. Also, here are some highlights:
Promoting privacy-preserving ML
At Apple, we believe that privacy is a fundamental human right, and advancing ML technology to protect privacy is a key area of ongoing research. Papers Apple researchers will be publishing at NeurIPS this year include two related to federated learning (FL).
Researchers working on FL often experiment with simulations to quickly iterate on new ideas. Apple researchers announce pfl-research: a simulation framework for accelerating research in private federated learning. It is a fast, modular, and easy-to-use Python framework for simulating FL that allows the research community to make further advances in this subject.
Apple researchers also present “Private and Personalized Frequency Estimation in Federated Settings.” This describes a new approach to privately compute personalized frequency histograms using private federated learning. The frequency of personalized words (or tokens) helps predict the next word of keyboard input on the user device. This is difficult because most users have little usage data, and because users have diverse vocabularies, topics, and styles, the distribution of data is also diverse. This paper introduces a new method to discover and exploit similar subpopulations of users, and this approach is shown to outperform existing clustering-based algorithms.
Multimodal model improvements
Multimodal and multitask models are becoming increasingly powerful, but their effectiveness can be hampered by limitations in training data. At NeurIPS, Apple ML researchers introduce new ways to overcome these limitations and improve the performance of these models.
Pre-trained large-scale visual language models like CLIP have been shown to be generalizable, but for fine-grained classification (e.g. car model identification) where visual concepts are not well represented in the pre-training data. tasks such as this can still be challenging. . At NeurIPS, Apple ML researchers present aggregation and adaptation of natural language prompts for downstream generalization of CLIP. This presents a new method to facilitate learning to fine-tune CLIP when available annotation data is limited. AAPE (Aggregate-and-Adapted Prompt Embedding) extracts textual knowledge from natural language prompts (generated by humans or LLMs) to enrich underrepresented concepts in the model’s training data . This approach improves the downstream generalization of CLIP and provides superior performance in various visual language tasks such as image-to-text retrieval, small-shot classification, image captioning, and VQA.
Although multimodal and multitask foundational models like 4M show promising results, their ability to accept diverse inputs and perform diverse tasks is limited by the modality and task in which they were trained. At NeurIPS, Apple ML researchers and EPFL collaborators will present 4M-21: Any-to-Any Vision Models for Dozens of Tasks and Modalities. It shows how to greatly extend the capabilities of 4M by training it on dozens of advanced tasks. By performing joint training on large multimodal datasets and text corpora, a variety of modalities can be exploited (see Figure 1). The resulting model scales up to 3 billion parameters and exhibits strong out-of-the-box visual performance, arbitrary conditional and manipulable generation, cross-modal search, and multisensory fusion capabilities.
Improved LLM pre-training
LLMs are used in a variety of production applications, including some Apple services, and fundamental improvements to these models can have a significant impact on developers and their users across the industry. At NeurIPS, research Apple ML researchers will present includes new techniques for more efficient LLM pre-training.
LLMs are generally trained on a dataset of fixed-length token sequences. This is because LLM training infrastructures often support limited sequence lengths. To create these, documents of varying lengths are combined and split into chunks of a specified length. Because this approach combines documents randomly, the model may use the context of unrelated documents to predict the next token rather than using the context of related documents. This not only provides insufficient training signals but also consumes unnecessary computations. Apple researchers present Dataset Decomposition: Pretraining LLM with Variable Sequence Length. It addresses this problem with a novel method that decomposes datasets containing documents of varying lengths into a union of “buckets” or subsets with sequences of the same length. Then, during training, variable sequence length and batch size are used to sample from all buckets simultaneously (see Figure 2). This allows efficient pre-training on long sequences, scales effectively with dataset size, and has been shown to significantly improve model performance in standard evaluations.

Explore your LLM reasoning abilities
Although LLMs have been proven to work across many tasks, the extent to which today’s models can reason remains an important open research question. Understanding the current capabilities and limitations of these models not only allows the research community to continuously improve the models, but also helps developers leverage LLMs more intelligently in production applications.
At NeurIPS, Apple researchers will present “How Far Can Transformers Deduce?” “Globality Barrier and Inductive Scratchpad” is a paper that investigates why trans-based models struggle with tasks that require “global reasoning,” where learned concepts must be combined with extrapolation. This work shows that these models cannot efficiently learn distributions with high globality, and therefore cannot construct long chains of syllogisms (e.g., inferring a⇒c from a⇒b and b⇒c). ), and the paper introduces the idea of an “inductive model”. A “scratch pad” allows the transformer to exceed these limits.
About self-supervised learning (SSL)
Learning representations effectively and efficiently is a fundamental goal of deep learning, as these representations can be used for many downstream tasks. By advancing the field’s understanding of how different approaches learn representations, research in this area may ultimately lead to improved performance across downstream tasks.
At NeurIPS, Apple researchers will present “How JEPA Avoids Noisy Features: Implicit Bias in Deep Linear Self-Distilling Networks.” It explores the differences in how representations are learned by two major SSL paradigms: masked autoencoder (MAE) and joint embedding prediction architecture (JEPA). This study shows that in a simplified linear setting, where both approaches learn similar representations, JEPA learns “highly influential” features (i.e., features characterized by high regression coefficients). , which provides a formal explanation of the empirically observed phenomenon. In this area, JEPA seems to prioritize abstract features over fine-grained pixel information.
ML research demonstration at Apple booth
During exhibit hours, NeurIPS attendees can experience live demos of Apple ML research at Booth #323 in West Hall A, including:
MLX – An open source array framework designed for Apple silicon that enables fast and flexible ML and scientific computing on Apple hardware. This framework is optimized for Apple Silicon’s unified memory architecture and leverages both the CPU and GPU. At NeurIPS, the MLX demo demonstrates inference and training large models on-device using MLX. Specifically, fine-tuning the 7B parameter LLM on iPhone, image generation using large-scale diffusion models on iPad, and text generation using many large-scale language models on Macs with Apple silicon. .
MobileClip – A mobile-friendly image-text model family with a hybrid CNN/Transformer architecture. Combining these models provides the best tradeoff between accuracy and delay. MobileCLIP-B obtains state-of-the-art results for zero-shot classification and retrieval as well as understanding relational, attribute, and ordinal information. With NeurIPS, you can experience MobileCLIP performing zero-shot scene classification in real time on your iPhone.
Support for the ML research community
Apple is committed to supporting underrepresented groups in the ML community and is proud to sponsor several affinity groups that will host events at the NeurIPS 2024 venue. This includes Black in AI (December 10th workshop), Women in Machine Learning (WiML) (December 10th workshop). (December 10th), LatinX in AI (December 10th Workshop), Queer in AI (December 11th Workshop, Social on December 12th). In addition to supporting these workshops through sponsorship, Apple employees also participate in each workshop and others.
Learn more about Apple ML Research at NeurIPS 2024
NeurIPS is one of the largest and most important annual ML research conferences, and Apple is proud to once again share innovative new research and connect with the participating community at the event. The post above highlights just a small portion of the research that Apple ML researchers will be presenting at NeurIPS 2024, but a comprehensive overview and schedule of our participation can be found here.