Apple has denied using an unethically collected dataset from EleutherAI to train its flagship artificial intelligence (AI) product, Apple Intelligence. However, they state they have used the dataset for another AI model.
After it was revealed this week that a company called EleutherAI used a dataset containing hundreds of thousands of YouTube video captions to create a dataset to aid in AI training, Apple spoke to Apple Insider, denying that EleutherAI’s ‘Pile’ was used to train Apple Intelligence.
However, they confirmed that ‘the Pile’ was used when developing the open-source OpenELM models released earlier this year.
What is EleutherAI’s ‘the Pile’?
EleutherAI is a non-profit organization that wants to make AI research and development more accessible to companies outside of the huge tech firms we see primarily working on huge AI models like OpenAI.
One of the ways they do this is by providing training datasets for large language models and other AI applications. However, instead of paying licensing fees to access data, or entering into partnerships to use data from sources, EleutherAI scrapes the web to obtain its data. This includes the captions from over 170,000 YouTube videos.
‘The Pile’ is the result of this – a huge corpus of unethically sourced training data is intended to lower the barrier to entry for smaller firms to enter the AI market. However, larger companies have also made use of the dataset.
What is Apple’s OpenELM?
Although they did not use ‘the Pile’ to train Apple Intelligence (and claim Apple Intelligence models were trained “on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web crawler,”) Apple has admitted to using it to develop their OpenELM models.
Apple released OpenELM in April. It was created for research purposes and is not used to power any of Apple Intelligence’s functions or features. Apple has told 9to5Mac that they have no plans to expand on OpenELM or release any further versions of the tool.
Featured image credit: Apple