AI Disclosures Project: New Working Paper Released
The AI Disclosures Project today released findings from a new working paper, “Beyond Public Access in LLM Pre-Training Data: Non-public book content in OpenAI’s Models,” investigating the use of non-public, copyrighted content in LLM model training. The research team used the DE-COP membership inference attack method to analyze 34 copyrighted O’Reilly Media books to assess whether OpenAI’s models were trained on content that required payment or authorization to access.