

Google Facing Lawsuit Alleging Misusing Data to Train its LLMs
AI and Data Science Newsposted by ODSC Team July 18, 2023 ODSC Team

In a new lawsuit, Google is facing accusations of misusing personal data to train its Large Language Models that power its AI products. This lawsuit claims that the tech giant scrapes the data of millions of its users without their consent, violating copyright laws in the process of training their AI products.
The lawsuit will also target its parent company, Alphabet, and DeepMind. The firm Clarkson Law Firm, which filed a similar lawsuit against ChatGPT maker OpenAI last month, was filed in federal court in California. It claims that Google “has been secretly stealing everything ever created and shared on the internet by hundreds of millions of Americans”.
It also claims that products such as Bard were trained with said data. But the lawsuit goes further. It also claims that Google has “virtually the entirety of our digital footprint,”. This of course includes, “creative and copywritten works” to build its AI products.
In a statement to CNN. Google’s general counsel, Halimah DeLaine Prado, called the claims as layout by the lawsuit as “baseless“. He continued, “We’ve been clear for years that we use data from public sources — like information published to the open web and public datasets — to train the AI models behind services like Google Translate, responsibly and in line with our AI Principles,“.
Mr. Halimah DeLaine Prado continued to point out existing precedents governing usage laws, “American law supports using public information to create new beneficial uses, and we look forward to refuting these baseless claims,”.
The question of information scraping off the world wide web to be used to train AI models has been simmering for months. This has escalated as AI-powered tools and products have exploded in the marketplace. Questions related to copyright, privacy, and more have become growing concerns in circles discussing responsible AI.
Tim Giordano, one of the attorneys at Clarkson bringing the suit against Google, told CNN, “Google needs to understand that ‘publicly available’ has never meant free to use for any purpose,…Our personal information and our data is our property, and it’s valuable, and nobody has the right to just take it and use it for any purpose.”.
Giordano went on to separate the difference between Google’s search indexing, and how it takes data to train its model. In part he said of indexing, Google can “serve up an attributed link to your work that can actually drive somebody to purchase it or engage with it.”. On the other hand, when it comes to scraping, it is “an alternative version of the work that radically alters the incentives for anybody to need to purchase the work,”.
What the lawsuit is seeking from Google is a temporary freeze on commercial access to and the commercial development of Google’s generative AI tools. If this injunction is given, it could place Google’s plans of expanding their AI in 2023 on hold.
The lawsuit is also requesting unspecified damages and payments as financial compensation to people whose data the firm claims Google misused.