In late June, Microsoft released a new type of artificial intelligence technology that can generate its own computer code.
The tool called Copilot is designed to speed up the work of professional programmers. As they type on their laptops, it will suggest ready-made blocks of code for the computer that they can immediately add to their mobile devices.
Many programmers liked the new tool or were at least fascinated by it. But Matthew Patrick, a Los Angeles-based programmer, designer, writer, and attorney, wasn’t one of them. This month, he and a team of other attorneys filed a lawsuit seeking class action status against Microsoft and the other prominent companies that designed and published Copilot.
Like many cutting-edge AI technologies, Copilot developed its skills by analyzing vast amounts of data. In this case, I relied on the billions of lines of computer code posted on the Internet. Patrick, 52, equates this process to hacking, because the system does not acknowledge his debt to the work in progress. His lawsuit alleges that Microsoft and its collaborators violated the legal rights of millions of programmers who spent years writing original code.
The suit is believed to be the first legal attack on a design technology called “artificial intelligence training,” a method for building artificial intelligence that is poised to reshape the tech industry. In recent years, many artists, writers, pundits, and privacy activists have complained that companies are training their AI systems using data that doesn’t belong to them.
The lawsuit has echoes of the last few decades of the tech industry. In the 1990s and into the 2000s, Microsoft battled the rise of open source software, seeing it as an existential threat to the future of the company’s business. With open source becoming more and more important, Microsoft embraced it and acquired GitHub, which is the home for open source programmers and the place where they build and store their code.
Almost every new generation of technology — even online search engines — has faced similar legal challenges. Oftentimes, “no law or jurisdiction covers it,” said Bradley Hulbert, an intellectual property attorney who specializes in this increasingly important area of law.
The lawsuit is part of a wave of concern about artificial intelligence. Artists, writers, composers and other creators increasingly worry that companies and researchers are using their work to create new technology without their consent and without offering compensation. Companies are training a variety of systems in this way, including technical generators, speech recognition systems like Siri and Alexa, and even driverless cars.
Copilot is based on technology created by OpenAI, an artificial intelligence lab in San Francisco backed by $1 billion in funding from Microsoft. OpenAI is at the forefront of an increasingly widespread effort to train AI technologies using digital data.
After Microsoft and GitHub released Copilot, GitHub CEO Nat Friedman tweeted that using existing code to train the system is “fair use” of the material under copyright law, an argument often used by the companies and researchers who built these systems. But no court case has yet tested this argument.
“The ambitions of Microsoft and OpenAI go beyond GitHub and Copilot,” Patrick said in an interview. “They want to practice on any data anywhere, for free, without consent, forever.”
In 2020, OpenAI unveiled a system called GPT-3. The researchers trained the system using vast amounts of digital text, including thousands of books, Wikipedia articles, chat logs and other data posted online.
By identifying patterns throughout all of that text, this system learned to predict the next word in a sequence. When someone writes a few words in this “big language form”, they can supplement the idea with entire paragraphs of text. In this way, the system can write its own Twitter posts, speeches, poems, and news articles.
To the surprise of the researchers who built the system, they were able to write computer programs, having apparently learned from countless programs posted on the Internet.
So OpenAI went a step further, training a new system, Codex, on a new set of data stored specifically with code. At least some of this code came from GitHub, a popular programming service owned and operated by Microsoft, the tester later said in a paper detailing the technology.
This new system became the core technology for Copilot, which Microsoft distributed to programmers through GitHub. After testing it with a relatively small number of programmers for about a year, Copilot rolled out to all programmers on GitHub in July.
Many programmers who have used the technology at present said that the code produced by Kopilot is simple and may be useful for a larger project, but it needs to be pampered, scaled, and checked. Some programmers only find it useful if they are learning to code or trying to master a new language.
However, Patrick worried that Kopilot would end up destroying the global community of programmers who built the code at the heart of most modern technology. Days after the system was released, he posted a blog titled, “This assistant is stupid and wants to kill me.”
Known as an open source programmer, Butterick is part of a community of programmers who openly share their code with the world. Over the past 30 years, open source software has helped drive the rise of most technologies that consumers use every day, including web browsers, smartphones, and mobile apps.
Although open source software is designed to be shared freely between programmers and companies, that sharing is governed by licenses designed to ensure that it is used in ways that benefit the broader community of programmers. Patrick believes that Kopilot has violated these licenses and, as it continues to improve, will make open source programmers obsolete.
After publicly complaining about the issue for months, he filed his lawsuit with a handful of other attorneys. The lawsuit is still in its early stages and the court has not yet granted class action status.
To the surprise of many legal experts, Patrick’s lawsuit did not accuse Microsoft, GitHub, and OpenAI of copyright infringement. His lawsuit takes a different tack, arguing that the companies violated GitHub’s terms of service and privacy policies while also going against federal law that requires companies to display copyright information when using materials.
Patrick and another attorney behind the lawsuit, Joe Savery, said the lawsuit could eventually address the copyright issue.
When asked if the company could discuss the lawsuit, a GitHub spokesperson declined, before saying in an emailed statement that the company has been “committed to innovating responsibly with Copilot since the beginning, and will continue to evolve the product to best serve developers around the world.” Microsoft declined. OpenAI commented on the lawsuit.
Under current laws, most experts believe that training an AI system on copyrighted material is not necessarily illegal. But it might happen if the system ends up generating material that closely resembles the data it was trained on.
Some Copilot users have said that it generates code that looks identical — or nearly identical — to existing software, an observation that could become the central part of Patrick’s case and others.
Legal thinkers and regulators briefly explored these legal issues in the 1980s, before the advent of technology, said Pam Samuelson, a professor at the University of California, Berkeley, who specializes in intellectual property and its role in modern technology. Now, she said, a legal assessment is needed.
“It’s no longer a problem to play,” said Samuelson.