Contracts involving artificial intelligence (AI) often share similarities with other software agreements but also introduce unique considerations, especially when dealing with generative AI like ChatGPT. This article serves as a guide for lawyers, contract managers, and other contract drafters to navigate these complexities.
1. Ownership and Control of Customer Training Data and Prompts
General Problems with “Ownership” of Data
Contrary to popular belief, data or other information cannot be owned in a meaningful sense. Information cannot be patented, and copyright only protects expression, not the information itself. So, what does it mean to “own” data?
Not much, but not nothing. Your data may include trade secrets, and ownership terms might help protect them (though not as much as confidentiality terms). You could also claim a weak form of copyright in the compilation of your dataset. This would give you limited rights to prevent others from copying the compilation, but it would not prevent them from copying smaller points of information within the dataset. In other words, while you can claim data ownership, don’t rely on it for real protection. Instead, focus on contract terms that restrict the use of data.
For ownership itself, consider using language that protects the owner while maintaining a relatively balanced agreement between the parties. For example: “Party A claims ownership of the Data, and this Agreement does not transfer to Party B any title or other ownership rights in or to Data. Party B recognizes and agrees that: (a) the Data is Party A’s valuable property; (b) the Data includes Party A’s trade secrets; (c) the Data is an original compilation pursuant to the copyright law of the United States and other jurisdictions; and (d) Party A has dedicated substantial resources to collecting, managing, and compiling the Data.” Alternatively, a less balanced version could replace the first sentence with: “The Parties recognize and agree that Party A owns the Data.”
Prompts and Customer Training Data & Input Data – Customer Ownership & Control
If possible, the customer should claim ownership of its employee and contractor prompts, as well as any training data and input data it provides.
More importantly, the customer should add contract terms that restrict the use of prompts and training data. For example, for data not used to train the AI: “Provider shall not access or use any Prompt or Customer Input Data other than as necessary to provide the System to Customer.”
For both prompts and training data used to train the customer’s copy or instance of the AI: “Provider shall not access or use Customer Training Data (including without limitation Prompts) for any purpose other than to train and maintain Customer’s model or copy of the System.”
Prompts & Customer Training Data – Provider’s Rights
For the provider, customer ownership of prompts and customer data does not necessarily create a problem, nor does accepting limits on control.
However, prompts and customer data could include information that the provider also receives from a third party and needs to use. They could also include the provider’s own information, such as trade secrets, copyrighted text, and even patents or patent applications, assuming the customer’s staff has access to those materials.
To address this, the provider should take two precautions.
First, it should distinguish between assigning ownership and merely accepting that the deal doesn’t give it ownership rights – and avoid the former.
Second, the provider should clarify that any customer ownership does not extend to prompts or other data it independently receives or develops.
For example: “Customer’s rights in Section __ (Ownership & Restricted Use of Prompts and Customer Data) do not restrict Provider’s ownership of or other rights to information Provider independently (a) develops or (b) receives from a third party. Provider does not assign or license to Customer any right, title, or interest in or to Prompts or Customer Data.”
Customer Training Data and Prompts Used to Train Provider’s Products – Provider’s License
If the AI uses customer training data, prompts, or both to train the provider’s separate products and services, the provider should not necessarily object to customer ownership or control.
However, it needs protections like those listed above. Additionally, the provider needs clear rights to that training data.
Provider-friendly language might read: “Customer hereby grants Provider a perpetual, irrevocable, worldwide, royalty-free license to reproduce, modify, and otherwise access and use Customer Training Data (including without limitation Prompts) to train and otherwise modify the System and other Provider products or services, with the right to sublicense any such right to Provider’s contractors supporting such training or modification. Provider has no obligation to report on the use of Customer Training Data, except as set forth in Section __ (Nondisclosure).”
Provider Training Data – Provider Control w/o Customer Rights
The customer generally gets no access to training data supplied by the provider, nor does it need rights to use that data in addition to its right to the AI itself.
So the provider can draft simple, protective language: “This Agreement does not transfer to Customer any ownership of Provider Training Data or any right to access or use Provider Training Data.”
2. Ownership and Control of Outputs
General Problems with “Ownership” of Outputs
AI systems produce data as outputs, and ownership of data is problematic. The same goes for ownership of the more “creative” outputs from generative AI, but for different reasons.
The law will not grant a copyright or patent for works created by an AI or any other non-human. The law does offer copyright and patent protection to humans who use AI to supplement their own creativity. But as of today, no one knows how much the AI can contribute.
Ownership of outputs, then, could have some value for copyrights and patents, but it’s hard to say how much. It could also contribute to trade secret and trademark rights, which don’t need human creators. So contract terms about ownership might be worthwhile. But the would-be owner might get more protection by restricting the other party’s use of outputs.
Customer Ownership of Outputs
Usually, it’s the customer who wants ownership. “Provider recognizes and agrees that Customer owns and will own Outputs, including without limitation any intellectual property rights therein, except as set forth ___.” And the customer might want more: an assignment of any provider IP in outputs.
But what’s to assign?
The customer or its staff might have patent rights or copyrights in outputs, assuming an adequate human contribution. But the provider’s contribution almost certainly won’t involve human work.
So the provider could not own patents or copyrights related to outputs. And unless the AI processes the provider’s confidential data, the provider wouldn’t own trade secret rights. Nor would it own trademarks it hasn’t registered or used in commerce.
Any assignment terms, then, would result from a serious abundance of caution on the customer’s part.
Still, you never know. “Provider hereby assigns to Customer all its right, title, and interest in and to Outputs, including without limitation intellectual property rights, except as set forth ___.“
Provider Concerns – Refusing Output Ownership
In most cases, the AI provider should resist customers ownership of outputs.
In many systems (not just generative AI), outputs could include data or content owned or claimed by third parties or by the AI provider itself. The safest route for the provider, then, may be to say nothing about IP, leaving ownership to the underlying laws. Those laws might give the customer ownership, but not of anything that already has an owner.
The provider should also consider disclaimers related to IP. “PROVIDER OFFERS NO REPRESENTATION OR WARRANTY, EXPRESS OR IMPLIED, RELATED TO INTELLECTUAL PROPERTY OR OTHER RIGHTS IN OUTPUTS, AND CUSTOMER USES OUTPUTS AT ITS OWN RISK WITH REGARD TO ALL SUCH RIGHTS.”
Provider Concerns – Limiting Output Ownership and Retaining Rights
In some deal-types, the customer can and does insist on owning outputs. But the provider should still try to limit ownership terms to an acknowledgement with no assignment.
And with or without an assignment, the provider should add carve-outs. “The Customer ownership rights and the assignment in Section __ (Ownership and Assignment of Outputs) do not apply to Independent Assets.” With some systems, the provider can offer a clear list of the data or content included in these “Independent Assets.”
The provider knows the input data or training data the AI processes, so it knows what could appear in outputs. But the provider can’t offer much certainty about Independent Assets if the AI draws on very broad or hard-to-define data, like the Internet-wide training data used by some generative AI.
“‘Independent Asset’ could then be defined as any data or content created before or independent of the prompt that led the System to generate the Output in question.” Unfortunately, the customer almost certainly won’t know what parts of outputs were created “before or independent of” its prompt. So the example above doesn’t tell it what components of the outputs it owns. The customer might have to accept that as the price of complex AI. Or maybe creative attorneys can come up with better IP terms – nuanced terms tuned to the system in question.
License to Outputs
The customer probably doesn’t need a license to outputs, even if it doesn’t own them. The provider probably has no IP to license. And if the provider did have IP, a license would probably be implied.
But the abundance of caution that demands an assignment could demand a license too, particularly if the provider refuses ownership terms. “Provider hereby grants Customer a nonexclusive, perpetual, worldwide license to reproduce, distribute, modify, publicly perform, publicly display, and use Outputs, with the right to sublicense each and every such right. Provider grants the license in the preceding sentence under such copyrights as it may have, if any.”
Finally, the provider/licensor should consider disclaiming IP warranties related to outputs.
Use of Outputs – Restrictions and Rights for Each Party
As we’ve seen, IP ownership might not give the customer much protection related to outputs, even if it works.
So if the provider has access to outputs, the customer should consider restricting provider use of outputs. “Provider shall not: (a) publicize or distribute Outputs; or (b) access or use Outputs other than to maintain or improve the System or to support Customer’s use of the System. The preceding sentence does not restrict Provider’s rights to information from independent sources that also appears in Outputs.”
Of course, if the provider uses outputs to train the system for other customers’ benefit (which isn’t likely), it probably can’t accept these or other use restrictions.
Finally, the customer should consider protecting outputs through nondisclosure terms.
3. Trade Secrets and other Confidential Information Related to AI
Typical Prompts and Customer Training Data & Input Data – Nondisclosure Protecting Customer
If prompts or customer data could include sensitive information, the customer should consider nondisclosure provisions.
The “Confidential Information” definition in standard NDA terms would include prompts, customer input data, and customer training data. The provider can probably accept those terms if its AI makes no future use of prompts or customer training data (like most AI), or if it uses them only to train the customer’s copy (like many machine learning systems).
But the provider needs the typical exceptions to nondisclosure, which protect the recipient’s rights to information received from a third party or generated independently.
Prompts and Customer Training Data Used to Train Provider Products – No Confidentiality, No Trade Secrets
If the system uses prompts or customer training data to train the AI for all customers’ use, the provider should not accept confidentiality terms protecting that information.
In fact, it should consider disclaimers and warnings. “CUSTOMER RECOGNIZES AND AGREES THAT: (a) PROMPTS AND CUSTOMER TRAINING DATA ARE NOT CONFIDENTIAL AND MAY BE USED TO TRAIN THE SYSTEM FOR THE BENEFIT OF PROVIDER’S OTHER USERS AND CUSTOMERS; AND (b) INFORMATION GIVEN TO PROVIDER’S OTHER USERS AND CUSTOMERS MAY INCLUDE INFORMATION FROM CUSTOMER’S PROMPTS AND CUSTOMER TRAINING DATA UPLOADED PURSUANT TO THIS AGREEMENT.”
In that case, the customer should keep sensitive information out of prompts and training data.
Typical Outputs – Nondisclosure Protecting Customer + Trade Secret Precautions
If the provider can accept nondisclosure terms protecting prompts and customer training data, it can probably accept them for outputs.
The usual caveats about information developed independently should protect the provider. Among other things, those caveats should protect provider rights to its own information that winds up in outputs, like information from its training data.
For the customer, nondisclosure terms offer the chance to establish trade secret rights in new information generated by the AI. And of course, those terms should protect existing customer secrets that find their way into outputs through prompts and training data.
But the customer should take an additional precaution, outside the contract, if outputs could include sensitive information. The customer should treat outputs like trade secrets. It should make sure its staff keeps outputs confidential, at least until it can establish which outputs include potential trade secrets.
Outputs from Systems that Reuse Prompts and Customer Training Data – Options for Provider
The fact that AI reuses prompts or customer training data doesn’t necessarily rule out nondisclosure terms protecting outputs.
The provider still might not use or even see outputs. So nondisclosure could work, again with the usual independent development caveats.
But some providers can’t promise nondisclosure. Their systems use outputs as training data. Or they run open-door AI systems (maybe B2C): software that shares incoming and outgoing information.
Provider’s Training Data and Other Inputs – Nondisclosure Protecting Provider
Some software processes sensitive information supplied by its vendor/provider, and AI is no exception.
If the system involves machine learning, outputs could include sensitive information from provider-supplied training data or input data. In that case, the provider needs nondisclosure promises from the customer. The usual caveats about independent development should protect the customer’s rights to preexisting and third party information, including information it includes in prompts.
But nondisclosure still greatly limits the customer’s use of outputs. For some deals, the customer might need detailed terms on permitted use of confidential information.
The following, for instance, could work for AI that helps set prices. “Customer may: (a) use Confidential Information in Outputs to set prices for its own purchase or sale of Priced Assets (“Internal Use Transactions”) but not for any transaction that does not directly involve Customer as a buyer or seller; and (b) disclose Confidential Information in Outputs to its contractor assisting Customer with Internal use Transactions, subject to Section __ (Contractor NDAs).”
Of course, customer nondisclosure obligations would complicate customer ownership or exercise of IP rights in outputs.
Provider’s Sensitive Information in Models and Other Tech – Nondisclosure Protecting Provider
Use of software and computer systems can give the customer access to the provider’s sensitive information.
Again, AI is no exception. If so, the provider needs nondisclosure terms protecting its technology.
The provider’s “Confidential Information” might then include AI software, models, documentation, user interfaces, and just about any information available when the customer logs in.
4. Errors in Outputs
Errors in Outputs – Provider Disclaimers
Software vendors often warrant or otherwise promise that their systems will work according to a set of specifications.
In most cases, that promise addresses the operation of the system, not its outputs or other results. The system could work but produce bad results because of customer error or bad data.
AI providers draw the same line, and they have an extra incentive to do so. AI can “work” but still produce bad outputs because of problems with prompts and input data from the customer. And generative AI and other machine learning systems can produce bad outputs because of issues with training data, which could come from the customer, the provider, or both.
So an AI provider should take two precautions.
First, consider a broad disclaimer: “Except as specifically set forth in Section __ (Compliance with Specifications), CUSTOMER ACCEPTS ALL OUTPUTS ‘AS IS,’ WITH NO REPRESENTATION OR WARRANTY WHATSOEVER, EXPRESS OR IMPLIED.”
Second, review the AI’s specifications to make sure they don’t offer promises about outputs – or at least inappropriate promises.
Errors in Outputs – Customer Response/Concerns
The customer should consider pushing back against total disclaimers related to outputs.
Ask questions that test what the provider can promise. What DO you promise about outputs, regardless of underlying weaknesses in the data?
Based on the provider’s answer, consider creating or citing specifications that specifically address outputs, and use them for performance promises. “Provider warrants that Outputs will conform to the requirements of Section __ (Output Specifications).”
Also, explore whether the provider can stand behind the quality or accuracy of its training data or input data and, as a result, for the quality of outputs.
Obviously, the provider can’t promise accurate outputs for a generative AI system trained on massive data drawn from the Internet, with all its errors and issues – or on data provided by the customer. But most AI draws on smaller datasets. In some cases, promises about output quality could work.
5. Liability Related to Third Parties: Indemnities and Warranties
Third Party IP and Privacy in Outputs – Provider Disclaimers
Customers typically want IP warranties and indemnities covering their use of software. When those terms address the customer’s right to reproduce or use AI software, they don’t raise unusual concerns, though that doesn’t mean the provider will grant them.
However, IP warranties and indemnities related to outputs, as opposed to the use of software, often do raise unusual concerns. The same goes for indemnities against privacy suits related to outputs, as well as warranties that outputs won’t include personal information (PI). Outputs could rely on prompts and input data from the customer. And machine learning outputs could rely on training data from the Internet, third parties, or again the customer.
Any of those sources could import content subject to third party IP or privacy rights. So, in those cases, the provider should avoid IP and privacy warranties and indemnities related to outputs.
In fact, it should consider all-caps general disclaimers and consider adding more specifics: “PROVIDER DOES NOT REPRESENT OR WARRANT THAT OUTPUTS WILL BE FREE OF CONTENT THAT INFRINGES THIRD PARTY RIGHTS, INCLUDING WITHOUT LIMITATION PRIVACY AND INTELLECTUAL PROPERTY RIGHTS.”
The provider’s argument: This risk is inherent in use of our type of AI, and the parties should share it. Some providers, in fact, go further. They grant no IP or privacy indemnities whatsoever, even related to the use of their software.
Third Party IP and Privacy in Outputs – Customer Response re Sources
Before accepting the provider arguments above, the customer should ask a question. If outputs could reproduce data from third parties, are those third parties the provider’s suppliers? If so, can’t the provider take responsibility for that data?
That gives the customer an argument for IP and privacy indemnities and warranties covering outputs. However, the provider still might not offer those terms if the system also relies on customer training data or other input data.
In generative AI and some other systems, the parties would never know which side’s data led to the output. So, they’d never know whether the warranty or indemnity applies.
Third Party IP and Privacy in Outputs – Customer Response re Use of Data
Again before accepting the provider’s arguments that the risk is inherent in use of their type of AI, and the parties should share it, the customer should ask: Do the system’s outputs actually reproduce training data or other information from the customer, third parties, or the Internet? Or does that information just guide the creation of outputs, without actually appearing within them?
If so, the customer again has an argument for IP/privacy warranties and indemnities related to outputs.
But think this through before making that argument. The provider might request the same warranties and indemnities from you, the customer – in this case about IP and privacy related to customer-provided prompts and data.
Defamation, Discrimination, and Similar Torts by Outputs
Outputs could harm third parties in other ways.
Generative AI Outputs sometimes defame third parties. And much AI can produce outputs that encourage ethnic, gender, religious, or other discrimination. (AI-guided hiring systems, for instance, have discriminated on the basis of race and gender.) So, the customer should seek warranties against those errors, as well as indemnities against resulting third party lawsuits.
However, those terms raise the same set of issues as third-party IP and privacy warranties and indemnities, discussed above. So, the provider should resist those requests.
Third Party Rights in Customer-Provided Data – Provider Concerns
A cloud service customer could upload infringing, private, defamatory, or otherwise harmful content to the provider’s computers. If so, the (innocent) provider might face liability for hosting or publicizing customer content. AI providers face those risks too, of course, when they provide their software via the cloud.
So, the provider should seek warranties and indemnities related to prompts, customer input data, and customer training data. “Customer warrants that: (a) it has and will collect Customer Data in compliance with all applicable laws, including without limitation laws on intellectual property, privacy, and disclosure of personal information; and (b) it has and will obtain such intellectual property licenses and other consents as are required by applicable law for Provider to access and use Customer Data as authorized by to this Agreement.”
This request, of course, reverses all the issues discussed above with the customer arguing that it can’t be responsible for certain data.
6. Security, Privacy, and Responsible Use
Security and Privacy Terms in General
AI raises the same security concerns as other software, along with a few unique issues.
For instance, some AI systems access unusually large datasets. So, they risk high impact data breaches. Also, AI processes are often invisible and impossible to reconstruct. So, it’s hard to know whether the system has been tampered with or whether it’s misused personal information (PI).
The parties should start by identifying system vulnerabilities. From there, the customer should ask for security-related specifications, as well as warranties and indemnities related to data breach.
In some cases, the provider should ask for those same terms from the customer – particularly where the provider hosts the AI and makes it available through the cloud. The provider would request terms on customer protection of its own computers, assuming they can access AI. The provider might also request promises that the customer has protected the security of input data and training data.
Of course, even as it requests those provisions from the other party, each party should consider also protecting itself through the opposite set of terms: security-related disclaimers. “PROVIDER DOES NOT REPRESENT OR WARRANT THAT THE SYSTEM WILL BE FREE FROM THIRD PARTY INTERFERENCE OR OTHERWISE SECURE.”
Personal Information in Prompts and Other Customer Data – DPAs, SCCs, BAAs, etc.
If the customer gives personal information (PI) to the provider, privacy law may require that the parties execute a data protection addendum (DPA) or some other set of data terms.
For instance, if customer prompts, training data, or input data include “protected health information,” then HIPAA requires that the parties execute a business associate agreement (BAA). If the customer’s data includes European PI, then GDPR may require the execution of “standard contractual clauses” (SCCs) before the data moves to the U.S. or certain other jurisdictions.
Various U.S. state laws may require special terms too. In most cases, the law imposes these obligations on the customer. It’s the data controller (probably), and it’s required to get the necessary contract promises from the data processor: the provider. But some privacy laws impose contracting rules directly on the processor.
In that case, the provider violates the law if it doesn’t sign onto the required contract terms.
Check the law(s) in question.
Personal Information in Outputs – DPAs, SCCs, BAAs, etc.
Personal information could flow in the other direction, from provider to customer. Could outputs include PI from the provider’s training data or input data?
If so, privacy law may consider the provider the data controller and may require that it secure a DPA or other contract terms from the customer.
Requirements for Responsible Use – AUPs and Other Conduct Terms
Because AI can do so much harm, providers should consider codes of conduct for their customers. Often, a typical acceptable use policy (AUP) will do the trick which, for instance, forbids harassment, defamation, violation of privacy and IP rights, hacking, and fraud.
But the provider should also consider terms specific to its form of AI. A machine learning provider, for instance, might add the following to its AUP: “Do not use the System: (a) to reverse engineer AI outputs in order to generate underlying information, including without limitation training data (model inversion attacks); or (b) to generate, transmit, or otherwise manage fake or intentionally misleading training data for any AI system.” And the provider might go further: “If outputs generated by the System include material that would violate the AUP, do not distribute or publicize those outputs, and do not use them in any way that could cause harm to a third party.”
For its part, the customer should consider demanding similar codes of conduct governing future use of prompts and customer training data, assuming the system uses that information to serve other customers. “Provider warrants that it will not authorize use of a Further-Trained Model (as defined below) by a third party that does not first agree in writing to conduct restrictions consistent with those of Attachment __ (AUP). (‘Further-Trained Model’ means any artificial intelligence software trained on Prompts or Training Data provided by Customer pursuant to this Agreement.”
Summing It Up
These points provide a comprehensive overview of the issues surrounding ownership and control of customer training data and prompts in AI contracts.
It is important for both parties to carefully consider these issues and negotiate appropriate terms to protect their interests.