It’s a high-flying era for AI. However, with great automated power comes great potential for disastrous machine generated mistakes. When applying AI to the high-stakes domain of executed agreement management, data scientists have landed on a strategy for maximizing benefits while minimizing these risks.
The strategy is called grounding AI in human annotated contract data. It means using AI “to analyze and extract meaningful insights from contract documents that have been previously labeled and categorized by human experts through a process called "annotation," allowing the AI to learn and identify key details within contracts with greater accuracy and efficiency than simply scanning raw text alone.”1
Realities of AI in high-stakes domains
Recent revolutionary advances in AI have led to intensifying hype surrounding its ability to replace human capabilities with automated ones. In spheres where the consequences of AI generated mistakes remain relatively low – such as generating email first drafts or producing images from text descriptions – this vision of our near-term future might be feasible.
But the same futuristic picture gets murkier when we start rethinking how AI might be applied within complex, high-stakes arenas like medical,2 tax,3 or legal.4 Within any of these fields, one inaccurate, automatically-generated summary, or one hallucinated fact, or one machine-generated answer based on incomplete information – can quickly lead into negative, disastrous or even fatal consequences!
Unfortunately, it’s quite sobering to realize the potential for these types of costly mistakes cannot be eliminated by advances in AI models’ size and power alone. In fact, recent research has identified a worrying pattern wherein increasingly sophisticated AI models have shown a notable tendency to actually sound more convincing when generating incorrect information about difficult and complex subject matters.5,6
For high-stakes, complicated domains, this means that relying solely on the power of an AI model to support decision-making may unintentionally make it harder to detect consequential, machine-generated errors when they do occur. The upshot of this unavoidable dilemma? Whenever a strong priority demands that AI-supported insights and decisions are accurate every time – not just most of the time – the final burden of verifying machine-generated outputs will continue rolling out to the judgement of knowledgeable humans.
So, rather than deny or resist this reality, we at Knowable are opting to allow human decision-making to be the guiding light in the LLM’s understanding of executed agreements by developing three core principles for grounding AI capabilities within executed agreement management.
Three principles for grounding AI in annotated data
- Metadata filtering – Having the right set of documents is critical to ensuring the trustworthiness of any review or interpretation, be it AI or human generated. Although great strides have been made in automatic document retrieval using Retrieval-Augmented Generation (RAG),7,8 filtering, based upon key points of human annotated document metadata (such as a named counterparty, a document type, or an expiration date), ensures that you can retrieve complete and correct sets of target documents with verifiability that cannot be achieved through purely automated approaches that rely on raw document text alone. That’s why finding ways to incorporate metadata filtering as a first step for any application of AI has proven extremely useful in generating trustworthy and meaningful results.
- Reducing task complexity – AI is more likely to make major mistakes when asked to navigate complex tasks that push the limits of its reasoning and information capacities.9,10 Directly providing the AI with verified, annotated data unburdens AI from having to make more complicated (and potentially dubious) inferences and frees up AI’s capacities to focus on tasks for which AI is better suited and more usefully applied. Automatic summarization, translation of natural language queries, and question answering are examples.
- Lowering verification frictions – Even when AI generated results are consistently trustworthy, high-stakes domains still require end users to verify machine generated outputs for accuracy.11 You can substantially reduce the frictions involved in this verification process by comparing structured information and annotated data with AI generated outputs. Then, by presenting key points of pre-vetted, ground truth and other structured outputs within an easily digested format, users can more easily and efficiently evaluate the accuracy of AI results without having to sift through raw document text.
A foundation for grounded executed agreement
These principles lead the way for approaching a broader question: How can AI be reliably leveraged to support organizations in understanding their executed agreements? Examples of some of the most immediate applications for this grounded AI approach are contract search, document summarization/Q&A, and understanding contract families.
Contract search
Executed agreement populations can be sizeable, often ranging from 10,000s - 100,000s of documents. This has historically made searching for specific contracts within such populations a time-consuming task.
Grounded AI can greatly enhance users' ability to conduct contract searches.
By providing a chat interface through which users describe in their own words the documents they need to find, AI can translate those requests into a set of predefined metadata filters passed through a contract database to retrieve all documents within a population whose annotated data points align with those filters.
This process significantly reduces the complexity of automated document search by transforming natural language into a set of predefined, well-specified metadata fields. User confidence in the retrieved results can be further supported by displaying the AI generated metadata nearby so users can quickly confirm they are looking at the documents they intend to find (i.e. lowering verification frictions).
Document summarization/question answering (Q&A)
Having applied this process of AI assisted metadata filtering to reliably retrieve the complete and correct set of target documents a user needs, AI is well positioned to be leveraged further to automatically summarize and support chat-based question answering (Q&A) for individual documents that have been retrieved.
This process surfaces key points of annotated document information alongside the AI outputs so that users can efficiently compare the two. Doing this lowers the need to verify the accuracy of these generated responses and reduces task complexity by offering new ways to integrate annotated data into your AI summary and answer generation processes.
Understanding contract families
One of the most important but notoriously challenging aspects of understanding executed agreement populations arises from the highly complex ways documents can depend upon one another. Think, for example, of how radically a series of amendments and addendums can alter the original terms of a Master Service Agreement (MSA).
An even more difficult form of annotated data to obtain, even with specialists in the loop, is one that provides AI with human-verified information on the true relationships between documents in a contract family. However, once created, access to these connections completely eliminates the need to rely on AI to correctly guess the interdependencies between such agreements.
Therefore, annotated family data along with metadata filtering ensures that all known members of a contract family have been retrieved in the first place. At that point, AI can then focus on accurately summarizing the text across contract families in a way that tightly adheres to the basic truth relationship information it has been given.
Alongside these generated family summaries annotated information can be surfaced for each document in the family. This provides an additional layer of trust for users by showing them the same structure of relationships provided to the AI. This supports a vastly lower friction verification process than would have been possible if users had to manually sort through and structure individual agreements.
Staying grounded in the AI era
The overwhelming hype concerning the imminent replacement of humans with AI has been inescapable in recent years. Many people with practical knowledge and expertise in complex, high-stakes domains, however, have wisely remained skeptical about this view of our near future. In explorations and applications of AI within the consequential arena of executed agreement management, data scientists have arrived at a more grounded vision for the future of AI – one that prioritizes continued innovation in how human and machine capabilities can be combined to produce more powerful, efficient, and trustworthy outcomes than would otherwise be possible.
END NOTES
- Source: AI Overview online
- Exploring the Boundaries of Reality: Investigating the Phenomenon of Artificial Intelligence Hallucination in Scientific Writing Through ChatGPT References. Athaluri et. al. Cueres. April 11, 2023.
- Warning: AI for Tax Advice May Not Be a Fit. Gervais, Christine. Tax Practice News. June 27, 2024.
- Stanford University Human-Centered Artificial Intelligence. May 23, 2024. AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries.
- The more sophisticated AI models get, the more likely they are to lie. Krywko, Jacek. Ars Technica. October 4, 2024.
- Larger and more instructable language models become less reliable. Zhao et, al. Nature. 634. September 25, 2024.
- 6.AWS. What is RAG (Retrieval-Augmented Generation)?
- "Retrieval-augmented generation for large language models: A survey." Gao et al. arXiv preprint arXiv:2312.10997 (2023).
- Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models. Dahl, Magesh, et. al. Journal of Legal Analysis. Volume 16: Issue 1. June 26, 2024.
- Lost in the Middle: How Language Models Use Long Contexts. Liu, Lin, et. al. Transactions of the Association for Computational Linguistics, Volume 12. February 2024.
- Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy. Shneiderman, Ben. International Journal of Human-Computer Interaction. 36(6). March 23, 2020.
ABOUT THE AUTHOR
Lynette Shaw is Lead Data Scientist at Knowable, Inc. with specializations in statistics, NLP, machine-learning, and the application of GAI/LLMs to build business solutions. She was also a Founding Board Member of The Computational Democracy Project, a non-profit which stewards the open source platform, Polis, and supports its usage in democratic deliberative processes across the world. She holds a PhD from the University of Washington in Sociology with a concentration in Social Statistics, and as an Assistant Professor of Complex Systems at the University of Michigan, researched the origins of Bitcoin's economic value in online communities and used computational modeling to investigate the emergence of cultural dynamics from individual cognitive processing.