Legal AI strategy
LLMs in legal: Pitfalls, oilfields, and best practices
Author: Damien Riehl (VP at VLex) – Source: Legal AI Strategies report by Henchman
In November 2022, OpenAI took a shot heard ‘round the world: LLMs overtook the legal zeitgeist. Accompanying that arrival, millions of lawyers screamed out in unison: “What does this mean for us?” The good news: This means good things for our industry. If we do it right.
LLMs are improving legal work, offering advanced capabilities that align with lawyers’ core competencies: reading, analyzing, and writing. LLMs can do all three with superhuman speed and post-graduate proficiency. But with great power comes great responsibility: We must ride the rocket ship without steering it into embarrassment, sanctions, or worse.
We’re in the opening minutes of a very long game. But even at this early stage, certain pitfalls, risk-mitigation methods, and best practices have emerged. Beware hallucinations (confabulations)! Reduce potential inaccuracy through “trust but verify”! To realize LLMs’ potential to transform the legal industry, we must harness the models’ superhuman speed and abilities, providing what we humans do best: judgment.
Characteristics of the Best LLM-Backed Tools
In this second year of LLM madness in the legal sphere, as the newest generations of foundational models (e.g., GPT, Claude, Gemini, LLaMA) leapfrog each others’ benchmark scores, the smartest minds in Legal Tech are coalescing around these themes:
Trust but verify. Let’s all repeat this mantra together: “To avoid hallucinations, ‘trust but verify’!” We’ve shamed myriad lawyers for failing to validate LLM-generated sources. But the best LLM-backed tools can help users mitigate such risks. Mature LLM-backed systems should make it simple for users to validate LLM outputs against established legal sources. Companies building these tools should leverage LLMs’ benefits, while also simplifying users’ verification process.
User control. Successful LLM systems should also encourage user-enabled experiences, giving users more control over research resources and offering contextual summarization. Users shouldn’t be limited to “take it or leave it” outputs.
Model agnosticism. The best legal LLM-backed tools should be model-agnostic, using the right model for the task. Various LLMs excel at different tasks, so tools that can leverage the strengths of multiple models provide superior performance and adaptability. As the competitive landscape of foundational models rapidly evolves, this agile approach ensures legal tools consistently deliver the most effective capabilities.
Lawyers aren’t the best prompt engineers. When Sam Altman was asked about the emergence of Prompt Engineering, Altman predicted that Prompt Engineering won’t be a thing — because when software builders do their jobs, users can simply ask a natural-language question and the system will always give you the right answer. No prompting necessary. The smartest Legal Tech companies should follow suit. If Legal Tech is doing its job, lawyers and allied professionals won’t need to be prompt engineers. Simply uploading a document will result in helpful insights – without prompting. Of course, if lawyers want to prompt engineer — much like some lawyers love crafting Boolean queries — they should also be able to prompt. But prompt engineering should be an option, not a mandate.
Pitfalls to Avoid
Legal LLM-backed tools should avoid these problems:
- Providing no insight into the output generation process
- Obfuscating sources and hindering verification
- Requiring users to (1) discover and (2) craft obscure incantations (prompts)
Best Practices
In contrast, the best LLM-backed legal tools should have:
- Easy source verification with frictionless “trust but verify”
- Individualized source summaries and confidence scores
- User control over the process
- Direct question-answering, not just chatbots
- Model agnosticism, using the best LLM for each task
- No prompting necessary: Tools provide the insights. Legal professionals bring their judgment and subject-matter expertise.
Legal Data is Oil; LLMs are Refineries
Law firms and law departments have vast Private Oil reserves. And the best legal-tech companies have Public Oil fields encompassing millions of legal documents: regulations, statutes, judicial opinions from dozens of countries. But crude does require refineries.
Refining the Oil. After extracting these legal documents, whether Private Oil or Public Oil, all oil should be refined (categorized, tagged, summarized) for better LLM processing. LLMs are able to do that refining — summarizing and tagging — at superhuman speed and quality.
Combining Oil. The smartest organizations are combining their Private Oil (e.g., contracts, settlements, know-how) with Public Oil (e.g., regulations, statutes, judicial opinions, motions, briefs, pleadings, public agreements). In our interconnected world, lawyers must be able to leverage global oilfields, providing legal insights that span their clients’ multi-jurisdictional needs.
Supercharging Oil. Silicon Valley’s most advanced companies are combining Symbolic AI (e.g., tagging, knowledge graphs) with Connectionist AI (e.g., LLMs, other neural nets), since Symbolic AI can provide benefits that LLMs alone cannot confer:
- Interoperability (e.g., pushing and pulling data from System 1 to System 2 through System 10)
- Analytics (e.g., “What percentage of our matters are ___?”)
- Higher precision and recall (e.g., tagging to avoid LLMs’ false positives and false negatives)
Each of the tasks above is impossible for LLMs alone, but with LLM-tagged data sources, all of the above can provide rich insights.
SALI. Where can you find good Symbolic AI tags — that’s 17,000+ well-curated tags that are free and open source, constituting the world’s most comprehensive legal data standard? Perhaps check out SALI, the nonprofit standard. Did I mention that it’s free? And open source? And really good.
Check out the XML file on the GitHub page, and browse WebProtege:
- USERNAME: saliuser
- PASSWORD: salilmss
Conclusion
Modern LLM-backed legal tools are transforming how legal professionals interact with data — the same data that might have been historically (1) neglected in firm repositories and (2) underappreciated in legal research databases. But armed with the knowledge that data is oil, the LLM-backed refineries are going full speed. By enabling easy “trust but verify,” providing wide jurisdictional reach, utilizing symbolic AI, and employing LLM ensembles, legal tools can offer insights that are both accurate and reliable. And the best of those tools can provide those insights programmatically — no prompting needed. Rich datasets lead to rich tools. And our profession’s most advanced tools are ushering in a new era of legal technology intelligence. We’re in a golden age.
We captured not only one vision about legal AI strategies, but nine. Discover more insights right from the experts and download the report below.