Getting the NDA out of the Way with Machine Learning

Posted by

Today’s edition of our blog post series, written by the speakers at the upcoming 2ML event, covers how the Dutch company JuriBlox B.V. applies ML to make the lives of legal professionals easier. This is a great example of how automating tedious and repetitive tasks with Machine Learning saves time for legal professionals that often need to process Non-Disclosure Agreement (NDA’s). It’s a well-defined task among many other productivity boosting applications transforming the legal industry.

This guest post is written by Arnoud Engelfriet, Founder at JuriBlox B.V. and author of the paper Creating an Artificial Intelligence for NDA Evaluation. To hear the full story, we invite you to join his session at 2ML on May 8-9.

Everyone in business has seen dozens of NDAs, also called Non-Disclosure Agreements, or confidentiality agreements. After ordering coffee, signing an NDA prior to negotiations or discussions is the most common act in business. For most businesspeople, an NDA is very much a standard document. However, from a legal perspective, that couldn’t be more wrong. Carefully reading NDAs for pitfalls is key. But who has time to do that for each NDA? Well, our Machine Learning Lawyer: NDA Lynn.

Actually, both businesspeople and lawyers are right. NDAs are routinely used to cover confidential exchange of any business information, from prototype designs to customer lists or proposals for new business ventures. Most NDAs are not reviewed as carefully as attorneys would recommend: it takes a significant amount of time and legal expertise to get an NDA just right and negotiated down to the last issue. So NDAs are routine documents that are perceived as standard, but in fact, they are custom documents that are unique.

Until the legal world comes up with a standard NDA for the whole world, the best course of action is to review each NDA carefully prior to signing. However, due to the high costs associated with a legal review, many businesspeople are somewhat hesitant to go this route. And for lawyers, the problem with reviewing an NDA is that it is mostly scanning for deviations from boilerplate text, which is extremely boring even for people whose job it is to review boring prose.

So, here we have a legal document that needs careful reviewing. The review consists of looking for standard patterns and deviations from the standard, and each review should produce the right output. Does that sound like a job for Machine Learning to you? Well, this is the premise on which NDA Lynn was built.

A challenge for Machine Learning on text documents is that text is unstructured. Legal documents do have clauses, headings and so on, but they are hard to recognize by a computer. Therefore, a two-step approach was used. In the first step, a Machine Learning model was developed to identify whether individual sentences in a document belong to one of some twenty-plus legal categories (e.g. purpose of NDA, duration of confidentiality, security obligations). Sentences in the same category can then be treated as a clause on that topic.

In the second step, a Machine Learning model specific to each category is deployed to determine the “flavor” or type of the clause in the category. For example, is this security clause strict, relaxed or intermediate. And with those flavors, it becomes possible to judge the NDA: if you are providing information, it’s bad to have a relaxed security clause as that creates a risk the information ends up in the wrong hands without the clause having been violated.

With these models, the remaining functionality of NDA Lynn follows quickly. A document is received on the NDALynn.com website, its text is extracted and each sentence is fed to the first model. Sentences in the same category are grouped, and for each group the appropriate “flavor” ML model is employed. Finally, a simple lookup table is used to determine the outcome: the customer is giving information and the security clause is relaxed, that is no good. As a result, NDA Lynn can judge any NDA (well, in English) with only one question to be answered in advance: are you providing or getting information, or both?

Surprisingly, the actual Machine Learning part is not that hard. The ensemble and neural network models of BigML proved very flexible and effective, and the easy-to-use interface made it a short point-and-click exercise to turn a training dataset into a complete model, ready to go at NDALynn.com. We could even use the models offline, giving lightning fast responses for each document.

Oh, did I mention those datasets? Any ML model is only as good as the data you put into it. That meant having to manually tag a lot of NDAs: what type of sentence is this, security or duration? And for each group of sentences (each clause), what flavor does it have? And yes, that is a lot of tagging. The current dataset comprises over 1200 Non-Disclosure Agreements, each between 30 and 60 sentences in length. But the effort was worth it: NDA Lynn now performs sentence classification with 94% accuracy, and its flavor models perform on average well over 90% too.

We just released our Business Edition, so it’s time to leverage the power of NDA Lynn to create your own NDA-reading lawyerbot for your organization. The Business Edition allows you to tune the lookup table: what is good, what is bad and where do you draw the line at the not-really-ok-but-not-dealbreaker-type clauses? Moreover, you get NDA document management, you can have the reviews sent to your company lawyer for a manual check and there’s even an API to connect Lynn to your e-mail or document management system. How’s that for the future of legal work?

Want to know more about how ML is used in the Legal Profession?

Join the #2ML18 event on May 8-9, 2018, in Madrid, Spain. Get your ticket today so you can meet all the speakers as well as the BigML and Barrabés.biz teams, the co-organizers of 2ML. We hope to see you there!

Leave a comment