George Socha chats with Jon Lavinder of Epiq
Each week on ACEDSBlogLive, I chat with an eDiscovery leader. For the Oct. 2 session, I sat down with Jon Lavinder. Jon is Director, Technology-Assisted Review at Epiq, where he leads that firm’s analytics, predictive coding and TAR practice. Our discussion ranged from lessons learned the hard way during the early days of technology assisted review, to how he has been able to make effective use of and then operationalize TAR, to a new service offering Epiq will be rolling out.
Recorded live on October 2, 2020 | Transcription below
GEORGE SOCHA: Good morning, afternoon and evening. I am George Socha, Senior Vice President of Brand Awareness at Reveal. Each Friday morning at 11 a.m. Eastern, we bring to you a new episode of ACEDS Blog Live, where I get a chance to chat with luminaries in eDiscovery and related areas. This week our guest is John Lavinder. Jon leads Epiq’s practice around analytics, predictive coding and TAR. He has a long and storied tenure in this industry. I guess you could say all with one organization…
JON LAVINDER: That’s right.
GEORGE: …or with at least three separate organizations. Today, Epiq, before that, DTI, going all the way back to EED. Which, by the way, was, as far as I know, the very first eDiscovery provider anywhere. So, you can’t have a longer lineage than that. Welcome, Jon.
JON: Thank you, George. Working with John Jessen a little bit. Yeah, that was a long time ago, but yeah, through acquisition and merger, etc., it’s been a lot of different companies, but really also kind of the same company. And George, I have to say, we were talking in the green room a little bit about the current situation with conferences all going virtual. And the separation between people, and I just have to say, thank you for putting this on. Any of these types of things where we’re able to connect a little bit better with our colleagues, I think is much needed and very much appreciated.
GEORGE: I agree. And thanks to ACEDS for making this possible because, as we were discussing in the green room before, the days are gone, and who knows when they will be back when we get a chance to actually sit down in person the way we did and try as hard as we might with online conferences? None of us, I think, has yet cracked the code on making it feel anything close to the way the in-person did. But these types of one-on-ones help a lot. I want to make this more than a one-on-one, though. We have, for our audience members, the ability to type in comments. You put in the comments, we can see them. I will try to monitor them, but Jon is going to be watching them as well. And his goal, I think, is to make sure I never get a chance to bring up or respond to a comment before he gets to it first.
What we’d like to talk about today is, of course, the things Jon, that you focus on. The use of analytics and predictive coding, TAR, model libraries, artificial intelligence in litigation and investigations. Your experience directly on those matters and some of the experience you’ve had working with the technology throughout your tenure.
Early Days & Hard Lessons
JON: Yeah, we’ve been working with these technologies for a long time. I think it was back in 2011 that we can identify the need for a more formal response, a team and a service that specifically addressed predictive coding, technology assisted review. Before that, it was all kind of boutique consulting offerings, kind of one offs. But we put together a team, identified technology, workflow, operationalized it back in 2011. And kind of an interesting story, I initially was just helping identify the technology and stand the service up and handing it off to a different leader. And we had a couple of people take the lead of this service and it just kind of didn’t work out, and asked me to take over just temporarily. And I’m still here, so it’s actually been a really rewarding and an experience that I’m really grateful for. And really it’s, I’ll say 100% about the people I’ve been able to work with, the really great team we’ve assembled.
But in all honesty, George, we have made over the years many, many mistakes. And we’ve learned from those mistakes, hopefully we’ve only made those mistakes once or twice. But, you know, we definitely have developed a lot of best practices that we’ve put in place to ensure that these projects go as well as possible.
GEORGE: So not to put you on the spot, but what have some of the more dramatic mistakes been? The ones from which you’ve developed the greatest scar tissue, learned the most and ultimately benefited the most?
JON: Yeah. I mean, honestly, the biggest mistake I think that we’ve made early in the process, was really just fundamental project stuff about communicating and documenting and making sure that everybody was on the same page, and roles and responsibilities. Really just standard, mundane project stuff. The technology works if you apply it correctly. If you have the right kinds of things you’re looking for, etc., there’s no question about that. But the real mistakes we made, the big stumbles that we made early, we would have a great outcome from a technical perspective. But the client was ready to fire us. Or the client was okay, the direct clients we were working with was okay, but their boss, the partner above them, didn’t know what was going on and thought it was a failure. So we had a lot of technical successes but project, I’ll say, failures.
And so we’ve done a lot to tighten up communication, documentation, setting expectations, making sure everybody knows what their role in the whole process is. Things of that nature. Really kind of mundane project stuff, you know, that is essential for making any project successful. Or, you know, if you skip those steps, risking failure. But yeah, a lot of other little things just specific to analytics and TAR and really the predictive coding in particular that we’ve learned, I’ll say, is just up front, the very first step is just doing a little random sampling. And in the very, very early days we would do that in the form of, we were using kind of a vanilla predictive coding workflow. We would do that just en passant because we were doing a control set right upfront. And a control set is a random sample. It gives you the same kind of information, the richness, the overall percentage of relevant documents in the population, etc. And it gives you this ability to measure your progress as you go.
Finding the Story – NimblyBut, one thing that we find quite common is that a lot of times people don’t really know what they’re looking for on day one. And the idea of what they’re looking for, that kind of how they’re judging documents, shifts quite dramatically from the very beginning. From day one to like day seven or day 15 or whatever the duration of that predictive coding exercise is. And so, we found that there was this topic shift. And doing a control set up front is just not the best idea. We always do it at the end now, if we do it all, because some of the workflows don’t require it. But a small random sample gives you just an enormous amount of information. Enables you to estimate, for example, the number of reviewers you’ll need if you’re gonna review all of the relevant documents. It gives you an idea of the rough budget, the timeline, which workflow is best. If it’s a 1% richness, maybe a continuous active learning workflow is going to knock this out in a day or two. If it’s 20% richness, maybe it’s a standard active learning workflow with just a screen for privilege is the right idea. So, I just can’t underestimate or understate the value of that small, random samples. Just like a few hours of work right upfront, and you just have this wealth of information.
GEORGE: So one of the workflows that I see, or I have seen people use so much when it comes to predictive coding or the use of technology in the review process if you’re going to do anything other than the linear review, is that they jump right into the predictive coding stage. They try to find a few examples of what they’re looking for, and then they use the technology to try to line up more like that. Sounds like you’re talking about a different model, one that at least I think of, and tell me what you think about this, an overly simplistic explanation of it is to break this process into two parts. The first part, I’m gonna steal from public radio, tell me something I don’t know. Tell me something I don’t know. Use the technology to help find the things I don’t even know I need to know about there. And then as you get a better understanding, because you don’t know at the beginning of a matter what’s really going on, is you get a better understanding. You reach a point where now you’ve found things that you do care about and you want to dig in more deeply and that’s, find more like this. So you start out, tell me something I don’t know. And then eventually near the end, you move to find more like this. That sounds akin to the methodology that you’ve developed over time.
JON: Yes and no. The idea of just jumping in and starting review and having the system work in the background to prioritize documents, that is absolutely valid. And I don’t have a problem with that, George, that is something that we do all the time. One thing we’ve actually done is, early days we had the TAR team set up kind of a separate silo. And today we’ve kind of operationalized it and we’re embedded in, actually, the manager review group. And we’re trying to press this TAR DNA into this TAR concept, into the very DNA of review. And so our review managers can actually do this on their own. They can, if they’re reviewing, they can use continuous active learning to make sure that review is as efficient as possible. But this random sample is just a way of, because the problem with continuous active learning, I found early days, is that there is a lot of ambiguity. And not every case team loves ambiguity you know, when you’re defining the process. For example, you know this expression, I think Maura Grossman described, continuing until the point of diminishing returns. You know, not every attorney is comfortable with that as a stopping point. Sometimes they like to define it very precisely. They like to create a rule that defines when they’re going to stop before.
So, the same is true generally with continuous active learning. Let’s say you have 100,000 documents. We’re gonna do a continuous active learning review. The teammates say, you know, what is this going to cost? How long is it going to take? How many reviewers should we get? If you’re just going in with a vanilla continuous active learning workflow, just let’s start reviewing, maybe we’ve got 10 C documents. Just start reviewing and we’ll go to the end. That might be 30,000 documents in, it might be 10,000 documents in, it might be 70,000 documents in. That’s a lot of ambiguity that can be easily cleared up with just like a small random sample. If the prevalence is 15% then you have a pretty good idea. And I’ll say that there’s a prevalence multiplier. I think John Tredennick talked about this in the book he published. And that is that the number of documents you review, as compared to the actual percentage of relevant documents you identify in the start? So let’s say it’s 10%. How many documents do you expect to review? And I would guess, you know, 2.5 would be a really good result. So, if you have 10,000 documents that are relevant in the population, when you expand that to families and some false positives, you may end up reviewing about 20-25,000 documents.
Finding Best of Breed
GEORGE: A good rule of thumb to keep in mind. One of the things we chatted about earlier when we were talking about this call was, sort of your experience with the selection and use of tools for these capabilities over the years, going back at least to 2016. Care to tell us what that life cycle has looked like for you?
JON: Yeah, so in the early days, we relied heavily on Equivio, which was a very adequate system. A little bit simplistic, compared to the tools we have today, but quite effective. In early 2015, Microsoft acquired Equivio, they really immediately began moving out of the market. And so we started looking around for, what is that next platform? And we’ve talked a little bit this morning already about continuous active learning, that’s just something new in terms of workflow. And that’s not something we could support very easily with some of these older tools. So we’re looking for two things. One is a TAR platform to be our kind of flagship platform, and the second was a platform that could support, more effectively, the variety of workflows that our clients are asking for, including continuous active learning.
So we did a pretty exhaustive survey of available technologies and platforms, etc. And this took, I want to say, about three or four months. We involved a lot of different teams within Epiq. Operations and plan services and etc., even our development team, our architects were involved, kind of kicking the tires. And, as you know, you know the end of this story, we actually selected NexLP as our flagship platform. We do have a lot of other tools in our tool belt. We have clients that have preferences for various things, but when it comes to our kind of flagship technology, it’s NexLP for predictive coding. And it also has really great investigative capabilities, so really two different tool sets. And those investigative capabilities we found to be quite different in many ways, from the other tools in our tool kits. It was complementary and additive. Rather than just being, they’ve implemented clustering in a different way, it’s they have added ways to kind of decipher sarcasm from sincerity. I mean, really quite interesting capabilities.
A New Service Offering in the Offing
GEORGE: How successful have you been with the use of those capabilities?
JON: Investigative capabilities?
JON: So when we have clients that use them and really dive in, roll up their sleeves, and learn how to use the tools, we’ve been very successful. So internal compliance officers for corporations find these to be extremely effective. They’re also some quite, probably well known to you, attorneys who are quite on the cutting edge of learning new tools, learning new capabilities, who have been very successful. And so in very small pockets, we have identified great successes using these investigative tools where we’re just providing the platform, we’re helping train the case team. But there is, I’ll say there’s quite a gap because there are quite a few times where the case team just doesn’t have the time or the right people, or just the inclination to learn a new technology. There is kind of a little bit of a learning curve at start. And to address that, actually, we are starting to branch out and provide consulting services around diving into the data itself, doing some of the early case assessment. Not just here are the file types and accounts, and here’s some ways to streamline this. But actually looking into the substantive issues of a matter and providing hot documents, things of that nature. So really bridging that gap when a case team comes in, yeah.
GEORGE: So this is the new service offering you’re piloting, that you were mentioning to me before?
JON: Yeah, we have a couple of new service offerings on the horizon related to new technologies, new AI services. And one of these is actually using some of these investigative capabilities that are in these platforms, NexLP in particular. And providing a much richer ECA kind of, picture early in the process. And you know this is something that we are piloting right now, and I’m sure will be announced very shortly for a general availability.
GEORGE: So, if someone wants to be part of your pilot? Are you interested?
JON: Yeah. Absolutely.
GEORGE: What do they do?
JON: Yeah, they can contact me directly and I will get them set up with the right people. That’s it. And it’s really, we have really smart people, bright people who are already working with these technologies. And so, there is a need in the market to go beyond just consulting and guidance and workflow, and actually some hands-on help with a particular matter. How do we find the key documents related to this, using all of these advanced tools? And you’ve probably seen these tools, they’re quite shockingly effective. And they’re really advancing at a quick pace. So, unlike some of the other technologies that we’ve seen, predictive coding hasn’t changed a lot in the last 10 years. But, some of these other features, like the natural language capabilities and just some of these other capabilities are really emerging, I’ll say they’re just evolving quite quickly.
GEORGE: So, like natural language processing, identifying content and images and taking that content, building libraries that you can rinse and repeat, use over and over again. Are those the types of capabilities?
JON: Yeah, exactly. Yeah, so that’s an interesting point you bring up George, so the idea of a pre-built predictive coding model. This is something that is starting to emerge. And I know NexLP has a whole library of these available. And these really just capture simple things that are quite common. Like, if you’re doing a search for harassment, you can download and apply a pre-built model that will identify things related to harassment. And you could do a little bit of training to customize that for your particular actors or custodians, and data, to kind of really custom fit it. But right out of the gate, it’s gonna find a lot of things that are interesting in that area. Another kind of, pre-built portable module, say, is custom entities. So, entities are the nouns of a data set, the people, places, things. And you might have a certain type of now that you’re looking for in your data set that’s important, like maybe a medical condition, for example. You want to find the medical conditions within the data set. That’s not something that most tools are gonna have out of the gate. But, you can build that custom model, then re-use it across many different cases, data sets, etc. if that’s important to your investigations or the matters that you’re working with.
GEORGE: So, it sounds like you are building a whole host of capabilities here, right, Jon?
JON: Yeah. Yeah, these tools are becoming more interesting, more capable at a quite quick pace, so yeah.
GEORGE: Well, I’d like to thank you, Jon. And a couple of points: in case you missed at the beginning, I’ve been talking with Jon Lavinder, who leads Epiq’s practice around analytics, predictive coding and TAR. And if you were listening in the middle, he’s doing a pilot to make more effective use of some of the AI and other capabilities. And I’m going to go ahead and, I think, re-extend the offer. If you are interested in being one of Jon’s guinea pigs, and you should be interested, please feel free to reach out to him. Do we have, let’s see, Mirabel, can we put up his email address? If that’s okay?
JON: Please do. And George, I’ll also say, you know, I don’t think we mentioned this before, but I owe you a huge debt of gratitude. I think I used the EDRM diagram in every conversation with clients for years, and I think most of my colleagues did as well. So, the light you’ve shown on this industry, the kind of, this ability to really explain the big picture, you know, it was very, very much appreciated.
GEORGE: Thank you. And Jon’s, I guess I’m just going to read it out because it’s not showing up on the screen. So John can be reached at Lavinder, you see it spelled out there right away up on the screen, email@example.com. Jon, thank you very much.
JON: Yeah, of course.
GEORGE: Here we go. Okay, until next time, thanks all for joining us for ACEDS Blog Live and we will see you again at next Friday morning 11 a.m. Eastern.