By Jeff Fehrman
Welcome to the world of rules-based data classification
Imagine walking into a grocery store where none of the products were organized and none carried labels. Where peanut butter was next to tuna fish, lemons next to loaves of bread, and no way to tell just how old that hamburger may be. In many ways, a company’s electronically stored information (ESI) is much like a disorganized grocery store. If poorly organized, documents containing pertinent information to a legal case could fall by the wayside, while valuable time is spent sifting through irrelevant and junk information.
There’s a more natural was to find relevant data fast: classification. By classifying electronic documents early and often, organizations can meet challenging deadlines by having quick access to pertinent high-value relevant data. This is valuable in risk management, legal discovery, and regulatory compliance. It will also reduce costs and establish a focused process for organizing and storing information.
Organizing Relevant Data in a Crowded Information Landscape
Classification essentially assigns a class or label to a document. Classification is used in computer science, information and library sciences. Even email spam filters, which classify relevant data like emails and junk mail, are a type of data classification.
Classification is the structure applied to data, and while it may not command the spotlight in legal discovery, classification is widely used. Responsiveness review has traditionally been the manual classification of documents, and functions like machine learning provide automation of classification decisions.
Classification can be accelerated with unsupervised learning techniques such as document clustering, topic modeling, and many other approaches where similarities are identified and documents are grouped based on the characteristics they share with other documents.
Another method for the acceleration is a supervised approach where machine learning models are provided positive and negative labeled (tagged) example documents through what’s called active learning. As the model begins to identify and learn the examples it’s provided, it can project that learning across all data sources in a collection to suggest labeling decisions.
In going back to the grocery store analogy, instead of searching every aisle in the store for milk you go to the aisle labeled “dairy” to explore various options the store carries. Unsupervised techniques get you to the aisle and supervised learning helps the store determine what to place in that aisle.
Classification Doesn’t Replace Legal Review, it Improves It
Despite the importance of classification, it isn’t always considered a vital standalone in eDiscovery. More often than not, classification is applied only in legal review. While helpful to a case at hand, performing classification solely in review creates a one-off process that has no long-term value.
Imagine a team of lawyers painstakingly categorizing piles of documents and placing them into folders and boxes. But at the end of the case, an attorney kicks the boxes open and all the paperwork scatters all over the ground. For the next case, the attorneys start all over.
We’re not saying classification shouldn’t take place during legal review. It has a purpose at that point: identifying the proper documents that are needed for trial, and ensuring sensitive data is handled consistently. However, review-only classification fails to generate long-term value as it tends to solve for a singular event with a singular focus.
eDiscovery technology isn’t being used effectively when classification is centered only on review.Using only de-duplication and a keyword search, an eDiscovery solution can remove 80 percent of data from review. However, because it’s dedicated to only one event, classification isn’t performed consistently, it can also miss relevant data or fail to classify documents that are business “sensitive.”
Classifying the Good and Bad, Early and Often
Data should be classified early and often. A reliable classification process improves with each discovery request, and the small gains made daily with classification can pay big dividends years down the road when a specific document is needed in a pinch. Classification can produce small, but relevant, sets of documents at a time when many judicial and regulatory processes expect information quickly. Regulatory agencies and certain investigations work on accelerated deadlines, and time is not a luxury most organizations have.
Effective classification calls for categorizing information into data types beyond just “relevant” and “not relevant.” To be truly efficient, classification also requires labeling data as “sensitive” and “junk.” “Relevant” and “not relevant” solve for the specific needs of each case. “Sensitive” and “junk” are classifications that provide ongoing value particular to each organization. That’s the ultimate goal of classification: create a more intelligent process for organizing data across the enterprise.
Classification can prove particularly useful to companies facing constant litigation. Pharmaceutical companies, for instance, must save all data on the development of a drug for many years. Leveraging solutions with artificial intelligence can go a long way for classifying information. But taking classification a step further— by applying models for junk and sensitive documents alongside those that are deemed relevant and not relevant — a pharmaceutical company can also be prepared for litigation outside of drug research.
Aside from helping organizations adhere to the many personal and financial data regulations, classification ensures that only information that needs to be sent is sent, a step that protects the privacy of employees.
Making classification an everyday process — an iterative function of storing documents — will make data collection and processing more effective and the identification and retrieval of documents easier. It’s a win-win for the current case, and for your whole process.
Intelligence Delivers Cost and Time Savings
Classification as an iterative function will not only make life easier, it also reduces costs. If only three employees are needed to manage classification daily, an organization may not need to hire an army of attorneys when a legal matter arises.
By applying a programmatic approach to classification, a corporation can have experienced attorneys handle the decision-making involved with sensitive information, while the task of reviewing potentially junk and non-relevant documents can fall to less-experienced attorneys. This will end the common practice of assigning data to any attorney with no rhyme or reason, making discovery more efficient and less costly.
Instead of reserving classification as a manual task during legal review — an undertaking that requires heavy staffing and burns time and money — organizations will see powerful results with automated classification. With Reveal’s eDiscovery technology and proper approach, you can use data to create more efficient business processes, allowing searches you never thought possible. Imagine finding a paragraph on the third page of a 15-page document and using that text to train the system to find similar passages. That’s not just classification; that’s intelligent classification.
Case in Point: Classification as a Valuable Tool in the Cost-Savings Arsenal
When a contracted attorney has to sift through thousands of electronic documents, the cost of legal review skyrockets, that’s why review typically accounts for 73 cents of every dollar spent on eDiscovery. But predictive coding — a computer-categorized review application that classifies documents based on matching concepts and terms — can reduce the number of hours attorneys need to review materials by up to 80 percent.
Predictive coding found the relevant documents needed to secure government approval of the merger of Anheuser-Busch InBev and Grupo Modelo, and it cut eDiscovery costs in half. But when organizations classify documents early and often, in addition to using predictive coding, they have the potential to achieve an even more significant reduction in overall eDiscovery costs.
Classification Sharpens Focus and Improves Legal Review
Effective classification simplifies eDiscovery. With classification, decision-making is faster and challenging deadlines can be more easily met. Classification prioritizes and accelerates data for review tasks, allowing organizations to focus on key documents with sharper insight. It also helps organizations follow stringent regulations governing the security of personal and financial information. And, at a time when costs matter, classification reduces time and money spent on information that is inconsequential to a case.
If your organization is interested in leveraging artificial intelligence to optimize the future of your legal review, request a demo to learn how Reveal’s AI-powered eDiscovery software can do just that.