Digital Forensic Preservation vs. Collection: A Practical Guide

Forensic preservation and collection are crucial steps in any investigation or litigation. The choices legal teams make in these early stages not only determine what kind of a legal strategy they could put together, but also how burdensome it is to put that strategy together, and what kinds of audibles they can call downstream if that’s necessary.  

Collection Does Not Equal Preservation

So what exactly do “data collection” and “data preservation” mean and why is it important to keep the ideas separate?
Well..  
Forensic preservation = making sure data remains intact in case it’s needed.
 
Forensic collection = taking data into custody for actual investigation.  

Think of your data population as a grocery store. It doesn’t make sense to buy the entire grocery store to make one dish because maybe you might need those ingredients. It makes far more sense to read your recipe, make a shopping list of ingredients you’ll need, and only buy the ones required.

When you collect every potentially relevant piece of data, you’re effectively buying the whole grocery store. This is usually going to end up being needlessly expensive, and make other processes downstream needlessly burdensome. By preserving data, you keep the grocery store intact and can always go shopping again if your first collection doesn’t pan out the way you want. In some cases, data preservation could be as simple as checking a few boxes on the backend of Microsoft 365. This distinction allows legal teams to benefit from more economical, tailored forensics techniques without leaving important evidence on the table. Preserving data means that you’ll have the option to make a Plan B (or C or D or E!) without the added, costly burden of collecting everything.

So Why Is Forensic Data Preservation Necessary? Isn’t Data Stored Anyway?

For a long time, conventional wisdom was that anything you write on the internet is there forever, but that’s not as true as it once was. Now, most of the technology we use in our day-to-day lives is cloud based, and people are generating so much new data that old data has to go somewhere. Automated deletion has become par for the course in most organizations.

Luckily, data preservation is not always that complicated. It could be as simple as disabling some of those automated deletion functions, and even simple preservation practices can still have a profound impact. More and more organizations are realizing the importance of such strategic data retention. By automating the deletion of redundant or obsolete data, organizations save on data storage, while still ensuring that important data remains available if further investigation is necessary. Forensics technicians can advise you on such policies, and how to organize the data you want to keep.  

When investigations do happen, these data preservation measures can help things run more smoothly, and can set you up to make contingency plans if need be. If it comes out later that crucial data could’ve been preserved but wasn’t, you could face adverse consequences such as sanctions. In a recent antitrust litigation, Google was sanctioned after courts determined a failure to preserve important information.

The important thing here is not to neglect data preservation because you’re assuming data is preserved by default. A digital forensics expert can work with you and your IT team to determine what is and isn’t being preserved already, and what changes to make if any are warranted.

So How Do I Determine What Data To Collect?

Once we let go of the idea that every potentially relevant piece of data needs to be collected as long as it’s preserved, that begs another question: what DO we collect?

Well… it depends.

What are you actually hoping to find during your investigation? Knowing that can guide your forensics team in the right direction. While it’s understandable to want all the information before you’re too committed to a strategy, James Whitehead, Associate Director of Digital Forensics at Contact Discovery, says that’s not always the best order of operations.

“As digital forensics experts, we often bring the most value when legal teams have some idea of what information they’d like to find, and where that information is most likely to live,” James says. “We can control costs and save time if we’re able to narrowly tailor our forensics approach to what’s most likely to prove helpful.”  

Oftentimes, that reluctance to have a more targeted approach stems from a fear of leaving important data behind. By making sure you’re preserving, you’re able to be a little more calculated with collection while still hedging your bets.  

By approaching these two related, yet different forensic processes with the appropriate strategies, legal teams can have their cake and eat it too: effective, efficient investigations without too much clutter in their dataset, while still protecting themselves against spoliation and a lack of contingency options.      

Disclaimer: This content is for general information purposes only, was not written by an attorney, and does not constitute legal advice.

Managed Review vs. Unmanaged Review: Which One’s Right For You?

Complex litigation cannot happen without some form of document review, whether that’s managed review or not. Document review cannot happen without reviewers. This raises a lot of questions that most lawyers never learned about in law school: Which reviewers do you hire? What parts of review can be trusted to technology, and what parts absolutely have to be done by humans? What work requires actual attorneys, and what work is better left to other litigation support professionals? 

There’s a myriad of considerations that go into these decisions depending on the case at hand and the capabilities of that particular legal team. One of those decisions is “who should make all the other decisions?” When legal teams are either spread too thin or simply want someone with a different realm of expertise, managed review services can be great for attorneys and their clients.

What eDiscovery Review Teams Do vs. What Lead Attorneys Do

Lawyers are good at a lot of things. They tend to be good researchers, and good at drawing connections between seemingly unconnected pieces of evidence. Unfortunately, those relevant pieces of evidence don’t just show up at a lawyer’s doorstep all wrapped up with a bow. Usually, they’re hidden somewhere in a massive pile of data.

If your legal case is a jigsaw puzzle, most lawyers could probably put that puzzle together just fine if there are relatively few pieces and they all come in the same box. Now, imagine the pieces of the puzzle that you want to make are in a box with thousands of other pieces that make up less relevant pictures. What happens when you’re not even 100% sure you have all the right pieces to make the picture you want to make? Can you find other pieces along the way that might make other helpful pictures?

It would be impossible for any one person to make their chosen puzzle in such a scenario. Instead, you need a coordinated team of people going through documents and coding them as relevant. These reviewers aren’t usually the ones making calls about big picture legal strategy, but they know what kinds of puzzle pieces to look for so that those other attorneys can actually put a picture together.

Finding the right pieces in that unruly pile is a separate skillset from actually putting the pieces together once you have them. Plenty of lawyers have great experience in both these areas, but it’s not a given, and there’s no shame in asking for outside help when you need it.

How Remote Review Services Can Help

Remote review and managed review services allow for legal teams to scale as needed in order to take on larger, review-heavy matters. Smaller teams who don’t have enough man power internally can staff up for one matter, and then scale back down afterwards.

Document review services aren’t just for when you need more reviewers, but sometimes simply different reviewers. Maybe you have documents in a foreign language that no one on your staff speaks; maybe you have a matter outside your normal realm of expertise, and need reviewers with different legal specialties.

Either way, remote review lets you take better care of your clients without having internal hires for every possible scenario that your clients might throw at you (which isn’t realistic for most law firms.)

Remote review is a great option for most law firms, but it comes with challenges of its own. Someone has to decide which reviewers are best suited for your matter. Someone has to decide how many reviewers you need to meet a deadline; those reviewers have to report to someone on a day-to-day basis, and that person needs to have an intimate understanding of the whole case, your overarching legal strategy, and whatever technology you’re using to assist human reviewers. Even after you agree on answers to all those questions, someone should be reevaluating things on a daily basis as new information comes to light. Of course, all these challenges multiply if you’re leveraging remote review for multiple cases.

So that raises a follow-up question…. Who should be that someone?

How Managed Review Can Help

In an unmanaged remote review situation, a client is still on the hook to answer questions from reviewers as they come up; the client needs to clearly communicate to reviewers what they’re looking for, and make sure all these different reviewers are taking a consistent approach to coding documents.

Unmanaged review does cost less money, but there can also be dire consequences if this point person is already spread thin and not able to give review management enough attention. What happens if an attorney gets caught up in court and can’t answer reviewers’ questions? What if they don’t notice that more reviewers are needed until it’s too late? What if they aren’t assessing progress often enough, and don’t realize a pivot in legal strategy is necessary until review is 99% done and there’s still no “smoking gun”?

This is why the biggest law firms that are dealing with complex litigation on a regular basis typically hire dedicated discovery project managers apart from their regular attorneys. They know that discovery can require near constant attention, and oftentimes it’s just not possible to give it that attention while dealing with all the other demands of being a lawyer: writing briefs, going to client meetings, court dates, etc. Such firms also know mismanaging discovery can have dramatic impacts downstream. Therefore, investing in constant vigilance at every step of the process will pay dividends later.

Managed review means you are not only temporarily hiring reviewers, but someone to manage them. Clients get to meet with one point person, describe their legal strategy and what they’re hoping to find in review. That point person turns around and handles all the other emails and meetings with the review team. Project managers can catch issues early on and bring them to the client’s attention.

How Do I Decide Which One Is Right For Me?

There’s different reasons someone might go the managed review route over unmanaged route. One might be that they simply don’t have attorneys with that discovery project management skillset. With managed review, a firm that cannot justify hiring discovery PMs internally can still have the same level of vigilance as a firm that could.

Another reason is that even if attorneys are perfectly capable of managing review themselves, their current caseload just doesn’t allow them to give it the attention it needs. By letting someone else answer more of the emails and sit in on more of the meetings, that attorney can focus on other things that a discovery PM couldn’t do, such as deposing witnesses or writing briefs.

Of course it’s going to depend on many factors specific to your case which can’t be addressed here, but generally the key factors that should shape your decision are:  

  1. Do you have discovery project managers internally? These could be either dedicated personnel or attorneys who have experience managing review.

  2. Do those people actually have enough room on their plate now to take on the responsibilities of managing this review?

If an unmanaged review situation is likely to result in either a) an attorney not having enough time to give all their matters the attention they deserve or b) review being managed by someone who’s never done it before and may not understand the intricacies of it, the managed review route is often best.

4 Rookie Mistakes of eDiscovery Processing

Within eDiscovery, Processing comes after collection but before review. This step is all about taking the data and extracting metadata such as who created that document, when they created it, the file format and size, etc. Such metadata helps legal teams organize a seemingly endless sea of data into the right buckets so they can make informed decisions about what to do next.

Unfortunately, there are many things that can go wrong within the processing stage that an untrained eye wouldn’t notice. Some teams load data into a program, click a few buttons, and tada! They get their coveted metadata. Without a forensics data engineer dotting the I’s and crossing the T’s, you could be missing out on mission-critical information and not even realize it until further along in the discovery process.

Data Engineers have the ability, knowledge, and understanding of how to identify, isolate, and apply various remedies to a vast variety of processing errors. Handling these errors appropriately will ensure the maximum amount of text and metadata is extracted from the source data. Keeping an eye out for these common rookie mistakes can help you mitigate them early, saving time and money.

Mistake 1: “If there was an error, I would’ve gotten an error message!”

Errors can exist in an imaging set even when the processing tool does not throw any error message. A data engineer can identify these types of documents based on fielded metadata and remedy the imaging issues using 3rd party applications. Proceeding without isolating these errors can result in incorrect OCR text or blank OCR text for documents.

Potential Fixes:

To avoid some of these mistakes, familiarize yourself with all available system fields and error messages and when to look at each. This will be a huge help in isolating incorrect imaging/OCR (optical character recognition) results.

Additionally, an experienced Data Engineer will recognize when they see the same issues on the same file types over and over again. To remedy having to manually search every time data is processed, set up saved searches by keying on metadata fields (File Type, File Description, Doc Extension, etc). That will display documents that likely have errors without needing to sift through the system fields and error messages.

Mistake 2: “Maybe I did get an error message, but the data’s still all here. I should be able to move onto review now.” 

Well…. Not exactly. ZIP files are a common format within forensics, and so extracting ZIP files in a forensically sound manner is a big part of processing. Oftentimes, there’s an exorbitant number of ZIP files in a case, so it’s easy for a few corrupt files to fall through the cracks.

Much the same way that a large haystack with a needle in it is identical to a haystack without one, the output of 10,000 perfectly converted files can look very similar to the output of 9,999 perfectly converted files plus one corrupt file. Data Engineers will know how to confirm if all of the data was actually extracted or if there are some files missing. They can also check to see if the content extracted is intact or corrupt. For a few files, this may be very obvious to even the inexperienced but when dealing with thousands of files it is easier for issues to slip by unless you know exactly what to look for.

Potential Fixes:

A good place to start is by running a “Sanity Check” by comparing the properties (File count, folder count, and file size) from within the zip file prior to extracting against the same properties of the extracted data. This comparison can either help confirm that you’ve done everything right, or shed a light on corrupt files and inconsistencies before they make it any further in the discovery process.

Mistake 3: “We’ve removed all the duplicates thanks to our metadata. Now we can throw the dupes out and move onto review.”

All eDiscovery professionals are familiar with deduping (or least the good ones are). Figuring out which documents are duplicates allows teams to better understand the scope of their review needs. How many attorneys are needed to review the necessary documents before a deadline? How costly will that be? In some cases, it may help determine if a client is better off litigating or settling out of court, so having accurate ideas of how many documents are duplicates is crucial.

However, discarding duplicates too early in the process can sometimes come back to haunt you. Some clients ask us which documents exist in the workspace that other vendors or internal team members have labeled as duplicates. Understandably so, since incorrect dededuplication can lead to drastically different decisions than what a team would make if they had the correct information.

Potential Fix:

The solution is to run custom SQL scripts that are able to scan an eDiscovery environment and find these documents that a rookie might have thrown out. We can double check this metadata to confirm whether or not these documents are in fact duplicates.

Ensuring your team is familiar with the backend SQL tables of your processing tool is an extreme benefit. The more comfortable and familiar a data engineer is with the backend, the more flexibility and time efficient custom solutions will be.

Mistake 4. “My role is eDiscovery processing. When it comes time for production, that’s someone else’s problem.”

An all-too-common issue for both legal professionals and eDiscovery professionals is not taking a holistic approach towards their discovery. They focus solely on the piece of the puzzle they’re responsible for without a strong sense of how that piece fits into a bigger picture. Data processing pros understand that it will eventually come time to produce this data, and those productions have to adhere to specific, previously-agreed-upon requirements. That could mean customized slipsheets, metadata formatting, production field creation, custom file-naming procedures and much more.

Potential Fix:

Make sure you’re communicating with the people on your team who will be handling the rest of discovery after you’re done with processing. Ask about what kind of file formats they’ll need, and learn as much as you can about the “big picture” goals of the case. Constantly learning and “getting into the weeds” on both the front end and back end of your processing tools will expand the number of tricks up your sleeve. With those tricks, data engineers are able to get the job done in a timely fashion where less experienced processing specialists may find limitations to what they’re able to achieve and spend more time than they have.