Tuesday, December 7, 2010

A Strategy to Sample All the ESI You Need

By Nick Brestoff, M.S., J.D.

Reprinted with permission from the December 6, 2010 issue of Law Technology News © 2010 ALM Media Properties, LLC. Further duplication without permission is prohibited. All rights reserved.

I was re-reading the EDRM section on “validation of results” when it hit me. Most of us have been so busy mining the data from the mountain of it that we just received that we have been missing the other mountain of data available to us, the mountain we didn’t ask for. You know the adage: if you don’t ask; you don’t get. So I’m talking about the ESI we didn’t ask for and didn’t get.

I had been reading the last paragraph of the EDRM Search Guide, Section 9.5. You know the one: “Sampling and Quality Control Methodology for Searches.” (See http://edrm.net/resources/guides/edrm-search-guide/validation-of-results.)

“Sampling.” There’s a word that most attorneys don’t grasp; that is, unless they had a statistics class (and remember some of it) or pay close attention to the results of political polls, when the sample size is usually about 900 to 1,200 randomly selected individuals. Amazingly enough, poll results seem to be pretty good estimates for whole counties, states, or the entire nation. The size of the sample matters, but the size of the population doesn’t. (I’ll skip the math.)

The word “sample” is there in the rules. It was added when the Federal Rules of Civil Procedure were amended to provide for the discovery of electronically stored information (ESI). It shows up in the rules governing requests to produce documents, Rule 34(a)(1): “A party may … request … to inspect, copy, test, or sample … (A) … electronically stored information ….” In the case law preceding this amendment to Rule 34, sampling was used in the context of statistical sampling backup tapes to see if they contained potentially relevant information. See Zubulake v. UBS Warburg LLC, 217 F.R.D. 309, 324 (S.D.N.Y. 2003).

Of course, such sampling must be within the scope of Rule 26(b), and that means that the ESI can be “any nonprivileged matter that is relevant to any party’s claim or defense …,” and “need not be admissible at trial if the discovery appears reasonably calculated to lead to the discovery of admissible evidence.” (Italics added.)

So, the rules allow us to use sampling on any ESI that “appears reasonably calculated to lead to the discovery of admissible evidence.” So what? You can’t use sampling on the data you didn’t receive. What light bulb went on?

First, back to the clue. It was the third and last paragraph of Section 9.5 of the Search Guide. It reads, in part: “In general, a sampling effort takes into consideration broad knowledge of the population, and [devises] an unbiased selection [of the sample]. In most cases, the party performing the sample has some knowledge of the population and there is one party with that knowledge. In contrast, most litigations where there is an adversarial relationship between a Requesting Party and a Producing Party, and since only one party has access to the underlying population of documents, agreeing on a sampling strategy is hard. An effective methodology is one that would require no knowledge of the data, but is still able to apply random selection process central to the effectiveness of sampling.” (Italics added.)

Ah ha. “Adversarial relationship.” “Sampling strategy.” Several points hit me at almost the same time:

· the frank recognition of the adversarial relationship;

· when you’re on the side of the Producing Party, you’re the only one with access to the ESI; and

· a sampling strategy is in play, notwithstanding the Sedona Cooperation Proclamation (http://www.thesedonaconference.org/content/tsc_cooperation_proclamation/proclamation.pdf).

When you’re on the side of the Requesting and (eventually) Receiving Party, of course, you’re very busy. You’re likely to be immediately swimming in the ESI you just received. This data has been produced, sans privileged documents, and the task ahead is to search it for documents that support either a claim or a defense. The act of swimming in that ocean of data takes concentration. But that focus may also lead to tunnel vision.

I asked myself to remember what goes on when you’re on the side of the Producing Party. What have you been through when you’re wearing that hat? The answer is that you’ve been through a culling process that stripped out, among other things, exact duplicates (de-duping), system files (de-NISTing), and documents covered by the attorney-client and work product privileges.

But you and others on the e-discovery team may have also created folders with data that was “probably” irrelevant or “not responsive,” such as spam e-mails with Viagra ads. For quality control purposes, sampling may have been done, so that an expert could show that both the process and the sampling protocols were reasonable.

In the end, some judgment had to be exercised to produce the nonprivileged and relevant matter. But that also means that the “probably irrelevant or nonresponsive” data was not produced. I wondered about “probably?”

And in whose eyes? Does a Requesting Party ever seek to learn the sampling strategy used by the Producing Party? What about the sampling parameters? What if the sampling protocol is loosey-goosey? What if the criterion for sampling by the Producing Party is a confidence level of only 90%, with an error factor of 10%? What if documents were misclassified as not relevant or not responsive when in fact they were relevant or documents which might lead to the discovery of admissible evidence? Wouldn’t you want to know?

Was the Producing Party’s sampling process transparent in any way? If this issue had been raised during the Rule 26(f) “meetings and conferences,” yes; but thinking back on that last paragraph from the EDRM Search Guide, I realized that Requesting Parties almost never ask the Producing Parties to disclose their processes, including the software they’ve sued or their sampling protocols.

These considerations led me to think of propounding a second wave of requests, immediately after receiving documents from the initial request. The second wave would ask the Producing Party to exclude the exact duplicates, the system files, and the documents covered by the attorney-client or work product privileges, but then to produce all of the other ESI (in native format) that was collected from the appropriate custodians, during the appropriate timeframes, and regarding the stated issues in the case, but which was not previously produced.

This additional step might involve a second mountain of data, but then you then have control of it, and you can search it using your own statistical protocols. In other words, you might treat this data as if it consisted of backup tapes. Most of the data will prove to be not relevant. You could search all of it. But if you sample it first, using a confidence level of 99%, with a 1% error factor, you may find nothing; if so, then perhaps there is nothing to find.

But then again, your sampling may turn something up, and then you’ll want to search the “second mountain” more thoroughly. Perhaps in the data that you didn’t receive in the first place you will find the gold that you seek.

Thus, it may be vital to realize that somebody on the other side of the case decided that some amount of ESI was not relevant or not responsive, and so did not produce it. Here are the three easy steps: (1) during the Rule 26(f) process, ask the other side to disclose its processes and statistical sampling protocols; (2) after receiving data from the Producing Party, ask for the ESI that was not produced (not all of it; exclude the duplicates, the system files and the privileged data), and then (3) use your own sampling protocols on that data when you get it.

It takes curiosity and persistence to operate effectively in this new world of e-discovery. And that includes remembering to ask for the ESI you didn’t get.

# # #

Nick Brestoff, M.S., J.D. is the Western Regional Director for Discovery Strategy & Management at International Litigation Services (www.ilsTeam.com), based in Los Angeles. E-mail: nbrestoff@ilsTeam.com. He gratefully acknowledges comments on the draft by e-discovery attorney Helen Marsh.