The Litigator's Guide to eDiscovery for Law Firms
This guide covers how the eDiscovery process runs, where the cost concentrates, how AI is changing review, what keeps that review defensible in court, and how to decide what your firm should run.
eDiscovery is the process of identifying, preserving, collecting, reviewing, and producing electronically stored information when a matter heads into litigation or investigation. It's the part of a case where the discovery obligation meets digital reality. For most firms, it's where the largest share of litigation cost and the sharpest risk of sanctions both sit.
The job has grown harder for a plain reason. The volume of electronic data has outpaced the methods firms built to handle it, and a single routine matter now spans email, shared drives, chat tools, and phones. Reading all of it manually has turned slow and expensive, which is why AI is reshaping the costliest stage of the work, the review.
If you've sat through a big document review, you already know eDiscovery can swallow a budget in a hurry. The firms that handle it well treat it as a process they can plan for and get ahead of. This guide covers how the process runs, where the cost concentrates, how AI is changing review, what keeps that review defensible in court, and how to decide what your firm should run.
The eDiscovery Process Stage by Stage
eDiscovery runs on a sequence the industry standardized into a model called the Electronic Discovery Reference Model, or EDRM. The stages move in a familiar order, from identification through preservation, collection, processing, review, analysis, and finally production. The full model brackets these with two more stages, information governance at the front and presentation at the end, but the seven in between are where the discovery work actually happens. Each one carries its own risk, and your firm's exposure shifts as a matter moves along the chain. Matters also keep getting bigger, since email, chat tools, collaboration platforms, and mobile data have multiplied the sources a single case touches.
Identification
Your team works out what data might be relevant and where it lives, which custodians hold it, which systems store it, and how far back the records reach. The goal is to map what exists and where before anyone touches it. Miss something here and every stage after it inherits the gap.
Preservation
The legal hold goes out at this stage. A legal hold is the instruction that tells custodians to stop deleting anything that might be relevant, including the automatic purges email systems run on a schedule. The duty to preserve attaches the moment litigation becomes reasonably anticipated, and the courts take it seriously, since Federal Rule of Civil Procedure 37(e) lets a judge sanction a party that loses electronically stored information, or ESI, it should have kept.
Collection
Here your team gathers the preserved data in a forensically sound way, with the metadata intact. Metadata, the record of when a file was created, sent, or changed, often matters as much as the content, so sound collection preserves it exactly as found. Done carelessly, collection can alter that record and hand the other side a reason to question the integrity of your evidence.
Processing
Next comes the work of turning the raw collection into something a team can review. The system removes duplicates, filters by date or keyword, and converts files into one consistent format. A matter that starts with millions of files can shrink considerably here, before anyone opens a single document.
Review
This is the heart of the work, and the most expensive part of it. Lawyers read the documents to judge relevance, flag privilege, and code what matters to the case. It's also where AI now does the most, which the next sections take up in detail.
Analysis
Running alongside review, your team looks for patterns and builds the timeline of events. The documents get connected to the legal theory of the case, and the gaps that need more discovery start to show. Strong analysis is where a pile of files starts to tell you what actually happened.
Production
At the end, your firm hands the relevant, nonprivileged documents to the other side in an agreed format. This is where disputes over file formats surface and where a single privilege error can expose material you meant to protect. Getting the format and the privilege review right helps keep a production clean and avoids a costly mistake.
Where eDiscovery Costs Come From
One fact shapes the economics of eDiscovery more than any other. The biggest cost in most matters comes from a place that surprises people. Collection and storage cost real money, and so does the software, but the line that dominates the budget is paying lawyers to read the documents.
Document review runs up the bill because it scales with volume. More custodians and more data sources mean more documents, and more documents mean more reviewer hours at an hourly rate. The other stages matter, but none of them grow the bill the way review does. Research from the RAND Institute for Civil Justice put review at roughly 73% of production costs, well ahead of collection or processing.
For your firm, this cuts two ways. Review has long been billable work, a steady source of associate hours and a cost clients paid without much argument. Those days are fading, since clients now scrutinize eDiscovery spend closely and many won't absorb a large review bill without a clear justification for the hours.
The courts add their own pressure through the principle of proportionality. Federal Rule of Civil Procedure 26(b)(1) ties the scope of discovery to what's proportional to the needs of the case, weighing the stakes, the resources of each side, and whether the burden of the discovery outweighs its likely benefit. Your firm can't review everything simply because everything exists, so the effort has to fit the case.
All of which explains why AI reached document review before any other stage. When the most expensive part of a process is also the most repetitive, it becomes the first place a firm looks to work faster and for less. The move from manual review to AI-assisted review is an economic event as much as a technical one, and the rest of this guide treats it that way.
From Keyword Search to Generative AI Review
Document review has moved through three methods over the past two decades, and each one has pushed legal tech forward by removing a limit the one before it couldn't. The progression runs from manual review, to technology-assisted review, to the large language models arriving now. Understanding how each works is the best way to see why generative AI matters so much for review, and where its limits still lie.
Manual review came first, lawyer by lawyer, document by document. For smaller matters it still works, and for the documents that decide a case, careful human reading remains the standard. At scale, though, it runs into simple arithmetic, since a team can only read so many documents in a day and a large matter can hold millions.
Technology-assisted review came next, and it became the workhorse of large-scale review. Lawyers call it TAR, or predictive coding. The idea is straightforward, since a senior lawyer reviews a sample set and codes each document as relevant or not, and a machine-learning model studies those decisions and applies the same judgment across the full set, ranking documents by likely relevance. TAR cut review volumes sharply, and the courts accepted it, though it takes real expertise to run well, because a defensible process needs a carefully built training set, several rounds of coaching the model, and statistical validation to show the results hold up.
Generative AI review is the newest method, built on the large language models that have reached legal work in the past few years. The difference is practical. A lawyer writes review instructions in plain language, much the way they'd brief a junior associate, and the model applies them across the documents while showing its reasoning for each call. That plain-language control is most of the answer to how to use AI as a lawyer, since the skill comes down to giving clear instructions and checking the work. The seed-set training cycle that TAR depends on falls away, and the review criteria become as flexible as the questions a lawyer can think to ask.
This is where software built specifically for legal work earns its place. Harvey, for one, applies large language models to reviewing and analyzing large sets of documents, and it grounds every answer in the source material, so the reviewing attorney can check each conclusion against the underlying text. Tools made for legal work are built around that kind of verification, where every answer needs to trace back to a document and hold up under challenge. The same large language models now reach other document-heavy work, from contract analysis to contract drafting, though AI for legal discovery is where the sheer volume makes the payoff clearest.
A caution belongs here, and it isn't small. AI-assisted review doesn't take the lawyer out of the loop. The model proposes, and the attorney disposes, through sampling, spot-checks, and validation. When a production goes out the door, your firm certifies it, and a court holds your firm accountable for what it contains, so the technology changes how the work gets done while the question of who answers for it stays exactly where it always was.
Defensibility and Proportionality When AI Reviews Documents
The first question most litigators ask about AI review is whether it'll survive a challenge from the other side or a skeptical judge. The reassuring answer is that this isn't new ground for the courts, and the principles for defending technology-assisted methods are well settled. A defensible process is a documented and validated one, and that standard hasn't changed just because the tool has.
Start with the precedent. In Da Silva Moore v. Publicis Groupe (2012), a federal magistrate judge in New York became the first to approve the use of predictive coding in discovery. The court approved the method on the ground that the process was reasonable, and reasonableness has been the standard ever since, with later courts continuing to accept technology-assisted methods wherever the party using them can show a sound, transparent approach.
Proportionality is the second pillar, and it works in your firm's favor. Federal Rule of Civil Procedure 26(b)(1) limits discovery to what's proportional to the needs of the case, which means the discovery effort should match the stakes. A firm that uses AI to review faster and more consistently is doing what the rule asks, and used well, AI review strengthens the proportionality argument by showing the firm matched its effort to the stakes.
Certification closes the circle. Under Rule 26(g), an attorney who signs a discovery response certifies, after a reasonable inquiry, that it's proper and proportionate under the rules. The software can't make that certification, so a lawyer makes it and stands behind the production, which is why attorney oversight runs through every step of AI-assisted review.
For your firm, that translates into a short list. Document the methodology you used, validate the results with sampling, keep a record you can show opposing counsel and the court, and keep an attorney accountable for the production.
AI Review and Associate Development
A quieter consequence of all this has nothing to do with cost or court rules. For a generation of associates, document review was the apprenticeship, the place they learned the facts of a matter and the texture of the evidence. Living inside the documents taught them how a case fits together, what a privileged communication looks like, and how a paper trail becomes a story a court will believe.
When AI handles the first pass, that classroom changes. The hours a young lawyer once spent reading routine documents shrink, and the real question for firms becomes how associates still build the case knowledge and judgment that review used to teach. The old path narrows, so a new one has to be built on purpose.
The encouraging part is that the work left for people is the work worth learning. Someone has to frame the review criteria the model follows, which takes a real grasp of the legal theory. The close calls the model flags as uncertain still need a human to resolve them, and the case narrative still has to be built from what review surfaces, which was always the part that mattered most. The shift changes what associates learn, and it puts the responsibility on the firm to make the new pathway intentional.
Choosing How Your Firm Runs eDiscovery
Before your firm picks a tool, it faces a more basic decision, which is who actually runs the eDiscovery work. Three models are common, and most firms use some blend of them depending on the matter and the client.
Some firms run eDiscovery in-house, with a dedicated litigation support team handling preservation, processing, and review under the firm's roof. This gives the most control and tends to suit firms with steady, heavy litigation volume. Other firms hand the work to an outside eDiscovery provider, paying specialists to manage the data and the platform while the firm's lawyers focus on the review and the case. A third model has grown more common as corporate clients build their own capabilities, where the client's in-house team handles preservation and collection and the firm takes over for review and analysis.
Once that's settled, the choice of eDiscovery software comes into focus. Like most types of legal software, it rewards a careful look, and a handful of criteria separate the platforms worth your time from the rest.
Security comes first, because eDiscovery data is some of the most sensitive material your firm handles. Look for independent security certifications, clear answers on where the data is stored, sound hosting and data handling, and a firm commitment that your data won't be used to train anyone's models. Integration matters nearly as much. The platform has to fit the legal document management software, legal knowledge management, and litigation management software your firm already runs. A tool that can't take in the sources a matter touches adds work at exactly the wrong moment.
The strength of AI-assisted review has become a real differentiator, and firms weighing it for discovery often weigh it for the contract review process too. Ask how the platform handles large-scale review, whether it grounds its conclusions in the source documents, and how easily a lawyer can validate and report on what it produces, since the defensibility you'll need in court starts with the software you choose. Finally, look hard at total cost, because eDiscovery pricing often combines hosting fees, per-gigabyte processing charges, and user licenses, and the headline number rarely tells the whole story.
The Edge Hiding in Discovery
As AI compresses the cost of reading documents, the value of legal work moves to the places AI can't reach: the judgment to frame the right questions, the skill to defend a method under pressure, and the insight to turn what review surfaces into strategy. The firms that come out ahead will be the ones that let AI absorb the repetitive work and pour the freed-up time back into the parts of lawyering that demand a human mind.
eDiscovery turns out to be an early test of that shift, since it's where the volume is highest and the payoff for getting it right is most concrete. Handled well, it turns from a drain on law firm productivity into one of the clearest places for your firm to build a real edge.
Harvey was built for this kind of work, applying domain-specific legal AI to the review, analysis, and drafting that fill a litigator's day, with every output grounded in sources a lawyer can verify. If you'd like to see what that looks like inside your own matters, request a demo.








