By Jay White, Vice President of eDiscovery Services

In part one of this series, we compared the workflows of conducting document review using native files and metadata in a review platform and reviewing a collection of PDF files. In this update, we will dive deeper in to what is a native file, what is metadata, and how a native review can save considerable time and money.

A native file, in the context of eDiscovery, refers to a document that is in the original format as it was created. For example, if a document is created in Microsoft Word, it is a native file if it is still a “.doc” file. If it has been converted to a PDF file, it is no longer a native file, it has been altered and is an imaged representation of the original native file. Many layers of its makeup have been stripped away or altered, the most important being the metadata. Accurate metadata is crucial to a successful and defensible document review. Most user created data collected for discovery purposes in a case as electronically stored information (ESI), is collected in its native format. All of the native files are rich with metadata that you can use to find and defensibly prove the evidence needed to win your case.

So what it metadata? Metadata is data that provides context and additional information about other data. The easiest way to explain is with email. Metadata will give you information about who authored the email, who received the email, when it was sent, when it was received, if there were attachments, what was the subject., and much more. Metadata can tell you where geographically a picture was taken, the last time someone opened and made a change to a document, if someone ever opened a particular email. This kind of information can be the key piece of evidence needed to win your case.

Converting your files to PDF files will strip away all of the metadata. The new PDF will have its own very limited set of metadata. What is there will not be of use to you as a reviewer. Your options for review become very limited and the usual workflow becomes opening and reading each file individually to locate what you need. This is a very inefficient method to review even a few hundred documents. Today’s cases involving ESI are rarely a few hundred documents, most are well in the thousands and thousands of documents. You need that metadata from the native file to efficiently find the information you need.

Many think that working with ESI in a review tool is expensive, but it truly is a cost saving tool. The first place you will feel the savings is during the processing stage. Data processing is taking raw data and converting it in to the format you will be using for your document review. There are generally two forms of processing, full processing and native processing. Native processing extracts a copy of all the metadata and extractable text from a file. If there are any files that do not have extractable text, the file will be pushed through an optical character recognition (OCR) engine to create searchable text. A copy of the native file with all of its metadata and text are exported for use in a review tool. Full processing uses the same process, except each file that can be imaged is imaged before the data is exported. Imaging is very time consuming and adds additional cost to a processing job, usually doubling the cost. The export can be very similar with a set of natives, images, text, and metadata be prepared for import in to a review tool. Many times though, what is requested is a set of searchable PDF files.

I want to address some of the questions I hear from my clients around native review and in general, not having images. I often hear that images are needed to redact and for production, why shouldn’t I do it upfront. It is true that you need images for redaction and production, but you don’t need to image everything, only what will be redacted and produced. Every modern review platform will have a method to image documents “on the fly”. If you are reviewing a document and find it needs to be redacted, you can instantly image that one document and then apply your redaction. There are also workflows for tagging and bulk imaging documents that need redaction. When it comes to production, the imaging can be done in bulk for only those documents being produced. The production set is generally only a very small portion of the original review set. Another concern is having all of the applications to view the natives, PDF files can be viewed by any machine. If your conducting a native review, the review platform will be able to render the document for you in its native form. For example, you may not have a CAD viewer installed on your local machine, but within the review platform, you will be able to view a CAD drawing as if you did have that application on your machine. There really is not a downside to reviewing documents in their native form.

If we were to have a scenario where a client had 20 GB of data, the cost of processing would be double and the time it takes to process would also double if full processing was used instead of native processing. Even with the cost of hosting, there would still be considerable saving by going with native processing and hosting. Once you factor in the cost of review, there is no comparison. The time it would take to review the same data set would be reduced by two-thirds or more within a review platform.

Native processing:

Native processing of 20 GB at $125/GB = $2,500

Review License x 3 for 60 days = $510

Data Hosting x 20 GB at $20/GB for 60 days = $800

Total = $3,810

Full Processing:

Full Processing of 20 GB at $250/GB = $5,000

Total = $5,000

If native processing was employed and the data was imported in to a modern review tool, a new world of opportunities would open. Consider that 20 GB of data would likely produce roughly 100,000 documents. The time it would take 3 reviewers to thoroughly review and memorialize pertinent information on 100,000 documents in PDF format would be in excess of 800 billed hours if each reviewer could review an average 40 documents per hour. Using the full power of a modern review platform with analytics, that time to review the same 100,000 documents would on average take between 200-250 billable hours. If the review takes advantage of technology assisted review (TAR), that number could drop even lower. That is massive savings to the law firm and their client.

In the following installments, we will go through a workflow and highlight the functions of a modern review platform that simplifies the review and increase the efficiency of document review.

** prices and review times above are general estimates and may vary depending on data set, tool used, and vendor employed.**

A link to part one to this series is below.

If you have any specific questions on how to streamline your review, contact me at

Are you performing document review with PDF files, there is a better way part 1.