In part 1 and Part 2 we discussed native review, specifically why native review is a better option for reviewing documents over reviewing a collection PDF files. In part 3, we will discuss setting up a database for document review within a modern document review platform. The real power of a review platform is that it allows you to organize, memorialize, collaborate, recall, and produce documents very efficiently from anywhere you have an internet connection. To be able to fully capitalize on the functions, the database has to be prepared in a manner to take advantage of the data in the workflow that works best for you.
Building the actual database is usually handled by your chosen vendor or internal administrative team. In general there should be a conversation to discuss the make up of the data, the important key terms, privileged terms, legal issues, and general workflow. If the reviewers are new to native review using a review platform, I typically start with our standard template which covers about 98% of the needs of most document reviews. We will go over the template to explain what constitutes a good setup for a database. We use Ipro Eclipse at my company, but the information will translate to just about any other modern review platform.
The first thing I start with is a list of fields. You want to make sure that you have fields int he database to hold all of the available metadata that is relevant to the matter. Each field also needs to be formatted in the way to best represent the type of data stored within the field. For example if the data was processed for the dates to display MM/DD/YYYY, then the field holding a date needs to be formatted in that manner. Most importantly, only the fields that contain data relevant to searching should be indexed to avoid false hits. Searching across the database to quickly locate information is the most used function and the results need to be reliable. Indexing the wrong fields can make search results unreliable and the best way to avoid that is when you are setting up the fields during database creation. Below is an example of a standard list of fields. There are literally hundreds of additional metadata fields that can be added depending on the processed data and nature of the document review. There are also special analytics fields that can be added for use with features such as email threading and near duplicate features.
The next feature to set up is the initial tags. One tags documents to organize, memorialize, and quickly recall groups of documents. Once documents have been tagged, the tags can be used in within advanced searches, to isolate a set of documents for printing, creation of smart folders, and many other functions. Tags can can be created for simple workflows or tags can be created in multi-layered tags with rules for a more complex work flow. In the below screen shot from my training database, there are tags to denote responsiveness, privilege, confidentiality, issues, and sets of terms. Notice that Responsiveness is in red and privilege is highlighted in yellow. This indicates rules and rules are used to build and enforce a workflow. The red for Responsiveness denotes a mandatory tag, meaning during review on of the tags much be selected to move to the next document. If eyes have been laid on a documents, it should be tagged. If a decision cannot be made, the tag Needs Further Review should be selected. The Yellow highlight signifies that a the entire family for that record will be tagged with the same tag. Also note some are radio buttons and some are check boxes. Radio buttons are exclusive tags, A document cannot be both responsive and not responsive. The check boxes are not exclusive, a document can be relevant to multiple issues.
Multi-level tags like in the picture to the right make it simple to organize complex or multi-faceted issues. The tag below tracks various types of spills over multiple locations. Once the documents are tagged, retrieving documents for each type of spill, location, or any combination is only a few click away.
While the Non-Toxic Spills tag is just two levels deep, the fact that it is a Non-Toxic Spill and the location of the spill. The Radio-Active Spills go three levels, adding the year of the spill for the location. Note that the actual tagging of the check box is at the lowest level, if you tag a Radio-Active Spill to be in the year 2012 under Location A, the information will be captured at all of the above levels.
This will allow you to retrieve all Location A documents no matter the year, or pick and choose the year, location, or combination. Tags can be used in advances searches or mirrored in smart folders.
Smart folders are the next up on the list of features to set up when creating a case. I want to capture that work product starting with the first document being reviewed. Smart folders though can really be set up at anytime, but there are a few that are always helpful and have available from the beginning. Smart folders are live folders that can be based off of fielded data, tags, or a combination using advanced search. In the picture below, we an example of all three methods. Under the All Documents, the first smart folder we see is Custodian folder. This was created using fielded data, specifically the data custodian field that was captured at the data collection or processing phase. We can easily click on any custodian to navigate to their sub-set of data. The next three are based off of tags and will mirror the information recorded by the tagging process. This is most helpful when you want to go to a direct subset of data with one click. Final smart folder visible and under the Document Subset is a combination of custodian Ariana Akers and the tagged hot documents. This can be accomplished with multi-level tags, but since the custodian data was populated at import I would chose to take advantage and only tag one level.
At this point the case is ready for importing the data. The processed native data will be matched up with their corresponding fields and imported for review. The data will then need to be indexed to make it searchable. The fields that were designated for indexing will be available for searching once complete. The main fields are the document text or email body and select metadata fields such as from, to , cc, bcc, subject, file name, email subject etc… Field that may or may not be indexed are fields like origfolder path, custodian, domain, etc… The reason can be varied and depending on the data set, may not always apply. For origfolder path, we often see companies label portions of the directory with a company name. If indexed, if you were to search the company name all documents would be returned with many only returning because they were located within that directory. They may not have the term within the document itself. In the example of custodian, if you are searching for a person by name, every single document collected from their control would be returned. For domain, all documents from a specif company could be returned. If you are going through your clients email data during review, it’s likely that a search for their company name would bring back a result of every email collected. You may be better off using the filtering by field function to search or cull down information for searches within those specific fields. Once your index is complete, you will have a set of data that is searchable and highly accurate to your search terms.
Key term management is incredibly useful throughout the entire life cycle of a case, but especially at the beginning. More times than not there has been a set of terms approved by all parties to be used for locating responsive or relevant documents. Being able to know how many documents contained each term and were able to sort the documents by terms makes the review less daunting. Documents could be grouped in to subsets and batched out to particular reviewers with that have the most knowledge of a particular issue. Priority can be given to certain document which will allow development of issue and concepts at a earlier in the review. Another use would be privilege terms, this will allow priority review for privilege documents or the entire privilege review to be batched out to a specific team.
Multiple sets of terms can be set up and the results can be reviewed on the fly to assist in adjusting the terms. If additional data is added, the set can be updated to see the new total.
During the process a data field and tag group with a tag for each term is created. This allows you to see each term that a record contains in the grid view as shown below, or search for the term with the tag or smart folder utilities.
Once you have all your database set up, your data imported, your tags, smart folders, and key terms set up, you are ready to start reviewing documents. There are more steps and other features that nay need to be set up by the administrator, however these are the key things that the lead reviewer should understand before the review starts. Next time, we will go over simple searches, advanced searches, data filtering, actually tagging the data, updating smart folders, and coding or mass editing records.
At Express Network we use Ipro Eclipse for hosted review. Setting up the above described features in a different review tool will be very similar. We have extensive knowledge of most review platforms and assist you in preparing data and managing data in most any platform. If you have any questions please feel free to contact me at email@example.com