NEAR DUPLICATE DOCUMENT IDENTIFICATION

How It Works

Near Duplicate Document Identification (near-dup) builds upon CAAT’s ability to dynamically cluster conceptually-similar documents and uses the CAAT engine to identify the precise measure by which these documents are duplicates or repetitive clones of one another.

CAAT’s near-dup capability is a virtually automatic function, and organizes documents based on whether they are total duplicates of each other, or very nearly duplicate. A mathematical score shows the degree by which near-dup documents vary.

How it is Used in eDiscovery

Review costs are the single largest eDiscovery expense. Eliminating duplicate documents only addresses part of the review challenge. Documents that are near-dups pose a different challenge: while their text may be repetitive, unless a review tool presents them to the reviewer as a group, there is no way to economize on the cost of reviewing each document separately.

CAAT’s near-dup document identification solves this challenge: highly-similar documents are grouped together for review, providing the user with a mathematical “gauge” of just how similar or repetitive they are. The reviewer can make document determinations en masse, with a potential savings in the hundreds or thousands of hours versus a traditional linear review.

 

spotlight