Content Analyst's CAAT transforms large volumes of unstructured data into organized, relevant information, and exposes insights hidden in the data. The CAAT platform is a dynamic suite of technologies known as Text Analytics. It provides organization tools for classification and email analysis; concept search; and other text analytics capabilities that automate most of the human activity traditionally associated with using unstructured data.
Typically, Text Analytics offerings are collections of disparate software components from different vendors. For companies developing solutions using Text Analytics, these platforms are inefficient and difficult to incorporate. They aren't optimized to be used together, they don't leverage a common index structure, and they require dealing with different vendors.
In contrast, CAAT was designed from the ground up for integration and efficiency. It performs a variety of functions on the same basic index structure, and makes it easy to continue adding new capabilities to your offering as and when you see the need.

Dynamic Clustering—CAAT takes an entire collection of information and automatically sorts it into folders and sub-folders by conceptual topics, even creating titles for each folder. This quickly organizes information in a logical fashion based on what it's about, not the words in it.
Benefit—Researchers and reviewers can narrow-in on only the information that is relevant to them, and extraneous information can be discarded before it consumes valuable time, space, and resources.

Concept-based Categorization—CAAT groups documents based on content, whether or not the same words are used to describe the same topics or concepts.
Benefit—Users can quickly locate information that is related and relevant, avoiding the typical flood of "keyword-responsive" documents that aren't on-topic. This accelerates and lowers the cost of legal review, streamlines enterprise content management, and sorts through social media content.

Conceptual Search—CAAT search mimics the way people think: by topics or concepts, versus keywords. CAAT can use a phrase, an entire sentence, or even a document to find other information that is conceptually similar
Benefit—Whereas two-thirds of keyword searches fail because they are overly inclusive or don't find the right information, concept searches will always find the most relevant information to the query. Because queries are natural language, even cut and pasted from actual documents, searching is faster and accuracy increases several fold.

Summarization—CAAT uses its own notions of concepts to evaluate an entire document, sentence-by-sentence, and find the sentences that are most relevant to the overall gist of the document. It then organizes these sentences into a summary form. CAAT identifies what a document is about from its content, versus titles and author-provided summaries, which are often misleading.
Benefit—Researchers and reviewers can quickly and accurately determine if a document is relevant to their queries, and if so, which parts are most relevant.

Near Duplicate Detection—CAAT's analytics include statistical capabilities to derive a number of duplicate and near-duplicate conditions for documents and text. These include exact duplicates, duplicates that vary only in composition (the traditional "near-duplicate") and, most significantly, documents that are conceptual near-duplicates. Identifying information that is nearly duplicate can be more significant than finding exact matches; nearly duplicate information can clog-up information sources, distort search results, and waste valuable reviewers' time.
Benefit—Grouping documents that are very closely matched—even if they differ only slightly—enables users to identify all closely related information earlier in their workflows.

Language Analytics—CAAT is language-agnostic—it can perform analytics on most all languages that can be represented in Unicode. CAAT can determine the actual languages in documents, and can operate in a cross-lingual manner, allowing users to query or organize information in one language and locate relevant information in different languages without requiring prior translation.
Benefit—Users quickly develop a view of information that's relevant, regardless of language, so they can make informed decisions on multi-lingual information. They don't have to accept the uncertainty of inexact machine translations, nor incur the time and expense of human translation only to find that the information wasn't worthwhile.

Email Analytics—CAAT's analytics provide a number of email features, including thread identification, metadata tracking, segment analysis, and tracking statistics. It can even identify gaps where emails should be, but are missing from a string or collection. CAAT can identify not only who is communicating, but also what they are communicating about, and if there are other similar email strings within communications.
Benefit—Reviewers can quickly narrow-in on only the most relevant conversations among only the most appropriate recipients, and by grouping similar strings and topics, can find relevant information in a fraction of the time they might spend going through chronological email trails.
Content Analyst Company is the original patent-holder for Latent Semantic Indexing (LSI), and today holds numerous patents around this technology and its applications. CAAT, which is based on patented LSI technology, provides a number of advanced text analytics capabilities in a highly scalable, proven platform designed to cope with massive amounts of unstructured data.
CAAT is delivered to partners as a robust set of APIs along with a sample User Interface to speed integration.
Our Software Developers Toolkit (SDK) includes extensive documentation, and our ContentCare support and delivery programs are designed to help our partners quickly become proficient with CAAT and its broad text analytics capabilities.

© 2013 Content Analyst Company, LLC. All rights reserved.