Intro to Predictive Coding: Overview & Interpretation of Terminology June 2014 | Page 28

Glossary category assigned to the most similar labeled example (its nearest neighbor) or examples. Knearest neighbor classification uses the k most similar classified objects to determine the classification of an unknown object. Population – the universe of things about which we are trying to infer with our samples. For example, the population may be the set of documents that we want to classify as putatively responsive or putatively non-responsive. The group from which we pull our samples. Also called the sampling frame. Precision – the proportion of retrieved documents that are responsive. See also recall. Predictive coding – a group of machine learning technologies that predict which documents are and are not responsive based on the decisions applied by a subject matter expert to a small sample of documents.