Intro to Predictive Coding: Overview & Interpretation of Terminology June 2014 | Page 22

Glossary Collection – A group of documents. These can be documents gathered for a particular matter or purpose. Information retrieval scientists tend to use several well-known document collections (e.g., RCV1) for testing and comparison purposes. Confidence interval – the expected range of results. If you drew repeated samples from the same population, you would expect the result to be within the confidence interval about the proportion of times given by the confidence level. For example, in an election poll, the difference in the proportion of people favoring each candidate is described as being within a range of, say, plus or minus 5%. All other things being equal, the smaller the confidence interval, the larger the sample size needs to be. Said another way, the larger the sample size, the smaller the confidence interval.