Information Retrieval

Jacques Savoy and Eric Gaussier

This chapter presents the fundamental concepts of Information Retrieval (IR) and shows how this domain is related to various aspects of NLP. After explaining some of the underlying and often hidden assumptions and problems of IR, we present the notion of indexing. Indexing is the cornerstone of various classical IR paradigms (Boolean, vector-space, and probabilistic) which we introduce together with some insights to advanced search strategies used on the Web, such as PageRank. The IR community relies on a strong empirical tradition and we present the basic notions of IR evaluation methodology and show, with concrete examples, why some topic formulations can be hard even with the most advanced search strategies. Various NLP techniques can be used to, at least partially, improve the retrieval performance of IR models. We devote a section of this chapter to an overview of these techniques.

Bibtex Citation

    author = {Jacques Savoy and Eric Gaussier},
    title = {Information Retrieval},
    booktitle = {Handbook of Natural Language Processing, Second Edition},
    editor = {Nitin Indurkhya and Fred J. Damerau},
    publisher = {CRC Press, Taylor and Francis Group},
    address = {Boca Raton, FL},
    year = {2010},
    note = {ISBN 978-1420085921}

List of pointers to organizations, conferences and resources

This page complements the chapter on Information Retrieval of the Hanbook for Natural Language Processing, and provides a list of important journals, conferences and organizations of the domain. It also contains pointers to different resources (stopword lists, stemmers, collections, and open-source search engines).

Scientific journals, conferences and organizations

  • ACM-SIGIR, annual conference
  • ECIR (European Conference on Information Retrieval), annual conference
  • ACM-CIKM (Conference on Information and Knowledge Managegment), annual conference

Evaluation campaigns and test-collections

Open-source search engines

Other useful pointers