Terrier is billed as a highly flexible, efficient, and effective open source search engine, readily deployable on large-scale collections of documents. Terrier implements state-of-the-art indexing and retrieval functionalities, and provides an ideal platform for the rapid development and evaluation of large-scale retrieval applications.
Terrier follows a plugin architecture, and is easy to extend to develop new retrieval techniques, add new ranking features or experiment with low-level functionality such as index compression.
It’s written in the Java programming language, and therefore runs on all main operating systems.
Features include:
- Indexing support for common desktop file formats, and for commonly used TREC research collections (e.g. TREC CDs 1-5, WT2G, WT10G, GOV, GOV2, Blogs06, Blog08, ClueWeb09, ClueWeb12).
- Many document weighting models, such as many parameter-free Divergence from
- Randomness weighting models, Okapi BM25 and language modelling.
- Supervised (machine learned) ranking models are supported via learning to rank.
- Conventional query language supported, including phrases, and terms occurring in tags.
- Handling full-text indexing of large-scale document collections, in a centralised architecture to at least 50 million documents, and using the Hadoop MapReduce distributed indexing scheme for even larger collections.
- Incremental indexing and retrieval capabilities to support real-time search
- Modular and open indexing and querying APIs, to allow easy extension for your own applications and research.
- Active Information Retrieval research fed into the Open Source platform.
- Indexing:
- Out-of-the box indexing of tagged document collections, such as the TREC test collections.
- Out-of-the box indexing for documents of various formats, such as HTML, PDF, or Microsoft Word, Excel and PowerPoint files.
- Out-of-the box support for distributed indexing in a Hadoop MapReduce setting.
- Indexing of field information, such as the frequency of a term in a TITLE or H1 HTML tag.
- Indexing of position information on a word, or a block (e.g. a window of terms within a distance) level.
- Support for various encodings of documents (UTF), to facilitate multi-lingual retrieval.
- Support for changing the tokenisation being used.
- Updatable indices to support real-time search
- Indexing support for query-biased summarisation.
- Support for fetching files to index by HTTP, allowing intranets to be easily searched.
- Highly compressed index disk data structures with built-in pluggable compression algorithms.
- Highly compressed direct file for efficient query expansion.
- Alternative faster single-pass and MapReduce based indexing.
- Various stemming techniques supported, including the Snowball stemmer for European languages.
- Retrieval:
- Provides desktop, command-line and Web based querying interfaces.
- Provides standard querying facilities, as well as Query Expansion (pseudo-relevance feedback).
- Can be applied in interactive applications, such as the included Desktop Search, or in a batch setting for research and experimentation.
- Provides many standard document weighting models, including up to 126 Divergence From Randomness (DFR) document ranking models, and other models such as Okapi BM25, language modelling and TF-IDF. Two new 2nd generation DFR weighting model, JsKLs and XSqrA_M, are also included, which provide robust performance on a range of test collections without the need for any parameter tuning or training.
- Advanced query language that supports synonyms, +/- operators, phrase and proximity search, and fields.
- Learning-to-rank support enables out-of-the-box supervised ranking models.
- Provides a number of parameter-free DFR term weighting models for automatic query expansion, in addition to Rocchio’s query expansion.
- Flexible processing of terms through a pipeline of components, such as stopword removers and stemmers.
Website: terrier.org
Support: Documentation, GitHub Code Repository
Developer: School of Computing Science, University of Glasgow
License: Mozilla Public Licence
Learn Java with our recommended free books and free tutorials.
Return to Desktop Search Engines
Popular series | |
---|---|
The largest compilation of the best free and open source software in the universe. Each article is supplied with a legendary ratings chart helping you to make informed decisions. | |
Hundreds of in-depth reviews offering our unbiased and expert opinion on software. We offer helpful and impartial information. | |
The Big List of Active Linux Distros is a large compilation of actively developed Linux distributions. | |
Replace proprietary software with open source alternatives: Google, Microsoft, Apple, Adobe, IBM, Autodesk, Oracle, Atlassian, Corel, Cisco, Intuit, and SAS. | |
Awesome Free Linux Games Tools showcases a series of tools that making gaming on Linux a more pleasurable experience. This is a new series. | |
Machine Learning explores practical applications of machine learning and deep learning from a Linux perspective. We've written reviews of more than 40 self-hosted apps. All are free and open source. | |
New to Linux? Read our Linux for Starters series. We start right at the basics and teach you everything you need to know to get started with Linux. | |
Alternatives to popular CLI tools showcases essential tools that are modern replacements for core Linux utilities. | |
Essential Linux system tools focuses on small, indispensable utilities, useful for system administrators as well as regular users. | |
Linux utilities to maximise your productivity. Small, indispensable tools, useful for anyone running a Linux machine. | |
Surveys popular streaming services from a Linux perspective: Amazon Music Unlimited, Myuzi, Spotify, Deezer, Tidal. | |
Saving Money with Linux looks at how you can reduce your energy bills running Linux. | |
Home computers became commonplace in the 1980s. Emulate home computers including the Commodore 64, Amiga, Atari ST, ZX81, Amstrad CPC, and ZX Spectrum. | |
Now and Then examines how promising open source software fared over the years. It can be a bumpy ride. | |
Linux at Home looks at a range of home activities where Linux can play its part, making the most of our time at home, keeping active and engaged. | |
Linux Candy reveals the lighter side of Linux. Have some fun and escape from the daily drudgery. | |
Getting Started with Docker helps you master Docker, a set of platform as a service products that delivers software in packages called containers. | |
Best Free Android Apps. We showcase free Android apps that are definitely worth downloading. There's a strict eligibility criteria for inclusion in this series. | |
These best free books accelerate your learning of every programming language. Learn a new language today! | |
These free tutorials offer the perfect tonic to our free programming books series. | |
Linux Around The World showcases usergroups that are relevant to Linux enthusiasts. Great ways to meet up with fellow enthusiasts. | |
Stars and Stripes is an occasional series looking at the impact of Linux in the USA. |