Natural Language Processing

Apache OpenNLP – machine learning based toolkit

The Apache OpenNLP library is an open source machine learning based toolkit for the processing of natural language text.

It includes a sentence detector, a tokenizer, a name finder, a parts-of-speech (POS) tagger, a chunker, and a parser. It has proficient APIs that can be easily integrated with a Java program.

The goal of the OpenNLP project will be to create a mature toolkit. An additional goal is to provide a large number of pre-built models for a variety of languages, as well as the annotated text resources that those models are derived from.

Key Features

  • Tokenization. OpenNLP offers multiple tokenizer implementations:
    • Whitespace Tokenizer – A whitespace tokenizer, non whitespace sequences are identified as tokens.
    • Simple Tokenizer – A character class tokenizer, sequences of the same character class are tokens.
    • Learnable Tokenizer – A maximum entropy tokenizer, detects token boundaries based on probability model.
  • Sentence segmentation.
  • Part-of-speech tagging – marks tokens with their corresponding word type based on the token itself and the context of the token.
  • Named entity extraction – the Name Finder can detect named entities and numbers in text.
  • Chunking – consists of dividing a text in syntactically correlated parts of words, like noun groups, verb groups, but does not specify their internal structure, nor their role in the main sentence.
  • Parsing – offers two different parser implementations, the chunking parser and the treeinsert parser. OpenNLP has a command line tool which is used to train the models available from the model download page on various corpora.
  • Coreference resolution – links multiple mentions of an entity in a document together. The OpenNLP implementation is currently limited to noun phrase mentions, other mention types cannot be resolved.
  • Maximum entropy.
  • Perceptron based machine learning.

Website: opennlp.apache.org
Support: Documentation, GitHub
Developer: The Apache Software Foundation
License: Apache License Version 2.0

Apache OpenNLP is written in Java. Learn Java with our recommended free books and free tutorials.


Related Software

Natural Language Processing
PyTorch-TransformersLibrary of state-of-the-art pre-trained models
Natural Language ToolkitSuite of open source Python modules, data sets and tutorials
Stanford CoreNLPExtensible annotation-based NLP pipeline
spaCyIndustrial strength natural language processing
scikit-learnMachine learning library for Python
GensimPython-based vector space modeling and topic modeling toolkit
flairSimple framework for state-of-the-art NLP
Apache OpenNLPMachine learning based toolkit
DL4JDeploy and train deep learning models
Apache LuceneFull-featured information retrieval software library
UIMAImplementation of the UIMA specification
tidytextText mining using dplyr, ggplot2, and other tidy tools
text2vecFramework with API for text analysis and NLP
quantedaR package for Quantitative Analysis of Textual Data
MosesStatistical machine translation system

Read our verdict in the software roundup.

Java Natural Language Processing Tools
CoreNLPAnnotation-based NLP pipeline that provides core natural language analysis
OpenNLPMachine learning based toolkit
DL4JDeploy and train deep learning models
LuceneHigh-performance, full-featured information retrieval software library
UIMAOpen source implementation of the UIMA specification
TikaContent analysis toolkit
MALLETStatistical natural language processing, document classification and more
CogComp-NLPState-of-the-art Natural Language Processing (NLP) tools
ReVerb Automatically identifies and extracts binary relationships from sentences
NLP4JNLP framework for JVM languages
GATEFull-lifecycle solution for a broad range of NLP tasks

Read our verdict in the software roundup.


Best Free and Open Source Software Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.

This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk.

You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more.

Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form.
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments