Getting Started with Stanford CoreNLP: Tokenization, POS, NER, and Parsing

This guide introduces Stanford CoreNLP, a Python interface for fundamental NLP tasks such as tokenization, part‑of‑speech tagging, named‑entity recognition, constituency and dependency parsing, showing installation steps, model download links, and example outputs.

Lisa Notes
Lisa Notes
Lisa Notes
Getting Started with Stanford CoreNLP: Tokenization, POS, NER, and Parsing

Natural Language Processing (NLP) aims to enable computers to understand, process, and generate human language. Stanford CoreNLP offers a comprehensive set of language processing tools, and the stanfordcorenlp package provides a Python interface to these tools.

Installation

Install the Python wrapper with: pip install stanfordcorenlp Download the required language models from the official site:

Model download URL: https://nlp.stanford.edu/software/corenlp-backup-download.html

Core Functions

Stanford CoreNLP supports several fundamental NLP operations:

Tokenization (分词)

Part‑of‑Speech tagging (词性标注)

Named Entity Recognition (命名实体识别)

Constituency parsing (句法成分分析)

Dependency parsing (依存句法分析)

1. Tokenization

2. Part‑of‑Speech Tagging

3. Named Entity Recognition

Named Entities: [('我爱', 'O'), ('自然', 'O'), ('语言', 'O'), ('处理', 'O'), ('技术', 'O'), ('!', 'O')]
Named Entities: [('I', 'O'), ('love', 'O'), ('natural', 'O'), ('language', 'O'), ('processing', 'O'), ('technology', 'O'), ('!', 'O')]

4. Constituency Parsing

5. Dependency Parsing

Dependency: [('ROOT', 0, 4), ('nsubj', 4, 1), ('advmod', 4, 2), ('nsubj', 4, 3), ('dobj', 4, 5), ('punct', 4, 6)]
Dependency: [('ROOT', 0, 2), ('nsubj', 2, 1), ('amod', 6, 3), ('compound', 6, 4), ('compound', 6, 5), ('dobj', 2, 6), ('punct', 2, 7)]

The article also provides the official website (https://stanfordnlp.github.io/CoreNLP/) and the GitHub repository (https://github.com/stanfordnlp/CoreNLP) for further reference.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonParsingTokenizationNLPNamed Entity RecognitionPOS taggingStanford CoreNLP
Lisa Notes
Written by

Lisa Notes

Lisa's notes: musings on daily life, work, study, personal growth, and casual reflections.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.