Python

Lemmatization In Python

Lemmatization In Python

Introduction

Lemmatisation main purpose is to remove the inflectional endings only and to return the base word or the dictionary form of a word. This base word/dictionary word is called lemma. Lemmatisation is basically the process of grouping together different forms of word into one word of similar meaning.

Example:

lemma of running, runs, ran is run. It does morphological analysis of words. Stemming and lemmatisation are different and lemmatisation is most preferred.

Lemmatisation is preferred in search engine optimization techniques, compact indexing.

The main difference between stemming and lemmatisation is, lemmatisation takes a part of speech as a parameter named as "pos". If it is not mentioned the default value of "pos" is noun.

For a better understanding of an example of lemmatisation the below code is used:

# import the modules required
from nltk.stem import WordNetLemmatizer

# initialise the object of WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
 
print("running :", lemmatizer.lemmatize("running"))
print("runs :", lemmatizer.lemmatize("runs"))
 
# here "a" denotes adjective in "pos"
print("faster :", lemmatizer.lemmatize("faster", pos ="a"))

Output:

running : run
runs : run
worse : bad

write your code here: Coding Playground

Conclusion

Lemmatisation is the process of grouping together different forms of words into one dictionary form of word of same meaning. This base or dictionary word is called lemma. This is more useful during search engine optimisation techniques, compact indexing.