I would like to analyze large amounts of text for the contained vocabulary. Therefore, I'd like a tool that recognizes all sorts of shapes of words and connects them back to the basic word, so that they are only counted once. For example the words "counting", "count", "counted", "counts" would all be recognized as "count" and... counted only once. Is there some framework with the appropriate databases that can do that sort of thing, preferably an easy-to-use one?
12/25/2021 7:31:57 PM
HonFu10 Answers
New AnswerSo you have a text and want to extract word stems out of it (its sentences)? Did you try nltk (Python )? It should enable you to do something like that for English at least...
I've never used it myself, but this looks like it might do what you're looking for: https://machinelearningknowledge.ai/learn-lemmatization-in-ntlk-with-examples/
Hm, cool, that does look like the general thing I need... Would be great if it worked for other languages, foremost German and Japanese. Thanks, I'll check that out!
HonFu sir , why don't you check out medium . It really have good answers regarding this . 😇🤓😅
Sololearn Inc.
535 Mission Street, Suite 1591Send us a message