Task #1521
Changes to ensure that singular and plural forms of key phrases are also checked in 20K CW in extract_single_uncommon_words method
0%
Description
In extract_single_uncommon_words from BR3_IR3_tagger.py, all the single word keyphrases pass through the 20K common word filter (only those which are not present in 20K are allowed to be added as a header variant).
Now, there are some cases where the word is present in the 20K common word list but its singular or plural form is not present.
For example "datatype" is present in the 20K common word list but "datatypes" is not present in it. Using the "*singularize*" method from common_tagging_functions.py and the "*pluralize*" method from the "*pattern.text.en*" library, add an additional condition in extract_single_uncommon_words method.
Import statement: from pattern.text.en import pluralize
Test the changes with the following datasets:
1. Python Whirlwind Tour.txt
2. Python Tutorial.txt
3. Python 3 - Library Reference.txt