Project

General

Profile

Task #1522

Changes to ensure that singular and plural forms of key phrases are also checked in 20K CW in extract_single_uncommon_words method

Added by Nandini Bansal over 3 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
08/20/2021
Due date:
% Done:

100%

Estimated time:
2.50 h

Description

In extract_single_uncommon_words from BR3_IR3_tagger.py, all the single word keyphrases pass through the 20K common word filter (only those which are not present in 20K are allowed to be added as a header variant).
Now, there are some cases where the word is present in the 20K common word list but its singular or plural form is not present.

For example "datatype" is present in the 20K common word list but "datatypes" is not present in it. Using the "*singularize*" method from common_tagging_functions.py and the "*pluralize*" method from the "*pattern.text.en*" library, add an additional condition in extract_single_uncommon_words method.

Import statement: from pattern.text.en import pluralize

Test the changes with the following datasets:
1. Python Whirlwind Tour.txt
2. Python Tutorial.txt
3. Python 3 - Library Reference.txt

Also available in: Atom PDF