Bug #1755
Updated by Nandini Bansal about 3 years ago
There are cases where word count of KP > word count of header variant and the uncommon word in the KP is VERB. The POS tag of the KP is to be found using Spacy. We need to increase the penalty of such cases to 0.5 as they are quite horrible looking KPs that shouldn't be tagged in the final annotated file. E.g. KP | Header Variant 1) modules provide interfaces | module interface [('modules', 'NOUN'), ('provide', 'VERB'), ('interfaces', 'NOUN')] [('module', 'NOUN'), ('interface', 'NOUN')] 2) module contains functions | module functions [('module', 'NOUN'), ('contains', 'VERB'), ('functions', 'NOUN')] [('module', 'NOUN'), ('functions', 'NOUN')] 3) python validates bytecode | python bytecode [('python', 'PROPN'), ('validates', 'VERB'), ('bytecode', 'VERB')] [('python', 'PROPN'), ('bytecode', 'PROPN')] The KPs are from the Library Reference book. Recommended to create a dummy dataset.