Bug #1861
Finding the reason of low scores hyphenated KPs
Start date:
12/08/2021
Due date:
% Done:
0%
Estimated time:
3.00 h
Description
In MR https://gitlab.com/kordale/rk-ai-3/-/merge_requests/839 & https://gitlab.com/kordale/rk-ai-3/-/merge_requests/843, we worked on a problem of matching KPs with "-" (hyphen) with header variants that do not have hyphens. Just came across this issue in the "Numpy User Guide" book where a KP is being matched perfectly but the scores are below 0.5.
KP = "c-order"
Header Variant = "c order"
We need to investigate the reason for the low scores and fix it if possible.
Steps of investigation:
1) Identify where the score is reducing (BR3_IR3_tagger.py or tagging_utils.py)
2) Identify the reason for reduction (CW filter/POS tags/context matching)
3) Brainstorm the solution.
This ticket is primarily for steps (1) & (2).
Files
Updated by Nandini Bansal almost 3 years ago
- Assignee set to Anonymous
- Start date changed from 11/09/2021 to 12/08/2021