Project

General

Profile

Feature #1544

Changes to match a token/word with hyphen with a header variant which does not have a hyphen but is exactly same

Added by Nandini Bansal over 3 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
08/24/2021
Due date:
% Done:

100%

Estimated time:
4.00 h

Description

This is a new feature we would like to add to our pipeline. We have observed a case where a token in the Whirlwind text file is "control-flow" and we have an exactly matching header "control flow" but they are not matched due to the presence/absence of hyphen.
We need to make changes to ensure that they are matched with high confidence.

The changes are to be made in the "get_candidates_for_variant" function inside the nested loops. We will have to add a condition that checks whether the current token contains a hyphen after a word i.e. follows the format "word1-word2" and "word1" is the same as the first word of header variant.

E.g.
the token is "word1-word2" and the header variant is "word3 word4 word5".
we will check if "word1 == word3", if it is, we will process the KP such that it becomes "word1 word2" and call "candidate_variant_distance_calculation" function for it.

Please note we will have to make changes in the candidate_variant_distance_calculation function such that for distance calculation, we will use "word1 word2" but while saving the candidates, we will use "word1-word2" format.


Files

Also available in: Atom PDF