Project

General

Profile

Feature #1615

Add the str_between of variation_middle_parenthesis to processed_full_header list

Added by Nandini Bansal about 3 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Target version:
Start date:
09/06/2021
Due date:
% Done:

0%

Estimated time:
4.00 h

Description

In the C-API book, I have seen cases where the string extracted by the variation_middle_parenthesis is being tagged with correct context but the assigned score is too less due to the length of the header variant.
For e.g.
tls -> thread local storage (tls) api -> 0.81
tss -> thread specific storage (tss) api -> 0.81

We need to make sure that these KPs are assigned a higher similarity score. This will be possible if the str_between is added to the processed_full_headers list and assigned a fullness_ratio of 1.0. The string should be a single word and not within the 12K unstemmed CW list (feel free to tweak this CW threshold to include/exclude desirable cases: we'll discuss the cases for which you wish to tweak the threshold).

Cases like (de)compression of files should be left untouched. Only for those cases where the parenthesized string is surrounded by spaces.

Find the str_between which shall be added to processed_full_headers list for Whirlwind, Tutorial, C-API & Library Reference.
Based on the strings extracted from the header, test the final changes with C-API and Library Reference with ADR.

#1

Updated by Rohit Choudhary about 3 years ago

  • Status changed from New to Resolved
#2

Updated by Nandini Bansal about 3 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF