Project

General

Profile

Task #2371

Potential KPs ('immutable' within 'immutable sequence') that are part BRIR need to be given higher score

Added by Ram Kordale over 2 years ago. Updated over 2 years ago.

Status:
New
Priority:
Low
Target version:
Start date:
05/24/2022
Due date:
% Done:

0%

Estimated time:

Description

In Python 3 Doc Library Reference > Built-in Types > Text Sequence Type — str, 'immutable' should have score >0.9 since it is actually part of the string "immutable sequence". However, it looks like we are not including "sequence" because it is BRIR. We have a variant immutable sequence but sequence is BR tag that's why only immutable is tagged. Given that it's fullness ratio is 0.333, it also has a very low sim score.

ideally, we should consider match with 'immutable sequence', give the corresponding high score (because of better fullness ratio) and then make 'immutable' as the KP.

Another example is 'common' in 'common sequence operations' in Python 3 Doc Library Reference > Built-in Types > Text Sequence Type — str > String Methods.

#1

Updated by Ram Kordale over 2 years ago

  • Subject changed from Potential KPs that are part BRIR need to be given higher score to Potential KPs ('immutable' within 'immutable sequence') that are part BRIR need to be given higher score
#2

Updated by Ram Kordale over 2 years ago

  • Description updated (diff)
#3

Updated by Ram Kordale over 2 years ago

  • Assignee set to Rohit Choudhary
#4

Updated by Rohit Choudhary over 2 years ago

Current Investigation:
For doc_id 37 we have words in first few lines in cumulative_noun_adj_sect1_keyword.csv
Preprocessed text : ". textual data in python is handled with %# objects, or strings . strings are immutable %# of unicode code points. string literals are written in a variety of ways: single quotes: "

original text : "text sequence type — str . textual data in python is handled with str objects, or strings . strings are immutable sequences of unicode code points. string literals are written in a variety of ways: single quotes:

The BR tags are converted into "%#" and Preprocessed text are used to search for candidate, doc_id 37 has only word "immutable" because current code do not track "%#" words original form.

#5

Updated by Rohit Choudhary over 2 years ago

Rohit Choudhary wrote in #note-4:

Current Investigation :
For doc_id 37 we have words in first few lines in cumulative_noun_adj_sect1_keyword.csv
Preprocessed text : ". textual data in python is handled with %# objects, or strings . strings are immutable %# of unicode code points. string literals are written in a variety of ways: single quotes: "

original text : "text sequence type — str . textual data in python is handled with str objects, or strings . strings are immutable sequences of unicode code points. string literals are written in a variety of ways: single quotes:

The BR tags are converted into "%#" and Preprocessed text are used to search for candidate, doc_id 37 has only word "immutable" because current code do not track "%#" words original form.

#6

Updated by Ram Kordale over 2 years ago

  • Priority changed from Normal to Low

Also available in: Atom PDF