Project

General

Profile

Task #3052

Sophisticated deduplication for getPurpleLinksInfoExtFull API results

Added by Ram Kordale over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
High
Assignee:
Target version:
Start date:
05/30/2023
Due date:
% Done:

100%

Estimated time:

Description

Today, we do not send exact duplicates. Going forward, mark the following also duplicates

- 'defining a function', 'define function'
- 'threads', 'threading'
- 'migrate', 'migration'
- 'fetch the data', 'fetch data'
- 'socket program', 'socket programming'
- 'installing anaconda', 'install anaconda'

Example URL that contains the above is https://www.youtube.com/watch?v=8O5kX73OkIY&list=PLsyeobzWxl7poL9JTVyndKe62ieoN-MZ3&index=54.

other than (a) adding articles like a, the in between words and checking, (b) it involves comparing stems. This and the (c) plurals work could use some java nlp library. you could pick the simplest from the list at https://xperti.io/blogs/java-natural-language-processing/ ?

#1

Updated by Ram Kordale over 1 year ago

  • Priority changed from Urgent to High
#2

Updated by Venmuhilan B over 1 year ago

  • Status changed from New to In Progress
#3

Updated by Venmuhilan B over 1 year ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100
#4

Updated by Ram Kordale over 1 year ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF