Project

General

Profile

Task #3052

Sophisticated deduplication for getPurpleLinksInfoExtFull API results

Added by Ram Kordale over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
High
Assignee:
Target version:
Start date:
05/30/2023
Due date:
% Done:

100%

Estimated time:

Description

Today, we do not send exact duplicates. Going forward, mark the following also duplicates

- 'defining a function', 'define function'
- 'threads', 'threading'
- 'migrate', 'migration'
- 'fetch the data', 'fetch data'
- 'socket program', 'socket programming'
- 'installing anaconda', 'install anaconda'

Example URL that contains the above is https://www.youtube.com/watch?v=8O5kX73OkIY&list=PLsyeobzWxl7poL9JTVyndKe62ieoN-MZ3&index=54.

other than (a) adding articles like a, the in between words and checking, (b) it involves comparing stems. This and the (c) plurals work could use some java nlp library. you could pick the simplest from the list at https://xperti.io/blogs/java-natural-language-processing/ ?

Also available in: Atom PDF