Task #3052
Sophisticated deduplication for getPurpleLinksInfoExtFull API results
100%
Description
Today, we do not send exact duplicates. Going forward, mark the following also duplicates
- 'defining a function', 'define function'
- 'threads', 'threading'
- 'migrate', 'migration'
- 'fetch the data', 'fetch data'
- 'socket program', 'socket programming'
- 'installing anaconda', 'install anaconda'
Example URL that contains the above is https://www.youtube.com/watch?v=8O5kX73OkIY&list=PLsyeobzWxl7poL9JTVyndKe62ieoN-MZ3&index=54.
other than (a) adding articles like a, the in between words and checking, (b) it involves comparing stems. This and the (c) plurals work could use some java nlp library. you could pick the simplest from the list at https://xperti.io/blogs/java-natural-language-processing/ ?
Updated by Venmuhilan B over 1 year ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Documenting what was done for 3052 - https://docs.google.com/document/d/1r_FSCpszLa0NC6M9VC25OIHC5Oryo7YGUmReTsaOlEo/edit#heading=h.7quivb43tnmp