Recommender systems are an integral part of many web applica-
tions. With increasingly larger user bases, scalability has become an
important issue. Many of the most scalable algorithms with respect
to both space and running times are based on locality-sensitive
hashing (LSH). However, a significant drawback is that these meth-
ods are only able to handle insertions to user profiles and tend to
perform poorly when items may be removed. We initiate the study
of scalable locality-sensitive hashing for dynamic input. Specifi-
cally, using the Jaccard index as similarity measure, we design (1)
a sketching algorithm for similarity estimation via a black box re-
duction to ℓ0 norm estimation and (2) a locality sensitive hashing
scheme maintainable in fully dynamic data streams that quickly
filters out low-similarity pairs. Our algorithms have little to no
overhead in terms of running time compared to previous LSH ap-
proaches for the insertion only case, and drastically outperform
previous algorithms in case of deletions
Dettaglio pubblicazione
2018, Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, Marina Del Rey, CA, USA, February 5-9, 2018, Pages 72-80
Sketch 'Em All: Fast Approximate Similarity Search for Dynamic Data Streams (04b Atto di convegno in volume)
Bury Marc, Schwiegelshohn CHRIS RENE, Sorella Mara
ISBN: 9781450355810
Gruppo di ricerca: Algorithms and Data Science
keywords