software . 2019

ekzhu/datasketch: Add Cassandra storage layer.

Zhu, Eric; Markovtsev, Vadim; Aastafiev; Ae-Foster; Łukasiewicz, Wojciech; Fpug; Zac Bentley; Letal, Vojtech; Titusz; Spandan Thakur; ...
Open Source
  • Published: 26 Nov 2019
  • Publisher: Zenodo
Abstract
Performance improvement for MinHash's update method. Make MinHash updates 4.5X faster by using <code>update_batch</code> method for bulk update on MinHash. [See API doc].(http://ekzhu.com/datasketch/documentation.html#datasketch.MinHash.update_batch) Further performance gain by using bulk generation of MinHash using <code>MinHash.bulk</code> or <code>MinHash.generator</code>. See API doc and pull request. Optional compression for MinHash LSH index by hashing the bucket key produced by <code>MinHashLSH._H</code>. See pull request. This leads to saving of memory/storage space used by the index. Thank you @Sinusoidal36!
Download fromView all 3 versions
Zenodo
Software . 2019
Provider: Datacite
Zenodo
Software . 2019
Provider: Datacite
Zenodo
Software . 2020
Provider: Datacite
Zenodo
Software . 2019
Provider: Datacite
Any information missing or wrong?Report an Issue