Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ NTNU Openarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
NTNU Open
Master thesis . 2024
Data sources: NTNU Open
addClaim

Replacing Elasticsearch in a Data Analytics Platform

Authors: Mørkrid, Hermann;

Replacing Elasticsearch in a Data Analytics Platform

Abstract

Denne avhandlingen sammenligner to analytiske databaser, Elasticsearch og ClickHouse, til formålet av å bygge en generisk plattform for dataanalyse. Vi gjør et dypdykk i hvordan Elasticsearch er designet, og hvordan databasen brukes i Ignite, en plattform for å analysere innkjøpsdata. Avhandlingen presenterer et sett med utfordringer som Ignite har støtt på i deres erfaring med Elasticsearch, og utforsker potensielle alternativer. ClickHouse velges som database til å sammenligne videre. Vi setter så opp et eksperiment for å sammenligne de to databasene, og implementerer en generisk HTTP-tjeneste som et abstraksjonslag for å sammenligne dem likt. Et sett med ytelsesmålinger blir så utført for denne tjenesten, og finner at ClickHouse-databasen utpresterer Elasticsearch i datainntak, men at den yter verre i utføring av vår spesifikke spørring. I tillegg presenteres et sett med kvalitative funn, som beskriver utfordringen med å oppnå nøyaktighet i resultater fra Elasticsearch, og utfordringen med å forene objekt-orienterte modeller med den kolonne-orienterte strukturen til ClickHouse. Avhandlingen konkluderer med at ClickHouse er et levedyktig alternativ til Elasticsearch for en generisk dataanalyse-plattform, men at de blandede resultatene og begrensningene i eksperimentet gjør det til et ikke åpenbart valg.

This thesis compares two analytical databases, Elasticsearch and ClickHouse, for the use case of building a generic data analytics platform. We delve into the design of Elasticsearch, and how it is used in the case of Ignite, a platform for analyzing procurement data. The thesis presents a set of challenges faced by Ignite in their use of Elasticsearch, and then explore potential alternatives to it, choosing ClickHouse as the database to compare further. We then set up an experiment to compare the two databases, implementing a generic HTTP service as an abstraction layer to compare them equally. A set of benchmarks are performed for this service, finding that ClickHouse outperforms Elasticsearch in data ingestion, but that it performs worse at our specific query execution, though this finding has its limitations. In addition, a set of qualitative findings are presented, describing the challenge of achieving correctness in results from Elasticsearch, and the issue of "object-columnar impedance mismatch" for ClickHouse. The thesis concludes that ClickHouse is a viable alternative to Elasticsearch for the use case of a generic data analytics platform, but that the mixed results and limitations of the experiment make it not the obvious choice.

Country
Norway
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green