Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Article . 2026
License: CC BY
Data sources: Datacite
ZENODO
Article . 2026
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

ENHANCING CONTENT RETRIEVAL WITH BIG DATA AND NATURAL LANGUAGE PROCESSING FOR SCALABLE AND SEMANTIC SEARCH SYSTEMS

Authors: S SUJANTHI , Dr. MULUMUDI SUNEETHA , NARASIMHA RAO THOTA , N SRIHARI RAO , CHITNEEDI KASI VISWANADHAM , Dr. B HEMANTHA KUMAR , P S V S SRIDHAR;

ENHANCING CONTENT RETRIEVAL WITH BIG DATA AND NATURAL LANGUAGE PROCESSING FOR SCALABLE AND SEMANTIC SEARCH SYSTEMS

Abstract

Content search and retrieval systems are required to be more efficient due to the data's high volume and complexity. This paper presents a new way to combine Big Data techniques with high-end Natural Language Processing (NLP) models to improve the search procedure's accuracy, relevance, and scalability. We aim to build a system that effectively uses distributed Big Data infrastructure for data processing and cutting-edge NLP models for semantic query interpretation. We evaluate the system over three datasets: Common Crawl (web content), Medical Text Mining, and Amazon Product Reviews, and compare to traditional keyword-based search and TF‐IDF and Word2Vec‐based approaches. The experimental results show that our system achieves better precision, recall, F1-score, and Mean Average Precision (MAP) than previous works at a reasonable query response time. The combination of Big Data and NLP results was much more relevant and contextually aware. This work is a big step toward better content search in many application domains; it makes more accurate and efficient retrieval possible and proposes a personal search experience. The proposed integration of Big Data infrastructure with advanced NLP models enables scalable and semantically rich retrieval, addressing key limitations of existing keyword-centric and shallow semantic search systems.

Keywords

Big Data, Natural Language Processing, Content Search, Semantic Search, Precision, Information Retrieval

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!