StentorLabs Niche Retrieval Case Study

StentorLabs is proud to announce the publication of our research paper, “A Case Study in Niche Retrieval: Evaluating LLM Deep-Research Systems on a Newly Created Low-Signal Hugging Face Profile (StentorLabs)” by Kai Izumoto. This February 2026 self-experiment systematically tested nine leading LLM-powered deep-research systems on their ability to locate and correctly evaluate an independent hobbyist creator matching strict criteria: a 2026-established profile with at least four sub-35M-parameter models, exhaustive documentation, and zero corporate or academic ties. The full reproducibility dataset—containing 2,819 rows of raw search outputs, candidate analyses, verbatim system responses, research prompts, failure-mode taxonomy, and JSONL summaries—is now openly available on Hugging Face as StentorLabs/niche-retrieval-case-study-feb2026. By releasing both the paper (in Markdown and PDF) and dataset under open terms, StentorLabs continues its mission of radical transparency in small-model research while exposing real-world bottlenecks in agentic search, temporal filtering, and low-signal verification on platforms like Hugging Face. Results show only a 22 % success rate across systems, with clear patterns of access-limited abandonment, hallucinated affiliations, and documentation-quality misjudgments—offering actionable insights for next-generation AI research tools. We invite researchers, developers, and enthusiasts to download the resources, run their own evaluations, and help advance more reliable, hallucination-resistant deep-research agents in the open-source ecosystem.

Keywords

small language models, niche retrieval, Hugging Face, hallucination analysis, agentic search, Machine Learning, low-signal detection, AI transparency, Artificial Intelligence, AI, deep-research agents, Information Retrieval, LLM evaluation, information retrieval, reproducibility, Natural Language Processing

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Related to Research communities

Digital Humanities and Cultural Heritage

Knowmad Institut

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now