
StentorLabs is proud to announce the publication of our research paper, “A Case Study in Niche Retrieval: Evaluating LLM Deep-Research Systems on a Newly Created Low-Signal Hugging Face Profile (StentorLabs)” by Kai Izumoto. This February 2026 self-experiment systematically tested nine leading LLM-powered deep-research systems on their ability to locate and correctly evaluate an independent hobbyist creator matching strict criteria: a 2026-established profile with at least four sub-35M-parameter models, exhaustive documentation, and zero corporate or academic ties. The full reproducibility dataset—containing 2,819 rows of raw search outputs, candidate analyses, verbatim system responses, research prompts, failure-mode taxonomy, and JSONL summaries—is now openly available on Hugging Face as StentorLabs/niche-retrieval-case-study-feb2026. By releasing both the paper (in Markdown and PDF) and dataset under open terms, StentorLabs continues its mission of radical transparency in small-model research while exposing real-world bottlenecks in agentic search, temporal filtering, and low-signal verification on platforms like Hugging Face. Results show only a 22 % success rate across systems, with clear patterns of access-limited abandonment, hallucinated affiliations, and documentation-quality misjudgments—offering actionable insights for next-generation AI research tools. We invite researchers, developers, and enthusiasts to download the resources, run their own evaluations, and help advance more reliable, hallucination-resistant deep-research agents in the open-source ecosystem.
small language models, niche retrieval, Hugging Face, hallucination analysis, agentic search, Machine Learning, low-signal detection, AI transparency, Artificial Intelligence, AI, deep-research agents, Information Retrieval, LLM evaluation, information retrieval, reproducibility, Natural Language Processing
small language models, niche retrieval, Hugging Face, hallucination analysis, agentic search, Machine Learning, low-signal detection, AI transparency, Artificial Intelligence, AI, deep-research agents, Information Retrieval, LLM evaluation, information retrieval, reproducibility, Natural Language Processing
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
