
This paper describes the KASYS team's participation in the NTCIR-18 SUSHI Task by presenting a multi-level metadata aggregation and retrieval approach for Subtask A, which focuses on retrieving undigitized historical materials with sparse item-level metadata. Our system leverages the hierarchical organization of the data---comprising Box, Folder, and Item levels---by aggregating metadata from lower to higher levels and applying two search strategies (``Merge'' and ``Each''). We evaluate traditional BM25 alongside dense retrieval models (E5 and ColBERT) without fine-tuning, and hyperparameter optimization using Optuna is employed to determine the optimal weight for each level. Although our multi-level score aggregation strategy was designed to exploit the hierarchical structure of the data, it did not yield a significant performance improvement over a simpler BM25 baseline. Future work will explore improved preprocessing of noisy metadata, hybrid retrieval methods combining BM25 with dense re-ranking, and model fine-tuning to further enhance performance in searching undigitized archival collections.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
