
Dataset Description The ZJU-o dataset is a large-scale paired dataset aligning real-world lithium-ion battery (LIB) operational time-series data with structured, human-readable reports. It is designed to support explainable battery management and to enable large language models (LLMs) to perform battery operation and maintenance (O&M) tasks—such as condition assessment, anomaly detection, and decision support—without specialized architecture or retraining. In the accompanying work, we propose TimeSeries2Report (TS2R), an automated signal-to-semantics-to-report pipeline that translates quantitative multivariate time-series signals into standardized textual reports. These reports can be directly integrated into off-the-shelf LLMs through prompt engineering. The dataset is collected from a real-world LIB energy storage system, involving 28 LIB modules, with operational data recorded continuously over a half-year period from 2023-12-01 to 2024-05-31. Uses The dataset is intended for: Explainable battery management research Signal-to-text and time-series-to-language modeling LLM-based battery condition assessment LLM-based anomaly detection and fault diagnosis Decision support systems for battery O&M Benchmarking and reproducibility studies in battery–LLM integration It is especially useful for testing the feasibility of using textual semantics to represent multivariate battery operational dynamics in an interpretable format Dataset Structure The dataset is organized by LIB module, where each module corresponds to one JSON file. Real-world LIB module configuration Operational data collected from 2023-12-01 to 2024-05-31 Each module contains 16 cells connected in series Each cell includes time-series measurements of: temperature, voltage, State of Charge (SOC); Each module also includes a shared current signal, representing the same current through all series-connected cells Files and data contents Each file contains the complete dataset for one module. Format: JSON File naming pattern: data_for_module#1.json, data_for_module#2.json, ..., data_for_module#28.json Data contents: each line represents one time-series interval (100 minutes) paired with multiple reports, stored as key-value pairs: sample id: A unique identifier for the time-series/report pair within that module file. time series: A list recording the complete time-series of all 16 cells in the module over 100 timestamps. (Ordering: temperature of cell 1, voltage of cell 1, SOC of cell 1, ..., SOC of cell 16, shared current of cells). descript c1: System-level report describing the overall condition and behavior of the LIB module. descript c2 ~ descript c5: Single-cell reports grouped by cell index ranges: descript c2: reports for cells 1–4 descript c3: reports for cells 5–8 descript c4: reports for cells 9–12 descript c5: reports for cells 13–16 Sampling interval and dataset size The dataset uses 100 timestamps (100 minutes) as the construction interval for each time-series & report pair. In each module file: Each line corresponds to one 100-minute time-series segment and its aligned reports. Total number of time-series/report pairs per module file: 2590 pairs
large language model, lithium-ion battery
large language model, lithium-ion battery
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
