
This dataset provides time-aligned observational snapshots of crypto market activity and social sentiment across 270+ crypto assets, designed to contextualize market structure, liquidity, and attention dynamics rather than produce forecasts or signals. The archive contains weekly Sunday samples drawn from Instrumetriq's continuous monitoring pipeline. Data Collection Spot market data sourced from Binance Mid prices, bid–ask spreads, liquidity percentiles Aggregated per observation window Social sentiment data sourced from X (Twitter) Posts are filtered for crypto relevance, then classified using a two-model transformer-based sentiment pipeline (BERTweet primary, DistilBERT referee) Sentiment is exposed only in aggregated form (counts and averages) Each asset is monitored in ~2-hour observation cycles, producing one row per asset per session Approximately ~2,500 observations per day, regardless of tier Archive Contents This archive contains: 13 weekly Sunday samples (2025-12-21 through 2026-03-15) Three dataset tiers per week, sharing the same observations but differing in schema depth Apache Parquet format with Snappy compression Schema documentation for all tiers Methodology overview Dataset Tiers All tiers contain the same number of observations. They differ only in column structure and depth. Tier 1 — Explorer 19 flat columns Aggregated sentiment counts and averages Spot prices, spreads, liquidity, and quality scores Designed for lightweight inspection, dashboards, and general analysis Tier 2 — Analyst Extends Tier 1 with nested columns Detailed sentiment aggregates, author statistics, and engagement metrics Designed for deeper behavioral and cross-sectional analysis Tier 3 — Researcher Extends Tier 2 with nested futures and microstructure data Includes 700+ spot price samples per observation window (10-second resolution) Multi-window sentiment, diagnostics, and futures positioning data Designed for research, validation, and archival analysis Note: High-frequency (10-second) spot price samples are available only in Tier 3. Intended Use This dataset is intended for: Market structure research Behavioral and sentiment analysis Liquidity and execution context studies Exploratory and descriptive analytics Limitations & Ethics Observational data only No trading advice, predictions, or signal generation No individual social media posts or personal data are included All sentiment data is aggregated and anonymized Access Free weekly samples: github.com/SiCkGFX/instrumetriq-public Methodology: instrumetriq.com/research Full access via subscription at instrumetriq.com/access. Interactive demo: Open in Colab. Observational data only. No trading advice, predictions, or signal generation.
Market data is sourced from the Binance spot market via the public REST API. Spot prices, bid–ask spreads, and liquidity-related metrics are sampled internally at high frequency and aggregated into fixed observation windows. Social sentiment data is sourced from publicly available X (Twitter) posts. Posts are filtered for crypto relevance using a dedicated BERTweet-based classifier, then scored by a two-model sentiment pipeline (BERTweet primary, DistilBERT referee with confidence calibration). Sentiment outputs are aggregated into per-window counts and summary statistics. The sentiment pipeline was updated in February 2026 (V1 → V2). Phase 1 (2026-02-16): updated sentiment models. Phase 2 (2026-02-17): crypto relevance filter activated. Records include a methodology_regime field ('v1' or 'v2') for programmatic version identification. Each tracked asset is monitored in rolling observation cycles of approximately ~2 hours, producing one observation per asset per cycle. All tiers share the same observation timing and coverage. High-frequency spot price samples (10-second resolution) are retained only in the highest dataset tier. Lower tiers expose aggregated spot and sentiment statistics only. No raw social media content, user identifiers, or personally identifiable information are included. The dataset is strictly observational and descriptive in nature.
Market data: Binance spot market REST API, sampled at 10-second intervals. Sentiment (V2, since February 2026): X (Twitter) public posts are filtered for crypto relevance using a BERTweet-based classifier, then scored by a two-model pipeline — BERTweet primary model for classification, DistilBERT referee model for confidence calibration and edge-case arbitration. Each asset is monitored in ~2-hour observation cycles (~120-130 minutes). Aggregation produces per-cycle summaries with sentiment counts, mean scores, and silence detection. Cutover: Phase 1 (model swap) 2026-02-16T05:14Z, Phase 2 (relevance filter) 2026-02-17T06:03Z. Records include methodology_regime ('v1'/'v2') for version identification.
market microstructure, liquidity, relevance filtering, NLP, cryptocurrency, social sentiment, DistilBERT, bid-ask spread, sentiment analysis, crypto dataset, BERTweet, Twitter sentiment, Binance, time series
Twitter Data
market microstructure, liquidity, relevance filtering, NLP, cryptocurrency, social sentiment, DistilBERT, bid-ask spread, sentiment analysis, crypto dataset, BERTweet, Twitter sentiment, Binance, time series
Twitter Data
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
