Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Ethereum Wallet Profiling Data: Raw Uniswap Transaction Logs and NBD-Processed Behavioral Features

Authors: Sotov, Alexandre;

Ethereum Wallet Profiling Data: Raw Uniswap Transaction Logs and NBD-Processed Behavioral Features

Abstract

Dataset Description 1.1 Source and Collection of raw data All swap events emitted by the Uniswap v3 ETH/USDC 0.05% fee-tier pool(0x88e6A0c2dDD26FEEb64F039a2c41296FcB3f5640) on Ethereum mainnet werecollected between blocks 24,384,021 and 24,469,780 (04–16 February 2026,11.98 days) using a custom JSON-RPC collector that processes blockssequentially and caches transaction receipts per block to minimise RPCoverhead. Each record corresponds to one swap event and contains fifteen fields:block number, log index, timestamp, three address fields, trade side,base and quote quantities, execution price, price impact, gas used, gasprice, transaction fee, and transaction hash. The structure intentionallymirrors proprietary trader logs used in CEX microstructure research,enabling direct application of order-flow analysis methods todecentralised venue data. Address decomposition. Three distinct roles are captured per swap:tx_sender (the account that signed the transaction), swap_sender(the msg.sender of the pool's swap() call, typically a router), andrecipient (the address credited with output tokens). In direct swapsall three coincide; in router-mediated swaps they diverge. Thisdecomposition is the basis for the forensic separation of MEV bots fromretail users and routing contracts in Section 4. Derived fields. Execution price is decoded from the sqrtPriceX96value in each Swap event log. Price impact (impact_pct) is computedsequentially within each block—each trade's impact is measured againstthe execution price of the immediately preceding swap in the sameblock—consistent with the sandwich attack literature. Thegas_price_gwei field records effectiveGasPrice from the transactionreceipt; under EIP-1559 this is the priority tip, not base fee plus tip.This is confirmed empirically: 61.3% of transactions record sub-1-Gweivalues, inconsistent with mainnet base fees during the period (1–20Gwei). Gas fields are therefore treated as signals of inclusion urgencyrather than absolute cost. 1.2 Sample Overview Thirty records with sub-wei ETH quantities and zero USDC volume wereexcluded as contract dust artefacts. The cleaned sample is summarisedbelow. Parameter Value Block range 24,384,021 – 24,469,780 Block span 85,759 blocks Duration 11.98 days (287.6 hours) Unique blocks with swaps 55,309 (64.5% utilisation) Total swaps (cleaned) 121,258 Unique recipient wallets 3,801 Buy-side transactions 60,625 (50.0%) Sell-side transactions 60,663 (50.0%) The 50/50 directional split and near-equal buy/sell volume ($2.49B vs$2.51B, 300)." 2.2.3 Volume and Price Features value_clustering_score is the fraction of ETH trade sizes thatqualify as round numbers, defined as values whose string representationhas fewer than four significant decimal digits. Following Niedermayer etal. (2024) and Cong et al. (2021), round-number clustering reflectshuman cognitive reference points; its absence in a bot wallet isexpected and its presence may indicate wash trading or manualintervention. mean_base_qty_eth, mean_quote_qty_usdc,mean_gas_price_gwei, and mean_tx_fee_eth are arithmeticmeans of per-trade quantities. Gas price is interpreted as an inclusionurgency signal rather than an absolute cost measure, consistent with theEIP-1559 recording issue described in Section 3.2. net_usdc_flow is the signed sum of USDC flows over the observationwindow: $$\text{net_usdc_flow}(w) = \sum_{t} \text{flow}_t, \quad\text{where} \quad\text{flow}_t =\begin{cases}+\text{usdc}_t & \text{if SELL_ETH} \-\text{usdc}_t & \text{if BUY_ETH}\end{cases}$$ A positive value indicates net USDC extraction from the pool (wallet isa net seller of ETH); a negative value indicates net USDC injection(wallet is a net buyer). This is the primary profitability indicator. 2.2.4 Block-Level Features Four features capture intra-block positioning and dominance, computedfrom the pool's transaction log indexed by block number and log index. avg_log_index is the mean position of the wallet's transactionswithin their respective blocks. Lower values indicate earlier placement,which in the context of a Uniswap pool is a direct signal of blockconstruction access: randomly submitted transactions land at positionsdetermined by mempool ordering, while block builders can place their owntransactions first. block_capture_rate is the fraction of blocks in which the walletaccounts for more than 50% of the pool's total swap volume. A walletachieving majority volume share in a block has effectively dominatedthat block's price discovery. avg_block_share is the mean fraction of per-block pool volumeattributable to the wallet, across all blocks in which it appears. avg_txs_per_block and multi_tx_rate measure the intensityand prevalence of multi-transaction execution within single blocks.avg_txs_per_block is the mean number of swaps placed by the wallet ina block; multi_tx_rate is the fraction of blocks containing more thanone such swap. Values above 2 on avg_txs_per_block are consistentwith atomic sandwich execution (frontrun + backrun bracketing a victimtrade). 2.2.5 CEX-Derived Feature alpha_reaction_rate measures directional alignment between thewallet's DEX trades and concurrent Kraken price movements. Tick-leveltrade data for the XETHZUSD pair is fetched from the Kraken REST API,resampled to one-second intervals, and forward-filled with a 60-secondcap to avoid attaching stale prices to DEX events. Price returns areexpressed in basis points. A CEX signal is defined as a one-secondinterval with $|\text{bps}| \geq 5$. For each DEX swap occurring withina signal window, a reaction is recorded if the trade direction matchesthe CEX price movement: $$\text{reaction}_t = \mathbf{1}!\left[(\text{bps}_t > 0 \wedge \text{side}_t = \text{BUY_ETH}) ;\vee;(\text{bps}_t 0,distinguishing smart contract wallets from EOAs. In our sample,contract wallets account for the majority of identified MEV bots, ason-chain execution logic is required to implement sandwich attacks andatomic arbitrage within a single transaction. is_active_mev_proxy: a boolean flag derived from heuristicsapplied to the bytecode and transaction patterns, indicating whetherthe contract exhibits structural signatures consistent with a MEVproxy — a thin routing contract that delegates execution to a separateimplementation contract. Proxy architectures are commonly used toseparate upgradeable strategy logic from the capital-holding address. These four fields are appended to the per-wallet feature dataset,producing the final analytical table of 30 variables used in allsubsequent analysis. The label column enables stratified comparisonbetween Etherscan-confirmed bots and the unlabelled population, whilethe on-chain profile fields provide infrastructure-level covariates thatare orthogonal to the swap-log-derived behavioural features.

Keywords

Finances, Forensic Sciences, Blockchain/statistics & numerical data

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average