
This dataset contains a collection of totally unimodular (TU) node–arc incidence matrices, generated from random directed graphs with node counts between 20,000 and 50,000. Each column of the incidence matrix has exactly one +1 (arc tail) and one –1 (arc head). Because of the TU property, all linear programming relaxations of integer flow problems are guaranteed to have integer solutions. The dataset includes: matrices.csv: sparse representation of all instances (two rows per arc, +1 and –1). metadata.csv: summary of each instance (nodes, arcs, density). Conversion scripts (make_dat_all.py, make_dat_all.R) to produce .dat files for AMPL or other solvers. Typical applications include benchmarking large-scale network flow and minimum-cost flow solvers, and studying algorithmic scalability. ⚠️ Large file notice: The CSV files in this collection are very large (around 20 GB). They cannot usually be opened directly in spreadsheet software or loaded fully into memory on a typical laptop. For analysis, we recommend: Chunked reading (e.g. pandas.read_csv(..., chunksize=...)), Out-of-core frameworks such as Dask or Polars, or Importing into a database (e.g. PostgreSQL, SQLite). For smaller and more manageable datasets, please see the Small and Medium A collections.
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
