Benchmark Survey: Codebase for Graph Generative Model Benchmarking

This codebase is a benchmark framework for graph generative model experiments. It prepares graph datasets, trains/samples multiple model wrappers, evaluates generated graphs with several metric families, and produces reporting tables for survey-style empirical comparison. The main package lives in src/empirical_comparison. It defines dataset builders, model registries, generation utilities, evaluation helpers, graph attribute handling, metrics, and reporting utilities. The command-line workflows live in scripts. At a high level, the workflow is: Prepare datasets with scripts/prepare_data.py Train models with scripts/train_model.py Generate samples with scripts/generate_samples.py Evaluate samples with descriptor, molecular, learned-feature, or PolyGraphScore scripts Aggregate metric JSON files and generate LaTeX tables The benchmark supports synthetic datasets such as sbm and planar, plus molecular/attributed datasets such as qm9 and zinc. Graphs are represented as NetworkX objects with a canonical schema for node labels, node features, edge labels, edge features, and optional graph-level molecular targets. Model support is implemented through wrappers in src/empirical_comparison/models/wrappers. These adapt upstream or local models including dummy, construct, digress, disco, edp_gnn, graphguide, and grum. Vendored upstream repositories are stored under external. Evaluation includes: Generic structural descriptor MMD Molecular descriptor metrics and RDKit validity Official PolyGraphScore / PGS-JS Benchmark-local classifier fallback metrics Learned-feature / WL-subtree feature MMD Compute-budget reporting Configuration is YAML-based under configs, with separate files for datasets, models, metrics, and the full benchmark experiment matrix. Outputs are written under outputs/, including prepared datasets, checkpoints, generated samples, metric payloads, figures, and LaTeX tables.

Found an issue? Give us feedback