
This repository contains the CPgenes dataset, a comprehensive, large-scale trimodal corpus curated for generative virtual cell modeling. The dataset systematically bridges two powerful high-throughput screening technologies: Cell Painting (morphology) and L1000 (transcriptomics) across four distinct biological cohorts: BBBC021, CDRP, JUMP, and LINCS. Key contents of this upload: Paired Trimodal Samples: Matched sets of chemical/genetic perturbation embeddings, high-resolution (512*512) cellular morphology images, and corresponding gene expression profiles. Diverse Contexts: Data encompasses MCF7, U2OS, and A549 cell lines. For researchers requiring the complete raw datasets, you can download them directly from their respective official repositories. For more details, please refer to our paper: "MultiVCDiff: Building Generative Virtual Cell by Multimodally Predicting Morphological and Transcriptomic Perturbation Responses".
