Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software . 2024
Data sources: ZENODO
ZENODO
Software . 2024
Data sources: Datacite
ZENODO
Software . 2024
Data sources: Datacite
versions View all 2 versions
addClaim

TabeaSonnenschein/GenSynthPop: R-package for Generating Representative Spatially Explicit Synthetic Populations

Authors: Tabea Sonnenschein; de Mooij, Jan; Pellegrino, , Marco; Dastani, Mehdi; Ettema, Dick; Logan, Brian; Verstegen, Judith A.;

TabeaSonnenschein/GenSynthPop: R-package for Generating Representative Spatially Explicit Synthetic Populations

Abstract

Instructions for R-package: GenSynthPop This repository contains the implementation of GenSynthPop, a sample-free tool to construct Synthetic Populations from mixed-aggregation contingency tables. This package contains a set of functions that help prepare stratified census datasets to generate conditional propensities, combines the conditional propensities with spatial marginal distributions to generate a representative population and validates that the produced agents have a similar distribution as the initial spatial marginal datasets and the stratified datasets. The generated population is representative for a city or the spatial extent that is fed into the algorithms and can be used for simulation purposes, such as an agent-based model. The smaller the spatial units of the spatial marginal distributions, the more spatially resolved the agents will be too. Updates Changes in Version 2.0.0 of the package GenSynthPop compared to Version 1.0.0 * implements iterative proportionate fitting to fit multi-variable joint distributions to spatial marginal distributions. * implements deterministic assignment, instead of probability distribution sampling * fuses all steps into a single function, for ease of use The work in this repository is described in: de Mooij, J., Sonnenschein, T., Pellegrino, M. et al. GenSynthPop: generating a spatially explicit synthetic population of individuals and households from aggregated data. Auton Agent Multi-Agent Syst 38, 48 (2024). https://doi.org/10.1007/s10458-024-09680-7 An Python implementation of this library is available here Main Function Conditional_attribute_adder(): Adds a target attribute to a synthetic population by fitting it to a contingency table, optionally using iterative proportional fitting (IPF) with margins. Installing package in R install.packages("devtools") library(devtools) install_github("TabeaSonnenschein/GenSynthPop") library(GenSynthPop) Looking up documentation for a function There is extensive documentation for the functions within R Example: ?Conditional_attribute_adder help(Conditional_attribute_adder) Should there be remaining questions, shoot me an email: t.s.sonnenschein@uu.nl Instructions 1. Start by collecting neighborhood marginal distributions of age_groups. It is recommended to go as spatially resolved as you can (smallest spatial unit) but it depends on what you want to use the synthetic agent population for. You theoretically can even use provincial or national administrative areas, if this is your project scope and goal. We go for neighborhoods because we want to create an urban ABM. 2. generate a population by generating unique agents for each person living in each neighborhood # Load the library library(GenSynthPop) neigh_df = read.csv("Neighborhood_statistics.csv") # Initialize the agent_df agent_neighborhoods = list() agent_count = 0 for (i in 1:nrow(neigh_df)) { neighb_code = neigh_df[i, "neighb_code"] neighb_total = neigh_df[i, "nr_residents"] agent_neighborhoods = c(agent_neighborhoods, rep(neighb_code, neighb_total)) agent_count = agent_count + neighb_total } agent_ids = paste0("Agent_", 0:(agent_count - 1)) agent_df = data.frame(agent_id = unlist(agent_ids), neighb_code = unlist(agent_neighborhoods)) 3. use this new agent_df and the neighborhood marginal distribution dataframe to distribute the agents across neighborhoods and age groups. agecols = c("0-15", "15-25", "25-45", "45-65", "65+") ageneigh_df = neigh_df[unlist(c("neighb_code", agecols))] %>% pivot_longer(cols = all_of(agecols), names_to = "age_group", values_to = "count") # Create a new column for counts ageneigh_df = as.data.frame(ageneigh_df) agent_df = Conditional_attribute_adder(df = agent_df, df_contingency = ageneigh_df, target_attribute = "age_group", group_by = c("neighb_code")) print(head(agent_df)) 4. Read the stratified dataframe with the conditional variable and the variable of interest (that you want to add), for example sex by agegroup, since we already added that one. Make sure that the classes of the conditional variables correspond to the ones in the agent_df. We can now use additional neighborhood margins that we have sex_age_df = read.csv("sex_age_statistics.csv") # columns age_group, sex, counts sexneigh_df % pivot_longer(cols = all_of(sexcols), names_to = "sex", values_to = "count") sexneigh_df <- as.data.frame(sexneigh_df) agent_df = Conditional_attribute_adder(df = agent_df, df_contingency = sex_age_df , target_attribute = "sex", group_by = c("neighb_code"), margins= list(ageneigh_df, sexneigh_df), margins_names= c("age_group", "sex")) print(head(agent_df)) 5. Now we can add multi-variable contingency tables and repeat the function for any data and variables we would like to add. For example let us add education level based on age and sex. We can now use the neighborhood margins for age_group, sex, or even as well for education_level. The function can take contingency tables with any number of variables and any number of neighborhood marginal data. The only requirement is that the conditional variables of the contingency table and marginal data are represented in the agent_df. So all variables apart from the target attribute. The algorithm can deal with cases when no neighborhood marginal data is available for some conditional variables or target attributes. edu_age_sex_df = read.csv("edu_sex_age_statistics.csv") # columns age_group, sex, education_level counts agent_df = Conditional_attribute_adder(df = agent_df, df_contingency = edu_age_sex_df , target_attribute = "education_level", group_by = c("neighb_code"), margins= list(ageneigh_df, sexneigh_df), margins_names= c("age_group", "sex")) print(head(agent_df)) you can look at the Example_Application_GenSynthPop.R script for an example application of the functions in the package. License This package is licensed under the MIT License.

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
1
Average
Average
Average
Related to Research communities