Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software . 2024
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software . 2025
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software . 2025
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software . 2025
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software . 2024
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software . 2024
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software . 2025
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software . 2025
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software . 2025
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software . 2025
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software . 2025
Data sources: ZENODO
ZENODO
Software . 2024
Data sources: Datacite
ZENODO
Software . 2025
Data sources: Datacite
ZENODO
Software . 2025
Data sources: Datacite
ZENODO
Software . 2025
Data sources: Datacite
ZENODO
Software . 2024
Data sources: Datacite
ZENODO
Software . 2024
Data sources: Datacite
ZENODO
Software . 2025
Data sources: Datacite
ZENODO
Software . 2025
Data sources: Datacite
ZENODO
Software . 2025
Data sources: Datacite
ZENODO
Software . 2025
Data sources: Datacite
ZENODO
Software . 2025
Data sources: Datacite
versions View all 11 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Propensity Score Matching Python-based code

Abstract

It has been tested and works with datasets in SPSS v25, 28 and 29 ('open script'). Python, v3.10 and 3.11. Regarding R, versions 4.3.0 and 4.4.0, and 'Reticulate' package, 1.39 and 1.40. It tolerates missing values acceptably. However, it is desirable to reduce them as much as possible. UsageRefine the code with your current research:- Rename C:\PATH_TO_YOUR_DATASET.sav- Rename COVS with your data (name, not label)- Choose the ratio (1:1, 1:2...) and the caliper - Choose bar colors and adjust the limits of the x-axis and y-axis to the desired range- Rename C:\PATH_TO_YOUR_FOLDERRun the script [RStudio, SPSS (File / Open script)...] All of them perform PS matching and store matched pairs. Features: * Code 1: Sampling without replacement. Five plots showing SMD and PS distributions. * Code 2: Sampling with replacement. Five plots. * Code 3: Sampling without replacement. In addition, a lineplot showing the SMD before and after matching. A colour assignment has been applied based on whether a covariate is included in the PS. It can be shown that PSM can also indirectly reduce the SMD of covariates not explicitly included in the PS model, due to underlying correlations or associations. * Code 4: Sampling without replacement. Focused on matching validation, it stores 3 statistics: - SMD (covariates included in PS): the objective is an absolute SMD postmatching <0.1 - VR (covariates included in PS): VR postmatching close to 1 - McFadden's pseudo-R² (postmatching close to zero, indicating that the covariates included in the PS model are no longer determinant of the variability of the DV)

This repository provides 4 versions of a free, Python-based code for performing propensity score (PS) matching. An initiative of the Camargo Cohort Study, developed with the aim of sharing the tool and spreading the use of PS matching. The code overcomes compatibility issues with R versions and R packages, and implements (i) logistic regression to compute PS, (ii) 1:N matching using the K-nearest neighbour (KNN) algorithm with a customisable caliper, (iii) sampling with or without replacement, and (iv) visualisations to assess matching quality. Outputs: Matched pairs stored as '.csv' file, allowing a Coxreg to be performed ('SET' in SPSS). Diagnostic plots stored in the specified output folder, providing a view of SMD and PS distribution. Statistics for matching validation: SMD, variance ratio (VR), and McFadden's pseudo-R². The code has been developed using information from the Matplotlib, Numpy and Seaborn libraries and with OpenAI's ChatGPT support and refinements. No funding was received for conducting this work and there are no financial or non-financial interests to disclose. 

The code was also tested by comparing the results with those of a PSM in SPSS based on R packages (Propensity Score Matching for SPSS v1.0, by Thoemmes F). We selected certain characteristics (5 covariates to be included in the PS, caliper=0.20, ratio 1/1, sampling without replacement), and applied them to both methods. We observed significant discrepancies in the PS values and in the composition of the matched sample. However, the post-matching balance met the standard thresholds using both methods. The differences are probably due to several factors -PS estimation, optimisation algorithms, caliper application...- reflecting the different performances offered by Python libraries (matplotlib, sklearn) and R-based packages (MatchIt, RItools, cem). In our opinion, in a practical approach, a method could be considered acceptable if the balance after matching meets the key criterion of absolute SMD <10% in covariates. This indicates a good PSM model, regardless of the PS values or the composition of the matched pairs.

New version (2025-03-03) combining codes 3 and 4. It provides: * Full PS matching with KNN algorithm, sampling without replacement, caliper=0.2 and ratio 1:1.* Matched pairs, SMD, variance ratios and pseudo-R^2, stored in a dedicated folder.* Lineplot with different colour for covs included in PS, tick spacing and covs names. It captures the critical processes of PSM: PS calculations, matching, validation and a diagnostic plot. As commented, it runs (i) From SPSS v25 with R 3.3.3 [(File/Open/Script (Python >3.10 is crucial)], and (ii) From RStudio with R >4.3 and Python >3.10 (SPSS just for the path to your dataset).

Codes 1 and 3 have been slightly modified. In addition to performing PS matching and saving matched pairs, this latest version (update 2025-02-16) introduces refinements to the lineplots from the matplotlib library. In particular, the visualisation of SMD values has been improved, including validation thresholds. The limits of y-axis are set to [-1,+1] and the tick frequency (tick spacing) is 0.1. Both can be changed in the code lines: plt.ylim(-1, 1) ax.yaxis.set_major_locator(ticker.MultipleLocator(0.1))

These plots were generated after running PSM with the same parameters on the same dataset, but using R-based packages in SPSS (left, dotplot) and using our Python-based code outside SPSS (right, lineplot). The results are almost identical.

Keywords

Statistics as Topic, Methods, Matching, Propensity Score, Statistics in Medicine, Python

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average