AutoMapDB: automagically mapping of Postgres database schema into a Python abstraction

Summary AutoMapDB enables programmers to manipulate database tables without deeper SQL knowledge. By adding substantial features to the existing library sqlalchemy (Bayer, 2012), database tables can be queried through an abstraction layer for any PostgreSQL database. Therefore, table mappings of the schema are automatically generated by the tool. AutoMapDB can be used as both CLI and library within a Python project. The tool has already been successfully used to manipulate i2b2 system tables. Statement of Need Querying databases can be heavily complex. Often performance queries are not essential, but work time effort needs to be minimized. Depending on the security concept, a database is only accessible from the server which does not have a window manager. Thus, when working on a server, it is likely that no GUI client interfaces are available. Beyond that not everybody is familiar with SQL. Low-level ORM libraries require setup in order to consume existing databases. Further, they usually do not offer CLI commands that enable quick usage on the command line. AutoMapDB is a python package that fills the gap between Postgres, ORM and the command line. Further, its generic abstraction can be used in Python libraries in order to comfortably send queries to a Postgres database. In medical computer science, famous applications such as i2b2 (Murphy et al., 2010; Wagholikar et al., 2018) and the OHSDI tools (Hripcsak et al., 2015) can be run on Postgres. Within the consortium MIRACUM (Prokosch et al., 2018) Postgres forms a widely used dbms. The applications within consortium DIFUTURE (Prasser et al., 2018) also heavily rely on Postgres databases, among others the IBM FHIR Server. Thus, in that field, the tool can be used to write automatic data exports and manipulations. It is quick and easy to install and configure, works via CLI as well as library and does not require SQL knowledge. One can easily install it with the package manager pip. Therefore it is a tool which enables simple database interactions on a headless server or in a Jupyter environment of a data scientist. Underlying Principles AutoMapDB mainly relies on three libraries. Among these is fire for exposing classes and functions as CLI commands. psycopg handles the underlying connection to the Postgres database with which it communicates. Finally, the ORM library sqlalchemy (Bayer, 2012) leverages the connection to the database and generates Python objects from tables through database reflections. To construct valid queries, AutoMapDB creates its own internal representations of the tables it operates on. These representations are automatically generated and can be customized and managed in a local file. They contain information such as schema, table name and the required fields to select a single entity, which usually are primary keys. Further, for the creation or the update of an entity, one can flag non-nullable and foreign key fields in order to force the check for a value during runtime. Consequently, Python triggers an error if no value is given for such a field and does not send an erroneous query to the database. The tool also offers filter options to select specific entities from the database by passing the column name and a value as command line arguments, which are translated to conditionals in the generated statements. Additionally the SQL Expressions LIKE and ILIKE can be used to filter the results with pattern matching by passing a dictionary to the corresponding argument "–like/–ilike" (cf. Figure 1). The output returned by the CLI can be returned in structured formats such as json and yaml, which enables automated processing inside shell scripts. \(\small{\textbf{automapdb} \quad \overbrace{\textit{table list}}^{a} \quad\overbrace{\textit{data.patients}}^{b} \quad \overbrace{\textit{--age=35}}^{c} \quad\overbrace{\textit{--ilike='\{first_name:"%BERT%"\}'}}^{d} \quad\overbrace{\textit{--fmt=yaml}}^{e}}\) \(\small{\overbrace{table.list}^{a}\quad(\overbrace{"data.patients"}^{b}, \quad\overbrace{age="35"}^{c}, \quad\overbrace{ilike=\{"first\_name": "\%BERT\%"\}}^{d}, \quad\overbrace{fmt="yaml"}^{e})}\) Figure 1: Example command using the CLI (1): Invocation of the list function (a) on the table data.patients (b), a filter on the field age (c) and using an ilike expression on the field first_name (d). Finally, the resulting format is set to yaml by (e). The corresponding statement using the python API is shown in (2). Comparison to Other Tools GUI based database clients such as dbeaver feature powerful functionality, such as a graphical query editor and an entity relationship diagram generator. Though, either a direct connection to the database must be establishable or the server with the database instance requires a window manager. Often both are not the case. Further, dbeaver is not intended to run without UI, i.e. in a CLI such as a bash script. For that purpose Postgres comes with the powerful efficiently implemented CLI client psql. However, to guarantee proper work, one requires deeper database and SQL knowledge. Csvkit is a toolkit including CLI commands to directly work with the database from the command line. The foundational data structure of the tool is the CSV format. This leads to complicated workflows when manipulating single database entries or even single fields. Further, depending on a task’s complexity, SQL knowledge is required. AutoMapDB can be used without SQL knowledge. Even the database design can be automatically exported into a required config json representation. Thus, there is no necessity for further database tools in order to work with the database with intuitive commands. Finally, its usability as a library inside Python scripts enables quick data analysis with libraries such as Pandas without the requirement of any SQL knowledge. It enables statistical evaluation options using existing statistical tools or even machine learning approaches, which are often available in Python. This offers the possibility of staying in one programming language ecosystem. The tool has already been successfully used to manipulate i2b2 system tables. Acknowledgements This work was supported by the German Ministry for Education and Research (BMBF FKZ 01ZZ1801B, 01ZZ1804A).

{"references": ["Bayer, M. (2012). SQLAlchemy. In A. Brown & G. Wilson (Eds.), The architecture of open source applications volume II: Structure, scale, and a few more fearless hacks. aosa- book.org. http://aosabook.org/en/sqlalchemy.html", "Hripcsak, G., Duke, J. D., Shah, N. H., Reich, C. G., Huser, V., Schuemie, M. J., Suchard, M. A., Park, R. W., Wong, I. C. K., Rijnbeek, P. R., Lei, J. van der, Pratt, N., Nor\u00e9n, G. N., Li, Y.-C., Stang, P. E., Madigan, D., & Ryan, P. B. (2015). Observational health data sciences and informatics (OHDSI): Opportunities for observational researchers. Studies in Health Technology and Informatics, 216, 574\u2013578. https://www.ncbi.nlm.nih.gov/pmc/ articles/PMC4815923/", "Murphy, S. N., Weber, G., Mendis, M., Gainer, V., Chueh, H. C., Churchill, S., & Kohane, I. (2010). Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). Journal of the American Medical Informatics Association: JAMIA, 17(2), 124\u2013130. https://doi.org/10.1136/jamia.2009.000893", "Prasser, F., Kohlbacher, O., Mansmann, U., Bauer, B., & Kuhn, K. A. (2018). Data integra- tion for future medicine (DIFUTURE). Methods of Information in Medicine, 57, e57\u2013e65. https://doi.org/10.3414/ME17-02-0022", "Prokosch, H.-U., Acker, T., Bernarding, J., Binder, H., Boeker, M., Boerries, M., Daumke, P., Ganslandt, T., Hesser, J., H\u00f6ning, G., Neumaier, M., Marquardt, K., Renz, H., Rothk\u00f6t- ter, H.-J., Schade-Brittinger, C., Schm\u00fccker, P., Sch\u00fcttler, J., Sedlmayr, M., Serve, H., \u2026 Storf, H. (2018). MIRACUM: Medical informatics in research and care in university medicine. Methods of Information in Medicine, 57, e82\u2013e91. https://doi.org/10.3414/ ME17-02-0025", "Wagholikar, K. B., Dessai, P., Sanz, J., Mendis, M. E., Bell, D. S., & Murphy, S. N. (2018). Implementation of informatics for integrating biology and the bedside (i2b2) platform as docker containers. BMC Medical Informatics and Decision Making, 18(1), 66. https: //doi.org/10.1186/s12911-018-0646-2"]}

- fixed missing pyyaml dependency

Related Organizations

University of Freiburg
Germany
Technical University of Munich
Germany

Keywords

PostgreSQL, medical computer science, database, Python

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average