Routinely collected health care data are derived from electronic medical records, health insurance records and administration records in healthcare organisations. These databases are increasingly being used for research. They have been used to generate ideas about causes of illness, evaluation of health service policies, clinical audits and surveillance of diseases and looking for adverse effects of medications. Beyond these benefits routine databases are also useful to find out if the effects of drugs that are observed in Randomised Controlled Trials (RCT) are also observed in real world setting, especially in groups of people whose characteristics are different to those in the RCT studies. Despite the benefits of routinely collected healthcare databases there are numerous challenges in utilising them for research. Some of the challenges are due to difficulty in extracting data in a way that allows complex study designs. Data extraction is expensive and tedious in terms of time, cost, effort and expertise. This is partly because the databases are huge in size, vary in structure and have wide range of data. Some of the difficulty in extraction is due to complexity of study designs needed to probe these databases, because the data was not collected for research purposes and therefore have numerous inherent biases. Furthermore any extraction needs clinical, epidemiological and technical expertise to interrogate these databases. These issues can lead to many human induced errors and can result in data that are not accurate and reproducible. Working with computer scientists, clinicians and methodologists we have developed an Automated Clinical Epidemiology Studies (ACES) platform for extracting data that are accurate and reproducible for epidemiological studies in one database of medical records from general practices (The Health Improvement Network database). The platform enables to complete data extraction within minutes to hours which previously took weeks to months when done manually. The platform has already enabled numerous studies in the last 12 months. Now that we have developed such a platform, in this research programme, we aim to extend this platform to; 1) complex epidemiological study designs and 2) databases that have different structure and coding systems. For complex study designs we will develop and evaluate one platform for linked mothers and babies databases and another for studies of the effects of drugs (pharmaco-epidemiological studies). Pharmaco-epidemiological studies help with understanding the beneficial and harmful effects of medications. In the process of developing the automated platform for pharmaco-epidemiological studies we will also review and where necessary develop methodologies to estimate the effects of medications more accurately. We have been in conversation with institutions in other countries to extend our ACES platform to their databases, which have different structure and coding systems, and evaluate if this works. If we achieve this then we could research multiple databases across different countries for one question simultaneously. Finally we will also assess the risks of having such an automated data extraction system. For example, it is possible to conduct numerous studies within a day and only report ones that are showing positive results. We will identify such issues by discussing with relevant stakeholders and produce a set of recommendations on how best to avoid such situations.