Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Preprint . 2025
License: CC BY
Data sources: Datacite
ZENODO
Preprint . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

AI Peer Review Acceleration of LLM-Generated Glioblastoma Clinical Trial Patient Matching ML, FDA/ICH/ISO, and FastAPI

Authors: Kawchak, Kevin;

AI Peer Review Acceleration of LLM-Generated Glioblastoma Clinical Trial Patient Matching ML, FDA/ICH/ISO, and FastAPI

Abstract

Human peer review was an effective technique to address quality, originality, and errors of non-LLM generated work. Researcher-AI trust has since then grown due to performant large language model (LLM) benchmark improvements and author implementations across different model manufacturers for processing frequent, complex, and voluminous data. This single human-AI team has led to the emergence of readily available artificial intelligence peer review. A primary benefit of AI peer review is the ability to appropriately conduct iterative analyses and corrections throughout the entire manuscript process. This responsible method improves on over-worked and fatigued human recommendations provided after manuscript completion. Here, a maximum efficiency study was performed in 14 days with the author utilizing a prior glioblastoma clinical trial matching guidance and a TrialGPT paper as inputs to output a prompt for developing a Python deep learning pipeline. A 10 file dataset and clinical BERT model notebook were generated by Sonnet 4.5 Extended, followed by repetitions of code fixes and optimizations to yield a final notebook at a low test set performance of 67.3%. A triple AI peer review conducted by Sonnet, GPT 5.1, and Grok 4.1 all yielded the primary recommendation to upgrade to a tabular model, which was more suitable for the csv dataset. ChatGPT then identified the most appropriate selection: an open source TabPFN-v2-clf model from Hugging Face. The Sonnet recommendations also included detailed instruction for five-fold cross-validation, SHAP explainability, and bias analysis incorporation. An AI peer review evaluation standard for ML workflows based criterion and efficiency metrics was generated by Opus 4.5 with highest efficiencies based on the least number of humans, prompts, and time at the lowest LLM cost. Subsequent evaluation of the current workflow with an overall quality of 87.5%, and 0.831 efficiency based on human, prompt, time, and cost metrics. The ChatGPT identified model and Sonnet review instructions were implemented into the prior notebook to yield a new notebook that saw an acceleration in research progress with a test set accuracy improvement to 94.0% with few-shot learning along with peer review corrections. A regulatory compliance prompt was created, followed by full document generations of FDA 21 CFR Part 820, ICH-GCP E6(R2), and ISO 14971 addressing quality, clinical practice, and risk management. Several attempts to create an effective FastAPI code repository by Sonnet were initially unsuccessful, however an Opus/Sonnet prompting strategy yielded a FastAPI interface with three successful glioblastoma patient clinical trial match predictions. AI as a senior software engineer and as a senior regulatory analyst were also employed as part of the AI peer review process The purpose of this AI peer review study was to place the world’s focus back on innovating active and real world medical AI advancements in a fraction of the time. Measuring the effect AI implementation has on large workforces can be challenging to quantify. The conversational human-AI R&D relationship has become increasingly solitude: therefore this single author project should be viewed as the standard for new workflows going into the future. This next evolution of peer review is especially suitable for hard working researchers in less favorable economic, educational, and affiliation conditions, such as the author of the paper. This process is afforded by low cost, fast, and easy to use state of the art LLMs from Anthropic, Google, OpenAI, and xAI; marking the pivot away from slow and irresponsible use of human peer review for LLM-generated data, and back to fast automation of patient health applications at scale. Many authors of less influence can now improve their speed of development and gain reputation in papers by simply providing prompts, LLMs, AI peer review recommendations, and corrections they made - without the need for paid review or journals. Interested parties can then view the author’s peer review findings alongside the paper. Trust for the AI peer review process was established over a series of human-AI works reflected in LLM code and non-code benchmarking, applications in literature, and prior author works using AI in review roles coordinated across judging, external validation, cross-verification, meta-verification, and tests according to FDA/ICH guidance.

Keywords

LLM Machine Learning Code Generation, Glioblastoma, Clinical Trial Patient Matching

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average