robomustib/TextSimilarityGrader: TextSimilarityGrader: A Python Tool for Automated Fuzzy Evaluation of Speech-to-Text Transcripts in Research Contexts

Mustafa

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Software . 2026

License: http://opensource.org/licenses/MIT

Data sources: ZENODO

robomustib/TextSimilarityGrader: TextSimilarityGrader: A Python Tool for Automated Fuzzy Evaluation of Speech-to-Text Transcripts in Research Contexts

integration_instructionsResearch softwarekeyboard_double_arrow_right Software 29 Jan 2026Publisher:Zenodo

Authors: Mustafa;

doi: 10.5281/zenodo.18422535

robomustib/TextSimilarityGrader: TextSimilarityGrader: A Python Tool for Automated Fuzzy Evaluation of Speech-to-Text Transcripts in Research Contexts

- Summary

Abstract

Abstract In large-scale psychological and linguistic studies, manual coding of speech-to-text transcripts is time-consuming and prone to human error. Furthermore, automated transcription services (ASR) often introduce phonetic errors, typos, or misinterpretations (e.g., "Appple" instead of "Apple"), rendering exact-string-matching algorithms ineffective for automated grading. TextSimilarityGrader is an open-source Python utility designed to solve this problem. It automates the evaluation of transcript files (JSON or TXT) against a set of expected keywords/answers. By utilizing fuzzy string matching (based on Gestalt Pattern Matching), the tool identifies correct answers even when the transcript contains spelling errors, dialect variations, or ASR artifacts. This allows for rapid, standardized scoring (0/1) of thousands of audio transcripts with high reliability. Motivation and Problem Statement Researchers utilizing ASR (Automatic Speech Recognition) tools like Gladia, OpenAI Whisper, or Google STT often face a "post-processing bottleneck." While the audio is transcribed quickly, verifying if a participant said a specific target word requires reading through thousands of files. Simple "Ctrl+F" search scripts fail when the ASR makes minor mistakes (e.g., transcribing "Buß" instead of "Bus"). Methodology The software implements a multi-stage evaluation pipeline: Data Ingestion: The tool parses various transcript formats, including nested JSON structures (common in API outputs) and plain text. Normalization: Input text is cleaned (lowercased, punctuation removed, special character normalization) to ensure comparability. Fuzzy Logic Matching & Mathematical Foundation: The core engine utilizes the difflib.SequenceMatcher class, which implements the Ratcliff/Obershelp pattern recognition algorithm. The similarity ratio S is calculated as: S = (2 * M) / T Where: M is the number of matching characters. T is the total number of characters in both sequences (T = len(a) + len(b)). This yields a normalized score S between 0.0 and 1.0, where 1.0 indicates an identical match. 4. Threshold-Based Grading: A similarity threshold (default ≥ 0.75) determines validity. The score assignment follows a binary classification logic: Score = 1 (Correct) if S ≥ 0.75 Score = 0 (Incorrect) if S < 0.75 Note: A dynamic constraint is applied to short words (≤ 3 characters) to minimize false positives. 5. Reporting: Results are exported to an Excel file, listing the detected word, the full context sentence, the calculated similarity score, and the final point allocation. Key Features ASR-Agnostic: Works with Gladia JSON, generic JSON, and .txt files. Error Tolerance: Robust against ASR hallucinations, stuttering, and phonetic misspellings. Batch Processing: Capable of processing thousands of files in a single run. Visual Validation: The output Excel sheet allows researchers to manually verify "close calls" by reviewing the similarity percentage and extracted context. Reproducibility: Includes a test suite (tests/) to generate mock data with intentional typos, validating the grading logic before real data processing. Workflow The tool operates in three steps: Template Generation: Scans the data folder and creates an Excel template (Solutions.xlsx). Definition: The researcher enters the expected target words into the Excel template. Evaluation: The script evaluate.py processes the files and generates Grading_Results.xlsx. Technical Implementation Language: Python 3.x Dependencies: pandas (Dataframe manipulation), openpyxl (Excel I/O). License: MIT License Related Works This tool serves as the evaluation module for the Gladia Batch Transcriber workflow but can be used independently with any text-based data source.

Found an issue? Give us feedback