
We use the French model CamemBERT to replace every occurrence of the word schtroumpf- ("smurf-") with its contextually most probable word in a corpus of 300 pages (five albums) of the Belgian Smurfs comics. Our multimodal pipeline consists of comic-centered OCR, image captioning and automated token prediction. By generating ten versions of all speech bubbles in about 3000 panels (top-1 to 10 most likely predictions), our experiment exposes how an LLM performs the very task that enables humans to naturally understand the Smurfs’ playful language: inferring meaning from context.
