Code2Video: A Code-centric Paradigm for Educational Video Generation

Name: Code2Video: A Code-centric Paradigm for Educational Video Generation
Keywords: Human-Computer Interaction, FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Multimedia, Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Vision and Pattern Recognition, Computation and Language, Computation and Language (cs.CL), Human-Computer Interaction (cs.HC)

Chen, Yanzhe; Lin, Kevin Qinghong; Shou, Mike Zheng

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2025

Data sources: arXiv.org e-Print Archive

https://dx.doi.org/10.48550/ar...

Article . 2025

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

Code2Video: A Code-centric Paradigm for Educational Video Generation

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2025Embargo end date: 01 Jan 2025Publisher:arXiv

Authors: Chen, Yanzhe; Lin, Kevin Qinghong; Shou, Mike Zheng;

doi: 10.48550/arxiv.2510.01174

arXiv: 2510.01174

Code2Video: A Code-centric Paradigm for Educational Video Generation

- Summary
- Subjects
- Metrics

Abstract

While recent generative models advance pixel-space video synthesis, they remain limited in producing professional educational videos, which demand disciplinary knowledge, precise visual structures, and coherent transitions, limiting their applicability in educational scenarios. Intuitively, such requirements are better addressed through the manipulation of a renderable environment, which can be explicitly controlled via logical commands (e.g., code). In this work, we propose Code2Video, a code-centric agent framework for generating educational videos via executable Python code. The framework comprises three collaborative agents: (i) Planner, which structures lecture content into temporally coherent flows and prepares corresponding visual assets; (ii) Coder, which converts structured instructions into executable Python codes while incorporating scope-guided auto-fix to enhance efficiency; and (iii) Critic, which leverages vision-language models (VLM) with visual anchor prompts to refine spatial layout and ensure clarity. To support systematic evaluation, we build MMMC, a benchmark of professionally produced, discipline-specific educational videos. We evaluate MMMC across diverse dimensions, including VLM-as-a-Judge aesthetic scores, code efficiency, and particularly, TeachQuiz, a novel end-to-end metric that quantifies how well a VLM, after unlearning, can recover knowledge by watching the generated videos. Our results demonstrate the potential of Code2Video as a scalable, interpretable, and controllable approach, achieving 40% improvement over direct code generation and producing videos comparable to human-crafted tutorials. The code and datasets are available at https://github.com/showlab/Code2Video.

Project Page: https://showlab.github.io/Code2Video/

Keywords

Human-Computer Interaction, FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Multimedia, Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Vision and Pattern Recognition, Computation and Language, Computation and Language (cs.CL), Human-Computer Interaction (cs.HC), Multimedia (cs.MM)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Related to Research communities

Knowmad Institut