MLCerts Datasets and Language Models (ICSE 2026)

Auxiliary material, up to date documentation, and issue tracking available at: https://github.com/rub-softsec/MLCerts Docker images for reproducing artifacts are available at: https://zenodo.org/records/17850372 Datasets Raw PEM certificates used in differential testing: v3-chain.tar.bz2: 12 synthetic certificate datasets. v3-experiments-extra.tar.bz2: MLCerts 1M dataset. frankencerts-v1-8M.tar.bz2: Frankencerts 8M dataset. seeds30k.tar.bz2: Transcert 30K dataset. The CA information is available in customCA/ directory. Language Models (llm-code-MLcerts-EXPORT.zip) One of the model architectures below are used to generate synthetic ASN.1 instances (with BEGIN/END tags). asn1_to_pem.py is then used to convert them into a PEM format, with CA information copied from customCA/ directory. RNN models Code for RNN models, based on Char-RNN-Python, is available in Char-RNN-PyTorch directory. charRNN-custom.py is used for training, and generate.py for generating synthetic certificate instances. python3 generate.py saved_model hidden_size layers temperature original_cert_dataset extra_run_name Saved models available are: 2022-scanned-1024-3-0.0002lr-0.1dropout-epoch3-step300000 2022-scanned-256-3-0.0002lr-0.1dropout-epoch3-step300000 balanced-versions-1024-3-0.0002lr-0.1dropout-epoch3-step300000 balanced-versions-256-3-0.0002lr-0.1dropout-epoch3-step300000 zmap-data-256-3-0.0002lr-0.1dropout-epoch3-step300000 zmap-data-1024-3-0.0002lr-0.1dropout-epoch3-step300000 To generate certificates for the final model used in paper results (IPv4/RNN-Medium with Temperature = 1.5), use: python3 generate.py zmap-data-1024-3-0.0002lr-0.1dropout-epoch3-step300000 1024 3 1.5 zmap-data testZmap1M GPT Models Code for GPT models, based on GPT-Neo-125, is available in Transformers directory. train_script.py is used for training (train_script_scratch.py for training from scratch), and generate.py for generating synthetic certificate instances. python3 generate.py saved_model checkpoint_num training_type temperature training_type can be 'finetune' or 'custom’, for instance: python3 generate.py 2022-scanned-custom checkpoint-284400 custom 1.0 Saved models available are: 2022-scanned 2022-scanned-custom balanced-versions balanced-versions-custom zmap-data-custom zmap-data The custom versions are the ones trained from scratch. conda-env.yml can be consulted for environment dependencies. BibTeX Please cite our paper if you rely on the datasets for your work. @inproceedings{icse2026-hallucinating-certificates, title = {{Hallucinating Certificates: Differential Testing of TLS Certificate Validation Using Generative Language Models}}, author = {Paracha, Talha and Posluns, Kyle and Borgolte, Kevin and Lindorfer, Martina and Choffnes, David}, booktitle = {Proceedings of the 48th IEEE/ACM International Conference on Software Engineering (ICSE)}, date = {2026-04}, edition = {48}, editor = {Mezini, Mira and Zimmermann, Thomas}, location = {Rio de Janeiro, Brazil}, publisher = {Association for Computing Machinery (ACM)/Institute of Electrical and Electronics Engineers (IEEE)} }

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average