Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

MLCerts Datasets and Language Models (ICSE 2026)

Authors: Paracha, Muhammad Talha; Borgolte, Kevin; Lindorfer, Martina; Choffnes, David;

MLCerts Datasets and Language Models (ICSE 2026)

Abstract

Auxiliary material, up to date documentation, and issue tracking available at: https://github.com/rub-softsec/MLCerts Docker images for reproducing artifacts are available at: https://zenodo.org/records/17850372 Datasets Raw PEM certificates used in differential testing: v3-chain.tar.bz2: 12 synthetic certificate datasets. v3-experiments-extra.tar.bz2: MLCerts 1M dataset. frankencerts-v1-8M.tar.bz2: Frankencerts 8M dataset. seeds30k.tar.bz2: Transcert 30K dataset. The CA information is available in customCA/ directory. Language Models (llm-code-MLcerts-EXPORT.zip) One of the model architectures below are used to generate synthetic ASN.1 instances (with BEGIN/END tags). asn1_to_pem.py is then used to convert them into a PEM format, with CA information copied from customCA/ directory. RNN models Code for RNN models, based on Char-RNN-Python, is available in Char-RNN-PyTorch directory. charRNN-custom.py is used for training, and generate.py for generating synthetic certificate instances. python3 generate.py saved_model hidden_size layers temperature original_cert_dataset extra_run_name Saved models available are: 2022-scanned-1024-3-0.0002lr-0.1dropout-epoch3-step300000 2022-scanned-256-3-0.0002lr-0.1dropout-epoch3-step300000 balanced-versions-1024-3-0.0002lr-0.1dropout-epoch3-step300000 balanced-versions-256-3-0.0002lr-0.1dropout-epoch3-step300000 zmap-data-256-3-0.0002lr-0.1dropout-epoch3-step300000 zmap-data-1024-3-0.0002lr-0.1dropout-epoch3-step300000 To generate certificates for the final model used in paper results (IPv4/RNN-Medium with Temperature = 1.5), use: python3 generate.py zmap-data-1024-3-0.0002lr-0.1dropout-epoch3-step300000 1024 3 1.5 zmap-data testZmap1M GPT Models Code for GPT models, based on GPT-Neo-125, is available in Transformers directory. train_script.py is used for training (train_script_scratch.py for training from scratch), and generate.py for generating synthetic certificate instances. python3 generate.py saved_model checkpoint_num training_type temperature training_type can be 'finetune' or 'custom’, for instance: python3 generate.py 2022-scanned-custom checkpoint-284400 custom 1.0 Saved models available are: 2022-scanned 2022-scanned-custom balanced-versions balanced-versions-custom zmap-data-custom zmap-data The custom versions are the ones trained from scratch. conda-env.yml can be consulted for environment dependencies. BibTeX Please cite our paper if you rely on the datasets for your work. @inproceedings{icse2026-hallucinating-certificates, title = {{Hallucinating Certificates: Differential Testing of TLS Certificate Validation Using Generative Language Models}}, author = {Paracha, Talha and Posluns, Kyle and Borgolte, Kevin and Lindorfer, Martina and Choffnes, David}, booktitle = {Proceedings of the 48th IEEE/ACM International Conference on Software Engineering (ICSE)}, date = {2026-04}, edition = {48}, editor = {Mezini, Mira and Zimmermann, Thomas}, location = {Rio de Janeiro, Brazil}, publisher = {Association for Computing Machinery (ACM)/Institute of Electrical and Electronics Engineers (IEEE)} }

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average