Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Model . 2024
Data sources: ZENODO
versions View all 2 versions
addClaim

LDA-Mallet topic model for CPV-45 Spanish tender titles from Procurement Metadata

Authors: Carlos III University of Madrid;

LDA-Mallet topic model for CPV-45 Spanish tender titles from Procurement Metadata

Abstract

This model consists of an LDA-Mallet topic model trained on the titles from Spanish tenders with CPV 45 from the Spanish Procurement metadata dataset. The data was collected by crawling the Spanish government’s “Plataforma de contratación del sector público”. Before modeling, the tender titles underwent lemmatization and stopwords removal. The selection of the number of topics for model training was based on optimizing both coherence (Cv) and dispersion among the identified topics. The model is composed of the following folders and files: model_data: Stores model-related data and outputs. train_data: Contains data used for training the model. es_Mallet_all_CPV_45_15_topics ├── model_data # Stores model-related data and outputs │ ├── TMmodel # Directory for various model-related files │ │ ├── alphas.npy # NumPy array storing alpha parameters (topics' size) │ │ ├── alphas_orig.npy # Original alpha parameters (in case modifications to the alphas file are made) │ │ ├── betas.npy # NumPy array storing beta parameters │ │ ├── betas_ds.npy # Downsampled beta parameters │ │ ├── betas_orig.npy # Original beta parameters │ │ ├── edits.txt # Text file for documenting edits │ │ ├── ndocs_active.npy # NumPy array storing the number of active documents │ │ ├── pyLDAvis.html # HTML file for visualizing topic models (PyLDAvis) │ │ ├── thetas.npz # NumPy file storing theta parameters (document-topic distribution) │ │ ├── thetas_orig.npz # Original theta parameters │ │ ├── topic_coherence.npy # NumPy array storing topic coherence scores │ │ ├── tpc_coords.txt # Text file storing topic coordinates │ │ ├── tpc_descriptions.txt # Text file for topic descriptions │ │ ├── tpc_labels.txt # Text file storing curated topic labels (no changes to the ChatGPT ones were made) │ │ └── vocab.txt # Text file storing vocabulary │ ├── corpus_train.mallet # Training corpus in Mallet format │ ├── corpus_train.txt # Training corpus in text format │ ├── diagnostics.xml # XML file for model diagnostics obtained from Mallet │ ├── dictionary.gensim # Gensim dictionary file │ ├── doc-topics.txt # Document-topic distribution after training │ ├── inferencer.mallet # Mallet inferencer file │ ├── model.pickle # Pickle file storing the trained model │ ├── topic-keys.json # JSON file storing topic keys │ ├── topic-keys.txt # Text file storing topic keys │ ├── topic-report.xml # XML file for topic report │ ├── vocab_freq.txt # Text file storing vocabulary frequency │ ├── vocabulary.txt # Text file storing vocabulary │ └── word-topic-counts.txt # Text file storing word-topic counts ├── train_data # Contains data used for training the model │ ├── corpus.mallet # Corpus in Mallet format for training │ ├── corpus.txt # Raw text data for training │ ├── corpus_aux.txt # Auxiliary text file for the corpus │ ├── import.pipe # Pipe file for importing inference data │ └── train.config # Configuration file for training └── trainconfig.json # JSON file containing the training configuration

Related Organizations
Keywords

Topic Modeling, LDA, CPV, Public Procurement, Mallet

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average