CREXWED: CREXdata Weather Emergency Detector

Model description CREXWED, CREXdata Weather Emergency Detector is a weather emergency text classification model fine-tuned primarily on twitter data to identify social media posts that are speaking on a wildfire or flood incident and containing actionable information to aid rescue efforts. The model was trained to provide labels 'fire', 'flood', 'none'. Intended Usage This model is intended to be used for text classification in English, Spanish, Catalan, German. How to use from transformers import pipeline event_predictor = pipeline("text-classification", model=model_path, batch_size=512) tokenizer_kwargs = {'padding': True, 'truncation': True, 'max_length': 512} tweet_text_en = "It is raining heavy, the water in my apartment is up to my knees. Send help!!" tweet_text_de = "Es regnet in Strömen, das Wasser in meiner Wohnung steht mir bis zu den Knien. Schickt Hilfe!" tweet_text_es = "Está lloviendo muchísimo, hay agua en casa y me llega hasta los tobillos. Necesitamos ayuda!" tweet_text_ca = "Està plovent moltíssim, tinc aigua a casa que m'arriba fins els turmells. Necessitem ajuda!" output = event_predictor(tweet_text_en, **tokenizer_kwargs)[0] print(output) print(f'Predicted class: {output["label"]}') print(f'Prediction Score: {output["score"]}') Limitations and bias No measures have been taken to estimate the bias and toxicity embedded in the model. Since the data used to fine-tune this model comes from social media, this will contain biases, hate speech and toxic content. We have not applied any steps to reduce their impact. The base model twitter-xlm-roberta-base this model was fine-tuned from may also contain bias and toxicity. Training Training data The model was trained on a mix of real and synthetic tweets. The real tweets were collected from Twitter and synthetically annotated using a LLM, the datset can be found here. The synthetic tweets were generated using Google’s Gemma 3 27B and MistralAI’s Mistral Small 24B, the dataset can be found here. Training procedure The training data mentioned in the previous section was use to perform a full-parameter fine-tuning of the twitter-xlm-roberta-base model. Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 1.0 Evaluation Evaluation data The model was evaluated using the test dataset from here. Data statistics: - fire = fire-related - flood = flood-related - none = no disaster label Language fire flood none de 222 304 7416 ca 520 340 10611 es 592 239 5988 en 230 942 7318 Evaluation results Language F1 de 0.838 ca 0.704 es 0.705 en 0.799 Additional information Authors - Language Technologies Unit, Barcelona Supercomputing Center. Contact For further information, send an email to either or . License This work is distributed under a Apache License, Version 2.0. Terms of Use Since, part of the data used to train this model was generated using Google's Gemma 3 model, its usage should follow Terms of Use and Prohibited Use Policy. Funding This work has been developed under the EU-funded CREXDATA Project (Grant Agreement No. 101092749). Citation Disclaimer The model published in this repository is intended for a generalist purpose and is made available to third parties under a Apache v2.0 License. Please keep in mind that the model may have bias and/or any other undesirable distortions. When third parties deploy or provide systems and/or services to other parties using this model (or a system based on it) or become users of the model itself, they should note that it is under their responsibility to mitigate the risks arising from its use and, in any event, to comply with applicable regulations, including regulations regarding the use of Artificial Intelligence. In no event shall the owners and creators of the model be liable for any results arising from the use made by third parties.

EOSC Subjects

Twitter Data

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average