- Roskilde University Denmark
This paper revolves around the development of an LSTM multiclass classifier, constructed using Keras as framework and CRISP-DM as project process, with the purpose of classifying natural language into varying degrees of toxicity. The model takes a starting point in an existing toxic comment classification challenge from Kaggle.com, and makes a first iteration, engineered towards the requirements in the challenge. In this first iteration, several measures are taken to avoid common pitfalls of neural networks. The model is then held up against principles of freedom of speech including The Harm Principle and The Offence Principle by John Stuart Mill and Joel Feinberg respectively. After evaluating upon the models performance in the light of these principles, a second iteration is constructed with some design changes. For reasons i.a. related to the dataset, this operation is less successful. The paper concludes that it is possible to make a good multiclassification tool for shallow NLP problem, but gets less efficient in later iterations as we try to apply it to more concrete purposes.