GeenStijl.nl embeddings (TGTR-4)

The trained word embeddings (±150MB) are released for free and may be useful for further study on toxic online discourse.

We indexed over 8M public messages from the controversial Dutch websites GeenStijl and Dumpert to train a word embedding model that captures the toxic language representations contained in the dataset.

Available upon request

.css-l0mio9{display:none;visibility:hidden;}