+ 2

How can bags-of-words help in the hate speech detection?

I just read about the bags-of-words- method and I am not sure that I understand it perfectly. In sum this method just create a vector for some texts and if the word of one text match with the words from the hateful vocabulary the vector of this text gets higher. At the end the texts will be detected as hateful if the vector has the higher score. Is it right or I understand it wrong?

bag-of-words

19th Jan 2022, 6:31 PM

Katja

3 Answers

+ 2

I thought about it also today. When you want to use BoW-method you should at first do the preproccesing, like to extract the whole punctuation from the texts and lowercased the whole text etc. For the words with special characters you can use regex with the vocabulary, that will return the most matched word from the vocabulary. But i dont really know, how good or bad this method is

19th Jan 2022, 9:21 PM

Katja

+ 1

Katja Not sure what happens when the derogatory term is mixed with some letters on either side to make it into a new word, like, ui$h!tlp Would the system still work? What if I shove such a term into a random list of characters, would it still work then?

19th Jan 2022, 8:40 PM

Œ ㅤ

Katja Yup, Especially when it's like this (sorry for using the words here): fkccu ihateyoufucyou fzuzczk $h!T basshit (Bass hit, not the other one, what can be done to such words?) Till now the best derogatory-filter that I've seen is the username setting bar in Sonic Forces. Can't fool it

20th Jan 2022, 2:56 AM

Œ ㅤ