A Data Science Central Community
I am curious if anyone has a list of English swear/inappropriate words. I am working on a quick cleanup for my work on a couple massive SPSS files. In need to write some quick Python to find all innappropriate words and variations of those words within the file. It would be a piece of cake for me if I could just get a list to start from.
I need English swear words, but since I work with files in multiple languages I could use any language with ascii characters.
Some "swears" from Bart Simpson:
There are also many other undesirable keywords, such as "Hitler" or "Online Pharmacy". Usually, they are quite easy to identify because they have large volume, but CPC is low and no or very few advertisers (and no top 100 advertiser) want to or can purchase them - at least on Google.
Regarding the adult category, most English words are easy to detect (although I've seen some exceptions), but some words are a bit more difficult to catch, because they are in a foreign language, e.g. French, and generate search results on Google.com, not just on Google.fr.
@ Vincent. Thanks for pointing out that I also should also be including words that are offensive other than swear words. However, I am not working on a keyword related project. I am working on many seperate survey generated datasets that I have to clean up for end clients. There are several open ended questions that respondents are asked to answer. I need to omit all responses that are offensive, so a list of all offensive words in text format would make for a very quick fix.
@Gene. I agree that 'offensive' words are inherently offensive. This is why a text file of offensive words will be helpful so that I can make sure that certain responses are ommitted. I am also curious about why you would think a list of swear words is innappropriate. If I do not have a list of flagged words, how am I supposed to write a program that can deal with them propperly?