Subscribe to DSC Newsletter

List of Swear Words (Preferably English but I could use as many languages as possible)

I am curious if anyone has a list of English swear/inappropriate words. I am working on a quick cleanup for my work on a couple massive SPSS files. In need to write some quick Python to find all innappropriate words and variations of those words within the file. It would be a piece of cake for me if I could just get a list to start from.

 

I need English swear words, but since I work with files in multiple languages I could use any language with ascii characters.

 

Michael

Tags: Lists, Swear, Text, Words

Views: 1706

Replies to This Discussion

There are also many other undesirable keywords, such as "Hitler" or "Online Pharmacy". Usually, they are quite easy to identify because they have large volume, but CPC is low and no or very few advertisers (and no top 100 advertiser) want to or can purchase them - at least on Google.

Regarding the adult category, most English words are easy to detect (although I've seen some exceptions), but some words are a bit more difficult to catch, because they are in a foreign language, e.g. French, and generate search results on Google.com, not just on Google.fr.

In all seriousness I think it's a bad idea to think about offensive words as something that can be listed.  I think that kind of attitude is even a bit dangerous, and exactly the kind of thing that we need to be careful about as analytical professionals. What is offensive depends very much on the context and the audience. 

@ Vincent. Thanks for pointing out that I also should also be including words that are offensive other than swear words. However, I am not working on a keyword related project. I am working on many seperate survey generated datasets that I have to clean up for end clients. There are several open ended questions that respondents are asked to answer. I need to omit all responses that are offensive, so a list of all offensive words in text format would make for a very quick fix.

 

@Gene. I agree that 'offensive' words are inherently offensive. This is why a text file of offensive words will be helpful so that I can make sure that certain responses are ommitted. I am also curious about why you would think a list of swear words is innappropriate. If I do not have a list of flagged words, how am I supposed to write a program that can deal with them propperly?

RSS

On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service