Instead of blacklisting words that shouldn t be tags, why don t you instead build a whitelist of words that would make for good tags?
Start with an handful of tags that you would like to have, like Python
, off-topic
, football
, rickroll
or whatnot (depends on the kind of site you are building!) and have the system only suggest between those, then let users handpick appropriate tags and also let them type in their own tags.
When enough users suggest a tag, it gets into the pool of "known good" tags for auto suggestion -- maybe after some sort of moderation, so that you can still blacklist stupid tags like the
, lolol
, or typoed tags like objectoriented
when you have object-oriented
.
Only show few suggestions. Offer autocompletion. Limit the number of tags per item. If this will be about coding, maybe some sort of language detection system (the file
linux command is not too shabby on this) will help your suggestion system.