Forum Spam – A Practical Guide

Skydive Easter 13

If you’ve landed on this page hoping to find tips on enhancing your forum spam strategy, I’m afraid you’re out of luck. This guide is intended for those poor souls tasked with cleaning up the aftermath of a merciless spam campaign that has plastered your lovely domain across countless hell-holes of the internet with all the accuracy and subtlety of a sawn-off shotgun.

So, as part of your clean-up operation you’re faced with some wretched old forum within whose monstrous spammy bowels lie a few hundred links to your domain. I won’t ask how they got there. What matters is that sooner or later Google is likely to spot them, and so you’d like to get rid of them post-haste. Ignore the problem, and you risk getting slapped with a penalty.

The method that I’m going to share today has dramatically increased the response rate to my forum link removal requests. In short, it is a nifty way of distilling a long and messy list of spam into a succinct, coherent, and actionable request.

Getting Started

I’m going to assume you have both a contact at the forum, someone to whom you can send a list of the spam you want removed (if you don’t, check out our Contact API, and for stubborn cases this splendid guide by Mr Mason), and a long list of linking URLs on the offending domain (from a backlink analysis, perhaps using the tools of the gadgetplex).

If you’ve culled the data from a variety of sources (Open Site Explorer, Majestic SEO, Google Webmaster Tools), your list probably looks a little like this:

Forum Spam 1

Basically, a mess. Most of these forums don’t have anything that even resembles a coherent permalink structure, and so we’re often left with a mixture of search results links, profile links, post links and thread links, usually with a jumble of extra parameters on the end. Sending the list in its current state to your forum contact is not going to get you anywhere, because it’s simply not actionable. Our job today is to make the webmaster’s life as easy as possible. That means making sense of this raw data.

We can use the spammer’s methodology to our advantage. In my experience, the spammer usually places the offending link in the signatures of their users, so as to ensure it appears in every post they make. Often they will have half a dozen or so accounts, but will make upwards of thirty posts with each account. The best solution therefore is to get a list of the offending accounts and use them as our hit list. Deleting five users and their associated posts is a lot less daunting to our webmaster than making sense of several hundred garbled URLs. So, how do we go about this?

Quick Wins

Always check for an easy way out. Sort your list of linking URLs alphabetically, and search for a pattern. If, in among all the posts and threads, you see a few resembling http://badforum.com/profile/spammer123/, you’re in luck. Pay these profiles a visit, and you’ll probably find that the profile page contains the user’s signature (and the bad link). Grab the URLs of these user profiles, and you’ve got your hit list right there.

Often though, you won’t be so lucky. On some forums, user profile pages don’t include signatures and therefore aren’t picked up on our initial crawl (as they don’t feature the link). Upon running into this stumbling block, you might resign yourself to sifting through the mess by hand, trying to collect the names of all the spamming users. This is time-consuming, prone to errors, and boring, which is why I came up with the following alternative process.

Distill & Refine

First we need to get this list as short as possible. Get rid of all the obvious duplicates by removing all the prefixes; do a Find and Replace on variations of ‘http://www’, ‘http://’, ‘https://’, etc, substituting them with nothing. Then use Remove Duplicates (under the Data tab) to shorten the list. Do a quick check for unnecessary parameters at the end, such as ‘&ncf98dfj292m’, remove them and use Remove Duplicates again.

Ideally at this stage you’ll be able to spot the post permalinks, at which point you can delete all the thread links, page links, search results links, etc. This will leave us with a list of the offending posts (which can still easily reach into the hundreds). If however the forum doesn’t have post permalinks, then you’ll want to work by thread; this means deleting all parameters other than the page number, ‘&page=3′ for instance, to ensure you get the correct page of the thread. Using a combination of LEFT, RIGHT, LEN and SEARCH, Excel pro’s should be able to trim unnecessary parameters automatically.

Crawl Time

Once you’ve run a final dedupe, your list is probably as short as it’s going to get. That means… oh yes. You know what time it is. It’s time to fire up the frog.

Copy the first URL from your list and paste it into your web browser. Locate the offending post and note down the user’s name. Then copy and paste your list of URLs (ensuring to include ‘http://’ before each, use CONCATENATE if you removed it) into a notepad file and save it as a txt. We want to use the Screaming Frog SEO Spider to do a custom scan of the pages’ source code. For help with this process, check out this great guide by Stephanie of Distilled. For the experienced froggers among you, stick it into list mode, feed it your txt file, set your custom filter to scan for the exact username that you just noted down, and hit Start.

Forum Spam 2

Essentially, we’re scanning to see which of the posts (or threads) contain the spammer’s username. If Matt Cutts smiles upon you, your Frog will chew through your list and return a nice sizable chunk of it, meaning our lazy spammer used the same account many times. If it returns just one URL (namely the one you initially picked), you’re going to have a bad day. We’ll assume that hasn’t happened.

Cross-Reference

Hit Export and save the results of your Screaming Frog crawl as a CSV. Then copy the list back into Excel, into a separate table. We want to remove anything from our main list that we’ve linked to our first offender. To do this, add a second column to your main list, and do a VLOOKUP referencing your new table. Anything that doesn’t return an N/A error – that is, anything Screaming Frog has told us contains our first offender’s username – can be deleted, as we have established the user responsible.

Forum Spam 3

Hopefully your list is now quite a bit shorter. Keeping a note of your first offender to one side, pick one of the remaining URLs in your main list and identify the user responsible. Repeat the Screaming Frog crawl, this time searching for the second offender’s username. Remove anything your frog finds from the main list. And so on.

With any luck, by the time you’ve scanned for your fourth or fifth user, your list will be more or less empty. In my experience, only the most determined of spammers will bother creating more than seven or eight. You should be able to get a pretty good idea of how many users you’re dealing with after doing your first crawl, as the automated nature of the spam means that each user will generally have roughly the same number of posts. So keep at it, and once your list is empty, you’re done!

“Great Success!”

Forum Spam 4

And voila! What seemed just half an hour ago to be an insurmountable challenge, a great festering heap of rancid URLs pointing at some dark and disgusting corner of the interwebs is now a short, coherent, (and actionable!) hit-list of bad users. Fire this off to your webmaster and, if you’re lucky, he or she will delete the users and all of their posts. At any rate, you will have a far higher response rate with this method than you would by sending the webmaster hundreds of URLs from your initial crawls.

This is by no means a watertight solution, and naturally it has its shortcomings. But I’ve found that in certain specific cases of forum spam – generally, the ones that make you sigh and brace for disavowal – this method can score big wins, hence why I’m sharing.

I hope you’ve enjoyed reading, and that you’ve found my quick guide useful. I’d love to hear your thoughts so please do share them, either in the comments or via @tomcbennet on Twitter. Thanks for reading!

Image Credit: Varial Freefly
Forum Spam - A Practical Guide, 4.8 out of 5 based on 6 ratings

Comments

  1. Richard Baxter

    This is a seriously smart idea – I can image most forum webmasters are pleased to get rid of this stuff if they’re serious about maintaining a credible website. Great, great post Tom!

  2. Adeel

    can you please make a step by step video of this process..its a great idea..but i don’t understand completely!

  3. Tom Bennet Post author

    Hi Adeel. I’m afraid I don’t have the time to make a step-by-step video at the moment, but I would be happy to answer any questions you have about the process. Please feel free to drop me an email at tom seogadget.co.uk and I will do my best to help out. Thanks for reading!

  4. Jaesi

    I love the personality in this haha. This info could come in handy for me in the future, thank you very much! Also, some of it can be applied to spam comments that I’m constantly having to clean up!