Once upon a time, it took weeks, if not months to get a new page crawled and indexed by a search engine. Later, the whole indexing process got faster – I found this post from 2007 from a very excited Patrick at Blogstorm discussing a web page getting indexed within a minute.
Image – 16bit EPROM by YellowCloud
The real time web
Thanks largely to the emergence of the real time web, search engines have had to work even harder, improving their massive infrastructure to get new content indexed fast. You don’t need to look for long to find evidence that fresher indexing is right at the top of Google’s priority list.
A week or so ago Rand and I discussed the fascinating subject of getting your content indexed quickly using sitemap pings, Ping-O-Matic and PubSubHubbub. Today Let’s take a look at what PubSubhubbub is, and how you can implement it quickly and easily on your self hosted WordPress blog or website.
What is PubSubHubbub?
PubSubHubbub is a simple publish / subscribe protocol (hence the “PubSub“) that turns Atom and RSS feeds into real-time streams. In its simplest form, PubSubHubbub is a way to get content in front of your subscribers in real time, via a hub.
Why is that cool?
To understand why PubSubHubbub is exciting, you need to understand the difference between push and poll. Crawlers, such as GoogleBot or Google Reader’s feed fetcher might be required to revisit your site periodically to check if there’s any new content to syndicate. This frequent revisit method is known as “poll”. It’s efficient when you think about it – because those crawlers have no idea if your site has changed until they arrive, discover new content, receive a 304 not modified response or even download the same old content they saw last time.
The alternative made available through PubSubHubbub is that a publisher can push new content to subscribers via their hub. The publisher talks to their (or a public) hub, the hub talks to all of the subscribers, pushing new content out to each of them in real time.
Isn’t this just ping?
We’ve had the capability to tell feed aggregators and syndicators that a feed has updated with new content for some time, using Ping. Ping-O-Matic, for example is a service you can use to tell services like Feedburner, Syndic8, Blo.gs and NewsGator that something has changed and that they should poll your RSS feed to get the latest content on your site. The thing is, those subscribers are still polling your feed (and visiting your feed URL) to grab the latest content. You ping, they fetch. With PubSubHubbub, the hub efficiently fetches the published feed content and multicasts the new/changed content out to all of the registered subscribers, making your site do a lot less work in terms of serving a URL.
It’s a difficult concept which is best explained by this video, created by Brett Slatkin and Brad Fitzpatrick, creators of the PubSubHubbub protocol.
Services that are compatible with PubSubHubbub
First off, Matt told me that (as of last week) Google organic search does not currently use PubSubHubbub as a direct discovery mechanism for new content. Though that news was a little disappointing to hear, it was before Google had officially announced that their Caffeine index was fully live. It makes perfect sense for Google and other search engines to use this protocol as a direct discovery signal, and I really hope we see that happen. Google did mention that they may use PubSubHubbub sometime in the future, so who knows. In the meantime, PubSubHubbub subscribers include: FriendFeed, Six Apart, Google Reader, Google Alerts (Blog Search / News indexing impact? Don’t know), Google Buzz, Ping.fm, NetVibes and Status.net. PubsubHubbub publishers include the hosted version of WordPress, Posterous – here’s mine and Tumblr.
The super cool new Google Ajax Feed API uses PubSubHubbub to update syndicated feed widgets (or whatever else you’re using the API for) in real time, too:
I’m super excited about PubSubHubbub, how can I implement it into WordPress?
If you’re self hosting WordPress, it’s pretty easy to get started. You’re going to need a hub, the URL for which will need to be included in your RSS feed header, and a plug-in to ping your hub. That’s it!
Set up a hub
To set up a hub, you have a few options. I’m experimenting with all of them, so I have (as yet) no real preference for any of these solutions
1) Use a public hub (no set up at all) at: http://pubsubhubbub.appspot.com/
2) Use a 3rd party hub service that requires validation during set up, like Superfeedr.com: http://seogadget.superfeedr.com/
Ping your hub and include the hub URL in your RSS feed
When a PubSubHubbub compatible subscriber polls your RSS feed, you want them to see that you’re now a PubSubHubbub enabled publisher, You do this by referencing your Hub URL in your RSS feed header. Here’s mine:
Notice, the rel=”hub”? That’s a link based Microformat describing the location of my hub. If the subscriber is PubSubHubbub compatible, they should start subscribing to the hub, using the authentication process flow outlined in this presentation:
Fortunately, the PubSubHubbub plugin for WordPress will make it easy for you to add your hub URL into your RSS feed and will ping your hub every time you publish new content. Here’s what it looks like:
PubSubHubbub is not just for search, it’s real time notifications and data distribution
PubSubHubbub is an obvious and important step into the world of faster indexing and discovery, but there are many more diverse applications to the protocol too. I read an inspiring series of examples and ideas for distributed, real time updates to personal and public profile data on sites such as Linkedin and Facebook, and heard stories about real time Mobile application Push notifications using tools like Urban Airship. I’ve got to say, this is an exciting new field for technology inspired digital marketers and software developers alike, and it’s definitely worth getting up to speed on the topic sooner rather than later.
Fun With PubSubHubbub, WordPress & Faster Indexation [Real Time Search & Data Distribution],