Fun With The SEOmoz API – Get Links Data Straight into Excel

components

I love a challenge, especially when it comes to learning new skills. Today I’m going to show you an incredibly simple way to get your SEOmoz data straight into Excel – hopefully, my post will inspire some of you hardcore Excel junkies to go and build some new toys. What’s not to like?

Do I need to learn to code?

Not for this post. All you’ll need to do is follow the instructions, think about what you’re reading and be willing to try something new.

What do I need to know?

Well, you need to be reasonably familiar with Microsoft Excel, not advanced, just familiar. You’ll need a basic primer on simple regex (this post by Rob will do nicely) and a reasonable sense of how an API call works – check out Ben’s introduction to the SEOmoz API here. Finally, you’ll need Niel’s awesome SEO Tools for Excel.

Basically, if you can form a URI and an Excel query, we’re good to go! Here’s how:

Form a valid API call

To get a response from the SEOmoz api, you just need to form a URL. For testing purposes, forming the following URL and pasting into your browser would get a response:

http://[memberid]:[secret-key]@lsapi.seomoz.com/linkscape/url-metrics/[domain]

Depending on the type of API access you have, you’ll get a lot of json back in your browser, like this:

json from SEOmoz API

Using bit flags, we can get to the data we actually want, say, links to the root domain if you have the full site intelligence API (like we do, yeah!) or juice passing links if you’re rocking the free api. Either way, it’s getting the data we care about, so check this out:

bit field in seomoz api

?Cols=32 is the bit flag for the external links API call. The insanely clever thing about bit flags is (are?) that, if you add the number of another bit flag to 32, let’s say, http status code (536870912) to make 536870944, you get this:

more bit fields - this time combined

Anyway, we’re here to get API data into Excel. To get the job done, we’re going to use SEO Tools for Excel, and specifically the following functions:

=BuildHttpDownloaderConfig()
=DownloadString()
=RegexpFind()

You’ll need to understand =Concatenate, and understand that “d+” in regex means “a character in the range 0-9), 1 or more times”.

Form a valid http request

If you attempt to request our API URL without authentification, you’ll get a 400 response, and obviously no data. For ages, I tried using the member ID and secret key in the request URI with no luck. Thankfully my buddy Neils taught me the =BuildHttpDownloaderConfig() function.

The function is designed to control HTTP requests made by SEO Tools functions, In our case, we need to use the function to authenticate with a member ID and secret key. So, the query looks like this:

=BuildHttpDownloaderConfig(FALSE,,,,,,[member-id],[secret-key])

BuildHttpDownloader’s output is in XML, which is used to configure the function we’re using to fetch the data, =DownloadString()

In its simplest form, the correct syntax to fetch the API output is now:

=DownloadString([API-request],[BuildHttpDownloader-config])

To make this work, create a cell for the api call URI (“A1”), and a cell for your BuildHttpDownloader output (“A2”). That would make your DownloadString function look like this:

=DownloadString(A1,A2)

And here’s what that looks like in Excel:

Excel results

How good is that?! That, my friends is API output straight into Excel!

The last bit – how to extract that number

This is where the regex comes in. If you’re an actual coder, this is where you’ll tempted to mention JSON parsers and all that stuff. A JSON parser makes it easy for developers to fetch the data they need from a JSON output. There is actually a JSON parsing function in SEO Tools, so it’s perfectly possible to head down that route, and even write the results to an array. For the purpose of learning, I really wanted to extract my precious number as quickly and simply as possible. And anyway, if I give it up in one blog post, what is there left for you to work out?

So, assuming the fruits of your labour have been written to cell A3, here’s all you need to do:

=RegexpFind(A3,"d+")

And just like that, you’ll get a nice clean result for which you can add to a table, combine with data from your other tools and generally have an awesome time dreaming up ways to cook your menu of new data skills.

Want to download the spreadsheet? Here’s a simple version: [download] – don’t forget you’ll need to install SEO Tools for Excel, too.

Image credit: DannyMCL

Update!

Danny Goodwin has created a better way to do this with the newer version of SEO Tools – check out his code here: (Very nice Dan!)

 

Fun With The SEOmoz API – Get Links Data Straight into Excel, 5.0 out of 5 based on 2 ratings

Comments

  1. Cyle Tabor

    Richard,

    This is just what I have been looking for!! I couldn’t figure out why I keep getting the 400 response when I put the Member-ID and the Secret Key in the URI. This fixed everything. Thanks for the great post.

  2. Miguel Salcido

    I MUST subscribe to this blog via email. But your RSS is “bonked.” Ping me when its back up, @miguelsalcido, and don’t forget to activate subscribe by email functionality in feedburner. :-)

  3. Miguel Salcido

    Weird, I’m getting a bunch of words and code, not an RSS subscribe. Do you use a service like Feedburner?

  4. Sebastian

    Thank you – will look closer at this. Really want to find ways to use the API to get information out into a dashboard. Will look at Dans post also.

  5. Richard Ingersoll

    I have entered the code supplied by Danny Goodwin into my SEOTools spreadsheet and it’s returning a 1. I’m wondering if perhaps I need to change the cols=68719476736 field information. I’m not at all familiar with JSON. Any thoughts from those who are?

    Thanks!

  6. Richard Baxter Post author

    Hi Richard

    That’s why the step by step (buildhttp, download string and regex) approach is worth following at first. Small steps is also a great way to debug your errors!

    Best of luck!

  7. Bibiano Wenceslao

    Hi Richard,

    Awesome post! This is definitely one timeless piece. I got the API calls working using Danny’s approach with Neils’ SeoTools add-in for Excel. My problem though is that I get a #VALUE error after every 3 consecutive calls on 3 cells and have to wait for about 30 seconds then refresh those cells to be able to call another 3, and so on. I’m using a PRO account by the way which as they’ve documented in the API pricing grants me one request for every 5 seconds. I’m only pulling DA values of 20 different URLs at a time (some link research stuff) so I guest it shouldn’t be that much load.

    Is it an Excel (2007 here) thing? API calls using Google spreadsheets seems to be working fine even when pulling in metrics for multiple URLs at a time. I’m honestly not good with app scripts or anything similar (for most of the time I build my own spreadsheet templates around other people’s ideas shared on the web), but is this probably something that can be fixed? Maybe something similar to Utilities.sleep? Or is this something that can be tweaked via the SeoTools add-in? I can live with these errors though, but a permanent fix would be wondrous.

    My super thanks for this post!

    Cheers!

  8. Daniel Foley

    Hi Richard,

    Try increasing the PAUSE time in the HTTP Downloader function. You will find that the API calls fail and return ? instead of data if there is not a sufficient pause time between requests.

    Hope this helps!

  9. Dipak Patel

    Hi All

    Getting a 407 proxy authorization required message ?

    Any ideas where i’m going wrong.

    Thanks