A few months ago I attended a Guardian Master Class on Data Visualisation hosted by Tobias Sturt, Head of creative at Guardian Digital Agency and Adam Frost, Data Visualisation Manager at Guardian Digital Agency. Both were incredibly friendly, down-to-earth and made the class very enjoyable. I highly recommend checking them (and their work) out. Data ...
I recently attended the fantastic SASCon, search and social conference in Manchester, which I would recommend to any SEO, social media and associated online marketing professionals. Whilst much of the conference content has already been covered elsewhere, including great liveblog coverage at State of Search, I was lucky enough to get an interview with speaker Dixon Jones of Majestic SEO to discuss their new Flow Metrics. I'd recommend you read the post linked to above if you're as of yet unfamiliar with the new Flow Metrics as the interview assumes a certain level of awareness and attempts to dig a little deeper into the mechanics of these new link metrics.
[caption id="attachment_1506" align="aligncenter" width="495"] Buzzin' at SAScon 2012[/caption]
[Picture courtesy of the global wanderer of online marketing that is,Jackie Hole]
NS: For how long have Majestic had the data capability to perform these metric calculations?
DJ: We've of course had AC rank for some time but it was becoming clear that our competitors we're stronger on their equivalent metric, so we started working on finding out the faults with AC. The main fault being that it only looked at external links when it was clear that internal links were passing some flow. In addition AC rank treated all links equally with no weighting towards position on page or context. So there was really the ability to abuse PageRank, so we wanted to flow it. We started working on this in December/January and we made some headway but it was a massive exercise, as it had to be run over the whole data-set.
NS: Because it's iterative?
DJ: Absolutely. Citation Flow runs through (probably) trillions of calculations every day and needs to wait for the index to update. It's difficult to run on a sub-set as its almost meaningless as the whole data-set is needed to complete the calculations.
NS: Can you sum up the difference between"citation" and "trust".
DJ: The decay algorithm is the same wherever I go, in that the characteristic flows through links. The difference is the start of the data-set. In the case of Citation Flow it starts with AC rank whereas with Trust Flow it's informed by a human review. One measures how influential links are, whereas trust flow attempts to position a URL in a neighbourhood of "trust".
NS: Did you use your own human quality raters?
DJ: No there's just too many, we used several curated data sets.
NS: So trust is a human informed machine algorithm?
DJ: Yes, it's just the initial seed set that is human informed so we don't need to revisit the data.
NS: Have you done any correlation studies between Citation/Trust Flow metrics and PageRank?
DJ: Yes, though I may not publish them on Majestic, I may publish that on my own blog. I have data on DomainAuthority and with PageRank and with MozTrust and MozRank (which in my test did not correlate at all.) DomainAuthority correlated at .7 and Citation Flow correlated with PageRank at .814. Trust Flow less so; however, I would stress that the two are not intended to copy PageRank, so it's slightly coincidental that Citation Flow correlates with PageRank, but we think it's a much better metric. It's more granular (on a scale of 0-100), it's fresher and updates daily, it's transparent in that it isn't knackered by penalties - it is what it is. Of course it flows as well; which is we feel a better more modern metric.
Trust doesn't correlate badly but then it's not really designed to equate. It's much harder to get a good Trust score than Citation as every page starts with an AC Rank, but we couldn't start with every site in a Trust set.
The profile charts that we've done, whilst they illustrate nicely it's a bit like trying to represent a fingerprint. We've only had this data live since Monday [14th] so really we're looking to see how the community makes use of his data.
NS: Is there a controlled and defined rate of degradation of flow from page to page? Is there a weighted contribution to amount of flow passed?
DJ: There is a weighted algorithm on the factors that contribute but I can't really go into that in much detail.
NS: Can you tell me one?
DJ: Things like follow and nofollow we make decisions about and are experimenting with switching that on and off. We do want to work on expanding on that quite a bit further such as context and location on page but again I can't go into that in too much detail right now.
Thanks Dixon. Great progress for the Majestic SEO product and I'm really excited to get started in on this new data set.
Note: Thanks to Jon Quinton of SEO Gadget for rapping through some questions with me pre-interview.