charting the real time web
OR
the curious tale of how TechCrunch traffic inexplicably fell off a cliff in December

For a while now I have been thinking about doing a post about some of the data we track at betaworks.   Over the past few months people have written about Twitter’s traffic being up, down or sideways — the core question that people are asking is the real time web growing or not, is this hype or substance?     Great questions — the answer to all of the above is from the data set I see: yes.   Adoption and growth is happening pretty much across the board — and in some areas its happening at an astounding pace.    But tracking this is hard.   It’s hard to measure something that is still emerging.    The measurement tools we have still only sometimes work for counting traffic to web pages and they certainly dont track or measure traffic in streams let alone aggregate up the underlying ecosystems that are emerging around these new markets.  At betaworks we spend a lot of time looking at and tracking this underlying data set.   It’s our business and its fascinating.

I was inspired to finally write something by first a good experience and then a bad one.    First the good one.    Earlier this week I saw a Tweet from Marshall Kirkpatrick about Gary Hayes’s social media counter.    It’s  very nicely done — and an embed is available.     This is what it looks like (note the three buttons on top are hot, you can see the social web, mobile and gaming):

The second thing was less fun but i’m sure it has happened to many an entrepreneur.    I was emailed earlier this week by a reporter asking about some data – I didnt spend the time to weed through the analysis and the reporter published data that was misleading.    More on this incident later.

Lets dig into some data.    First — addressing the question people have asked me repeatedly in the past few weeks.  Did the real time stream grow in Q4 2009?    It did.    Not at the pace that it grew during q1-q3, but our data confirms continued growth.   One of the best proxies we use for directional trending in the real time web are the bit.ly decodes.   This is the raw number of bit.ly links that are clicked on across the web.    Many of these clicks occur within the Twitter ecosystem, but a large number are outside of Twitter, by people and by machines — there is a surprising amount of diversity within the real time stream as I posted about a while back.  Two charts are displayed below.    On the left there are bit.ly decodes (blue) and encodes (red)  running through the second half of last year.    On the right is a different but related metric.   Another betaworks company is Twitterfeed.    Twitterfeed is the leading platform enabling publishers to post from their sites into Twitter and Facebook.    This chart graphs the total number of feeds processed (blue) and the total number of publishers using Twitterfeed, again through the second half of the year (note if the charts inline are too small to read you can click though and see full size versions).   As you can see similar the left hand chart — at Twitterfeed the growth was strong for the entire second half of 2009.

Both these charts illustrate the ongoing shift that is taking place in terms of how people use the real time web for navigation, search and discovery.    My preference is to look at real user interactions as strong indicators of user behavior.   For example I actually find Google trends more useful often than comScore, Compete or the other “page” based measurement services.   As interactions online shift to streams we are going to have to figure out how measurement works. I feel like today we are back to the early days of the web when people talked about “hits” — it’s hard to parse the relevant data from the noise.  The indicators we see suggest that the speed at which this shift to the real time web is taking place is astounding.   Yet it is happening in a fashion that I have seen a couple of times before.

Most social networks I have worked with have grown in a step function manner.  You see this clearly when you zoom into the bit.ly data set and look at weekly decodes.   This is less clear but also visible when you look at daily trending data (on the right) — but add a 3 week moving average on top of that and you can once again see the steps.   You often have to zoom in and out of the data set to see and find the steps but they are usually there.     Sometimes they run for months — either up or sideways.      I saw this with ICQ, AIM, Fotolog, Summize through to bit.ly.   Someone smarter than me has surely figured out why these steps occur.    My hypothesis is that as social networks grow they jump in a sporadic fashion to a new dense cluster or network of relationships.   The upward trajectory is the adoption cycle of that new, dense cluster and the flat part of the step is the period between the step to next cluster.     Blended in here there are clearly issues of engagement vs. trial.   But it’s hard to weed those out from this data set.   I learnt a lot of this from Yossi Vardi and Adam Seifer.    Two people I had the privilege of working with over the years — two people whose DNA is wired right into this stuff.  At Fotolog Adam could take the historical data set and illustrate how these clusters moved — in steps — from geography to geography, its fascinating.

TechCrunch falls off a cliff

Ok I’m sure there are some people reading who are thinking — well this is interesting but I actually want to read about TechCrunch falling off a traffic cliff.   I’m sorry – I actually don’t have any data to suggest that happened.  After noting yesterday that provocative headline is  sometimes a substitute for data I thought — heck I can do this too!    This section of the post is more of a cautionary tale — if you are confused by this twist let me back up to where I started.   I mentioned that there were two motivations for me sitting down and writing this post.   The second one was that earlier this week  TechCrunch story ran this week saying that bit.ly market share had shifted dramatically.     It hasn’t.   The data was just misunderstood by the reporter.   The tale (I did promise a tale) began last August when TechCrunch ran the following chart about the market share of URL shorteners.

The pie chart showed the top 5 URL shorteners and then calculated the market share each had  — what percent each was *of* the top five.     The  data looks like this:

bit.ly 79.61%
TinyURL 13.75%
is.gd 2.47%
ow.ly 2.26%
ff.im 1.92%
(79.61+13.75+2.47+2.26+1.92 = 100)
The comparable data from yesterday is:

bit.ly = 75%
TinyURL = 10%
ow.ly = 6%
is.gd = 4%
tumblr = 4%
(again this adds up to 100%)

Not much news in those numbers, especially when you consider they come from the Twitter “garden hose” (a subset of all tweets) and swing by as much as +/- 5% daily.   The tumblr growth into the top 5 and the ow.ly bump up is nice shift for them – but not really a story.     The hitch was that the reporter didn’t consider that there are other URL’s in the Twitter stream aside from these five.   Some are short URL’s and some aren’t.   So this metric doesn’t accurately reflect overall short URL market share — it shows the shuffling of market share amongst the top five.   But media will be media.   I saw a Tweet this week about how effective Twitter is at disseminating information — true and false — despite all the shifts that are going on headlines in a sense carry even more weight than in the “read all about it” days.

The lesson here for me was the importance of helping reporters and analysts get access to the underlying data — data they can use effectively.   We sent the reporter the  data but he saw a summary data set that included the other URL’s and didn’t understand that back in August there were also “other” URL’s.   After the fact we worked to sort this out and he put a correction in his post.   But the headline was off and running — irrespective of how dirty or clean the data was.   Basic mistake — my mistake — and this was with a reporter who knows this stuff well.   Given the paucity of data out there and the emergent state of the real time web  this stuff is bound to happen.

Ironically, yesterday, bit.ly hit an all time high in terms of decodes — over 90m.   But back to the original question — there is a valid question the reporter was seeking to understand, namely: what is the market share of dem short thingy’s?      We track this metric — using the Twitter garden hose and identifying most of the short URL’s to produce a ranking (note its a sample, so the occurrences are a fraction of the actuals).     And it’s a rolling 24 hr view — so it moves around quite a bit — but nonetheless it’s informative.  This is what it looked like yesterday:

Over time this data set is going to become harder to use for this purpose.    At bit.ly we kicked off our white label service before the holidays.   Despite months of preparation we weren’t expecting the demand.   As we provision and setup the thousands of publishers, blogger and brands who want white label services its going to result in a much more diverse stream of data in the garden hose.

Real Time Web Data

Finally I thought it would be interesting to try to get a perspective on the emergence of the real time web in 2009 — how did its growth compare and contrast with the incumbent web category leaders?    Let me try to frame up some data around this.   Hang in there, some of the things I’m going to do are hacks (at best) — as I said I was inspired!   Lets start with the user growth in the US among the current web leaders — Google and Amazon — this is what it looked like in 2009:

It’s basically flat.     Pretty much every user in the domestic US is on Google for search and navigation and on Amazon for commerce — impressive baseline numbers but flat for the year (source: Quantcast).  So then lets turn to Twitter.    Much ink has been spilt over Twitter.com’s growth in the second half of the year.   During the first half of the year Twitter’s growth, I suspect, was driven to a great extent by the unprecedented media attention it received — media and celebrities were all over it.    Yet in the second half of the year that waned and the traffic numbers to the Twitter.com web site were flat for the second half of the year.    That step issue again?

Placing steps aside — because I dont in anyway seek to represent Twitter Inc. — there are two questions that haven’t been answered  (a) what about international growth, that was clearly a driver for Facebook in ’09, where was Twitter internationally?   (b) what about the ecosystem.     Unsurprisingly its the second question that interests me the most.    So what about that ecosystem?

We know that approx 50% of the interactions with the Twitter API occur outside of Twitter.com but many of those aren’t end user interactions.     We also know that as people adopt and build a following on Twitter they often move up to use one of the client or vertical specifics applications to suit their “power” needs.   At TweetDeck we did a survey of our users this past summer.     The data we got suggested 92% of them then use Tweetdeck everyday — 51% use Twitter more frequently since they started using TweetDeck.  So we know there is a very engaged audience on the clients.     We also know that most of the clients arent web pages — they are flash, AIR, coco, iPhone app’s etc. all things that the traditional measurement companies dont track.

What I did to estimate the relative growth of the Twitter ecosystem is the following.   I used Google Trends and compiled data for Twitter and the key clients.    I then scaled that chart over the Twitter.com traffic.   Is it correct? — no.   Is it made up? — no.   It’s a proxy and this is what it looks like (again, you can click the chart to see a larger version):

Similar to the Twitter.com traffic you see the flattening out in the summer.    But similar to the data sets referenced above you see growth in the forth quarter.     I suspect if you could zoom in and out of this the way I did above you would see those steps again.     So lets put it all together!    Its one heck of a busy chart.   Add in Facebook (blue) and Meebo (green) both steaming ahead — Meebo had a very strong end of year.    And then tile on top the bit.ly data and the Twitterfeed numbers (both on different scales) and you have an overall picture of growth of the real time web vs. Google and Amazon.

Ok.   One last snap shot then im wrapping up.    Chartbeat — yep another betaworks company — had one of its best weeks ever this past week — no small thanks to Jason’s Calacanis’s New Year post about his Top 10 favorite web products of 2009.   To finish up here is a video of the live traffic flow coming into Fred Wilson’s blog at AVC.com on the announcement of the Google Nexus one Phone.    Steve Gilmore mentioned the other week how sometimes interactions in the real time web just amaze one.    Watching people swarm to a site is a pretty enthralling experience.    We have much work to do in 2010.    Some of it will be about figuring out how to measure the real time web.   Much of it will be continuing to build out the real time web and learning about this fascinating shift taking place right under our feet.

random footnote:

A data point I was sent this am by Iain that was interesting — yet it didnt seem to fit in anywhere?!   Asian twitter clients were yesterday over 5% of the requests visible in the garden hose.

Trackbacks

  1. [...] platform really is.  If you are inspired and want to learn more, check out John Borthwick’s Charting the Real Time Web.  If the ecosystem of the real time web is inspiring to you, check out Betawork’s network of [...]

  2. [...] THINK / Musings» Blog Archive » charting the real time web OR the curious tale of how TechCrunch t… "is the real time web growing or not, is this hype or substance? Great questions — the answer to all of the above is from the data set I see: yes. Adoption and growth is happening pretty much across the board — and in some areas its happening at an astounding pace. " (tags: data socialmedia garyhayes) [...]

  3. [...] companies like bit.ly and Chartbeat, some of which Borthwick writes about here. This post was originally published on Borthwick’s blog, and was reprinted here with [...]

  4. [...] wasn’t happy, so he wrote his own post this morning with a deliberately misleading headline (“charting the real time web OR the curious tale of how TechCrunch traffic inexplicably fell of…) to make his displeasure known. Duly noted. Of course, the headline got the post on Techmeme even [...]

  5. [...] traffic volume (as played out in TechCrunch and betaworks’ John Borthwick’s blog, “Charting the Real-Time Web”), suggests that the real-time Web is continuing to explode, even though the standard tracking [...]