charting the real time web
OR
the curious tale of how TechCrunch traffic inexplicably fell off a cliff in December

For a while now I have been thinking about doing a post about some of the data we track at betaworks.   Over the past few months people have written about Twitter’s traffic being up, down or sideways — the core question that people are asking is the real time web growing or not, is this hype or substance?     Great questions — the answer to all of the above is from the data set I see: yes.   Adoption and growth is happening pretty much across the board — and in some areas its happening at an astounding pace.    But tracking this is hard.   It’s hard to measure something that is still emerging.    The measurement tools we have still only sometimes work for counting traffic to web pages and they certainly dont track or measure traffic in streams let alone aggregate up the underlying ecosystems that are emerging around these new markets.  At betaworks we spend a lot of time looking at and tracking this underlying data set.   It’s our business and its fascinating.

I was inspired to finally write something by first a good experience and then a bad one.    First the good one.    Earlier this week I saw a Tweet from Marshall Kirkpatrick about Gary Hayes’s social media counter.    It’s  very nicely done — and an embed is available.     This is what it looks like (note the three buttons on top are hot, you can see the social web, mobile and gaming):

The second thing was less fun but i’m sure it has happened to many an entrepreneur.    I was emailed earlier this week by a reporter asking about some data – I didnt spend the time to weed through the analysis and the reporter published data that was misleading.    More on this incident later.

Lets dig into some data.    First — addressing the question people have asked me repeatedly in the past few weeks.  Did the real time stream grow in Q4 2009?    It did.    Not at the pace that it grew during q1-q3, but our data confirms continued growth.   One of the best proxies we use for directional trending in the real time web are the bit.ly decodes.   This is the raw number of bit.ly links that are clicked on across the web.    Many of these clicks occur within the Twitter ecosystem, but a large number are outside of Twitter, by people and by machines — there is a surprising amount of diversity within the real time stream as I posted about a while back.  Two charts are displayed below.    On the left there are bit.ly decodes (blue) and encodes (red)  running through the second half of last year.    On the right is a different but related metric.   Another betaworks company is Twitterfeed.    Twitterfeed is the leading platform enabling publishers to post from their sites into Twitter and Facebook.    This chart graphs the total number of feeds processed (blue) and the total number of publishers using Twitterfeed, again through the second half of the year (note if the charts inline are too small to read you can click though and see full size versions).   As you can see similar the left hand chart — at Twitterfeed the growth was strong for the entire second half of 2009.

Both these charts illustrate the ongoing shift that is taking place in terms of how people use the real time web for navigation, search and discovery.    My preference is to look at real user interactions as strong indicators of user behavior.   For example I actually find Google trends more useful often than comScore, Compete or the other “page” based measurement services.   As interactions online shift to streams we are going to have to figure out how measurement works. I feel like today we are back to the early days of the web when people talked about “hits” — it’s hard to parse the relevant data from the noise.  The indicators we see suggest that the speed at which this shift to the real time web is taking place is astounding.   Yet it is happening in a fashion that I have seen a couple of times before.

Most social networks I have worked with have grown in a step function manner.  You see this clearly when you zoom into the bit.ly data set and look at weekly decodes.   This is less clear but also visible when you look at daily trending data (on the right) — but add a 3 week moving average on top of that and you can once again see the steps.   You often have to zoom in and out of the data set to see and find the steps but they are usually there.     Sometimes they run for months — either up or sideways.      I saw this with ICQ, AIM, Fotolog, Summize through to bit.ly.   Someone smarter than me has surely figured out why these steps occur.    My hypothesis is that as social networks grow they jump in a sporadic fashion to a new dense cluster or network of relationships.   The upward trajectory is the adoption cycle of that new, dense cluster and the flat part of the step is the period between the step to next cluster.     Blended in here there are clearly issues of engagement vs. trial.   But it’s hard to weed those out from this data set.   I learnt a lot of this from Yossi Vardi and Adam Seifer.    Two people I had the privilege of working with over the years — two people whose DNA is wired right into this stuff.  At Fotolog Adam could take the historical data set and illustrate how these clusters moved — in steps — from geography to geography, its fascinating.

TechCrunch falls off a cliff

Ok I’m sure there are some people reading who are thinking — well this is interesting but I actually want to read about TechCrunch falling off a traffic cliff.   I’m sorry – I actually don’t have any data to suggest that happened.  After noting yesterday that provocative headline is  sometimes a substitute for data I thought — heck I can do this too!    This section of the post is more of a cautionary tale — if you are confused by this twist let me back up to where I started.   I mentioned that there were two motivations for me sitting down and writing this post.   The second one was that earlier this week  TechCrunch story ran this week saying that bit.ly market share had shifted dramatically.     It hasn’t.   The data was just misunderstood by the reporter.   The tale (I did promise a tale) began last August when TechCrunch ran the following chart about the market share of URL shorteners.

The pie chart showed the top 5 URL shorteners and then calculated the market share each had  — what percent each was *of* the top five.     The  data looks like this:

bit.ly 79.61%
TinyURL 13.75%
is.gd 2.47%
ow.ly 2.26%
ff.im 1.92%
(79.61+13.75+2.47+2.26+1.92 = 100)
The comparable data from yesterday is:

bit.ly = 75%
TinyURL = 10%
ow.ly = 6%
is.gd = 4%
tumblr = 4%
(again this adds up to 100%)

Not much news in those numbers, especially when you consider they come from the Twitter “garden hose” (a subset of all tweets) and swing by as much as +/- 5% daily.   The tumblr growth into the top 5 and the ow.ly bump up is nice shift for them – but not really a story.     The hitch was that the reporter didn’t consider that there are other URL’s in the Twitter stream aside from these five.   Some are short URL’s and some aren’t.   So this metric doesn’t accurately reflect overall short URL market share — it shows the shuffling of market share amongst the top five.   But media will be media.   I saw a Tweet this week about how effective Twitter is at disseminating information — true and false — despite all the shifts that are going on headlines in a sense carry even more weight than in the “read all about it” days.

The lesson here for me was the importance of helping reporters and analysts get access to the underlying data — data they can use effectively.   We sent the reporter the  data but he saw a summary data set that included the other URL’s and didn’t understand that back in August there were also “other” URL’s.   After the fact we worked to sort this out and he put a correction in his post.   But the headline was off and running — irrespective of how dirty or clean the data was.   Basic mistake — my mistake — and this was with a reporter who knows this stuff well.   Given the paucity of data out there and the emergent state of the real time web  this stuff is bound to happen.

Ironically, yesterday, bit.ly hit an all time high in terms of decodes — over 90m.   But back to the original question — there is a valid question the reporter was seeking to understand, namely: what is the market share of dem short thingy’s?      We track this metric — using the Twitter garden hose and identifying most of the short URL’s to produce a ranking (note its a sample, so the occurrences are a fraction of the actuals).     And it’s a rolling 24 hr view — so it moves around quite a bit — but nonetheless it’s informative.  This is what it looked like yesterday:

Over time this data set is going to become harder to use for this purpose.    At bit.ly we kicked off our white label service before the holidays.   Despite months of preparation we weren’t expecting the demand.   As we provision and setup the thousands of publishers, blogger and brands who want white label services its going to result in a much more diverse stream of data in the garden hose.

Real Time Web Data

Finally I thought it would be interesting to try to get a perspective on the emergence of the real time web in 2009 — how did its growth compare and contrast with the incumbent web category leaders?    Let me try to frame up some data around this.   Hang in there, some of the things I’m going to do are hacks (at best) — as I said I was inspired!   Lets start with the user growth in the US among the current web leaders — Google and Amazon — this is what it looked like in 2009:

It’s basically flat.     Pretty much every user in the domestic US is on Google for search and navigation and on Amazon for commerce — impressive baseline numbers but flat for the year (source: Quantcast).  So then lets turn to Twitter.    Much ink has been spilt over Twitter.com’s growth in the second half of the year.   During the first half of the year Twitter’s growth, I suspect, was driven to a great extent by the unprecedented media attention it received — media and celebrities were all over it.    Yet in the second half of the year that waned and the traffic numbers to the Twitter.com web site were flat for the second half of the year.    That step issue again?

Placing steps aside — because I dont in anyway seek to represent Twitter Inc. — there are two questions that haven’t been answered  (a) what about international growth, that was clearly a driver for Facebook in ’09, where was Twitter internationally?   (b) what about the ecosystem.     Unsurprisingly its the second question that interests me the most.    So what about that ecosystem?

We know that approx 50% of the interactions with the Twitter API occur outside of Twitter.com but many of those aren’t end user interactions.     We also know that as people adopt and build a following on Twitter they often move up to use one of the client or vertical specifics applications to suit their “power” needs.   At TweetDeck we did a survey of our users this past summer.     The data we got suggested 92% of them then use Tweetdeck everyday — 51% use Twitter more frequently since they started using TweetDeck.  So we know there is a very engaged audience on the clients.     We also know that most of the clients arent web pages — they are flash, AIR, coco, iPhone app’s etc. all things that the traditional measurement companies dont track.

What I did to estimate the relative growth of the Twitter ecosystem is the following.   I used Google Trends and compiled data for Twitter and the key clients.    I then scaled that chart over the Twitter.com traffic.   Is it correct? — no.   Is it made up? — no.   It’s a proxy and this is what it looks like (again, you can click the chart to see a larger version):

Similar to the Twitter.com traffic you see the flattening out in the summer.    But similar to the data sets referenced above you see growth in the forth quarter.     I suspect if you could zoom in and out of this the way I did above you would see those steps again.     So lets put it all together!    Its one heck of a busy chart.   Add in Facebook (blue) and Meebo (green) both steaming ahead — Meebo had a very strong end of year.    And then tile on top the bit.ly data and the Twitterfeed numbers (both on different scales) and you have an overall picture of growth of the real time web vs. Google and Amazon.

Ok.   One last snap shot then im wrapping up.    Chartbeat — yep another betaworks company — had one of its best weeks ever this past week — no small thanks to Jason’s Calacanis’s New Year post about his Top 10 favorite web products of 2009.   To finish up here is a video of the live traffic flow coming into Fred Wilson’s blog at AVC.com on the announcement of the Google Nexus one Phone.    Steve Gilmore mentioned the other week how sometimes interactions in the real time web just amaze one.    Watching people swarm to a site is a pretty enthralling experience.    We have much work to do in 2010.    Some of it will be about figuring out how to measure the real time web.   Much of it will be continuing to build out the real time web and learning about this fascinating shift taking place right under our feet.

random footnote:

A data point I was sent this am by Iain that was interesting — yet it didnt seem to fit in anywhere?!   Asian twitter clients were yesterday over 5% of the requests visible in the garden hose.

  • http://www.andyswan.com andyswan

    Great stuff. thank you

  • Pingback: My Twitter Tipping Point | Darren Herman

  • http://500hats.typepad.com davemc500hats

    awesome stuff John… please make this a regular quarterly (or monthly?) update!

  • Pingback: links for 2010-01-08

  • Pingback: Finance Geek » Charting The Real-Time Web

  • http://twitter.com/L1AD LIAD

    really interesting and informative post.

    – you really need to make the size of your font bigger though, i doubt many people will start reading a blog post with such a teeny font let alone actually finish it.

    your selling your content short.

  • http://ventureswell.com LukeG

    The “step growth” function is pretty fascinating. Your hypothesis seems on, too – adoption accelerates within a specific social network (or “component”), a weak tie/local bridge spreads it to another network component, and adoption cascades from there.

    Jon Kleinberg and David Easley have a new book out next year – Networks, Crowds, and Markets – that addresses the theory behind a lot of this in an accessible way. Ebook is free online here: http://www.cs.cornell.edu/home/kleinber/networks-book/.

    I’m with Dave – regular updates on this would be amazing.

  • Pingback: Lies, Damned Lies, And Statistics or How To Get Under John Borthwick’s Skin

  • http://thegongshow.tumblr.com andrewparker

    John, great post, thanks for all the data. And, I love the title ;)

    One question, I'm a little bit lost in your final graph where you put it all together. I see two y-axes on that graph, but then you also mention that tweetmeme, meebo, etc are on different scale, so it sounds like the graph needs a couple more additional y-axes to be fully labeled… This matters because the graph is showing growth relative to each other. Without a consistent scale, I get lost in how the growth is relative.

    One solution might be to show all the growths of the various services as %change since Jan-09. That way, all the data will be on the same scale, and it will be easier to determine visually what going on with the growth of the realtime web.

    I really like the decode/encode data. I hope in a future post we can see a distribution of decode frequency for a random encoded link. I'm sure the number of decodes per encode is going to be a power curve (like everything on the net), but I imagine the shape of the power curve will be informative.

    Keep up the great work.

    • Johnborthwick

      AP thanks for the comment — let me clarify. All the data in the last chart is on the left hand Y axis with the exception of the bit.ly data and the Twitterfeed data. The bit.ly data is on the right hand Y axis. For the Twitterfeed data set yes I ran out of Y axes — what I did was use the left hand Y axes but I scaled it down by a factor of 10k to fit the scale. So its the same data as the blue line in the chart on top on Twitterfeed (showing # of feeds). On the chart up top the number of Twitterfeed feeds scales from 220k to approx 780k — over the period — in the chart below it runs from 22 –>78

    • georgeneuman

      LOL, varying the scale on the XY is a great way to make your lines steep or shallow

    • Johnborthwick

      AP — Running some new charts which I will push out shortly on a new post but we tried what you suggested (% change, month over month in 2009) and it is very hard to read — see here: http://bit.ly/7NctaZ

      • http://thegongshow.tumblr.com andrewparker

        Yea, that is a mess, thanks for taking a stab at it.

        However you end up deciding to visualize, perhaps you could publish
        the .xls so others could take a cut at it. I don't know if certain
        sources are proprietary or not…

    • http://www.lacostepoloshirts.co lacoste polo

      What is the REAL story then, sounds like a knee jerk panic post.

  • georgeneuman

    gadzooks. talk about touchy! Did you stay up all night to post this reposte to the TC posting. What is the REAL story then, sounds like a knee jerk panic post. Or as my guru says, a KJPP

    • Johnborthwick

      Lol — nope. What's KJPP — Google search says: Korean Journal of Physiology and Pharmacology?!

  • http://technbiz.blogspot.com paramendra

    This is one fascinating blog post. I came here from a TechCrunch post that reported you were miffed. And I am glad I did because you give a panoramic view of stats on some hot web properties. I came for one thing, found something else entirely, and I am glad. Revealing. And the mention of Fred Wilson towards the end is an eyeful. TechCrunch seems to do the love-hate thing with some pretty accomplished people in tech. http://technbiz.blogspot.com/2010/01/anu-shukla-has-found-new-frontier-in.html I think it is business. The media people have to ignite controversies to do well in the page hits department. My first time on your blog, off you go on to my blogroll.

  • Pingback: peHUB » Why Invest in oneforty and the Real-Time Web?

  • RickBullotta

    Interesting question: if Tweets could be 512 characters, how would the URL shortener traffic patterns change?

  • bonnwafer

    Your articles are very informative. Keep up the good work. We online users need more similar article like yours.

  • http://www.titidirectonline.co.uk/ski-snowboard-goggles skiing goggles

    great stuff

  • http://www.titidirectonline.co.uk/ski-snowboard-goggles skiing goggles

    great stuff

  • http://www.filecabinetkey.net/file-cabinet-lock-bar File Cabinet Lock Bar

    Oh!…that's great helpful, it's so right to me! Million thanks for the article,

  • Anonymous

    Very very nice article.Thanks for sharing with us.
    cursus timemanagement

  • Anonymous

    these new markets. At betaworks we spend a lot of time looking at and tracking this underlying data set. It’s our business and its fascinating.

  • http://www.cheap-lacostepoloshirts.co.uk lacoste polo

    a weak tie/local bridge spreads it to another network component, and adoption cascades from there.

  • http://www.cheap-ralphlaurenpoloshirts.com big pony polo

    much of it will be continuing to build out the real time web and learning about this fascinating shift taking place right under our feet

  • http://www.guccishoesmenss.com gucci boots

    I think this is a great post. One thing that I find the most helpful is number five. Sometimes when I write, I just let the flow of the words and information come out so much that I loose the purpose. It’s only after editing when I realize what I’ve done. There’s defiantly a lot of great tips here I’m going to try to be more aware of.