Nick Halstead
Nick Halstead, CEO, May 26th

I am getting sick of talk about twitter and it’s scalability problems and also frankly unqualified people slagging the service for it’s unreliability and also coming up with stupid ignorant answers to how it should be fixed.

As part of our future we do see a many-to-many problem arising so I put some thought to how we would tackle it (like any of our other problems) by doing some ‘planning’.

In a recent post Robert Scoble tries to explaining how twitter works by saying that twitter is using some form of ‘pivot table‘ – (my terminology for what he explains) and says that a model that others have put forward (i.e. a de-normalized system of inserting messages into everyones queues) was akin to microsoft exchange, now these two examples are so horribly not connected – and I won’t rant about how BAD exchange is efficiency wise, but please Robert do not get into any technical arguments please.

How it should be done

So what did I come up with for solving the many-to-many problem?

Firstly let me explain my terminology, I use the term ’shard’ because a shard could be a physical server or a daemon running alongside other daemon’s running on a server or even a grid platform.

The basic idea is to build a system in which no one particular element is unscalable. So the design takes into account that however big things get just larger numbers of shards would be required.

First a diagram (which I hope makes some sense of the complexity described below)
tweetshards.jpg

Queue Shard

The queue shard stores the message queue for a set number of users the shard would be found by a hashing mechanism that would map the user id to the particular user. The queue system would use a purely in-memory storage that would allow the for the really fast insertion speeds that would required.

The other really important point to make is that this server is only dealing with dates and the id of the tweet that it references (tweet id’s covered below)

Friend Shard

The friend shard stores the friends list for a number of users, this is likely to be a larger number of users per shard than the queue shard purely because its job is a lot easier. So lets say each shard can work with 100,000 users. When this shard gets told about a new message for a user it looks up its list of friends.

Then using the hashing algorithm that maps each user id to a specific queue shard it builds up a list of friends that live on each of those queue shards.

So in the extreme example of Robert Scoble of 25k friends we hope to split up the task so that no queue shard would have to deal with more than 1000 inserts.

Joiner Shard

Because we are only storing the id’s of each tweet in the queue when a query comes in for a stream we fetch the list of the 20 (or however many) tweets from the queue shard for that particular user then we need to go get each actual text tweet from a tweet shard. The joiner shard would very likely use a memcache (across a lot of servers) so that the tweet storage would get hit as little as possible.

The second thing to note is that as described the queue shards are in-memory only – they are storing data very inefficiently (e.g. totally de-normalized) but it would be a serious overhead for them to store to disk. The joiner shard would need to be able to cope with only getting partial results back from the queue shard – and then have to look it up from the ‘normalized’ version of the database (described below)

Tweet Shards

When new tweets come in they are stored in a tweet shard, we use a hash algorithm to spread them across a set number of servers. The algorithm should take into account that over time tweets get moved onto archive servers which are designed more for mass storage rather than quick retrieval.

Normalized Queue Storage

Because the in-memory queue shards are very fast but very inefficient it would be not sensible to store more than a weeks worth of data in them (and maybe even less) – as the many-to-friendlist-to-queue system even when stored just using id’s is still a large storage overhead compared with storing the data in a standard pivot/normalized style. So for every request that comes in it is also passed into this shard to be archived and also for when you do get the rare request for older data.

Web Shard

This is the easy bit, the other shards described so far have all done the complex database work, the web shard is responsible for dealing with the actual requests be they API / HTTP / whatever, and then pass on the request into the relevant other shards.

I will describe below the order in which the shards are called for particular actions (the two that are most common)

Sending a Tweet
The following is a rough step list of how a new tweet gets insert into the database.

  1. Web Shard – Receives the request – First it sends a request off to the Tweet shard to store the message itself which then returns an ID that will then be used for all the other requests. Then it looks up which friend shard stores the particular user that the request is coming from.
    1. Tweet Shard – Receives message to store the tweet and returns the new ID
  2. Friend Shard – Receives the message – then sends out signals to all the relevant queue shards for all the friends of that user.
  3. Queue Shard – Gets a list of friends (that it deals with) and inserts the id of the message into each of their queues.
  4. Normalized Queue Shard – We also store the data in this archive database for long term storage (and slow retrieval)

Building the Stream
This is a few short steps of how we build up a stream for a particular user.

  1. Web Shard – Receives the request – sends a request to a joiner shard asking for the stream for that user
  2. Joiner Shard – Looks up which queue shard has the data it is looking for – gets the list of id’s – then looks up those messages within the tweet shards – it then puts them together and returns them back to the web shard
  3. Tweet Shard – Returns the actual text tweets

The above could be more complex in circumstances when the data is not available within the queue shards – or within the tweet shards – at which point they both seamlessly go look up the data in the archive instead.

Conclusion

This looks complex – (well it is..) but in general terms of complexity of building large scale web applications this is pretty simple stuff (fav.or.it has far more complex problems!). I understand that the twitter people now have the extremely hard task of slowly improving each module rather than a complete redesign. Whatever the future I wish them well. btw – if you want you can follow me (and fav.or.it’s exploits) via my twitter feed

(just as an aside – the fav.or.it blog will be going through a major overhaul soon and more of our wonderful employees we start posting (they already have their own individual persona blogs) so we have had to cater for this in our new design, but fear not we will also be splitting up the feed so if you only want product updates you can still have that, if you want development/tips/programmer/design stuff you will also have the option to subscribe to an overall feed or to individual topics.)

Posted in Programming and was tagged , , .

10 Comments to “Fixing Twitter”

  • Future Technical Post : Nick Halstead, What is this Tech?
    Future Technical Post : Nick Halstead, What is this Tech?

    [...] As a starting point I wrote a post about how I think twitter should be scaled. [...]

    Posted on May 26th, 2008 at 10:19 pm
  • JC
    JC

    Thanks for posting this. I picked up your comment over on RS’s blog, and thought you were the only one with a proper example of the p[roblem/solution, jumped over here and have sent the link to your blog to print out and read when i am on the treadmill!!!
    Thanks, think you have made a higher level of explanation. RS is still on the “is Twitter storing messages things aka Xchange”
    TA

    Posted on May 26th, 2008 at 11:46 pm
  • STTPLN Online Community » Blog Archive » Future Technical Post – nick halstead
    STTPLN Online Community » Blog Archive » Future Technical Post - nick halstead

    [...] As a starting point I wrote a post about how I think twitter should be scaled. [...]

    Posted on May 27th, 2008 at 3:48 am
  • wow
    wow

    I love how you don’t even describe the problem

    You cannot propose a solution unless you’ve shown the user exactly where you think the problem is.

    Is it on writing or reading?

    How the database is structured, do we minimize lookup times, or write times?

    You have failed. (but reading into it, you grasp some basics about how to keep a fairly lively and lucid cache of information, but I’d need to really read it, I just skimmed)

    Posted on May 27th, 2008 at 5:23 am
  • John Wards
    John Wards

    @wow..

    If you want to know the problem ask twitter..or read the post on the twitter blog:
    http://dev.twitter.com/2008/05/twittering-about-architecture.html

    The main “problem” is that twitter can’t cope with heavy load. This is because its a content management system evolving into a messaging system.

    I think Nicks solution is very nice, I’d love to have a crack at building something like that!

    Having built a CMS that is under constant heavy load and then dealing with trying to optimize it while users are screaming in your ears I have every sympathy with twitter…I don’t even use twitter or understand why people use it…but the technical stuff that is flying around about it at the moment is hugely interesting.

    Posted on May 27th, 2008 at 7:24 am
  • Thomas Hansen
    Thomas Hansen

    “and also frankly unqualified people slagging the service for it’s unreliability and also coming up with stupid ignorant answers to how it should be fixed”…
    And therefor you felt complied to come with your own solution as to how it should be fixed ;)

    Sorry :)
    Couldn’t resist, it might be right for all I know, though your opening sentence kind of took some of the “hot air” out of me while reading … ;)

    Posted on May 27th, 2008 at 12:44 pm
  • Cyndy Aleo-Carreira
    Cyndy Aleo-Carreira

    Is this the stuff they are teaching in CS schools now? A database that can’t handle load should then get hit even harder to make it work? Every time I hear this theory (which seems to be getting more traction) that Twitter should be storing keys and then pulling from the database based on a key I hear DBAs snorting with laughter. Is the belief that the problem is that Twitter uses only one table in this most holy of holy databases, and that adding more tables with these miraculous keys will somehow help with the load? And I’m sure some Googlesque algorithms should come into play just so it looks cool?

    As for the whinging about how HARD it is to fix it while people are USING it, in a non-Web 2.0 world, that’s why one has a development server and a production server. You code the NEW system while people are using the old system and then port people over. Of course, at that point, you have to have admitted that what you wrote in the beginning doesn’t work, will never work, and needs to be replaced, so several steps of checking egos at the door need to be completed.

    While all this nattering on is occurring, someone is out there coding the hell out of a scalable replacement and laughing at all the commentary.

    Posted on May 27th, 2008 at 6:29 pm
  • Jimbo
    Jimbo

    D’oh! Of course I meant grammar. In my haste to retype my message after wordpress lost it because I didn’t enter an email address, my finger must have slipped.

    Posted on May 27th, 2008 at 9:52 pm
  • TechCrunch UK » Blog Archive » Startup Roundup
    TechCrunch UK » Blog Archive » Startup Roundup

    [...] Fav.or.it has a few ideas about how to fix Twitter and is off to a few [...]

    Posted on May 29th, 2008 at 8:57 am
  • How to Get Six Pack Fast
    How to Get Six Pack Fast

    My friend on Orkut shared this link with me and I’m not dissapointed that I came here.

    Posted on April 15th, 2009 at 6:42 pm

Leave a Reply