High performance at massive scale – Lessons learned at Facebook

datacenter High performance at massive scale – Lessons learned at Facebook

Performance and scalability are high on the list of top requirement from the customer we deal with. The largest financial institution use our products at a global scale with access through the web. They expect low latency and of course a solution that can cope with future volume of utilization.
There are today 2 Internet giants who are facing scalability challenges every single day: Google and Facebook. I’ve had a chance to touched on some of these challenges talking to some of the Google engineers last week during GTAC 2009. Google is approaching the crazy number of one million servers so you can bet they have to be creative to handle their data. And deal with the carbon tax.

I’ve just finished watching a very informative presentation from Jeff Rothschild who is VP Technology at Facebook. He gave this presentation at the University of California this month. The actual presentation (75 minutes) can be watched here.

Here are some of the key things I was interested in.

Photo storage

This is of course one of the most storage consuming application on Facebook. There are about 850 millions photos uploaded to the site each months in various resolution ! So yes, storage is a challenge. Facebook former system was based heavily on CDNs from Akamai and Limelight as well as file handle cache placed in front of NetApp. A memcache storage layer was introduced to reduce the load on the NetApp filers.
facebookold High performance at massive scale – Lessons learned at Facebook

This system was still generating way too many IOs (3 per each photo read which is was you get when you rely havily on a file system) and couldn’t handle the ever increasing volume of data. Facebook decided then to develop their own solution called Haystack which can be seen as a blob store.

facebooknew High performance at massive scale – Lessons learned at Facebook

Haystack stores photo data inside 10 GB bucket with 1 MB of metadata for every GB stored. Metadata is guaranteed to be memory-resident, leading to only one disk seek for each photo. Haystack servers are built from commodity servers and disks assembled by Facebook to reduce costs associated with proprietary systems.The Haystack index stores metadata about the one needle it needs to find within the Haystack. Incoming requests for a given photo asset are interpreted as before, but now contain a direct reference to the storage offset containing the appropriate data.Cachr remains a first line-of-defense to Haystack lookups, quickly processing requests and loading images from memcached where appropriate. Haystack provides a fast and reliable file backing for these specialized requests.

If you’re interested in Haystack, here is a good starting point from Facebook.

Access to data and caching

Facebook relies heavily on memcached to minimize their database load. They of course use in memory hash table running on 64 bits machines, efficient serialization, multi-threading, compression and polling drivers. What’s interesting with this approach is that the switches connected to all the memcached server become the bottleneck ! To alleviate this problem, they perform client-side throttling.

Database

Facebook uses MySQL servers distributed in multiple data centers (In case anyone doubt the scalability of MySQL. This is a clear demonstration.). They avoid like the plague shared architecture so they can manage failures more efficiently (basically avoiding storing non-static  and heavily referenced data in a central database). They use services and memcached for global queries. In order to scale across multiple data centers, they replicate as well their memcached data to deal with race condition (This is I believe the most challenging aspect of social networks).

Hardware

Difficult to get a clear number but iit looks like they have around 30k machines (800 memcached servers !). The typical hardware specification for a Haystack server is a 2xquad-core, 32 GO RAM, 512 MO of NVRAM cache and 12 1TB SATA drives in RAID6. That’s 10TB in XFS. They don’t rely on external provider to manage their environment which is farily typical with the large internet company.

Open-Source

Facebook makes good use of open-source software: Tornado, Scribe (for log aggregation. Nice !), Cassandra, Thrift, xhprof, Hive, MySQL, Memcached etc. and might release their Haystack architecture as open source as well.

More on the open-source stack facebook is using here.

The presentation is definitely worth the 75 minutes. If you’re passionate about web scalability, this is a good start ! I also would like to recommend a good stack of slides I’ve been reading a while back. A lot of good best practices to take into consideration if you’re in that space.
Real World Web: Performance & Scalability

 High performance at massive scale – Lessons learned at Facebook