July 17, 2009
Over the past fourteen years, Yahoo!’s properties have served a broad swath of the Internet. With several hundred million users across the globe and tens of thousands of page views every second, we’ve generated and served petabytes of content to our users worldwide.
I’m the Product Manager for MObStor, Yahoo!’s unstructured storage cloud, and I’m writing this to share our approach to unstructured storage.
MObStor grew out of the unstructured storage system that was first built to host Yahoo! Photos. MObStor is an unstructured storage cloud that builds in the best practices and serves the fundamental unstructured storage needs for all of Yahoo!. This allows individual properties to focus on their core mission – building a great product, quickly. In the meantime, we continue to innovate in the storage layer, lowering Yahoo!’s costs and bringing operational, performance, and technical improvements to all our customers.
When you look at the hot-button issues around unstructured storage for any internet property, you’re likely to see some variations on the following themes:
– Availability is paramount: Every second that your data is unavailable means lost visits. Long and repeated availability events can destroy an internet business – your customers will go somewhere else.
– Data needs to be stored reliably : More often than not, the content you serve is your most important asset. This content needs to be stored reliably.
– Consistency, shmonsistency: More often than not, when we really examine the level of consistency we need from our unstructured data store, it’s actually quite low. Multiple concurrent updates of the same object are rare. We rarely need more than object-level atomicity – a guarantee that an update for a given object will either succeed completely, or fail, leaving no trace of the failed update. This is nice because we need availability, and it turns out you can’t have a distributed system that’s completely available and always consistent. It’s against the laws of physics. Of course, there are some pieces of data that need to be consistent, but more often than not they’re better handled by Sherpa.
– Reads dominate writes: Most data is WORM (Write-Once, Read-Many).
– Browser-ready: The storage backend should enable the end-user application. It turns out that using REST is a great way to serve data to the browser.
At Yahoo!, we have few other requirements:
– Scale quickly and efficiently: When we build a new property, it has to scale to serve our worldwide audience of several hundred million users, frequently reaching tens of thousands of requests per second across our global network of datacenters, with occasional load spikes an order of magnitude higher than steady-state usage. Over-provisioning every property to serve peak load would be enormously expensive – but the alternative, being under-provisioned to handle peak load, is far worse.
– Handle multiple types of storage: At Yahoo!, we have a very diverse set of storage needs, and our needs are large enough that it make sense for us to take on the additional complexity of supporting multiple storage media (e.g., filers, commodity storage, SSD-based storage, etc.) to serve them.
– Support flexible replication: Data needs to be replicated for two main reasons – business continuity and performance. Different properties, and even different pieces of data in a given property, have different replication requirements. We need to be able to handle them all.
– Keep the data close the application : While in some cases the location of data doesn’t matter, in other cases it matters very much. This is particularly true in emerging markets where network bandwidth is unreliable, or at a premium.
– Not all data is created equal: No matter how carefully you plan, stuff happens. When it does, some things absolutely positively must not break.
– Data moves, URLs don’t: As part of ongoing operations, Yahoo! decommissions hardware, consolidates or moves datacenters, or decides to move storage hardware around for other reasons. In all of these cases, the physical hardware that is used to represent a piece of information moves – but once we’ve published a URL that points to an object, we’d like it to work forever.
As mentioned above, Yahoo!’s storage needs are varied. A Yahoo! storage engine must handle three main kinds of data:
This includes photos you store in Flickr, your profile photo or Avatar, etc. Access to this data tends to be spread across the data set, although in many cases there’s a strong temporal factor – more recent data is accessed more than older data. Furthermore, the most accessed data tends to be small, although that’s been changing over the past few years with increasing use of Internet video. This data can’t be cached effectively, because end-user accesses to the data are spread across the data sets while caches tend to be optimized for small amounts of data that are read repeatedly.
A lot of Yahoo!’s static content is served through CDNs in order to enable fast delivery across the world. As a result, a second storage need at Yahoo is to serve as an origin server for the edge cache. Since the edge cache handles repeated reads of data, the requests that reach Yahoo!’s storage backend tend to be occasional, randomly distributed across the data set, and are read cold (i.e., read operations go all the way down to the spindle).
Yahoo has petabytes of cold storage, some of it user-generated, with the rest generated by our own systems. Access to this data is very infrequent, but it’s still important that it is stored reliably.
MobStor is a cloud storage service that was purpose-built to address these needs. It’s offered as a cloud service (my team not only continues to innovate on the technology front, we also operate and manage the hardware and the service). It features:
– A very simple REST API that makes data browser-accessible and speeds integration time.
– A simple security model designed to serve the needs of Yahoo! properties (rather than the needs of a generalized filesystem)
– Content-agnostic storage features: We don’t do fancy things like image transcoding/resizing, video transcoding, virus checking, etc. Instead, we’ve taken a lego-block approach – focused on building fast, reliable and secure storage, and allowing our customers to layer additional services on top of the core.
– High performance: GET latencies, even when the read goes all the way to the spindles, are very low. Our web servers are optimized to push bits as fast as a client can download them.
– High availability: The software is built to provide high availability, but we don’t stop there. All critical data is stored on reliable storage, and in most cases we maintain at least one additional replica in a different data center elsewhere in the world.
There’s a lot of buzz about some of the larger private and public cloud storage services – Facebook’s Haystack, Amazon S3, Mosso Cloudfiles, etc. Since not much is known about the architecture of the commercial offerings, I’ll quickly compare our architecture and the tradeoffs we’ve made to the one about which we know the most – Facebook’s Haystack.
Haystack seems closest to the bottom of the three layers in MobStor’s architecture (albeit with an HTTP API). This is because Haystack is built to solve one problem (storing Facebook photos) really well, while MobStor supports a diverse set of use cases.
Facebook’s Haystack is based on commodity storage. While MObStor does support commodity storage, it doesn’t require it. Instead, we have a storage-layer abstraction we call the ObjectStore. The ObjectStore encapsulates the key storage operations we need to perform, and allows us to have many underlying physical object stores. This allows us to mix, for example, filer-based storage with commodity storage. The upper layers have the routing intelligence that determines which ObjectStore a given piece of data is stored in. However, like Haystack, we do support high request rates using our own optimized ObjectStore written to run on commodity hardware – with one important difference. While Haystack identifies every object using a 64-bit photo key, all objects in MObStor are accessible through logical (i.e., client-supplied) URLs, not object IDs.
In MObStor, the storage layer maintains the mapping between logical URLs and physical storage, and can use any means to do so – the implementation is encapsulated within the storage layer. Needless to say, this operation is a potential performance bottleneck, so we’ve carefully optimized the algorithms used and the hardware that they run on.
The top two layers don’t have direct analogs in Haystack:
The Local Object Management Layer implements features like replication (for content that needs to be replicated), metadata management, and access control – problems that Haystack doesn’t have to deal with. This layer also collaborates with the edge, or « Global Object Management » Layer to implement some user features.
This layer implements our REST API, handles authentication, manages request routing, and implements several user features. It consists of several highly optimized HTTP servers, based on Yahoo! Traffic Server – the same technology we use in Yahoo!’s internal CDN. These servers are at the network edge, and can be deployed to our global data centers, independent of whether or not we store the underlying data locally. This architecture facilitates global Tier 2 object cache within MObStor.
You’re already seeing some of the benefits of MObStor when you use several of our products. Yahoo! Traffic Server, a core part of our infrastructure, recently became a top level ASF Incubator Project. As we continue to innovate in the Cloud at Yahoo!, you’ll continue to see additional benefits from our work.
Sr Product Mgr, MObStor
Posted at July 17, 2009 4:31 PM