Haystack powers allyoucanupload and will soon power all of Webshots (Webshots is a photo sharing community with 19,000,000 members who have uploaded over 375,000,000 photos). Allyoucanupload is an image hosting service that we built to run alongside Webshots.
Haystack is designed to provide a very scalable, reliable and cost effective platform for object storage and delivery to the Internet. It just went live 2 weeks ago - we are currently using it for the allyoucanupload Webshots image hosting service (gif, jpeg and png). In the very near future, it will serve all Webshots photos, and soon, video.
I'm doing a long post because I'm very proud of what our technical team has accomplished, but also to give some insight into how finance, user and technical strategy intersect in our group. Building a large and sustainable (aka profitable) business in social media requires a balancing act between the delivering on a great user promise, a revenue model and keeping your costs under control. Haystack will allow us to deliver very robust storage solutions for users at a very low marginal cost.
Disclaimer/Credit: Almost all of the technical content in this blog post was written by Paul O, who runs CNET Networks' data center services, including database architecture, network systems and operations and the actual data center. Paul, Jim, Rodolphe, Marcus, and Matthew built Haystack - they don't (yet) have blogs so I'm doing this post. Please do not attribute any technical props to me because of this post. My only contribution to Haystack was to approve its development and cheerlead along the way.
Haystacks' content (social media files) has several interesting characteristics: it grows without bound; it tends toward write-once, read-many; the most recent content tends to be the most frequently accessed. Haystack's design leverage these characteristics.
The challenge: A big financial and therefore technical issue is the relationship of storage to delivery. It's relatively easy to deliver a small number of files to a lot of people. And a large number of files to a small number of people. A large number of files to a large number of people gets more complicated and can get very expensive very quickly.
Haystack gives us the ability to finely match the raw storage capacity of the system with it's overall IO capacity. Haystack grows very naturally through incremental addition of capacity. Haystack is designed to handle failures automatically and to keep reliability constant as the system grows. Haystack uses commodity hardware and software.
The promise of Haystack is that we can handle reliability at
scale at a very low (perhaps the lowest) marginal operating cost.
Reliability means that we never have to say we're sorry - we lost your
photos. Scale is scale. Low marginal cost directly goes to our
ability to give users as much storage as we can, and run the least
intrusive ads, while running Webshots with a sustainable profit margin
- and keeping the data center team focused on talent vs hands and
well paid:) You'll see the effects of Haystack on Webshots in our
upcoming redesign and soon to be revised storage limits.
BSU Haystack consists of many Basic Storage Units (BSU), which are just servers with a lot of disks. Content is scattered more or less randomly across all BSU and spindles to maximize the IO throughput of the system. Multiple copies of the content are maintained on disparate equipment so that no single failure can loose all copies of an object.
Failure: In the event of a component failure, Haystack immediately begins a process to copy the "missing" content from one of the redundant sources to the available components. Because the content is scattered across all available components, recovery time is on the order of 1/N. Of course, the failure rate is on the order of N, so the overall availability is on the order of N * 1/N or a constant. Recovery is also has very little impact on the overall performance of the system.
As new capacity is added, existing content is migrated to the new capacity to rebalance the storage and IO across all available units. The rebalancing time is approximately constant.
Separation of Church and State
To minimize the overhead in tracking the location of any given object, Haystack puts objects into buckets and needs to track only the location of the buckets. The applications using Haystack must independently track metadata about each object including it's bucket. At some level, a bucket is really just a directory and each BSU knows what buckets it contains. Haystack maintains a proper cache of the bucket locations and each BSU checks in periodically to report the state of its buckets. The proper cache can easily be rebuilt and the overall system is very tolerant of data inconsistencies between clients, the cache and the BSUs.
Processes
Various processes monitor the overall condition of Haystack and initiate actions as needed to maintain the health. For example, when new capacity is added, these monitoring processes detect the availability of "under-utilized" capacity and begin a bucket migration process to bring the new capacity up to the same levels as the old capacity.
Some of the jobs have names...they are:
leon's job is to identify and remove extra instances
jeopardy makes more copies of data that has too few copies
scalpel removes outdated copies of data
optimist moves data to fill up less-full nodes
pessimist moves data from busy nodes to less busy nodes
Changing the drives over time
Currently we use 400 GB sata drives. As time goes by, the average age of the content will grow and the average number of access per object will decrease. This will allow us to introduce larger capacity disk drives as the system grows, helping to keep the overall hosting and depreciations costs low. The number of sata drives attached to a given BSU is determined by the overall network throughput of the BSU and the ability of the BSU to effectively use the file system cache.
Caching
Haystack uses various content caching and redirection mechanisms to both hide the complexity of the system from clients and to leverage the raw IO capacity of the spindles. The system is very loosely coupled and designed to be quite tolerant of failures. This means that failures are very localized. In addition, the types of failures that can have the largest negative impact are with very proven technologies, such as file systems and disk arrays. Thus the overall reliability is high and the operational costs are low.
Rant - lots of private social media properties are venture funded and lose tons of money storing/serving content to get big to either get bought or change the model.
Finale The challenge at public companies like CNET Networks is to build a social media business model as well as a user & technical model from the "get go". That operating plan must be good for user and cheap enough to operate so we can return a healthly return on invested capital to our shareholders. Haystack is a significant innovation that should help us to build a better product for users, scale for marketing partners, and generate operating cash flow for shareholders.
Whew - if you're still reading then you are likely one of the Webshots or CNET Networks engineers - nice work folks :)
http://live.focus.msn.de/ just started in Germany (in Alpha mode) and it will be interesting which model (the VC Model, the big spender model or the intelligent design model) will win in the long term.
Thank you for this update.
thomas
Posted by: wingthom | June 07, 2006 at 12:56 AM
Haystack seems like a great architecture. Will any of its underlying technology be share with the Open Source community?
Posted by: Scott Johnson | June 07, 2006 at 07:14 PM
I imagine that parts of it will be shared. Not sure which parts yet. but our company has done this in the past with solar.
Posted by: martin | June 07, 2006 at 08:04 PM
I'd be interested in how the haystack system compares cost-wise to Amazon's S3 service.
15c per G per month
and
20c per G transferred
have you considered opening up haystack to do something similar?
I'm still not sure how you are planning on actually making money on this..
Posted by: Ian Holsman | June 15, 2006 at 05:26 AM
we are working on a commercial api. we are working on the details and should have them nailed down shortly.
we currently make money like flickr, imageshack and photobucket - which is when a person viewing a served image clicks back to see the image hosted at allyoucanupload.com, we serve them ads.
Posted by: Martin | June 16, 2006 at 12:08 AM
Does the system work with geographically dispersed nodes or must they all be on a private network? This sounds like pretty fascinating stuff.
Posted by: Matt | July 08, 2006 at 10:25 AM
yo nice!thanks for your sharing sir!
i will give u some useful websites.
http://www.buywowgold.org.cn wow gold|buy wow gold
http://www.buy-wow-gold.org.cn wow gold|buy wow gold
Posted by: cool dog | October 09, 2007 at 08:20 PM
http://www.runescape2store.com
http://www.vgoldsupply.com
Posted by: runescape money | December 20, 2007 at 10:09 PM
Does the system work with geographically dispersed nodes or must they all be on a private network? This sounds like pretty fascinating stuff.
http://www.eng-club.com/forum/the-prophet-mohammed-real-story/
Posted by: tamam | April 20, 2008 at 06:42 PM
I must challenge myself
Posted by: BjBead | August 05, 2008 at 12:18 AM
I will read more
Posted by: Aypear | August 05, 2008 at 12:25 AM
How about some free psp games?
http://pspgames247.com
Posted by: Jim k. | August 12, 2008 at 04:16 PM
vvvv
Posted by: nike shoes | October 05, 2008 at 10:34 PM
3c to play in Chenghai is often a time when there are other people to see Jian Sheng said, but can really become a powerful Jiansheng, the really is not hard! As soon as possible to buy mmoinn wow gold desire to achieve it! mmoinn to buy a warhammer gold professional trading platform not only
convenient but also allows you to save huge expenditure to buy weapons, buy transmission. You can
find all the time world of warcraft sale in the Wow power leveling.
Posted by: wowgolds98722 | October 09, 2008 at 10:03 PM
Is the quality of service, the price is amazing, as soon as possible mmoinn to buy wow gold it! You can buy with a peace of mind. We have confidence in our services. Let you enjoy playing the role of power. You can find these Defias Windmill near the world of warcraft gold, you can quickly find the wow power leveling to save huge expenditure is so
easy.
Posted by: wowgolds987 | October 11, 2008 at 12:18 AM
http://www.kelebeklove.com
Posted by: mirc | November 21, 2008 at 11:52 PM
http://www.online-flash-game.com/ online game
http://www.online-flash-game.com/ flash game
Posted by: online game | January 12, 2009 at 03:31 AM
thanks
Posted by: mirc | January 25, 2009 at 08:59 AM
col, i like it
Posted by: coach handbags | February 20, 2009 at 09:17 PM
yes, you are right ,i will follow you to do
Posted by: cheap nike | February 20, 2009 at 09:19 PM
http://www.faysale.com/
Posted by: wow power leveling | March 10, 2009 at 08:25 PM
mirc and mırc vee mirc detayında forum
Posted by: mirc | March 11, 2009 at 06:59 AM
I agree this is really interesting and helpful report, provided data explained in this report is reliable and true.
Posted by: car verzekering | March 16, 2009 at 02:24 AM
wholesale jewelry
handmade jewelry
jewelry wholesale
fashion jewelry
Posted by: sharon0610 | March 23, 2009 at 10:42 PM
mirc and ve nice web
Posted by: mirc | March 27, 2009 at 05:13 AM