Object Storage (like AWS S3) in the cloud is a key enabler of scalability and reliability in Cloud Computing. Apache CloudStack. We will discuss how CloudStack integrates Object Storage solutions and discuss specifically how HDFS can provide the storage engine for the Object Storage component
Object Storage is a new class of scalable storage that features extremely large scale, HTTP-based access and immutable storage. One of the first such implementations is the publicly available S3 service from Amazon Web Services. Object Storage enables reliability and scale of IAAS clouds by providing highly reliable, cheap and scalable storage for backups and machine images. Object Storage is considered to be almost essential for a scalable, reliable IAAS deployment.
Apache CloudStack (incubating) provides the IAAS management layer that orchestrates compute, primary storage and networking. By default CloudStack uses NFS as the protocol to enable storage of snapshots and machine images. For sufficiently large clouds however, NFS cannot scale cheaply. We will discuss some of these requirements and limitations.
As a solution, CloudStack also implements the AWS S3 API with the flexibility of providing different backing stores to the S3 API. By default the S3 API can write and read from a POSIX filesystem, but several enterprise storage vendors are integrating their own storage behind this API layer. We will examine some of these integrations.
Apache Hadoop has a storage engine (HDFS) that provides some of the same features of object storage -- namely scale, reliability (with 3 copies) and immutability. Apache CloudStack intends to leverage HDFS to provide the backing store for the CloudStack S3 API layer. The talk will focus on the strengths and weaknesses of this approach and discuss some of the possible enhancements to HDFS that will make the solution truly competitive.