Object-based Software Defined Storage Platform
Fully Abstracted From Underlying Hardware
A Himalaya-based system is built using commercial-off-the-shelf hardware and is designed to provide high-throughput, high storage density, low power consumption, and low cost of ownership. It provides the highest levels of durability and availability, through an architecture that has no single points of failure.
Systems components include a unified fully abstracted software stack for maximum flexibility and usability and three component tiers that enable independent scaling of capacity, bandwidth, and performance resources. The components include Controllers, Storage Nodes, and Network Fabric:
Unified Storage Services Software Stack
Interfaces - NFS, CIFS, Native REST, S3 and iRODs
HimalayaTM is easy to integrate into existing environments by the unified access services layer. All major interfaces are supported including, NFS, CIFS, native REST, and Amazon S3 compatible interface, both of which include multi-part upload for maximum throughput of large objects. Himalaya’s native REST API delivers industry leading performance, both single object and multi-stream throughput, scaling from multiple Gigabytes per second in a single rack to 100’s of Gigabytes per second and beyond for the largest deployments.
Popular supported desktop and server tools that provide cloud storage interfaces include:
- Cyberduck S3 (Windows, Mac, and Linux clients)
- Cloudberry Explorer and Drive for S3 (Windows only)
- Webdrive (Windows, Mac)
Storage management, virtualization, and data grid environments that Himalaya integrates with as a storage tier include:
- Quantum StorNext Storage Manager – AmpliStor is available through Quantum Corporation under their Lattus Object Store product. Lattus™ Object Storage meets the extreme scalability, durability and access requirements of large-scale, long-term Big Data archives.
- SGI StorHouse – StorHouse/Trusted Edge is a simple-to-install, user-friendly intelligent content analysis and policy-based migration management tool for automatically moving less active, forever-read data from primary storage to a more cost-effective StorHouse data management system. Once data resides on StorHouse, users benefit from an automatically managed, virtualized storage environment that provides direct, online access to data and automated backup, archive, disaster recovery, replication, retention, HSM, and other functionality at the lowest cost per terabyte of storage.
- Integrated Rule-Oriented Data-management System (iRODS) – Open Source Data Grid, Helping People Organize and Manage Large Collections of Distributed Digital Data. iRODS is in use at some of the largest data archives and distributed data grids in the world.
- Cambridge Computer STARFISH – Starfish is application software that enables the user to associate metadata with files and directories in any NAS or network file system. Starfish enables the storage administrator to leverage these metadata to define reports and enforce storage management rules across a diversity of storage devices.
- Commvault Simpana – CommVault® Simpana software is built from the ground up on a single platform and unifying code base for integrated data and information management. All functions share the same DNA and back-end technologies to deliver the unparalleled advantages and benefits of a truly holistic approach to protecting, managing and accessing data.
Global Namespace / System Manageability
Data is accessed via a locality aware, unlimited global namespace. Multiple namespaces can be created with different attributes and policies, essentially making tiers within the same system.
To further accelerate performance, intelligent local caching is possible at each location and the system is fully secure with end-to-end authentication, and encryption.
To track usage by customer, multi-tenancy is supported and can track capacity, transactions, and bandwidth usage per namespace/per user. The system is easily managed through a single pane of glass with a rich GUI console.
BitSpread - Data Protection / Scalability
Amplidata’s patented Bitspread® technology allows users to store and protect data according to a chosen durability policy, which allows for concurrent failures of up to 19 disk drives, storage nodes, racks, or even a complete datacenter. Because of this higher and more diverse level of failure tolerance, BitSpread delivers orders of magnitude greater data durability than RAID based systems, a necessity for Petabyte scale and beyond.
User data files stored in an HimalayaTM namespace according to a given durability policy are referred to as objects. In the durability policy, two parameters determine the data protection level. The first policy parameter determines the “spread width”, which is the number of disk drives (drives) that will be used to store the encoded data object. The spread width can be up to 20 drives wide. The second policy parameter is the “drive safety”, which is the number of drives in a spread width that can be simultaneously unavailable without impacting BitSpread’s ability to reconstruct the original data.
For example, a 20/4 durability policy refers to a spread width of 20 and a drive safety of 4. The encoded data for each object stored according this 20/4 policy is spread across 20 drives in the AmpliStor storage system. With a drive safety of 4, the data on any subset of 16 drives out of the 20 drives is sufficient to reconstruct the original object.
Hierarchy Aware Spreading
Himalaya BitSpread is hierarchy aware, which means a Himalaya-based system is aware of the location of a drives in a storage node, a storage node in a rack and a rack in a datacenter location. To minimize the impact of a hardware failure, Himalaya automatically make a selection of drives that spreads the data as wide as possible across the storage system. This is done for each object, randomly and optimally, such that the system remains balanced and storage capacity is used evenly across all resources.
For example, a 9.6 Petabyte AmpliStor system deployed in a single datacenter using a 1U/12 drive storage node would require 5 racks, each with redundant Ethernet switches. For this single datacenter example running a 20/4 policy, BitSpread randomly selects 4 drives per rack, 4 storage nodes per rack, and 1 of the 12 drives in each selected storage nodes. As a result, the current encoded object is spread across the 20 drives while the next object will be stored across a different spread, meaning 20 different drives selected according to the same spread strategy. This is illustrated below showing two objects encoded check blocks, differently colored, being spread across the system.
Himalaya BitSpread Enables the Use of Consumer Disk Drives
Active archive solutions, whether NAS, block or object stores, used the highest capacity disk drives available, to pack the greatest density into their datacenters at the lowest cost. Yet, all disk drives are not alike with respect to reliability or bit error rates.
Enterprise NearLine SAS drives have the best reliability (MTBF) and bit error rates of 1 in 1015. At significantly lower cost are consumer SAS drives, which have lower MTBF and an order of magnitude higher bit error rates. Higher bit error rates and lower MTBF contribute to higher annual failure rates (AFR). Well know studies from Google and Microsoft measured AFR for large systems (1M disk drives), and found these drive failure rates can range from 1% to 6% for low to medium duty cycle systems, as in active archive use-cases.
Himalaya BitSread maintains at least 15 nines of Data Durability, even with Annual Failure Rates of 6.5%, supporting consumer disk drives in our systems depending on the application. This reduces TCO even further over alternative systems.
BitSpread stores three copies of the metadata across three controllers in the AmpliStor system. Part of the objects metadata is the spread, which is the list of drives that contain the encoded data for that object. The three controllers form an active cluster for the metadata which supports full read/write operation even in case of a full controller failure. In addition to the three copies on three controllers, the metadata is also stored with the data on the storage nodes to protect it against a major disaster where all three controllers fail.
GeoSpread - Disaster Prevention / Access
A Himalaya-based system can be deployed across multiple datacenters in different locations. A bridged or routed TCP-IP connection with sufficient security between the sites is sufficient to allow BitSpread® to work across multiple locations.
Data is typically spread across multiple datacenters to protect it against an outage or unavailability of one of those datacenters. After configuring the durability policy to achieve the desired protection level, Himalaya’s hierarchy aware architecture will spread the data across the number of datacenters evenly and will satisfy read request from the lowest latency locations, thereby optimizing for WAN latency.
GeoSpread protection policies support multiple datacenter spreading across more than just 2 datacenters. As the number of datacenters in the GeoSpread policy increases, the overhead efficiency improves, providing greater savings over replication based ObjectStore solutions in the market today.
For example, to protect 1PByte of user data in a 3 datacenter GeoSpread, AmpliStor requires only 1.71PBytes of raw storage capacity (or 570TBytes raw capacity in each of the 3 datacenter locations). If spread over 4 datacenter locations, AmpliStor requires only 1.52PBytes of raw storage capacity (or 381TBytes raw capacity in each of the 4 datacenter locations).
Whereas, replica based ObjectStore solutions require 4PBytes of raw capacity, one full copy in each datacenter location. Himalaya therefore saves over 60% in raw storage capacity required in this example.
The hierarchy rules also apply within the datacenters, meaning the encoded object data is spread wide and evenly across the racks, nodes and drives as they are deployed in the datacenter.
Choosing the Disk Safety for GeoSpread
We also need to ensure the drive safety parameter is high enough to protect against a datacenter outage. With a spread width of 18 across 3 datacenters, an object will be stored on 6 disks in each datacenter. In order to protect the data against a datacenter outage, the drive safety parameter needs to be at least that number of drives per data center as defined by the spread. In this example, with a drive safety of 6 on a spread width of 18 allows BitSpread to reconstruct the object from the data stored on any subset of 12 disks out of the 18 from the remaining two datacenters.
In a four-datacenter example, with a spread width of 20, objects are stored on 5 drives per datacenter which means the drive safety for the durability policy should be at least 5 to support a full datacenter failure.
Metadata Protection for GeoSpread
As mentioned previously, Himalaya stores 3 copies of the metadata across 3 controllers. To protect against a datacenter outage, the metadata is located in 3 different datacenter locations. The system remains fully read/write operational with 2 out of 3 metadata controllers, supporting the ability to loose a complete datacenter.
BitDynamics - Data Integrity / Repair
Amplidata’s other key patented technology BitDynamics®, provides automated out-of-band storage management functions such as continuous data integrity verification, self-monitoring, and automatic data healing, scrubbing and garbage collection. BitDynamics keeps the storage system healthy and optimized without the need for manual human interventions.
Automated Capacity Management
BitDynamics keeps track of the available storage space. As disks in the Himalaya-based storage system fill up, the BitSpread agent will automatically generate new spreads that include disks with available capacity. As this happens transparently, Himalaya can scale effectively to petabytes and beyond without requiring virtual disk reconfiguration.
Add Storage Nodes On-the Fly
In the Himalaya-based system, storage nodes can be added at any time. BitDynamics enables BitSpread encoders to use the added capacity to store data, without requiring reconfiguration of the virtual disks that are in use by your applications.
Self-monitoring and healing
Himalaya monitors and heals itself through BitDynamics agents on every storage node. In case of a disk or node failure, BitDynamics agents across a series of storage nodes will generate additional data blocks to substitute the lost data. This processing happens out-of-band and does not impact system performance. BitDynamic agents work in parallel to heal the data with repair times in hours vs. 10s of hours, days or weeks in a comparable RAID-based system.
Continuous integrity checking
BitDynamics agents perform frequent background integrity checks on the stored data. If data becomes corrupted due to an unnoticed write error, bit rot or tampering, the BitDynamics agent will detect that error and proactively correct it before it would become an issue for the user.
Intelligent Multi-threaded Grid Architecture
Amplidata offers a complete Appliances that includes the AmpliStor software with Intel-based servers. In addition, there is a growing list of certified hardware from partners such as Quanta, Sanmina, HP and others, for which the AmpliStor software can be licensed separately.
The grid architecture consists of Controllers and Storage Nodes connected through a standard switched Ethernet fabric.
Controllers for Himalaya are high-performance, standard Intel® Xeon® based servers that run Amplidata’s BitSpread software, MetaStore, and management framework. Controllers provide high-performance access over multiple 10 GbE network interfaces, and can serve data over a variety of network protocols including http/REST object interfaces, and language-specific interfaces such as Microsoft .Net, Python or C. Controllers are equipped with additional 10 GbE ports to interface to the back-end storage pool. Controllers operate in a highly available cluster of Controllers, to provide fully shared access to the storage pool, metadata caching in high-performance SSD’s and protection of metadata. Full specifications are provided in the Himalaya data sheet.
Storage Nodes are available in a variety of standard configurations designed to provide a high-density, power-efficient and cost-effective modular storage. It offers industry-leading storage density with 36 TB or 48TB capacity in a 1U rack-mount server enclosure. Using a low-power Intel processor, and equipped with ten (10) or twelve (12) 3TB or 4TB SATA disk drives, Storage Nodes are very power efficient at approximately 5W/TB. Storage Nodes are connected to the network over redundant Gigabit Ethernet network interfaces.
The capacity, density, low-power consumption and overall low TCO of Storage Nodes makes it an ideal fit for large, unstructured data applications for Active Media Archives, Online Applications, and Enterprise Big Data Archives.
Full specifications are provided in the Himalaya data sheet.
Integrated Network Fabric
Himalaya utilizes a dedicated 10GbE / 1GbE redundant Ethernet fabric to interconnect Storage Controllers and Storage Nodes comprising the storage pool. Network fabric components include redundant, high-performance network switches. This enables multiple Gigabytes per second of throughput from a single Hiamalay rack, and scalability of a single system across multiple racks to enable systems that provide 100s of Terabytes to Exabytes in a single system, with unified namespaces and single system management views.
Independently Scale Performance and Capacity
Easily scales as you grow, one node at a time
Private Cloud –>
Performance, flexibility, and security in your datacenter
Public Cloud –>
At a lower TCO than the largest Cloud providers