Data Lifecycle Management optimizes utilization of HDFS by leveraging the tiered storage solution provided by Hadoop. You can optimize big data storage based on the frequency of data usage, thereby reducing the cost in an effective manner. By implementing tiered storage, data files that are not used frequently, are stored in nodes with higher density, low compute power, and low cost.
During heavy initial usage, you can configure the data sets as Hot. As the usage declines with time, the data sets may belong to the Warm state. When data usage declines further, the data sets can be set to the Cold state. Finally, when the data is very rarely used, you can move the data set to the archived state and you can store the data in cheap storage media, such as Amazon S3.
Zaloni's Data Lake Management Platform allows you to create policies and associate them to data files through entity type definitions. Based on your definition of policy, the files can migrate from one storage type to another.
- ARCHIVE: archival storage. This storage has high storage density with low processing resources.
- DISK: disk drive storage. This is the default storage type.
- SSD: Solid State Drive
- RAM_DISK: DataNode Memory. This storage type has limited capacity with faster access.
The Platform allows users to implement following storage policies as supported by Hadoop in order to allow files to be stored in different storage types according to the storage policy.
- HOT: This state generally represents the state of data files that are processed frequently. This state is used for both storage and compute. In this state, all block replicas are typically stored in DISK.
- WARM: This state generally represents the state of data files that are processed less frequently than those in the Hot state but more frequently than those in the Cold state. In this state, some block replicas are stored in DISK, remaining are stored in ARCHIVE.
- COLD: This state generally represents the state of data files that need to be available for limited compute. The data lake management platform moves the unused data or the data that must be archived to the Cold storage.
- ALL_SSD: This state is generally used to have faster access to data being used for computation. This state is generally used only if you have SSD storage type. In this state, all block replicas are stored in SSD.
- ONE_SSD: This state is generally used to have faster access to data being used for computation. This state is generally used only if you have SSD storage type. In this state, one replica is stored in SSD, remaining are stored in DISK.
- LAZY_PERSIST: This state is generally used to write blocks with single replica in memory. The replica is first written in RAM_DISK and it is lazily persisted in DISK. You must select this state only if you have RAM_DISK storage type.
- S3_VAULT: This is a Zaloni exclusive file state. The data files on attaining this state are moved to S3 storage media from HDFS. This state can only be at the end of the policy rule chain since files in this state are moved out of HDFS to a different storage facility altogether and will not be available in HDFS.
Continue reading about data lifecycle management for more information.
About the Author
Big Data Solutions Engineer - RTP Raleigh NCMore Content by Parth Patel