3 Keys to Creating an Enterprise-scale Security Model for the Data Lake

December 21, 2016 Parth Patel

We’re seeing data lake environments grow from the size of tens of terabytes to the colossal scale of petabytes. As a result, more enterprises are questioning how on earth they should go about governing something so huge and complex. From our perspective, a policy-based or attribute-based security model is paramount in terms of creating an enterprise-scale security model for the data lake. Leveraging metadata, a policy-based security model automates permissions and access – and is really the only way to confidently secure big data while still allowing the access necessary to democratize use and derive value from the data lake.

Why policy-based versus other options, such as role-based? A policy-based model enables more flexibility to control access based on policies that consider a combination of attributes in addition to the user’s role; e.g., the type of data being accessed, the desired action, and the context in which it is being accessed.

As you develop your policy-based data lake security strategy, we recommend taking the time to consider three important areas.

Key #1: Encryption

Determine which data needs to be encrypted while in transit – both coming into the data lake and being extracted out of the data lake – and which data needs to be encrypted while at rest in the data lake. Also, where will you enable policy-based encryption at various levels in the stack, e.g., the file system, storage or application layers? You’ll want to put rules in place that automate encryption to ensure you comply with industry and other regulations.

Key #2: Access control

Ultimately the data lake should be made available for many data users via a self-service data platform that leverages metadata to enable users to discover, curate and prepare datasets from the data lake – as well as from other systems across the enterprise. A data catalog provides access, while maintaining the required governance policies and controls. As the data lake environment becomes shared and used, one of the key aspects is to be able to provide policy-based access control to the data based attributes of roles and business units or based on functional groups that may be defined within an organization. How you implement those aspects and how you transfer those aspects into the data access layer so that they can be enforced at the data access layer is important.

Key #3: Data privacy

Data masking for personally identifiable information (PII), personal health information (PHI) or payment card industry (PCI) information ensures compliance with industry and other regulations. For certain attributes that are sensitive, consider how you will do application-level masking and tokenization so that you are either obfuscating the field or replacing the original field with a tokenized value. Your strategy should allow you to maintain the mapping of the original value in a secure area accessible only by privileged users, while the majority of users only see the tokenized value.

The amount and types of data that enterprises collect are only increasing, making data ecosystems more and more complex. Putting a data management platform and policies in place to operationalize governance and automate processes is the only way for enterprises to confidently secure data at a level that meets or exceeds industry regulations.


About the Author

Parth  Patel

Big Data Solutions Engineer - RTP Raleigh NC

Previous Article
A Big Data Reading List for the Holidays
A Big Data Reading List for the Holidays

Happy holidays and best wishes for a prosperous new year!

Next Article
Managing Memory is Easier Using YARN
Managing Memory is Easier Using YARN

There is a long list of items that can be tuned in Hadoop, but understanding how each daemon uses memory in...


Get a custom demo for your team.

First Name
Last Name
Phone Number
Job Title
Comments - optional
I would like to subscribe to email updates about content and events
Zaloni is committed to the best experience for you. Read more on our Privacy Policy.
Thank you! We'll be in touch!
Error - something went wrong!