Estimated reading time: 2 minutes, 33 seconds

Data loss protection, anti-viruses, and endpoint security are just some of the technology available to defend against a cybersecurity attack. Organizations use these platforms to keep their existing data secure from potential threats. Often, this protection does not extend to information stored in data lakes. Because of this – information located in data lakes are vulnerable to attacks – a problem that hackers exploit when infiltrating data lakes.  

Data lakes were first coined by Pentaho CTO James Dixon in 2010. "If you think of a data mart as a store of bottled water -- cleansed and packaged and structured for easy consumption -- the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples, “Dixon explained. In other words data lakes are storage repositories located on premises or on the cloud used to store both structured and unstructured data for later use.

Records stored in these vast repositories were once useful or were anticipated to be in the future – therefore they were stored until they were needed. Like most things that are put away out of sight – data scientists forgot that the data existed or only thought about them from time to time. During this time of negligence hackers would steal the valuable data or plant a virus into already existing data. With no data protection – the theft often times went unnoticed for months to years. The hack would only be discovered after a data scientist went to look for the data only to discover that it was gone or altered. By that time, it was too late and data recovery was futile.

Like all data – businesses are held accountable for data loss regardless of where the data is stored. Therefore, it’s crucial that companies make sure that their data is secure whether in a data warehouse, a data lake or stored on a PC. In an article with Builtin, Carbon Black Chief Cybersecurity Officer Tom Kellermann warned “We need to be very, very concerned [about] these massive data lakes. They have become targets not just for traditional hackers and disillusioned individuals, but also for nation states.” Thankfully there are steps that an organization can do to protect their data.

Kellermann goes on to suggest that platforms that store secure data shouldn’t have a permanent administrator. Instead everyone should have limited access or be a “temporary administrator” to ensure that records are protected.

It’s also important to note that cloud providers such as Amazon Web Services aren’t necessarily impenetrable – but they offer more protection then a PC or a small company’s security. If a business wants to ensure that their data is secure but do not have the resources to protect it on their own – investing in a cloud service provider is fundamental.

Securing data lakes are just as important as securing data that is currently being analyzed. Is your data lake protected?  

Last modified on Monday, 25 November 2019
Read 610 times
Rate this item
(0 votes)
Danielle Loughnane

Danielle Loughnane earned her B.F.A. in Creative Writing from Emerson College and has currently been working in the data science field since 2015. She is the author of a comic book entitled, “The Superhighs” and wrote a blog from 2011-2015 about working in the restaurant industry called, "Sir I Think You've Had Too Much.” In her spare time she likes reading graphic novels and snuggling with her dogs.

Visit other PMG Sites:

PMG360 is committed to protecting the privacy of the personal data we collect from our subscribers/agents/customers/exhibitors and sponsors. On May 25th, the European's GDPR policy will be enforced. Nothing is changing about your current settings or how your information is processed, however, we have made a few changes. We have updated our Privacy Policy and Cookie Policy to make it easier for you to understand what information we collect, how and why we collect it.
Ok Decline