The Waterline Data Blog

Mar 21, 2017

Yada, Yada, Yada. Why Are We Still Just Talking about the Dangers of Dark Data?

Last week I came upon an article entitled, “Beware the Dangerous Databerg Lurking Beneath Your Business' Surface.” Apparently awareness of all the ways dark data can damage a business is still incredibly limited—so much so that Veritas actually determined there was a need to post this sponsored piece on

Read More

Topics: Big Data, Big Data governance, Dark Data, Big Data compliance, Big Data security

Mar 16, 2017

Top 4 reasons you need a data catalog

I’ve been reading a lot lately on the thoughts of different industry analysts about the importance of data catalogs or information catalogs and why you should have one. The data governance market is expected to double in size over the next five years, with data governance solutions such as data catalogs holding the largest market share, so from a market standpoint, this is no surprise. But most of the arguments I hear from analysts are built around a semi-technical argument that can be summed up as “good metadata is good, bad metadata is bad” or what I like to think of as the “four legs good, two legs bad” argument. 


Given I spend my days steeped in data cataloging, I figure it is about time I share with you…
Read More

Topics: Big Data discovery, Data Catalog, Data Redundancy, access control, data hoarding, GDPR

Mar 10, 2017

Are we currently living in the Dark Ages of big data?

 They called it the Dark Ages. The period following the decline of the Western Roman Empire.


Perhaps one day, data experts will look back on today and call this our dark ages. The Dark Ages of Big Data.


Perhaps they will marvel over the obsession many organizations have in hoarding data—especially given their near equal apathy when it comes to actually putting all that data to use.


Over the past few months, travel industry Big Data specialist Mark Ross-Smith has written about some of the excellent ways airlines, hotels and other travel and hospitality related businesses could be converting data,  which is otherwise just sitting around collecting dust, into new high game-changing revenue. He calls it the billion dollar opportunity hidden in plain sight.

Read More

Topics: Big Data, Big Data governance, Hadoop, Smart Data Catalog

Mar 6, 2017



Companies are struggling to make big data work.  In fact, most of the companies we work with at Waterline have come to use after their initial big data efforts have failed.  And why is that?  It is because in many cases the Hadoop vendors have over promised and under delivered. 


Hadoop and Spark are incredible technologies.  But they don’t solve a complete end to end problem and too many people have been misinformed.  As a result, Waterline Data is teaming up with a number of other vendors, Streamsets, Trifacta, Arcadia, to be the initial sponsors of  or MBDW for short.

Read More

Topics: Big Data, Data Catalog

Feb 28, 2017

The Top 5 reasons why Big Data in the cloud will outstrip growth of Big Data on premise.

When it comes to integrating Big Data into your business and deriving value from data, it’s all about ease of deployment. At the end of the day, and especially for those of you who have been keeping up with my blog posts so far, Hadoop is just too damn hard. Anything that makes it easier to deploy a big data solution will win out. Empirical evidence supports this position as the number of requests we have been receiving at Waterline for cloud-based deployments vs. on-premises deployments has jumped significantly in the last two quarters. At least from the perspective of this vendor, whose customers are working on managing large quantities of data, there is a clear, measurable increase in demand for cloud deployments of big data projects.

Read More

Topics: Hadoop, on-premise, cluster, cloud-based, big data integration

Feb 16, 2017

Big Data Smack Down: Round 1 – Best of Breed Stacks vs. All-in-One Solutions

Ladies and gentlemen, boys and girls of all ages, welcome to the World Data Federation’s giant cage match! In the corner to your left we have the straggle-toothed veterans—Informatica, IBM, and Oracle—dragging their legacy architectures into the big data age with so called “end-to-end solutions” and only “one throat to choke.” In the other corner, to your right, we have the young scrappy up and comers—Waterline, Trifacta, Arcadia, and Streamsets—building their platforms from scratch to run natively on Hadoop and Spark, with modern REST API architectures that allow for easy integration to create a custom “best-of-breed” big data stack!

Read More

Topics: Big Data, Hadoop, Data Integration, Legacy Vendor Stack, Data management software

Feb 15, 2017

Introducing Smart Data Catalog 4.0 for Faster Use of Big Data

Today we’re very excited to announce early access availability of Smart Data Catalog 4.0, the latest version of the industry’s most trusted data catalog available.

Read More

Topics: Big Data, Big Data governance, Compliance, Smart Data Catalog, tag based security

Feb 9, 2017

There will continue to be a natural conflict between big data and business self service

Two trends in big data have been tugging at each other particularly hard for the past several years and frankly, I find it amazing that more people aren’t talking about how these two trends are almost fundamentally opposed to one another. On one hand, there is the push for big data, and the general sentiment is, more, more, more – more data at a faster pace and with greater variety. This all falls under the train of thought that if I get more data, I can find new and bigger insights that will change my company or perhaps the world for better.

Read More

Topics: Big Data, Big Data governance, Self service analytics, Hadoop, Data Redundancy, Smart Data Catalog

Feb 2, 2017

The Real Cost and Risk of Redundant Data

How big of a problem is data redundancy? If you are like most companies, it is much bigger than any one thing. 

Read More

Topics: Big Data, Data Redundancy

Jan 31, 2017

2017 Prediction #3: The CDO Finally Moves into the BOARD ROOM

This particular prediction will be a little difficult to prove, but I will start off by stating that I am definitely not alone in my opinion about the rise of the CDO. Gartner recently wrote in its second CDO Survey (The State of the Office of the CDO), “Data- and analytics-related crises will continue to plague enterprises that do not implement the chief data officer (CDO) role and the office of the CDO.” This was heartening to see. While this topic has long been under discussion by data nerds like me, it had never really reached the mainstream. I think that’s about to change. Organizations do seem to be waking up (finally) to the strategic value of data.

Read More

Topics: Big Data, Chief Data Officer


About this blog

Waterline Data is all about bringing self-service to finding, understanding, and governing Hadoop data. So you'll see articles here about that topic, its challenges, innovations, and comments from Waterline Data bloggers and guests on the state of the art.

Subscribe here


Learn more about the Waterline Data

Smart Data Catalog

download the Solution Overview