The Waterline Data Blog

Apr 19, 2017

Search is NOT a Data Catalog

In talking with prospective customers, I am finding there is a lot of confusion about what makes a data catalog. Lately, we’ve seen some companies consider taking their search technology and calling their search product a “data catalog”, which is not entirely accurate. This is happening purely because the enterprise search market is being overrun by open source phenom, Solr and companies that were once in the enterprise search space are looking for a new home. 

Read More

Topics: Data Catalog, crowdsourcing, data management, automated tagging

Apr 12, 2017

Danger, Will Robinson! GDPR is Coming!

Or so the robot from Lost in Space might say. Though, as organizations begin working toward GDPR readiness, many may find themselves instead quoting Dr. Smith: “Oh the pain, the pain!”

Read More

Topics: Big Data governance, GDPR, Big Data compliance

Apr 4, 2017

How data management is like cleaning up after my teenage daughters

Do you (or your organization) waste too much time searching for data and not enough time actually using it? Me too. In fact, most Chief Data Officers I speak with complain of the same problem and I’d estimate that in general, 9 out of 10 CDOs complain about this particular issue. But when you consider the way most companies manage their data, this shouldn’t really come as much of a shock. 

Read More

Topics: Big Data discovery, Data Catalog, Data management software, Data Redundancy, data warehousing

Mar 28, 2017

It’s time to look in the mirror and confess, “I am a data hoarder, and I need help.”

Data hoarding organizations like yours are risking too much.  It’s time for hoarding organizations to shut down their data museums. 

Read More

Topics: data hoarding, data storage, data encryption, data discovery

Mar 21, 2017

Yada, Yada, Yada. Why Are We Still Just Talking about the Dangers of Dark Data?

Last week I came upon an article entitled, “Beware the Dangerous Databerg Lurking Beneath Your Business' Surface.” Apparently awareness of all the ways dark data can damage a business is still incredibly limited—so much so that Veritas actually determined there was a need to post this sponsored piece on

Read More

Topics: Big Data, Big Data governance, Dark Data, Big Data compliance, Big Data security

Mar 16, 2017

Top 4 reasons you need a data catalog

I’ve been reading a lot lately on the thoughts of different industry analysts about the importance of data catalogs or information catalogs and why you should have one. The data governance market is expected to double in size over the next five years, with data governance solutions such as data catalogs holding the largest market share, so from a market standpoint, this is no surprise. But most of the arguments I hear from analysts are built around a semi-technical argument that can be summed up as “good metadata is good, bad metadata is bad” or what I like to think of as the “four legs good, two legs bad” argument. 


Given I spend my days steeped in data cataloging, I figure it is about time I share with you…
Read More

Topics: Big Data discovery, Data Catalog, Data Redundancy, access control, data hoarding, GDPR

Mar 10, 2017

Are we currently living in the Dark Ages of big data?

 They called it the Dark Ages. The period following the decline of the Western Roman Empire.


Perhaps one day, data experts will look back on today and call this our dark ages. The Dark Ages of Big Data.


Perhaps they will marvel over the obsession many organizations have in hoarding data—especially given their near equal apathy when it comes to actually putting all that data to use.


Over the past few months, travel industry Big Data specialist Mark Ross-Smith has written about some of the excellent ways airlines, hotels and other travel and hospitality related businesses could be converting data,  which is otherwise just sitting around collecting dust, into new high game-changing revenue. He calls it the billion dollar opportunity hidden in plain sight.

Read More

Topics: Big Data, Big Data governance, Hadoop, Smart Data Catalog

Mar 6, 2017



Companies are struggling to make big data work.  In fact, most of the companies we work with at Waterline have come to use after their initial big data efforts have failed.  And why is that?  It is because in many cases the Hadoop vendors have over promised and under delivered. 


Hadoop and Spark are incredible technologies.  But they don’t solve a complete end to end problem and too many people have been misinformed.  As a result, Waterline Data is teaming up with a number of other vendors, Streamsets, Trifacta, Arcadia, to be the initial sponsors of  or MBDW for short.

Read More

Topics: Big Data, Data Catalog

Feb 28, 2017

The Top 5 reasons why Big Data in the cloud will outstrip growth of Big Data on premise.

When it comes to integrating Big Data into your business and deriving value from data, it’s all about ease of deployment. At the end of the day, and especially for those of you who have been keeping up with my blog posts so far, Hadoop is just too damn hard. Anything that makes it easier to deploy a big data solution will win out. Empirical evidence supports this position as the number of requests we have been receiving at Waterline for cloud-based deployments vs. on-premises deployments has jumped significantly in the last two quarters. At least from the perspective of this vendor, whose customers are working on managing large quantities of data, there is a clear, measurable increase in demand for cloud deployments of big data projects.

Read More

Topics: Hadoop, on-premise, cluster, cloud-based, big data integration

Feb 16, 2017

Big Data Smack Down: Round 1 – Best of Breed Stacks vs. All-in-One Solutions

Ladies and gentlemen, boys and girls of all ages, welcome to the World Data Federation’s giant cage match! In the corner to your left we have the straggle-toothed veterans—Informatica, IBM, and Oracle—dragging their legacy architectures into the big data age with so called “end-to-end solutions” and only “one throat to choke.” In the other corner, to your right, we have the young scrappy up and comers—Waterline, Trifacta, Arcadia, and Streamsets—building their platforms from scratch to run natively on Hadoop and Spark, with modern REST API architectures that allow for easy integration to create a custom “best-of-breed” big data stack!

Read More

Topics: Big Data, Hadoop, Data Integration, Legacy Vendor Stack, Data management software


About this blog

Waterline Data is all about bringing self-service to finding, understanding, and governing Hadoop data. So you'll see articles here about that topic, its challenges, innovations, and comments from Waterline Data bloggers and guests on the state of the art.

Subscribe here


Learn more about the Waterline Data

Smart Data Catalog

download the Solution Overview