In talking with prospective customers, I am finding there is a lot of confusion about what makes a data catalog. Lately, we’ve seen some companies consider taking their search technology and calling their search product a “data catalog”, which is not entirely accurate. This is happening purely because the enterprise search market is being overrun by open source phenom, Solr and companies that were once in the enterprise search space are looking for a new home.
Or so the robot from Lost in Space might say. Though, as organizations begin working toward GDPR readiness, many may find themselves instead quoting Dr. Smith: “Oh the pain, the pain!”
Do you (or your organization) waste too much time searching for data and not enough time actually using it? Me too. In fact, most Chief Data Officers I speak with complain of the same problem and I’d estimate that in general, 9 out of 10 CDOs complain about this particular issue. But when you consider the way most companies manage their data, this shouldn’t really come as much of a shock.
Data hoarding organizations like yours are risking too much. It’s time for hoarding organizations to shut down their data museums.
Last week I came upon an article entitled, “Beware the Dangerous Databerg Lurking Beneath Your Business' Surface.” Apparently awareness of all the ways dark data can damage a business is still incredibly limited—so much so that Veritas actually determined there was a need to post this sponsored piece on CIO.com.
Given I spend my days steeped in data cataloging, I figure it is about time I share with you…
They called it the Dark Ages. The period following the decline of the Western Roman Empire.
Perhaps one day, data experts will look back on today and call this our dark ages. The Dark Ages of Big Data.
Perhaps they will marvel over the obsession many organizations have in hoarding data—especially given their near equal apathy when it comes to actually putting all that data to use.
Over the past few months, travel industry Big Data specialist Mark Ross-Smith has written about some of the excellent ways airlines, hotels and other travel and hospitality related businesses could be converting data, which is otherwise just sitting around collecting dust, into new high game-changing revenue. He calls it the billion dollar opportunity hidden in plain sight.
Companies are struggling to make big data work. In fact, most of the companies we work with at Waterline have come to use after their initial big data efforts have failed. And why is that? It is because in many cases the Hadoop vendors have over promised and under delivered.
Hadoop and Spark are incredible technologies. But they don’t solve a complete end to end problem and too many people have been misinformed. As a result, Waterline Data is teaming up with a number of other vendors, Streamsets, Trifacta, Arcadia, to be the initial sponsors of www.MakeBigDatawork.org or MBDW for short.
When it comes to integrating Big Data into your business and deriving value from data, it’s all about ease of deployment. At the end of the day, and especially for those of you who have been keeping up with my blog posts so far, Hadoop is just too damn hard. Anything that makes it easier to deploy a big data solution will win out. Empirical evidence supports this position as the number of requests we have been receiving at Waterline for cloud-based deployments vs. on-premises deployments has jumped significantly in the last two quarters. At least from the perspective of this vendor, whose customers are working on managing large quantities of data, there is a clear, measurable increase in demand for cloud deployments of big data projects.
Ladies and gentlemen, boys and girls of all ages, welcome to the World Data Federation’s giant cage match! In the corner to your left we have the straggle-toothed veterans—Informatica, IBM, and Oracle—dragging their legacy architectures into the big data age with so called “end-to-end solutions” and only “one throat to choke.” In the other corner, to your right, we have the young scrappy up and comers—Waterline, Trifacta, Arcadia, and Streamsets—building their platforms from scratch to run natively on Hadoop and Spark, with modern REST API architectures that allow for easy integration to create a custom “best-of-breed” big data stack!