Ladies and gentlemen, boys and girls of all ages, welcome to the World Data Federation’s giant cage match! In the corner to your left we have the straggle-toothed veterans—Informatica, IBM, and Oracle—dragging their legacy architectures into the big data age with so called “end-to-end solutions” and only “one throat to choke.” In the other corner, to your right, we have the young scrappy up and comers—Waterline, Trifacta, Arcadia, and Streamsets—building their platforms from scratch to run natively on Hadoop and Spark, with modern REST API architectures that allow for easy integration to create a custom “best-of-breed” big data stack!
Today we’re very excited to announce early access availability of Smart Data Catalog 4.0, the latest version of the industry’s most trusted data catalog available.
Two trends in big data have been tugging at each other particularly hard for the past several years and frankly, I find it amazing that more people aren’t talking about how these two trends are almost fundamentally opposed to one another. On one hand, there is the push for big data, and the general sentiment is, more, more, more – more data at a faster pace and with greater variety. This all falls under the train of thought that if I get more data, I can find new and bigger insights that will change my company or perhaps the world for better.
How big of a problem is data redundancy? If you are like most companies, it is much bigger than any one thing.
This particular prediction will be a little difficult to prove, but I will start off by stating that I am definitely not alone in my opinion about the rise of the CDO. Gartner recently wrote in its second CDO Survey (The State of the Office of the CDO), “Data- and analytics-related crises will continue to plague enterprises that do not implement the chief data officer (CDO) role and the office of the CDO.” This was heartening to see. While this topic has long been under discussion by data nerds like me, it had never really reached the mainstream. I think that’s about to change. Organizations do seem to be waking up (finally) to the strategic value of data.
Topics: Waterline Data
As I continue these blog posts about my predictions for 2017, my mind wanders back to 2014 when Cloudera had just taken a massive investment from Intel, and Hortonworks had just gone public. Ah yes, I remember those good old days when Hadoop was going to eliminate the need for ETL, and data quality would no longer be needed, because the volume of data would make data quality issues operate at the noise level.
Do you know what DCE is or was? What about CORBA? These were distributed computing architectures designed to help create scalable applications. But most of you probably never heard of these technologies, because they never went anywhere. They more or less died on the vine. And do you know why? You guessed it: they were just too damn hard.
In my last post, I wrote how most organizations are relying on the street light method to find data. They’re searching for data only where they can see, not where the data might actually located. The result: they don’t know what data is even available. Tribal knowledge can help, but results are often spotty. People forget. People leave. People make mistakes.