It feels good to be a data geek in 2017.
Last year, we asked “Is Big Data Still a Thing?”, observing that since Big Data is largely “plumbing”, it has been subject to enterprise adoption cycles that are much slower than the hype cycle. As a result, it took several years for Big Data to evolve from cool new technologies to core enterprise systems actually deployed in production.
In 2017, we’re now well into this deployment phase. The term “Big Data” continues to gradually fade away, but the Big Data space itself is booming. We’re seeing everywhere anecdotal evidence pointing to more mature products, more substantial adoption in Fortune 1000 companies, and rapid revenue growth for many startups.
Meanwhile, the froth has indisputably moved to the machine learning and artificial intelligence side of the ecosystem. AI experienced in the last few months a “Big Bang” in collective consciousness not entirely dissimilar to the excitement around Big Data a few years ago, except with even more velocity.
2017 is also shaping up to be an exciting year from another perspective: long-awaited IPOs. The first few months of this year have seen a burst of activity for Big Data startups on that front, with warm reception from the public markets.
All in all, in 2017 the data ecosystem is firing on all cylinders. As every year, we’ll use the annual revision of our Big Data Landscape to do a long-form, “State of the Union” roundup of the key trends we’re seeing in the industry.
Let’s dig in.
High level trends
Big Data + AI = The New Stack
As any VC privileged to see many pitches will attest, 2016 was the year when every startup became a “machine learning company”, “.ai” became the must-have domain name, and the “wait, but we do this with machine learning” slide became ubiquitous in fundraising decks.
Faced with an enormous avalanche of AI press, panels, newsletters and tweets, many people who had a long standing interest in machine learning reacted the way one does when your local band suddenly becomes huge: on the one hand, pride; on the other hand, a distinct distaste for all the poseurs who show up late to the party, with ensuing predictions of impending gloom.
While it’s easy to poke gentle fun at the trend, the evolution is both undeniable and major: machine learning is quickly becoming a key building block for many applications.
We’re witnessing the emergence of a new stack, where Big Data technologies are used to handle core data engineering challenges, and machine learning is used to extract value from the data (in the form of analytical insights, or actions).
In other words: Big Data provides the pipes, and AI provides the smarts.
Of course, this symbiotic relationship has existed for years, but its implementation was only available to a privileged few.
The democratization of those technologies has now started in earnest. “Big Data + AI” is becoming the default stack upon which many modern applications (whether targeting consumers or enterprise) are being built. Both startups and some Fortune 1000 companies are leveraging this new stack (see for example, JP Morgan’s “Contract Intelligence” application here).
Often, but not always, the cloud is the third leg of the stool. This trend is precipitated by all the efforts of the cloud giants, who are now in an open war to provide access to a machine learning cloud (more on this below).
Does democratization of AI mean commoditization in the short term? The reality is that AI remains technically very hard. While many engineers are scrambling to build AI skills, deep domain experts are, as of now, still in very rare supply around the world.
However, there is no reversing this democratization trend, and machine learning is going to evolve from competitive advantage to table stakes sooner or later.
This has consequences both for startups and large companies. For startups: unless you’re building AI software as your final product, it’s quickly going to become meaningless to present yourself as a “machine learning company”. For large organizations: if you’re not actively building a Big Data + AI strategy at this point (either homegrown or by partnering with vendors), you’re exposing yourself to obsolescence. People have been saying this for years about Big Data, but with AI now running on top of it, things are accelerating in earnest.
Enterprise Budgets: Follow the Money
In our conversations with both buyers and vendors of Big Data technologies over the last year, we’re seeing a strong increase in budgets allocated to upgrading core infrastructure and analytics in Fortune 1000 companies, with a key focus on Big Data technologies. Analyst firms seem to concur – IDC expects the Big Data and Analytics market to grow from $130 billion in 2016 to more than $203 billion in 2020.
Many buyers in Fortune 1000 companies are increasingly sophisticated and discerning when it comes to Big Data technologies. They have done a lot of homework over the last few years, and are now in full deployment mode. This is now true across many industries, not just the more technology-oriented ones.
This acceleration is further propelled by the natural cycle of replacement of older technologies, which happens every few years in large enterprises. What was previously a headwind for Big Data technologies (hard to rip and replace existing infrastructure) is now gradually turning into a tailwind (“we need to replace aging technologies, what’s best in class out there?”).
Certainly, many large companies (“late majority”) are still early in their Big Data efforts, but things now seem to be evolving quickly.
Enterprise Data moving to the Cloud
As recently as a couple of years ago, suggestions that enterprise data could be moving to the public cloud were met with “over my dead body” reactions from large enterprise CIOs, except perhaps as a development environment or to host the odd non-critical, external-facing application.
The tone seems to have started to change, most noticeably in the last year or so. We’re hearing a lot more openness – a gradual acknowledgement that “our customer data is already in the cloud in Salesforce anyway” or that “we’ll never have the same type of cyber-security budget as AWS does” – somewhat ironic considering that security was for many years the major strike against the cloud, but a testament to all the hard work that cloud vendors have put into security and compliance (HIPAA).
Undoubtedly, we’re still far from a situation where most enterprise data goes to the public cloud, in part because of legacy systems and regulation.
However, the evolution is noticeable, and will keep accelerating. Cloud vendors will do anything to facilitate it, including sending a truck to get your data.
The 2017 Big Data Landscape
Without further ado, here’s our 2017 landscape.
To see the landscape at full size, click here. The image is high-res and should lend itself to zooming in well. To download the full-size image, click here. To view a full list of companies in spreadsheet format, click here.
This year again, my FirstMark colleague Jim Hao provided immense help with the landscape.
We’ve detailed some of our methodology in the notes to this post. Thoughts and suggestions welcome – please use the comment section to this post.
Is Consolidation Coming?
As the Big Data landscape gets busier every year, one obvious question comes to mind: is the industry on the verge of a massive wave of consolidation?
It doesn’t appear so, at least for now.
First, venture capitalists continue to be happy to fund both new and existing companies. The first few months of 2017 saw a flurry of announcements of big funding rounds for growth stage Big Data startups: Looker ($81M Series D), InsideSales ($50M Series F), DataRobot ($54M Series C), Confluent ($50M Series C), Collibra ($50M Series C), Uptake ($40M Series C), WorkFusion ($35M Series D) and MapD ($35M Series B). Also notable is DataBricks, which raised a $60M Series C in December.
It’s worth noting that activity in the space is truly global, with great companies being built and funded in Europe, Israel (e.g. Voyager Labs), China (iCarbonX), etc.
Second, since our 2016 landscape, M&A activity has been steady but not particularly remarkable, perhaps in part because private company valuations have remained lofty. Looking at our 2016 Big Data landscape, 41 companies were acquired (see the Notes at the end of this post for a full list), which is roughly consistent in terms of pace with the previous year.
On the other hand, we’ve seen some big ticket acquisitions in 2017 so far, including Mobileye (acquired by Intel for $15.3B), AppDynamics (Cisco; $3.7B), and Nimble Storage (HPE; $1.2B).
Last year was also dominated by a phenomenon that may not last much longer: large tech companies gobbling up AI startups left and right, particularly those tackling horizontal problems with great teams. Some examples: Turi (Apple), Magic Pony (Twitter), Viv Labs (Samsung), MetaMind (Salesforce), Geometric Intelligence (Uber), API.ai (Google) and Wise.io (GE). While they make horizontal AI startups a tricky category to invest in from a VC perspective, those quick acquisitions probably correspond to a specific moment in time dominated by hype and scarcity of AI engineers.
Third, some of the larger Big Data startups are becoming self-standing, public companies. SNAP arguably led the revival of the IPO market for tech companies, but so far Big Data companies are the ones capitalizing on the opportunity.
While in 2016 Talend was the lone Big Data company to go public, 2017 so far is turning into an IPO bonanza for the space. Mulesoft and Alteryx went out and did very well, both well over the IPO prices. As of the time of writing, Cloudera is about to go out, and the gap between its last private valuation ($4.1B) and revenue ($261M in 2016) will test the “unicorn” valuation phenomenon. MapR, as well as location intelligence company Yext, are lined up as well.
Who’s next? Palantir, after years of being one of the most secretive companies in the industry, expressed an interest in going public (some details here). As Palantir’s most recent private valuation was $20 billion, this could be a blockbuster IPO, if its public valuation landed anywhere near those levels.
Fighting the Cloud Wars
The industry may not imminently consolidate through failure or acquisition, but there are increasing signs of “functional consolidation”, particularly in the cloud. Some of the key players there are gradually building a consolidated “Big Data + AI” offering that covers many bases, both through homegrown products and their own implementations of popular open source compute engines, thereby getting increasingly closer to the “one-stop shop” that many buyers have been hoping for.
Amazon Web Services, in particular, continues to impress by the sheer velocity and breadth of its product releases. At this stage, it offers pretty much all things Big Data and AI, including analytics frameworks, real time analytics, databases (NoSQL, graph, etc.), business intelligence, as well as increasingly rich AI capabilities, particularly in deep learning (see full list here). At this rate, there will soon be an AWS product in almost every infrastructure and analytics box in our Big Data Landscape.
Google, late to the cloud party, has also been aggressively building a wide offering in Big Data (BigQuery, Dataflow, Dataproc, Datalab, Dataprep, etc.), and views AI as a way to leapfrog competitors. Google has had a lot to announce on the AI front over the last year, including: a new translation engine (here), the hiring of two great AI experts, Fei-Fei Li and Jia Li, to lead its newly created Cloud AI and Machine Learning group, a new machine learning API for video recognition (here) and the acquisition of data scientist community Kaggle.
The larger enterprise IT vendors – Microsoft, IBM, SAP, Oracle and Salesforce in particular – are also pushing hard with Big Data (and occasionally, AI) offerings, both in the cloud (most noticeably, Microsoft) and on prem. In addition to homegrown technology building efforts, and some acquisitions, there seems to be an increasing desire to partner, especially between companies that “have the data” (repositories) and companies that “have the AI”. Some noteworthy partnerships are IBM and Salesforce (here) and SAP and Google (here).
Cloud vendors are still small by enterprise IT industry standards, but the convergence between their growing ambitions (including their clear interest to go up the enterprise stack from IaaS to applications) and the gradual move of enterprise data to the cloud opens the door to an all out war with legacy IT vendors for the control of the gigantic enterprise technology market, with Big Data and AI at its core battlefield.
I'm a venture capital investor at FirstMark in New York. I invest in early-stage technology startups, mostly at the Series A level, anywhere in the US and in Europe. I feel incredibly privileged to get to partner with outstanding entrepreneurs, and I work hard to help them succeed.
ODSC’s Accelerate AI focuses on three key areas: Innovation, Expertise, and Management. Learn what the latest advances in AI and applied data science are, how they can affect your company, and how to build an effective team around their potential. Ready to learn more? Learn more here.
- Deep Learning in R with Keras 123 views | by Daniel Gutierrez, ODSC | under Conferences, Deep Learning, Modeling, R, Tools & Languages
- Cracking the Box: Interpreting Black Box Machine Learning Models 71 views | by Yuriy Gavrilin | under Machine Learning, Modeling
- 7 Reasons Your Data Science Resume is Suboptimal 44 views | by ODSC Team | under Career Insights, Featured Post