matmul() is eating software
BlogRposted by Joseph Reisinger, Co-founder - Premise September 5, 2017
Last week Zak Stone from Google Brain gave a talk at South Park Commons where he wove together a bunch of threads that are shaping future machine learning progress: TensorFlow, XLA, Cloud TPUs, TFX, and TensorFlow Lite; he also hinted at even more exciting stuff not quite ready for public consumption. (Fun fact about Zak: he was in the literally first ever YC batch.)
As a platform, TensorFlow is an ambitious bet on being everything to everyone: fast, flexible and production-ready (choose three). Execution and experimentation must be a fast loop to enable engineering productivity, static computation graphs are expressed in a high level language (Python) for flexibility, while graph compilation allows precision optimization for specific target architectures.
As an open-source project, TensorFlow has been incredibly successful, garnering over 20,000 commits since November 2015. The main TensorFlow GitHub repository is synced bidirectionally at least once a week with Google’s internal mirror and has received major and minor contributions from engineering teams at Intel, Microsoft, IBM, RStudio, Minds.ai, and other companies.
In terms of reach, TensorFlow Lite will increase the efficiency of TensorFlow models on mobile and embedded devices later this year, and projects like XLA are even more ambitious: XLA supports ahead-of-time and just-in-time compilation of the linear algebra primitives underlying deep learning to generate accelerated machine code for any target backend system. The promise of XLA is a quantum leap for hierarchical optimization not just on GPGPUs, but on any arbitrary architectures that can parallelize linear algebra primitives.
Internally at Google, TensorFlow is used in a staggering number of projects following Sundar Pichai’s call to become a true “AI-first” company. And the trend towards machine learning-based software is accelerating not just at Google: Amazon, Apple, Baidu, Facebook, Microsoft, Salesforce, Uber, Lyft — nearly every major tech company — has hired dedicated research teams to help push machine learning into production. And along with these major players, platforms for deep learning are also proliferating: PyTorch and Caffe2 from Facebook, CNTK from Microsoft, Core ML from Apple and MXNet from Amazon, just to name a few.
What does Software Engineering look like in 10 years?
With the rise of machine learning frameworks, the clean abstractions and modular design patterns are being replaced by high-dimensional floating-point tensors and efficient matrix multiplication. As this trend continues, it is fundamentally altering the practice of software engineering.
In “Machine Learning: The High-Interest Credit Card of Technical Debt”, D Sculley maps out the myriad of ways that machine learning systems encourage (or worse, necessitate) poor software design choices. These systems “have all the basic code complexity issues as normal code, but also have a larger system-level complexity that can create hidden debt.”
Machine learning systems erode model boundaries and abstraction by tightly coupling all system inputs: the desired behavioral invariants flow not from software logic, but from the specific external data driving them. Although tools exist to identify dependencies in code via static analysis and linkage graphs, in general such tools are not yet available for analyzing data dependencies.
D et al. discuss several system-design anti-patterns that will resonate with machine learning practitioners:
- Glue code system design pattern, “in which a massive amount of supporting code is written to get data into and out of general-purpose packages.”
- Pipeline jungles, which evolve organically over time where the data preparation system “may become a jungle of scrapes, joins, and sampling steps, often with intermediate files output.”
- Configuration debt, which accumulates as systems and pipelines mature, collecting “a wide range of configurable options, including which features are used, how data is selected, a wide variety of algorithm-specific learning settings, potential pre- or post-processing, verification methods, etc.”
And even in smaller, less complicated projects, practitioners struggle with issues related to:
- Model architecture and weights versioning during experimentation — particularly when models are partially pre-trained with a different regime or weights are borrowed from other runs;
- Data source and feature versioning;
- Domain-shifts between the experimentation environment and production deployment;
- Monitoring inference quality in production.
One answer may be found in TFX, an internal platform developed at Google for distributing and serving machine learning models in production:
Creating and maintaining a platform for reliably producing and deploying machine learning models requires careful orchestration of many components — a learner for generating models based on training data, modules for analyzing and validating both data as well as models, and finally infrastructure for serving models in production. This becomes particularly challenging when data changes over time and fresh models need to be produced continuously. Unfortunately, such orchestration is often done ad hoc using glue code and custom scripts developed by individual teams for specific use cases, leading to duplicated effort and fragile systems with high technical debt.
TFX standardizes these patterns and components and integrates them into a single platform that
simplifies the platform configuration, and reduces the time to production from the order of months to weeks, while providing platform stability that minimizes service disruptions.
Some parts of TFX have already been open-sourced, including TensorFlow Serving and tf.Transform.
What does hardware look like in 10 years?
Moore’s Law is slowing down and we’re poised to re-enter the Golden Age of Architecture, seeing rapid development across a wider variety of chips and instruction sets. Companies like Nervana (Intel), NVIDIA, Cerebras, and Google are all working on next-gen hardware architectures to accelerate linear algebra for machine learning. And by default, each of these architectures would typically require its own low-level, hand-optimized primitive libraries a la cuDNN. Combatting this trend will require enormous community effort around more general compiler frameworks such as XLA.
Google’s Tensor Processing Units (TPUs) are perhaps the farthest along in becoming a generally available alternative to the current GPGPU hegemony. Each Cloud TPU provides up to 180 teraflops of floating-point performance, 64 GB of ultra-high-bandwidth memory and can be connected together. Unlike previous supercomputer architectures, TPUs are designed from the ground up to realize peak-performance on the linear algebra workloads that are common in machine learning.
TPUs are integrated with TensorFlow, and Google provides both a paid hosted infrastructure option (Cloud TPU) as well as a grant program for ML experts who want early access to the hardware and are willing to share their research with the world via publications and open-source software:
To accelerate the pace of open machine-learning research, we are introducing the TensorFlow Research Cloud (TFRC), a cluster of 1,000 Cloud TPUs that will be made available free of charge to support a broad range of computationally-intensive research projects that might not be possible otherwise.
Graph computation and deep learning libraries such as TensorFlow are a major driving force behind the future of computing, requiring us to rethink systems architecture, from hardware to compilers to higher level programming languages and design patterns.
It is incredibly humbling to see the sheer amount of work ahead of us as software architects, engineers, researchers and practitioners, but it is also an incredibly exciting time to be working in AI. As Zak summarized this hope:
When I was in grad school, most of these amazing new applications weren’t even possible — what will it be like when people can take this machine learning technology for granted and start doing things we can’t even envision now? What will the first wave of TensorFlow-native products be?
This is a summary of a talk Zak Stone gave at the South Park Commons AI Speaker Series titled “TensorFlow, Cloud TPUs, and ML Progress.” Selected slides from the talk provided by Google and used with permission on the original posting here.