fbpx
TensorFlow as a Distributed Virtual Machine TensorFlow as a Distributed Virtual Machine
TensorFlow has a flexible API, and it has automatic differentiation, and it can run on GPUs. But the thing that’s really... TensorFlow as a Distributed Virtual Machine

TensorFlow has a flexible API, and it has automatic differentiation, and it can run on GPUs. But the thing that’s really neat about TensorFlow is that it gives you a fairly general way to easily program across multiple computers.

TensorFlow’s distributed runtime, the big bottom box in this figure from the 2016 paper “TensorFlow: A system for large-scale machine learning”, is the part of TensorFlow that runs the computation graph.

layered TensorFlow architecture

The computation graph, specified with protocol buffers, is much higher level than Java virtual machine (JVM) bytecode. But I think it’s interesting to think of the TensorFlow distributed runtime as a sort of virtual machine. This is not a whole system virtual machine, but a process virtual machine, like the JVM, to “execute computer programs in a platform-independent environment.” The Python API can be like a domain-specific language for programming the TensorFlow graph.

Unlike the JVM, and unlike any other system I know, TensorFlow lets you directly put computation on multiple machines, pretty much however you want, and then it quietly handles all the details for you. Wherever it needs to, TensorFlow adds send and receive nodes to allow the graph to be executed as specified. This is shown in this figure from the 2015 paper “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems” (also available on arXiv and commonly linked as the PDF on tensorflow.org).

distributing a TensorFlow graph

The box labels in the figure are for devices, but with TensorFlow they could be on the same machine or on different machines, and it hardly matters.

The flexibility and ease that result are incredible. It isn’t terribly hard to think of implementing your own map-reduce system using TensorFlow. It might even be reasonably performant, if you have a distributed file system! The common TensorFlow distributed model training techniques are a kind of map-reduce, after all. But you could do pretty near anything! This is pretty cool.

Originally posted at planspace.org/

Aaron Schumacher

Aaron Schumacher

Aaron Schumacher is a data scientist and software engineer for Deep Learning Analytics. He has taught with Python and R for General Assembly and the Metis data science bootcamp. Aaron has also worked with data at Booz Allen Hamilton, New York University, and the New York City Department of Education. He studied mathematics at the University of Wisconsin–Madison and teaching mathematics at Bard College. Aaron's career-best breakdancing result was advancing to the semi-finals of the R16 Korea 2009 individual footwork battle.

1