Aaron Schumacher

Senior Data Scientist and Software Engineer - Deep Learning Analytics

Bio: Aaron Schumacher is a data scientist and software engineer for Deep Learning Analytics. He has taught with Python and R for General Assembly and the Metis data science bootcamp. Aaron has also worked with data at Booz Allen Hamilton, New York University, and the New York City Department of Education. He studied mathematics at the University of Wisconsin–Madison and teaching mathematics at Bard College. Aaron's career-best breakdancing result was advancing to the semi-finals of the R16 Korea 2009 individual footwork battle.

Word Vectors and SAT Analogies

This reminded me of SAT analogy questions, which disappeared from the SAT in 2005, but looked like this: PALTRY : SIGNIFICANCE :: A. redundant : discussion B. austere : landscape C. opulent : wealth D. oblique : familiarity E. banal : originality The king/queen example is not difficult, and I don’t know whether it was tested or discovered. […]

TensorFlow Clusters: Questions and Code

One way to think about TensorFlow is as a framework for distributed computing. I’ve suggested that TensorFlow is a distributed virtual machine. As such, it offers a lot of flexibility. TensorFlow also suggests some conventions that make writing programs for distributed computation tractable. When is there a cluster? A Hadoop or Spark cluster is generally […]

How NOT to program the TensorFlow Graph

Using TensorFlow from Python is like using Python to program another computer. Some Python statements build your TensorFlow program, some Python statements execute that program, and of course some Python statements aren’t involved with TensorFlow at all. Being thoughtful about the graphs you construct can help you avoid confusion and performance pitfalls. Here are a […]

TensorFlow and Queues

There are many ways to implement queue data structures, and TensorFlow has some of its own. FIFO Queue with a list In Python, a list can implement a first-in first-out (FIFO) queue, with slightly awkward syntax: >>> my_list = [] >>> my_list.insert(0, ‘a’) >>> my_list.insert(0, ‘b’) >>> my_list.insert(0, ‘c’) >>> my_list.pop() ‘a’ >>> my_list.pop() ‘b’ […]

Does TensorFlow Suffer from the Second-System Effect?

In The Mythical Man Month, Fred Brooks includes an essay called The Second-System Effect. The second-system effect describes two problems likely when building a new project like one you’ve done before: too many features: “frill after frill and embellishment after embellishment” focus on the wrong features: “a tendency to refine [obsolete] techniques” The “too many […]

Use only what you need from TensorFlow

There isn’t just one decision to use TensorFlow or not use TensorFlow; you have to make decisions about which pieces of TensorFlow you’re going to use. I’ve thought about whether Tensorflow suffers from the second-system effect, and my conclusion is that while TensorFlow has a huge abundance of features, it can’t really be said to […]

Scikit-learn trees with D3

The decision trees from scikit-learn are very easy to train and predict with, but it’s not easy to see the rules they learn. The code below makes it easier to see inside sklearn classification trees, enabling visualizations that look like this: This shows, for example, that all the irises with petal length (cm) less than […]