What is recognizable about a particular artist’s style? What parts can be delegated to an assistant? Can AI play the role of assistant or even collaborator? How would we ever get enough data for training? How little data could we get away with?
Exploring these questions using GANs, image to image translation and extreme augmentation of very small image data sets in a series of experiments in human/AI cartoon drawing led not only to technical insights about machine learning methods, but also to insights in the underlying problem domain. A former Hollywood animator and professional sculptor, I learned things about drawing cartoon characters in the process of automating their generation that I hadn’t learned in decades of drawing them.
But back to ML…
The Big Problem of Big Data Requirements
A big problem with supervised machine learning is the need for huge amounts of labeled data. At least it’s a big problem if you don’t have the labeled data and even now, in a world awash with big data, most of us don’t. While a few companies have access to enormous quantities of certain kinds of labeled data, for most organizations and many applications, the creation of sufficient quantities of exactly the kind of labeled data desired is cost prohibitive or impossible. Sometimes the domain is one where there just isn’t much data. It might be the diagnosis of a rare disease or determining whether a signature matches a few known exemplars. Other times the volume of data needed and the cost of human labeling by Amazon Turkers or summer interns is just too high. Paying to label every frame of a movie length video adds up fast, even at a penny a frame.
With our first experiment, DragonPaint, we confront the problem of deep learning’s big labeled data requirements, using a rule based strategy for extreme augmentation of small data sets and and a borrowed TensorFlow image to image translation model to automate cartoon coloring with very limited training data.
How limited? Well, if I was going to write a program to color my cartoon characters for me, I didn’t want to draw and color a lifetime supply of cartoon characters just to train the model. The tens of thousands or hundreds of thousands deep learning models often require was out of the question. How many was I willing to draw? Maybe 30. So I drew a few dozen cartoon flowers and dragons and asked, could I somehow turn this into a training set?
Faced with a shortage of training data, we should first ask if there is a good non machine learning based approach to our problem. If there’s not a complete solution, is there a partial solution and would a partial solution do us any good? Do we even need machine learning to color flowers and dragons or can we specify geometric rules for coloring?
I can tell a kid how I want my drawings colored. Make the flower’s center orange and the petals yellow. Make the dragon’s body orange and the spikes yellow.
At first, that doesn’t seem helpful since our computer doesn’t know what a center or petal or body or spike is. But it turns out that we can define the flower or dragon parts in terms of connected components and get a geometric solution for coloring about 80% of our drawings. 80% isn’t enough but we can bootstrap from that partial rule based solution to 100% using strategic rule breaking transformations, augmentations and ML.
Experiment 1B: DragonPaint – Extreme Augmentations
It’s commonplace in computer vision to augment an image training set with geometric transformations like rotation, translation and zoom.
But what if we need to turn sunflowers into daisies or make a dragon’s nose bulbous or pointy?
Certain homeomorphisms of the unit disk make good daisies and Gaussian filters change a dragon’s nose. Both were extremely useful for creating augmentations for our data set, but they also started to change the style of the drawings in ways that an affine transformation could not.
By changing the character of lines and drawings, they inspired questions. What defines an artist’s style, either to an outside viewer or to the artist themselves? When does an artist adopt as their own a drawing they could not have made without the algorithm? When does the subject matter become unrecognizable? What’s the difference between a tool, an assistant and a collaborator?
Experiment 2: ComplexityLayers
In ComplexityLayers, we shifted from coloring to drawing and asked again how little we can draw, but this time we mean not just a small number of drawings but also only the simplest of drawings. We experiment with human/AI collaborative generation of increasingly complex characters through learning temporal stages of drawing complexity. Can the model be trained to draw the petals on a flower? The spikes down a dragon’s back? The spots on a giraffe?
How far can we go?
How little can we draw for input and how much variation and complexity can we create while staying within a subject and style recognizable as the artist’s. What would we need to do to make an infinite parade of giraffes? And if we had one, what could we do with it?
An Infinite Parade of Giraffes provides a gentle introduction to machine learning applied to computer vision while also offering techniques for the more advanced user.
A former ship designer, national lab mathematician and Hollywood special effects artist, Gretchen Greene is a computer vision scientist and machine learning engineer working with Cambridge startups on everything from wearables to welding. Greene has been interviewed by Forbes China, the Economist and the BBC and her plasma cut steel sculptures have appeared in Architectural Digest and sold to Dior. Greene has a CPhil and MS in math from UCLA and a JD from Yale.