Deep learning has massive potential, but it’s often hard to achieve. These models’ complexity requires extensive training before they’re ready for use, making implementation long and often expensive.
Facebook AI Research (FAIR) and the University of Guelph may have found a potential solution. In a recent research paper, the team outlined a new open-source model to predict the initial parameters for deep learning networks. The updated model, dubbed Graph HyperNetworks (GHN-2), could substantially impact the machine learning industry.
What Is GHN-2?
GHN-2 aims to simplify and speed up neural network development. It does this by predicting an unseen system’s architecture’s parameters, replacing bespoke optimizers and manual processes.
The researchers started by creating a dataset of 1 million neural network architectures called DeepNets-1M. They then trained a modified graph hyper-network (GHN) on the dataset using meta-learning. This method took inspiration from recent work on another algorithm: Differentiable Architecture Search (DARTS).
The researchers then tested GHN-2 by using it to start a 24 million-parameter machine learning model based on ResNet-50 architecture. GHN-2 produced parameters with 60% accuracy on the CIFAR-10 database with no gradient updates. Satisfied with these results, the team released the open-source model and code along with the DeepNets-1M database and several benchmarks.
Benefits of the GHN-2 Model
Most deep learning training projects are long, expensive processes. Popular optimization algorithms like Adam or stochastic gradient descent (SGD) can take hours of calculation to find the parameters that minimize a model’s loss function. By contrast, GHN-2 can perform the same task in less than a second.
This also translates into energy savings. Running these complex computations takes a considerable amount of power, most of which comes from fossil fuels. Consequently, training a large deep learning model can generate 626,000 pounds of CO2, the same as five cars’ lifetime emissions.
Those emissions are concerning, considering 77% of consumers today use some form of AI technology. Deep learning algorithms that aren’t energy-efficient could cause considerable harm to the environment. Since GHN-2 works so much faster than traditional methods, it provides a solution.
In addition to being remarkably fast, GHN-2 is also highly accurate. The researchers tested its reliability by comparing its performance to two other meta-models and parameters from traditional SGD with two datasets: ImageNet and CIFAR-10.
GHN-2 substantially outperformed both the other meta-models and SGD. One forward pass of GHN-2 showed similar accuracy to 2,500 SGD iterations with CIFAR-10 and 5,000 iterations with ImageNet.
What Open-Sourcing GHN-2 Means for Machine Learning
Now that this model is open-source, it opens the door for widespread benefits across the machine learning industry. Data scientists will spend much less time defining parameters so they can produce reliable deep learning models faster. Swifter deployment would then help realize deep learning’s full potential sooner.
These benefits address many companies’ leading barriers to machine learning adoption. For example, 32% of surveyed organizations say limited budgets are their most significant barrier, and 22% cite difficulty building and maintaining models. GHN-2’s speed and accuracy help overcome those obstacles.
Less time setting parameters means less work and money going into development. As a result, companies can comfortably deploy deep learning models with lower budgets and less expertise. Machine learning as a practice could reach new heights as these benefits attract new adopters.
Despite its benefits, GHN-2 isn’t perfect. Its most notable limitation is that you have to train a new meta-model for each domain-specific dataset. That could limit the extent of its speed, its primary advantage.
If you’re using GHN-2 for the provided DeepNets-1M dataset or one you’ve already tried it with, it will perform as expected. However, if you wanted to use the meta-model on a new domain-specific dataset, you’d need to retrain it. That could take time and effort, which GHN-2 normally saves.
While GHN-2 showed impressive accuracy during tests, it may not be precise with every architecture. The researchers warned that while it will always outperform random choices, it may not be accurate after one pass in some architectures.
Deep Learning Is Advancing
Not long ago, deep learning was little more than theoretical. Now, it’s a growing part of data science, and advances like the GHN-2 model make it more viable than ever.
GHN-2 isn’t perfect, but it still represents an impressive step forward. Similar projects stemming from the meta-model’s open-source code could take this innovation further. Deep learning could become more affordable and accessible as a result.