In a blog post last week, Google Research introduced AdaTape, a new AI approach with transformer-based architecture that utilizes adaptive computation. AdaTape uses its adaptive function to use create an elastic input sequence that can modulate its computational budget.
According to the accompanying research paper, AdaTape directly injects adaptivity into the input sequence instead of the model depth. It also uses an adaptive tape reading mechanism. This is to determine various tape tokens that are added to each input which are based on the input’s complexity.
The blog mentions that AdaTape uses a vector representation to represent each input to select a variable-sized sequence of tape tokens dynamically. The team at Google Research goes on to state that AdaTape creates what’s called “tape blank” to store all the candidate tape tokens.
To create the tape banks, researchers used two methods. The first is an input-driven bank. The way it works is that the input-driven bank extracts a bank of tokens from the input while employing a different approach than the original model tokenizer for mapping the raw input to a sequence of input tokens.
The second method is the learnable bank. It’s a more general method for generating the tape bank by using a set of trainable vectors as tape tokens. Once all of this is done, the tape tokens produced are appended with the original input and sent to the transformer.
Then, the two feed-forward networks are used. One is used for original input, and the other for all tape tokens. The researchers observed slightly better quality using separate feed-forward networks for input and tape tokens.
Google’s team found that AdaTape can outperform all baselines incorporating recurrence within its input selection mechanism. They also evaluated AdaTape on image classification tasks. During the test, AdaTape was tested on ImageNet-1K. They found that in terms of quality and cost tradeoff, AdaTape performs much better than the alternative adaptive transformer baselines.
According to Google’s conclusion, they found that AdaTape has the protentional to solve tasks that are both challenging for standard transformers and existing adaptive transformers. If you’re interested in learning more, you can read the paper here, and Google’s post here.
Editor’s Note: If you’re interested in the latest in transformers, large language models, and AI, then you don’t want to miss ODSC West 2023. Learn from the leading experts as they arrive in San Francisco as they dive into the topics. Get your in-person or virtual pass today!