Going to the Bank: Using Deep Learning For Banking and the Financial Industry Going to the Bank: Using Deep Learning For Banking and the Financial Industry
At ODSC London 2018, Pavel Shkadzko explained to the audience how Gini GmbH, where he works as a semantics engineer, uses... Going to the Bank: Using Deep Learning For Banking and the Financial Industry

At ODSC London 2018, Pavel Shkadzko explained to the audience how Gini GmbH, where he works as a semantics engineer, uses deep learning to automate information extraction from financial documents, such as invoices.

By applying deep learning to tasks historically handled by optical character recognition and clever regular expression hacks, Shkadzko and his team found that they were able to outperform the most common conventional approaches, with equitable or better performance across a range of different tasks.

Shkadzko explained that one of the main motivations behind their experiment was creating a more robust approach to information extraction from documents that are poorly scanned. Many of the invoices that his company receives are photographed in ways that traditional methods are poorly suited to handle, especially where the invoice is photographed at an angle or has poor penmanship.

These are some of the key highlights from Skhadzko’s speech:


LSTM Models Work Pretty Well!

Shkadzko and his team started with standard feedforward network configurations, earning approximately 60 percent accuracy at best. Recurrent and convolutional configurations proved competitive with their existing approach leveraging regular expressions and conditional random fields. LSTM models achieved up to 82 percent accuracy. The model employed pre-trained embeddings extracted from 100,000 training documents.


Bigger Isn’t Always Better

For applications where speed is important, extremely large networks are often untenable. Rules-based approaches often make the most sense in situations where the tradeoff between speed and accuracy leans towards the former.

Shkadzko’s team tried an approach that illustrates this point. Two LSTM network models were combined, one reading the document left-to-right and the other right-to-left. The outputs of these networks were then concatenated: the word vector for the observed word, the last word in the sequence, and the positional information for both words. This was then pushed into a word-level LSTM, which produced the final prediction. This was an extremely complex model that was much slower than single-network experiments, while only producing a four percent gain in accuracy. In short, bigger isn’t always better.


Deep Learning Frameworks are Still Quirky

Shkadzko also highlighted that existing deep learning frameworks are imperfect and can produce unexpected discrepancies when attempting to translate configurations between them. In this case, his team attempted to use the same inputs for models written in both Tensorflow and Torch. The Tensorflow version of the model failed to train, producing worthless results. After puzzling out the source of the error, they concluded it stemmed from Tensorflow’s preference for L2 normalized input vectors – a bizarre quirk that, in theory, should have no bearing on the ability of a model to train (though the quality of the training could certainly vary as a result).


Experimenting with Hyperparameters Pays Off

Experimenting with hyperparameters can produce significant gains when done well, including creating ‘simpler’ versions of the same model. The input size and hidden layer size were both decreased by half; the learning rate was dropped; learning rate decay was eliminated; the LSTM node’s dropout was lowered from .5 to .25; and lastly, the optimization algorithm was switched from Adam to stochastic gradient descent with momentum. This ‘less complex’ model not only converged significantly faster, but also produced better results after the same number of training epochs.


The Final Word

Of course, the main reason for creating a deep network in the first place was to deal with ‘messy’ input data. On their hand-curated selection of ‘messy’ documents (invoices photographed at an angle, upside down and in a variety of other orientations), the model successfully extracted information between 60 and 70 percent of the time. That’s a massive improvement over the rules-based and conditional random field methods, which weren’t able to retrieve any information from these documents (0 percent accuracy).

You can listen to Shkadzko’s full lecture on ODSC’s YouTube channel now!

Interested in learning more about deep learning? Check out our machine learning and deep learning talks coming to ODSC East 2019 this April 30 – May 3!

Spencer Norris, ODSC

Spencer Norris is a data scientist and freelance journalist. He currently works as a contractor and publishes on his blog on Medium: https://medium.com/@spencernorris