The State of Automatic Text Summarization with NLP
NLP/Text AnalyticsResearchNLPposted by Luke Coughlin May 9, 2019 Luke Coughlin
Ideally, NLP will be able to help humans complete tedious text-evaluation tasks, and its potential for use in fields like law and medicine have elicited significant enthusiasm. But where NLP has been applied to processes that do not align with strict mathematical evaluation, which perhaps require judgments of value, progress has been less steady.
[Related Article: The Promise of Retrofitting: Building Better Models for Natural Language Processing]
One such example of a lagging subset of NLP is that of text summarization. Though there are already applications that peruse technical documents, the complete automation of processes demanding the interpretation of documents is as yet unfulfilled. In their paper “Automatic text summarization: What has been done and what has to be done,” researchers Abdelkrime Aries, Djamel Eddine Zegour, and Walid Khaled Hidouci of the University of Algiers discuss the state of research regarding the NLP’s efficacy in summarizing complex documents. Services exist that purport to provide summations of long documents—in fact, the idea of automatic text summarization dates back at least to 1958. Despite the considerable academic attention garnered by this technology, Dr. Zegour et. al. note that “the [initially construed] challenges…are still holding this field from going forward.”
Why has this field lagged behind while so much advancement has been seen in NLP? Computers have been long-capable of finding definitions of key terms, checking for grammar and spelling errors, and even detecting a particular style– so why shouldn’t they be able to complete simple extractions from the text?
According to Dr. Zegour, et. al., the problem amounts to “the absence of a precise definition of what should be included in a summary.” For one thing, readers might be seeking out a summary for different reasons. Consider a scientific paper, for example, about research regarding a certain disease: One person might be looking at it because they want specific information about the disease. One person may be interested in the work of a specific doctor or research group a third might be compiling information about the success rate of surgeries, and so on.
A summary ought to include the most important information, but what exactly is important, and to which audience? As the paper notes, different models have addressed different concerns, but even the success of these has been difficult to gauge. The initial method of automatic summarization, which simply draws relevant-seeming sentences from the text, has been dismissed as crude, but a truly viable replacement is as yet wanting.
A potential solution is hinted at by Dr. Sen Sheng et. al. In a recent paper, “Pragmatically Informative Text Generation,” they discuss their findings in applying pragmatic reasoning to text-generation, finding that machine learning with intensified logical function leads to more realistic models of text-generation. The findings in this paper improve on the already extant Rational Speech Act modeling systems and point toward the expanded application of sophisticated, machine-learning based text generation.
[Related Article: An Introduction to Natural Language Processing (NLP)]
Within the boundaries of its traditional functions, such as sorting or translation, the value of a particular NLP process is tantamount to its accuracy. Though the development of these technologies has been imperfect, the path towards better models is clear: more data, and better synthesis of this data, will lead to more precise models. These functions, which have already seen major improvements in recent years are, in a significant sense, mathematical functions that rely on the conversion of language to data. Whether values like significance can be evaluated in a consistent way still remains to be seen.