Information Extraction from Documents

Information extraction techniques can be used to extract or summarize the content of textual documents. This is useful in all areas in which a lot of unstructured text data occurs that needs to be further processed or analyzed.


Unstructured text
in today's world

Emails, contracts, reports

Even in the age of digital transformation, most information that is exchanged in a day-to-day business is and remains unstructured text. This includes for example customer support emails, customer feedback, contracts and reports. Processing these kinds of documents requires a large amount of manual effort and is a cost-driving factor. That means in particular, that any improvement that makes that process slightly more efficient is already very valuable.

For example, in customer service every second counts. Therefore, it is crucial for agents to get a quick understanding what an email of a customer is about - and also first, to forward this email to the right specialist. In addition to that, it is useful for the agent to already have important information already extracted from the email, like the customer's name and ID, which saves him additional time.

Combine NLP techniques to solve the problem

Using a combination of different NLP techniques, all of these problems can be tackled. Document classification can be used to select the right specialist for a topic. The customer's name and ID can be extracted using named-entity recognition. In some cases, it might also be useful to extract the sentiment of an email using sentiment analysis.
Another use case is the automatic extraction of information from reports to make them available in a structured database.