DANIEL
Logiciel pour l'extraction d'entités nommées dans des textes manuscrits
Technology No.

1

2
Description :
An OCR transcribes a written document seen as a picture into a numerical file, making it easier to manipulate. However, it may be interesting to go beyond this first process by obtaining a rich numercial text where named entities have a specific label. This is what DANIEL does : extracting the named entities from a written document.
How does it work :
- Acquisition of the written document as an image
- Scanning of the document to extract its text
- This text is then analysed to detect the named entities
- A label is associated to each named entity
DANIEL is an end-to-end software performing handwritten text recognition and named entity recognition on full-page documents. It is working with a fully convolutional encoder so it is able to deal with images of any size and uses an attention network with a LLM to extract named entities.
Applications :
- Creation of databases linking entities within documents
- Analysing historical document
- Searching through documents for a specific named entity
Advantages :
- State of the art results in text recognition and named entity extraction
- Works with multiple languages
- Faster than other solutions
- End-to-end architecture

