indexes 183 million words from debates in the Belgian Chamber of Representatives. OCR is used to scan the pages. We updated the process and the website with a new research tool courtesy of UAntwerp digital humanities.
Parliamentary proceedings contain a wealth of information for historians. That is why the History Department of the University of Antwerp approached Textgain to update their database of the OCRed parliamentary proceedings of Belgium from its inception to present day. Our project aims to make 183 million words from their parliamentary debates more accessible, ensuring that essential insights are readily available to anyone who needs them.
To overcome the challenge of accessing such an extensive volume of information, we have revamped the existing website and its underlying infrastructure. By implementing the more robust MongoDB database system, we have ensured that parliamentary text files are now scalable and easily retrievable through an API. Our integration of Natural Language Processing (NLP) techniques has resulted in a clearer and more refined taxonomy, while advanced methods such as automatic layout analysis have improved the accuracy of document scanning.
To further enhance user experience, we have incorporated a powerful search function and added metadata for speakers and intervention types, enabling more efficient navigation through the records.
This project has transformed how parliamentary records are accessed and used. By making the search process more user-friendly, we have empowered researchers, policymakers, and all users of parliamentary records to engage more thoroughly with the available content.