Parsing of Structured and Unstructured Data, Scraping and Crawling

Summary:	Solution for creating arbitrary text and binary parsers, e.g., for websites, CSV, XML, or PDF files, etc. Processing and downloading multiple sources (e.g., files) or URLs (e.g., web shops) in batch mode is also possible, as well as transferring them to a database or files.
Technologies:	Java, Server, Windows/Linux, Databases, XML, CSV etc.
Security:	Secure Programming (→ Guideline) secure operation, patch management, update processes (→ Guideline)
Status:	08 / 2025

In the last 20 years, we have successfully developed parsers for various tasks in different situations. While the processing and conversion of structured data (e.g., CSV, XML) or semi-structured data (e.g., HTML/website) follows a fixed schema, individual cases and abstraction possibilities must be analyzed in practice for unstructured data.

In all cases, efficient mass processing and transfer to databases or intermediate formats (e.g., CSV or XML) is possible. We also support individual or special formats of certain industries or companies. Mass or targeted data extractions are possible.

If websites are to be tested or parsed that use dynamic content via, e.g., JavaScript, parsing based on an integrated browser engine is possible. Here, a trade-off between performance and resource requirements in mass processing must be made.

back to the solution overview