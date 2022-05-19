Hi,

few minutes ago I got a new challenge from my Boss and to be honest at the moment I have no clue if and if, how this can be developed.

So I thought I just ask all of you to give me your thoughts and discuss with me how to solve this problem:

We have a process where the customer is sending questionnaires to us. We have a team which is doing some kind of first step review and then sends this questionnaire to some specialists which can answer the questions the level 1 team can’t answer and then send it back to the team and they send it back to the customer.

What we now want to achieve is to reduce the need of the specialists. To do so, we want to save all answered questions in a database to be able to “reuse” them if they come up again.

For this the questionnaires must be kind of “parsed” and splitted into Sections and Questions and Answers.

The problem is, that the questionnaires can have every type of format. It can be a simple .txt file, a word document with tables and checkmarks, an Excel file with multiple sheets and even macros, a PowerPoint and so on and so on…

At the end I know that it is impossible to have a 100% solution which automatically can handle all this different types of files but it would be nice to have some kind of “best result” in relation to “minimum of expense in development”.

So for example: Maybe it’s the best way to convert all the different file formats first to an .txt file? Or maybe there is already a library which is able to convert excel to word and power point to word, so I can break down this formats to one?

Maybe there is another solution I do not know but you have heard of?

Any help is appreciated.

Thallius