I'm working on redesigning a database for an application which allows users to get estimates for different insurance plans by asking plan-specific questions. The user is able to save and retrieve past estimates.
In the current system, the database stores the user's input in a seperate table for each plan. So if there are 25 plans, there are 25 tables. Some of the columns will be the same for all of the plans, but alot are different. You can see an example of the input tables below.
Once the calculations are performed, the results of the estimate are shown to the user. This output is saved in a series of output tables so when the user retrieves the estimate, all of the calculations do not need to be performed again. The model is the same as that of the input tables; if there are 25 plans, there are 25 tables. Some of the columns will be the same for all of the plans but there are differences. The output tables are structured like the input tables, but with different columns.Code:input_plan10 input_plan11 ------------------- ------------------- estimate_id estimate_id column_1 column_1 column_2 column_4 column_3 column_7 column_4 column_8 ..... ..... column_40 column_62
The current database is informix and the new db is SQL Server 2008. We get about 40k estimates run per year and we maintain past year's data. The average number of columns in each plan table for both inputs and outputs is 40.
I know there has to be a better way to do this and wanted to get some opinions here. In our new design, we will have a table that holds all of the unique questions for all plans and then a question cross reference table that has the question id, plan id, and some other information needed by the app.
We thought of possibly having one (very large) table to store the inputs for all of the plans which would be along the lines of what I believe is an EAV approach:
So the data would look like:Code:InputTable ------------- EstimateID PlanID FieldKey Value
This approach is not normalized however. Keep in mind that with at least 40k estimates run per year, and an average of 40 questions per estimate, we're looking at 1,600,000 records to search through for one year so this approach doesn't seem great at all.Code:EstimateID PlanID FieldKey Value ----------------------------------------------------------------------------- 1 10 Old Column Name Column Value 2 11 Old Column Name Column Value
Another approach would be instead of having the FieldKey column, have a foreign key to the question cross reference table and the user's answer as the value. We don't need to store the Plan Id here because the Plan ID is stored in the question cross reference table.
While this helps with the normalization, this table can still grow large and I believe will be slow to query.Code:InputTable ------------ EstimateID QuestionXrefID Value
We thought of possibly having an archive table so if the estimate date is more than 6 months old, move it to an archive table because most people retrieve existing estimates witin a few weeks of creating them. So this could help reduce the size of the primary input table, thus speeding up queries.
I don't feel like any of these approaches is ideal so if you have any recommendations, let me have them! I appreciate any and all opinions on this. Let me know if I didn't explain something clear enough.







Bookmarks