English 中文(简体)
Problem performance datawarehouse with lots of indexes
原标题:

Our product takes tests of some 350 candidates at the same time. At the end of the test, results for each candidate are moved to a datawarehouse full of indexes on it. For each test there s some 400 records to be entered in datawarehouse. So 400 x 350 is a lot of records. If there are not much records in the datawarehouse, all goes well. But if there are already lots of records in the datawarehouse, then a lot of inserts fail...

Is there a way to have indexes that are only rebuild at the end of the day or isn t that the real problem? Or how would you solve this?

最佳回答

I ve worked with both normalized and Kimball star data warehouses and this doesn t sound like a problem you should be running into. I would say 140000 rows is not a lot of rows even in a small data warehouse.

Why do the inserts fail? Typically in a Kimball-style warehouse, no inserts ever fail - for instance in a fact table, inserts always have a unique set of primary keys related to the dimensions and the grain (like a date or time snapshot). In a dimmension table, changes are detected, new dimensions are inserted, existing ones are re-used. In a normalized warehouse, you usually have some kind of revision mechanism or archive process or effective date which keeps things unique.

It seems to me that regardless of your DW philosophy or architecture, there should be something keeping these rows unique.

If (as you stated in your comments) you have a single index containing every column, that s probably not a very useful index (in any database design). Are you sure your index is even being used for any queries? Is it also marked to be unique and is that constraint being violated? In any case, that s a pretty large multi-column index, and it s going to be relatively expensive to compare against - this could result in a timeout - you can always fix that in your connection to wait forever, but I would attack the problem from a design perspective.

问题回答

It is common in data-warehousing do drop indexes and constraints before loading, and re-create them after. If you get rid of constraints (FKs), make sure that your loading process takes care of this. Drop any check constraints too, and move check validations into ETL software,

140K is NOT a lot of rows. Please post your table design and the error that you get when the inserts fail

I would suggest the following: Keep all you data, except of today s in the separate table (lets call it History), where indexed are tuned for your reports. Keep today s data in another separate table, (Lets call it Today) and run a job in the midnight to move data from Today table to History table. In the Today table - you should have minimal indexing to improve insert performance. By implementing this design you will be sure that you reports are not congesting with inserts. In addition - you have two table tuned for their purposes. In general it is hard to tune table for both a fast inserts and a fast selects.





相关问题
what is wrong with this mysql code

$db_user="root"; $db_host="localhost"; $db_password="root"; $db_name = "fayer"; $conn = mysqli_connect($db_host,$db_user,$db_password,$db_name) or die ("couldn t connect to server"); // perform query ...

Users asking for denormalized database

I am in the early stages of developing a database-driven system and the largest part of the system revolves around an inheritance type of relationship. There is a parent entity with about 10 columns ...

Easiest way to deal with sample data in Java web apps?

I m writing a Java web app in my free time to learn more about development. I m using the Stripes framework and eventually intend to use hibernate and MySQL For the moment, whilst creating the pages ...

join across databases with nhibernate

I am trying to join two tables that reside in two different databases. Every time, I try to join I get the following error: An association from the table xxx refers to an unmapped class. If the ...

How can I know if such value exists in database? (ADO.NET)

For example, I have a table, and there is a column named Tags . I want to know if value programming exists in this column. How can I do this in ADO.NET? I did this: OleDbCommand cmd = new ...

Convert date to string upon saving a doctrine record

I m trying to migrate one of my PHP projects to Doctrine. I ve never used it before so there are a few things I don t understand. In my current code, I have a class similar to this: class ...

热门标签