The fact is that most organizations have bad data and still allow bad data
to flow into the system. There are ways to ameliorate this but eliminating it entirely
is often more costly than just living with some level of dirty data. It??™s important for
business users to understand why bad data is in the system and what options exist to
fix it, and why it might be necessary to see some percentage of data listed as unknown.
C h a p t e r 3 : D a t a W a r e h o u s i n g a n d B u s i n e s s I n t e l l i g e n c e 47
Building Cubes
When data warehousing was originally born, the star schema was the data
warehouse. In fact, it??™s still often called the relational data warehouse. There??™s
nothing wrong with this at all, and many companies query directly against the
relational warehouse. The star schema is designed for the speed of retrieval, so
running reports against it is certainly acceptable.
One thing that star schemas may not have, however, are summary levels. Imagine
that a particular schema has a daily grain, and that there are two billion fact records
in the fact table, representing 4 years of sales. If a customer wants to see yearly
sales totals, this requires performing a Sum across all two billion records, summing
approximately 500 million records at a time in order to get the values for each year.
This can be an expensive operation on large warehouses, especially when there are
many users of the system.
One solution was to preaggregate the data in the relational system.
Pages:
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90