What is a column-oriented database?
A
conventional database stores information by rows. A column-oriented
database stores information by columns. For data crunching
implementations like a data warehouse or a CDR (Call Details Record)
processing application this is a much better organization for
performance.
Here is a simple example:
Our data is represented in the table below.
|
Call ID
|
Time
|
Duration
|
Number
|
1
|
10:17
|
1:37
|
12345
|
2
|
11:11
|
5:13
|
23456
|
3
|
13:15
|
3:59
|
34567
|
A column-oriented database will store the information on the disk by
column: 1, 2, 3 - 10:17, 11:11, 13:15 -1:37, 5:13, 3:59 - 12345, 23456,
34567. A traditional database will store the same information on disk
by row: 1, 10:17, 1:37, 12345 - 2, 11:11, 5:13, 23456 - 3, 13:15, 3:59,
34567.
Now imagine you would like to average the call duration
column. In a column-oriented database you have all durations in
sequential order on the disk. This will result in an optimal disk read
and thus an extremely fast average calculation. On top of that a column
oriented database will only fetch that one duration column while a row
database will fetch all columns. Less I/O equals more speed.