当前位置导航:炫浪网>>网络学院>>编程开发>>Oracle教程

histogram与10053

histogram是oracle为cbo提供更精确的成本估计而设计的一种直方图数据。histogram能提供列的数据分布,每次分析表后列的分布信息将会被保存在统计表里面,分析时默认的histogram size是75,意思就是采用75个buckets来表示数据分布。

 

histogram分为2种类型,基于高度的histogram和基于值的histogram

 

基于高度的histogram

 

当histogram buckets的数量少于列的distinct value时,oracle会采用基于高度的直方图反映数据分布,每个bucket容纳相同数量的值。

 

基本格式如下所示

 

SQL>  column column_name format a20;

SQL>  column ENDPOINT_ACTUAL_VALUE format a20;

SQL> SELECT COLUMN_NAME,ENDPOINT_NUMBER, ENDPOINT_VALUE,ENDPOINT_ACTUAL_VALUE

  2       FROM DBA_HISTOGRAMS

  3       WHERE TABLE_NAME ='TEST' AND COLUMN_NAME='OBJECT_ID'

  4       ORDER BY ENDPOINT_NUMBER;

COLUMN_NAME          ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU

-------------------- --------------- -------------- --------------------

OBJECT_ID                         29              1

OBJECT_ID                         44              2

OBJECT_ID                         59              3

OBJECT_ID                         74              4

OBJECT_ID                         75             76

 

这里'OBJECT_ID'列有从1到76不同的76个值,bucket数量为75个,所以采取了基于高度的直方图。

 

可以看到1的值占据了1-29号bucket,2的值占据了30-44号bucket,3的值占据了45-59号bucket,4的值占据了60-74号bucket,值5-76占据了75号bucket。所以值1占的比例最大。

 

基于值的histogram

 

当histogram buckets>=列的distinct values时,那么Oracle会使用基于值的histogram,每个值将会占据一个bucket,来看一下

 

SQL>  column column_name format a20;

SQL>  column ENDPOINT_ACTUAL_VALUE format a20;

SQL> SELECT COLUMN_NAME,ENDPOINT_NUMBER, ENDPOINT_VALUE,ENDPOINT_ACTUAL_VALUE

  2       FROM DBA_HISTOGRAMS

  3       WHERE TABLE_NAME ='TEST' AND COLUMN_NAME='OBJECT_ID'

  4       ORDER BY ENDPOINT_NUMBER;

COLUMN_NAME          ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU

-------------------- --------------- -------------- --------------------

OBJECT_ID                      19928              1

OBJECT_ID                      29927              2

OBJECT_ID                      39926              3

OBJECT_ID                      49838              4

OBJECT_ID                      49839              5

OBJECT_ID                      49840              6

OBJECT_ID                      49841              7

OBJECT_ID                      49842              8

OBJECT_ID                      49843              9

OBJECT_ID                      49844             10

OBJECT_ID                      49845             11

OBJECT_ID                      49846             12

OBJECT_ID                      49847             13

OBJECT_ID                      49848             14

OBJECT_ID                      49849             15

OBJECT_ID                      49850             16

OBJECT_ID                      49851             17

OBJECT_ID                      49852             18

OBJECT_ID                      49853             19

OBJECT_ID                      49854             20

OBJECT_ID                      49855             21

OBJECT_ID                      49856             22

OBJECT_ID                      49857             23

OBJECT_ID                      49858             24

OBJECT_ID                      49859             25

OBJECT_ID                      49860             26

OBJECT_ID                      49861             27

OBJECT_ID                      49862             28

OBJECT_ID                      49863             29

OBJECT_ID                      49864             30

OBJECT_ID                      49865             31

OBJECT_ID                      49866             32

OBJECT_ID                      49867             33

OBJECT_ID                      49868             34

OBJECT_ID                      49869             35

OBJECT_ID                      49870             36

OBJECT_ID                      49871             37

OBJECT_ID                      49872             38

OBJECT_ID                      49873             39

OBJECT_ID                      49874             40

OBJECT_ID                      49875             41

OBJECT_ID                      49876             42

OBJECT_ID                      49877             43

OBJECT_ID                      49878             44

OBJECT_ID                      49879             45

OBJECT_ID                      49880             46

OBJECT_ID                      49881             47

OBJECT_ID                      49882             48

OBJECT_ID                      49883             49

OBJECT_ID                      49884             50

OBJECT_ID                      49885             51

OBJECT_ID                      49886             52

OBJECT_ID                      49887             53

OBJECT_ID                      49888             54

OBJECT_ID                      49889             55

OBJECT_ID                      49890             56

OBJECT_ID                      49891             57

OBJECT_ID                      49892             58

OBJECT_ID                      49893             59

OBJECT_ID                      49894             60

OBJECT_ID                      49895             61

OBJECT_ID                      49896             62

OBJECT_ID                      49897             63

OBJECT_ID                      49898             64

OBJECT_ID                      49899             65

OBJECT_ID                      49900             66

OBJECT_ID                      49901             67

OBJECT_ID                      49902             68

OBJECT_ID                      49903             69

OBJECT_ID                      49904             70

OBJECT_ID                      49905             71

OBJECT_ID                      49906             72

OBJECT_ID                      49907             73

OBJECT_ID                      49908             74

OBJECT_ID                      49909             75

 

很明显可以看出与基于高度的histogram的区别,基于值的histogram为每一个值提供了一个bucket,从上面看到值1有19928行,值2有9999行,值3有9999行,值4有9912行,值5-75都分别只有1行,同样我们可以看出值1占的比例最大。

 

直方图的数据我们也看到了,但cbo怎么利用它来计算成本呢,做个10053看一下

 

alter session set events'10053 trace name context forever,level 1';

select object_name from test where object_id=1;

alter session set events'10053 trace name context off';

 

提取trace文件中的一段

***************************************

BASE STATISTICAL INFORMATION

***********************

Table Stats::

  Table: TEST  Alias: TEST

    #Rows: 49909  #Blks:  707  AvgRowLen:  95.00

Index Stats::

  Index: IND_TEST_OBJECT_ID  Col#: 4

    LVLS: 1  #LB: 179  #DK: 5  LB/K: 35.00  DB/K: 175.00  CLUF: 879.00

***************************************

SINGLE TABLE ACCESS PATH

  Column (#4): OBJECT_ID(NUMBER)

    AvgLen: 2.00 NDV: 75 Nulls: 0 Density: 1.0018e-05 Min: 1 Max: 75

    Histogram: Freq  #Bkts: 75  UncompBkts: 49909  EndPtVals: 75

  Table: TEST  Alias: TEST

    Card: Original: 49909  Rounded: 19928  Computed: 19928.00  Non Adjusted: 19928.00

  Access Path: TableScan

    Cost:  158.56  Resp: 158.56  Degree: 0

      Cost_io: 156.00  Cost_cpu: 18011198

      Resp_io: 156.00  Resp_cpu: 18011198

  Access Path: index (AllEqRange)

    Index: IND_TEST_OBJECT_ID

    resc_io: 423.00  resc_cpu: 11183699

    ix_sel: 0.39929  ix_sel_with_filters: 0.39929

    Cost: 424.59  Resp: 424.59  Degree: 1

  Best:: AccessPath: TableScan

         Cost: 158.56  Degree: 1  Resp: 158.56  Card: 19928.00  Bytes: 0

 

上面的selectivity就是通过histogram来计算的,如果不存在histogram的话那么selectivity=(1/number of distinct values),这里的selectivity=(值为1的行数/总行数)=(19928/49909)=0.39929

 

再根据索引扫描成本计算公式

 

io_cost=resc_io= blevel+FF*leaf_blocks+FF*clustering_factor

=1+0.39929*179+0.39929*879

=1+71.47291+350.97591

=423

 

所以index访问成本大于table scan成本,最终cbo选择的路线为采用tablescan.

 

如果选择object_id=75呢?

 

alter session set events'10053 trace name context forever,level 1';

select object_name from test where object_id=75;

alter session set events'10053 trace name context off';

 

***************************************

BASE STATISTICAL INFORMATION

***********************

Table Stats::

  Table: TEST  Alias: TEST

    #Rows: 49909  #Blks:  707  AvgRowLen:  95.00

Index Stats::

  Index: IND_TEST_OBJECT_ID  Col#: 4

    LVLS: 1  #LB: 179  #DK: 75  LB/K: 2.00  DB/K: 11.00  CLUF: 879.00

***************************************

SINGLE TABLE ACCESS PATH

  Column (#4): OBJECT_ID(NUMBER)

    AvgLen: 2.00 NDV: 75 Nulls: 0 Density: 1.0018e-05 Min: 1 Max: 75

    Histogram: Freq  #Bkts: 75  UncompBkts: 49909  EndPtVals: 75

  Table: TEST  Alias: TEST

    Card: Original: 49909  Rounded: 1  Computed: 0.50  Non Adjusted: 0.50

  Access Path: TableScan

    Cost:  158.56  Resp: 158.56  Degree: 0

      Cost_io: 156.00  Cost_cpu: 18011198

      Resp_io: 156.00  Resp_cpu: 18011198

  Access Path: index (AllEqRange)

    Index: IND_TEST_OBJECT_ID

    resc_io: 2.00  resc_cpu: 15503

    ix_sel: 1.0018e-05  ix_sel_with_filters: 1.0018e-05

    Cost: 2.00  Resp: 2.00  Degree: 1

  Best:: AccessPath: IndexRange  Index: IND_TEST_OBJECT_ID

         Cost: 2.00  Degree: 1  Resp: 2.00  Card: 0.50  Bytes: 0

 

对于non-popular value(数量比较少的值,比如在基于高度histogram里跨度不超过一个bucket或在基于值的histogram里行数比较少的值),oracle采取了不再去计算selectivity,而是采用density来做为

 

做为selectivity,从HIST_HEAD$里可以看到density,oracle采用density function(密度函数)来估计列的密度。

 

SQL> select OBJ#,COL#,BUCKET_CNT,ROW_CNT,SAMPLE_SIZE,MINIMUM,MAXIMUM,DISTCNT,DENSITY from sys.HIST_HEAD$ where obj#=51933 AND COL#=4;

      OBJ#       COL# BUCKET_CNT    ROW_CNT SAMPLE_SIZE    MINIMUM    MAXIMUM    DISTCNT    DENSITY

---------- ---------- ---------- ---------- ----------- ---------- ---------- ---------- ----------

     51933          4      49909         75       49909          1         75         75 .000010018

 

io_cost=resc_io= blevel+FF*leaf_blocks+FF*clustering_factor

=1+1.0018e-05*179+1.0018e-05*879

=1+0.001793222+0.008805822

=2(因为最少会读2个块)

 

所以最后cbo选择了index scan。

 

这篇文章的目的是为了给需要的人一点提示,希望有兴趣的人能够举一反三。

相关内容
赞助商链接