一个很有意义的SQL的优化过程（一个电子化支局中的大数据量的统计SQL）

yangy608

浏览: 72878 次
性别:
来自: 深圳

最近访客更多访客>>

H880125

cslwoaini2010

listeningweather

无知的1234

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

SQL

select count(distinct v_yjhm)
from (select v_yjhm
          from zjjk_t_yssj_o_his a
         where n_yjzl > 0
           and d_sjrq between to_date('20070301', 'yyyymmdd') and
               to_date('20070401', 'yyyymmdd')
           and v_yjzldm like '40%'
           and not exists(select 'a' from INST_TRIG_ZJJK_T_YSSJ_O b where a.v_yjtm=b.yjbh)
           --and v_yjtm not in (select yjbh from INST_TRIG_ZJJK_T_YSSJ_O)
        union
        select v_yjhm
          from zjjk_t_yssj_u_his a
         where n_yjzl > 0
           and d_sjrq between to_date('20070301', 'yyyymmdd') and
               to_date('20070401', 'yyyymmdd')
           and v_yjzldm like '40%'
           and not exists(select 'a' from INST_TRIG_ZJJK_T_YSSJ_U b where a.v_yjtm=b.yjbh))
           --and v_yjtm not in (select yjbh from INST_TRIG_ZJJK_T_YSSJ_U))

说明：1、zjjk_t_yssj_o_his 、zjjk_t_yssj_u_his 的d_sjrq 上都有一个索引了

2、zjjk_t_yssj_o_his 、zjjk_t_yssj_u_his 的v_yjtm 都为 not null 字段

3、INST_TRIG_ZJJK_T_YSSJ_O、INST_TRIG_ZJJK_T_YSSJ_U 的 yjbh 为PK

优化建议：

1、什么是DISTINCT ? 就是分组排序后取唯一值，底层行为分组排序

2、什么是 UNION 、 UNION ALL ？ UNION ：对多个结果集取DISTINCT ，生成一个不含重复记录的结果集，返回给前端，UNION ALL ：不对结果集进行去重复操作底层行为：分组排序

3、什么是 COUNT(*) ？累加

4、需要有什么样的索引？ S_sjrq + v_yjzldm ：理由：假如全省的数据量在表中全部数为1000万，查询月数据量为200万，1000万中特快占50%，则通过 beween 时间(d_sjrq)+ 种类( v_yjzldm )，可过滤出约100万，这是最好的检索方式了。

5、两表都要进行一次 NOT EXISTS 运算，如何做最优？ NOT EXISTS 是不好做的运算，但是我们可以合并两次的NOT EXISTS 运算。让这费资源的活只干一次。

综合以上，我们可以如下优化这个SQL：

1、内部的UNION 也是去重复，外部的DISTINCT 也是去重复，可左右去掉一个，建议内部的改为 UNION ALL ，这里稍请注意一下，如果V_YJHM 有NULL的情况，可能会引起COUNT值不对实际数的情况。

2、建一个 D_SJRQ+V_YJZLDM 的复合索引

3、将两个子查询先 UNION ALL 联结，另两个用来做 NOT EXISTS 的表也 UNION ALL 联结

4、在3的基础上再做 NOT EXISTS

5、将NOT EXISTS 替换为NOT IN ，同时加提示 HASH_AJ 做半连接HASH运算

6、最后为外层的COUNT(DISTINCT … 获得结果数

SQL书写如下：

select count(distinct v_yjhm)
from (select v_yjtm, v_yjhm
          from zjjk_t_yssj_o_his a
         where n_yjzl > 0
           and d_sjrq between to_date('20070301', 'yyyymmdd') and
               to_date('20070401', 'yyyymmdd')
           and v_yjzldm like '40%'
        union all
        select v_yjtm, v_yjhm
          from zjjk_t_yssj_u_his a
         where n_yjzl > 0
           and d_sjrq between to_date('20070301', 'yyyymmdd') and
               to_date('20070401', 'yyyymmdd')
           and v_yjzldm like '40%'
        ) a
where a.v_yjtm not IN
       (select /*+ HASH_AJ */
         yjbh
          from (select yjbh
                  from INST_TRIG_ZJJK_T_YSSJ_O
                union all
                select yjbh from INST_TRIG_ZJJK_T_YSSJ_U))

经过上述改造，原来这个SQL的执行时间如果为2分钟的话，现在应该20秒足够！

分享到：