查询优化技术之逻辑查询优化

查询优化中的罗即查询优化就是依据启发式规则和关系代数对sql语句进行语义上的等价转化,使之成为更高效的形式。具体的优化技术有下列几种:

1. 视图重写

2. 子查询优化

3. 等价谓词重写

4. 条件化简

5. 外连接消除

测试语句

create table t_1(
    t_1_id int UNIQUE,
    t_1_col_1 int,
    t_1_col_2 varchar(10)
);
create table t_2(
    t_2_id int UNIQUE,
    t_2_col_1 int,
    t_2_col_2 varchar(10)
);

insert into t_1 values(1,11,'t_1_1');
insert into t_1 values(2,12,NULL); 
insert into t_1 values(3, NULL, 't_1_3');
insert into t_1 values(4, 14,'t_1_4');
insert into t_1 values(5,15,NULL);
insert into t_1 values(7,NULL,NULL);

insert into t_2 values(1, 11,'t_2_1');
insert into t_2 values(2, NULL,'t_2_2');
insert into t_2 values(3 ,13,NULL);
insert into t_2 values(4, 14,'t_2_4');
insert into t_2 values(6, 16,'t_2_6');
insert into t_2 values(7,NULL,NULL);

唯一键作为连接条件,观察where和join/on所起的作用是否有区别

语句一

testDB=# select * from t_1 left join t_2 on true where t_1_id=t_2_id;
 t_1_id | t_1_col_1 | t_1_col_2 | t_2_id | t_2_col_1 | t_2_col_2 
--------+-----------+-----------+--------+-----------+-----------
      1 |        11 | t_1_1     |      1 |        11 | t_2_1
      2 |        12 |           |      2 |           | t_2_2
      3 |           | t_1_3     |      3 |        13 | 
      4 |        14 | t_1_4     |      4 |        14 | t_2_4
      7 |           |           |      7 |           | 
(5 rows)

testDB=# explain select * from t_1 left join t_2 on true where t_1_id=t_2_id;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Hash Join  (cost=59.73..119.72 rows=2210 width=92)
   Hash Cond: (t_1.t_1_id = t_2.t_2_id)
   ->  Seq Scan on t_1  (cost=0.00..32.10 rows=2210 width=46)
   ->  Hash  (cost=32.10..32.10 rows=2210 width=46)
         ->  Seq Scan on t_2  (cost=0.00..32.10 rows=2210 width=46)
(5 rows)

外连接消除

语句二

testDB=# select * from t_1 left join t_2 on t_1_id = t_2_id;
 t_1_id | t_1_col_1 | t_1_col_2 | t_2_id | t_2_col_1 | t_2_col_2 
--------+-----------+-----------+--------+-----------+-----------
      1 |        11 | t_1_1     |      1 |        11 | t_2_1
      2 |        12 |           |      2 |           | t_2_2
      3 |           | t_1_3     |      3 |        13 | 
      4 |        14 | t_1_4     |      4 |        14 | t_2_4
      5 |        15 |           |        |           | 
      7 |           |           |      7 |           | 
(6 rows)

testDB=# explain select * from t_1 left join t_2 on t_1_id = t_2_id;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Hash Left Join  (cost=59.73..119.72 rows=2210 width=92)
   Hash Cond: (t_1.t_1_id = t_2.t_2_id)
   ->  Seq Scan on t_1  (cost=0.00..32.10 rows=2210 width=46)
   ->  Hash  (cost=32.10..32.10 rows=2210 width=46)
         ->  Seq Scan on t_2  (cost=0.00..32.10 rows=2210 width=46)
(5 rows)

没有消除外连接,观察where和join/on所起作用是否有区别

语句三

testDB=# select * from t_1 left join t_2 on t_1_id = t_2_id where t_1_id = t_2_id;
 t_1_id | t_1_col_1 | t_1_col_2 | t_2_id | t_2_col_1 | t_2_col_2 
--------+-----------+-----------+--------+-----------+-----------
      1 |        11 | t_1_1     |      1 |        11 | t_2_1
      2 |        12 |           |      2 |           | t_2_2
      3 |           | t_1_3     |      3 |        13 | 
      4 |        14 | t_1_4     |      4 |        14 | t_2_4
      7 |           |           |      7 |           | 
(5 rows)

testDB=# explain select * from t_1 left join t_2 on t_1_id = t_2_id where t_1_id = t_2_id;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Hash Join  (cost=59.73..119.72 rows=2210 width=92)
   Hash Cond: (t_1.t_1_id = t_2.t_2_id)
   ->  Seq Scan on t_1  (cost=0.00..32.10 rows=2210 width=46)
   ->  Hash  (cost=32.10..32.10 rows=2210 width=46)
         ->  Seq Scan on t_2  (cost=0.00..32.10 rows=2210 width=46)
(5 rows)

普通列作为连接条件

语句四

testDB=# select * from t_1 left join t_2 on true where t_1_col_1 = t_2_col_1;
 t_1_id | t_1_col_1 | t_1_col_2 | t_2_id | t_2_col_1 | t_2_col_2 
--------+-----------+-----------+--------+-----------+-----------
      1 |        11 | t_1_1     |      1 |        11 | t_2_1
      4 |        14 | t_1_4     |      4 |        14 | t_2_4
(2 rows)

testDB=# explain select * from t_1 left join t_2 on true where t_1_col_1 = t_2_col_1;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Merge Join  (cost=309.73..687.08 rows=24420 width=92)
   Merge Cond: (t_1.t_1_col_1 = t_2.t_2_col_1)
   ->  Sort  (cost=154.86..160.39 rows=2210 width=46)
         Sort Key: t_1.t_1_col_1
         ->  Seq Scan on t_1  (cost=0.00..32.10 rows=2210 width=46)
   ->  Sort  (cost=154.86..160.39 rows=2210 width=46)
         Sort Key: t_2.t_2_col_1
         ->  Seq Scan on t_2  (cost=0.00..32.10 rows=2210 width=46)
(8 rows)

语句五

testDB=# select * from t_1 left join t_2 on t_1_col_1 = t_2_col_1;
 t_1_id | t_1_col_1 | t_1_col_2 | t_2_id | t_2_col_1 | t_2_col_2 
--------+-----------+-----------+--------+-----------+-----------
      1 |        11 | t_1_1     |      1 |        11 | t_2_1
      2 |        12 |           |        |           | 
      4 |        14 | t_1_4     |      4 |        14 | t_2_4
      5 |        15 |           |        |           | 
      3 |           | t_1_3     |        |           | 
      7 |           |           |        |           | 
(6 rows)

testDB=# explain select * from t_1 left join t_2 on t_1_col_1 = t_2_col_1;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Merge Left Join  (cost=309.73..687.08 rows=24420 width=92)
   Merge Cond: (t_1.t_1_col_1 = t_2.t_2_col_1)
   ->  Sort  (cost=154.86..160.39 rows=2210 width=46)
         Sort Key: t_1.t_1_col_1
         ->  Seq Scan on t_1  (cost=0.00..32.10 rows=2210 width=46)
   ->  Sort  (cost=154.86..160.39 rows=2210 width=46)
         Sort Key: t_2.t_2_col_1
         ->  Seq Scan on t_2  (cost=0.00..32.10 rows=2210 width=46)
(8 rows)

语句六

testDB=# select * from t_1 left join t_2 on t_1_col_1 = t_2_col_1 where t_1_col_1 = t_2_col_1;
 t_1_id | t_1_col_1 | t_1_col_2 | t_2_id | t_2_col_1 | t_2_col_2 
--------+-----------+-----------+--------+-----------+-----------
      1 |        11 | t_1_1     |      1 |        11 | t_2_1
      4 |        14 | t_1_4     |      4 |        14 | t_2_4
(2 rows)

testDB=# explain select * from t_1 left join t_2 on t_1_col_1 = t_2_col_1 where t_1_col_1 = t_2_col_1;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Merge Join  (cost=309.73..687.08 rows=24420 width=92)
   Merge Cond: (t_1.t_1_col_1 = t_2.t_2_col_1)
   ->  Sort  (cost=154.86..160.39 rows=2210 width=46)
         Sort Key: t_1.t_1_col_1
         ->  Seq Scan on t_1  (cost=0.00..32.10 rows=2210 width=46)
   ->  Sort  (cost=154.86..160.39 rows=2210 width=46)
         Sort Key: t_2.t_2_col_1
         ->  Seq Scan on t_2  (cost=0.00..32.10 rows=2210 width=46)
(8 rows)

where条件是非连接条件,且条件分别在左表和右表上,查看是否消除了外连接。

语句七

testDB=# select * from t_1 left join t_2 on true where t_1_id > 0 and t_2_id>0;
 t_1_id | t_1_col_1 | t_1_col_2 | t_2_id | t_2_col_1 | t_2_col_2 
--------+-----------+-----------+--------+-----------+-----------
      1 |        11 | t_1_1     |      1 |        11 | t_2_1
      1 |        11 | t_1_1     |      2 |           | t_2_2
      1 |        11 | t_1_1     |      3 |        13 | 
      1 |        11 | t_1_1     |      4 |        14 | t_2_4
      1 |        11 | t_1_1     |      6 |        16 | t_2_6
      1 |        11 | t_1_1     |      7 |           | 
      2 |        12 |           |      1 |        11 | t_2_1
      2 |        12 |           |      2 |           | t_2_2
      2 |        12 |           |      3 |        13 | 
      2 |        12 |           |      4 |        14 | t_2_4
      2 |        12 |           |      6 |        16 | t_2_6
      2 |        12 |           |      7 |           | 
      3 |           | t_1_3     |      1 |        11 | t_2_1
      3 |           | t_1_3     |      2 |           | t_2_2
      3 |           | t_1_3     |      3 |        13 | 
      3 |           | t_1_3     |      4 |        14 | t_2_4
      3 |           | t_1_3     |      6 |        16 | t_2_6
      3 |           | t_1_3     |      7 |           | 
      4 |        14 | t_1_4     |      1 |        11 | t_2_1
      4 |        14 | t_1_4     |      2 |           | t_2_2
      4 |        14 | t_1_4     |      3 |        13 | 
      4 |        14 | t_1_4     |      4 |        14 | t_2_4
      4 |        14 | t_1_4     |      6 |        16 | t_2_6
      4 |        14 | t_1_4     |      7 |           | 
      5 |        15 |           |      1 |        11 | t_2_1
      5 |        15 |           |      2 |           | t_2_2
      5 |        15 |           |      3 |        13 | 
      5 |        15 |           |      4 |        14 | t_2_4
      5 |        15 |           |      6 |        16 | t_2_6
      5 |        15 |           |      7 |           | 
      7 |           |           |      1 |        11 | t_2_1
      7 |           |           |      2 |           | t_2_2
      7 |           |           |      3 |        13 | 
      7 |           |           |      4 |        14 | t_2_4
      7 |           |           |      6 |        16 | t_2_6
      7 |           |           |      7 |           | 
(36 rows)

testDB=# explain select * from t_1 left join t_2 on true where t_1_id > 0 and t_2_id>0;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Nested Loop  (cost=19.73..6849.61 rows=543169 width=92)
   ->  Bitmap Heap Scan on t_1  (cost=9.87..29.08 rows=737 width=46)
         Recheck Cond: (t_1_id > 0)
         ->  Bitmap Index Scan on t_1_t_1_id_key  (cost=0.00..9.68 rows=737 width=0)
               Index Cond: (t_1_id > 0)
   ->  Materialize  (cost=9.87..32.76 rows=737 width=46)
         ->  Bitmap Heap Scan on t_2  (cost=9.87..29.08 rows=737 width=46)
               Recheck Cond: (t_2_id > 0)
               ->  Bitmap Index Scan on t_2_t_1_id_key  (cost=0.00..9.68 rows=737 width=0)
                     Index Cond: (t_2_id > 0)
(10 rows)

满足空值拒绝,外连接消除

语句八


testDB=# select * from t_1 left join t_2 on t_1_id = t_2_id where t_1_id>0;
 t_1_id | t_1_col_1 | t_1_col_2 | t_2_id | t_2_col_1 | t_2_col_2 
--------+-----------+-----------+--------+-----------+-----------
      1 |        11 | t_1_1     |      1 |        11 | t_2_1
      2 |        12 |           |      2 |           | t_2_2
      3 |           | t_1_3     |      3 |        13 | 
      4 |        14 | t_1_4     |      4 |        14 | t_2_4
      7 |           |           |      7 |           | 
      5 |        15 |           |        |           | 
(6 rows)

testDB=# explain select * from t_1 left join t_2 on t_1_id = t_2_id where t_1_id>0;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Hash Right Join  (cost=38.29..98.28 rows=737 width=92)
   Hash Cond: (t_2.t_2_id = t_1.t_1_id)
   ->  Seq Scan on t_2  (cost=0.00..32.10 rows=2210 width=46)
   ->  Hash  (cost=29.08..29.08 rows=737 width=46)
         ->  Bitmap Heap Scan on t_1  (cost=9.87..29.08 rows=737 width=46)
               Recheck Cond: (t_1_id > 0)
               ->  Bitmap Index Scan on t_1_t_1_id_key  (cost=0.00..9.68 rows=737 width=0)
                     Index Cond: (t_1_id > 0)
(8 rows)

外连接未消除

语句九

testDB=# select * from t_1 left join t_2 on t_1_id = t_2_id where t_2_id>0;
 t_1_id | t_1_col_1 | t_1_col_2 | t_2_id | t_2_col_1 | t_2_col_2 
--------+-----------+-----------+--------+-----------+-----------
      1 |        11 | t_1_1     |      1 |        11 | t_2_1
      2 |        12 |           |      2 |           | t_2_2
      3 |           | t_1_3     |      3 |        13 | 
      4 |        14 | t_1_4     |      4 |        14 | t_2_4
      7 |           |           |      7 |           | 
(5 rows)

testDB=# explain select * from t_1 left join t_2 on t_1_id = t_2_id where t_2_id>0;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Hash Join  (cost=38.29..98.28 rows=737 width=92)
   Hash Cond: (t_1.t_1_id = t_2.t_2_id)
   ->  Seq Scan on t_1  (cost=0.00..32.10 rows=2210 width=46)
   ->  Hash  (cost=29.08..29.08 rows=737 width=46)
         ->  Bitmap Heap Scan on t_2  (cost=9.87..29.08 rows=737 width=46)
               Recheck Cond: (t_2_id > 0)
               ->  Bitmap Index Scan on t_2_t_1_id_key  (cost=0.00..9.68 rows=737 width=0)
                     Index Cond: (t_2_id > 0)
(8 rows)

满足控制拒绝,外连接消除

语句十

testDB=# select * from t_1 left join t_2 on t_1_id = t_2_id where t_1_col_1>0 or t_2_col_1>0;
 t_1_id | t_1_col_1 | t_1_col_2 | t_2_id | t_2_col_1 | t_2_col_2 
--------+-----------+-----------+--------+-----------+-----------
      1 |        11 | t_1_1     |      1 |        11 | t_2_1
      2 |        12 |           |      2 |           | t_2_2
      3 |           | t_1_3     |      3 |        13 | 
      4 |        14 | t_1_4     |      4 |        14 | t_2_4
      5 |        15 |           |        |           | 
(5 rows)

testDB=# explain select * from t_1 left join t_2 on t_1_id = t_2_id where t_1_col_1>0 or t_2_col_1>0;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Hash Left Join  (cost=59.73..130.76 rows=1228 width=92)
   Hash Cond: (t_1.t_1_id = t_2.t_2_id)
   Filter: ((t_1.t_1_col_1 > 0) OR (t_2.t_2_col_1 > 0))
   ->  Seq Scan on t_1  (cost=0.00..32.10 rows=2210 width=46)
   ->  Hash  (cost=32.10..32.10 rows=2210 width=46)
         ->  Seq Scan on t_2  (cost=0.00..32.10 rows=2210 width=46)
(6 rows)

外连接未消除

6. 嵌套连接消除

7. 连接消除

8. 语义优化

9. 选择操作下推

(未完)