InnoDB存储引擎

介绍InnoDB的体系架构及其相关技术。

查看运行状态

>show engine innodb status\G
*************************** 1. row ***************************
  Type: InnoDB
  Name:
Status:
=====================================
2018-08-10 12:17:04 0x7f3c9c1f5700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 16 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 246 srv_active, 0 srv_shutdown, 1448608 srv_idle
srv_master_thread log flush and writes: 1448854
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 13400
OS WAIT ARRAY INFO: signal count 32841
RW-shared spins 0, rounds 66721, OS waits 7076
RW-excl spins 0, rounds 547490, OS waits 4609
RW-sx spins 8508, rounds 127346, OS waits 1128
Spin rounds per wait: 66721.00 RW-shared, 547490.00 RW-excl, 14.97 RW-sx
------------------------
LATEST DETECTED DEADLOCK
------------------------
2018-08-10 12:03:48 0x7f3c9c1f5700
*** (1) TRANSACTION:
TRANSACTION 421374926765608, ACTIVE 22 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 3 row lock(s)
MySQL thread id 144020, OS thread handle 139898296178432, query id 5829 localhost root Sending data
select * from t where a<=4 lock in share mode
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 73 page no 3 n bits 72 index PRIMARY of table `testrep`.`t` trx id 421374926765608 lock mode S waiting
*** (2) TRANSACTION:
TRANSACTION 3993, ACTIVE 54 sec inserting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 144023, OS thread handle 139898294064896, query id 5831 localhost root update
insert into t values(3)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 73 page no 3 n bits 72 index PRIMARY of table `testrep`.`t` trx id 3993 lock_mode X locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 73 page no 3 n bits 72 index PRIMARY of table `testrep`.`t` trx id 3993 lock_mode X locks gap before rec insert intention waiting
*** WE ROLL BACK TRANSACTION (2)
------------
TRANSACTIONS
------------
Trx id counter 3994
Purge done for trx's n:o < 3993 undo n:o < 0 state: running but idle
History list length 31
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421374926764416, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
I/O thread 10 state: waiting for completed aio requests (write thread)
I/O thread 11 state: waiting for completed aio requests (write thread)
I/O thread 12 state: waiting for completed aio requests (write thread)
I/O thread 13 state: waiting for completed aio requests (write thread)
I/O thread 14 state: waiting for completed aio requests (write thread)
I/O thread 15 state: waiting for completed aio requests (write thread)
I/O thread 16 state: waiting for completed aio requests (write thread)
I/O thread 17 state: waiting for completed aio requests (write thread)
I/O thread 18 state: waiting for completed aio requests (write thread)
I/O thread 19 state: waiting for completed aio requests (write thread)
I/O thread 20 state: waiting for completed aio requests (write thread)
I/O thread 21 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
12858 OS file reads, 34690 OS file writes, 2892 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 276671, node heap has 1 buffer(s)
Hash table size 276671, node heap has 444 buffer(s)
Hash table size 276671, node heap has 1 buffer(s)
Hash table size 276671, node heap has 1 buffer(s)
Hash table size 276671, node heap has 1 buffer(s)
Hash table size 276671, node heap has 1 buffer(s)
Hash table size 276671, node heap has 0 buffer(s)
Hash table size 276671, node heap has 1 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 1057174941
Log flushed up to   1057174941
Pages flushed up to 1057174941
Last checkpoint at  1057174932
Max checkpoint age    2608479868
Checkpoint age target 2526964873
Modified age          0
Checkpoint age        9
0 pending log flushes, 0 pending chkp writes
1832 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 1117782016
Dictionary memory allocated 352887
Internal hash tables (constant factor + variable factor)
    Adaptive hash index 25113408 	(17706944 + 7406464)
    Page hash           277384 (buffer pool 0 only)
    Dictionary cache    4779623 	(4426736 + 352887)
    File system         845736 	(812272 + 33464)
    Lock system         2659448 	(2657176 + 2272)
    Recovery system     0 	(0 + 0)
Buffer pool size   65528
Buffer pool size, bytes 1073610752
Free buffers       28233
Database pages     36845
Old database pages 13519
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 12818, created 24027, written 31282
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 36845, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
----------------------
INDIVIDUAL BUFFER POOL INFO
----------------------
---BUFFER POOL 0
Buffer pool size   16382
Buffer pool size, bytes 268402688
Free buffers       7096
Database pages     9176
Old database pages 3367
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 3169, created 6007, written 8652
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 9176, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 1
Buffer pool size   16382
Buffer pool size, bytes 268402688
Free buffers       7027
Database pages     9242
Old database pages 3391
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 3215, created 6027, written 7895
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 9242, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 2
Buffer pool size   16382
Buffer pool size, bytes 268402688
Free buffers       7102
Database pages     9166
Old database pages 3363
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 3224, created 5942, written 6812
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 9166, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 3
Buffer pool size   16382
Buffer pool size, bytes 268402688
Free buffers       7008
Database pages     9261
Old database pages 3398
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 3210, created 6051, written 7923
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 9261, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
0 RW transactions active inside InnoDB
Process ID=25745, Main thread ID=139898338137856, state: sleeping
Number of rows inserted 4583431, updated 443593, deleted 263, read 123935029
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

1 row in set (0.00 sec)

后台线程

InnoDB存储引擎是多线程模型,因此其后台有不同的后台线程,负责处理不同的任务。

master thread

主要负责将缓冲池中的数据异步刷新到磁盘,保证数据的一致性,包括脏页的刷新、合并插入缓冲、undo页的回收等。

master thread所做工作伪代码

void master_thread(){
    goto loop;
loop:
for(int i=0; i<10; i++){
    thread_sleep(1) 
    do log buffer flush to disk

    if(last_one_second_ios < 5% * innodb_io_capacity)
        do merge 5% innodb_io_capacity insert buffer 

    if(buf_get_modified_ratio_pct > innodb_max_dirty_pages_pct)
        do buffer pool flush 100% innodb_io_capacity dirty page
    else if enable adaptive flush
        do buffer pool flush desired amount dirty page

    if(no user activity)
        goto backgroud loop
}
if(last_ten_second_ios < innodb_io_capacity)
    do buffer pool flush 100% innodb_io_capacity dirty page

do merge 5% innodb_io_capacity insert buffer
do log buffer flush to disk
do full purge

if(buf_get_modified_ratio_pct > 70%)
    do buffer pool flush 100% innodb_io_capacity dirty page
else
    do buffer pool flush 10% innodb_io_capacity dirty page

goto loop

backgroud loop:
do full purge 
do merge 100% innodb_io_capacity isnert buffer 
if not idle:
    goto loop
else:
 goto flush loop

flush loop:
do buffer pool flush 100% innodb_io_capacity dirty page
if(buf_get_modified_ratio_pct > innodb_max_dirty_pages_pct)
    goto flush loop
    goto suspend loop

suspend loop:
suspend_thread()
waiting event
goto loop
}

IO thread

在innodb存储引擎中使用了大量的AIO来处理IO请求,IO thread负责这些IO请求的回调(call back)处理。

>show variables like 'innodb_version'\G
*************************** 1. row ***************************
Variable_name: innodb_version
        Value: 5.7.21-21
1 row in set (0.00 sec)

>show variables like 'innodb_%io_threads'\G
*************************** 1. row ***************************
Variable_name: innodb_read_io_threads
        Value: 4
*************************** 2. row ***************************
Variable_name: innodb_write_io_threads
        Value: 16
2 rows in set (0.00 sec)

purge thread

回收已经使用并分配的undo页。

>show variables like 'innodb_purge_threads'\G
*************************** 1. row ***************************
Variable_name: innodb_purge_threads
        Value: 4
1 row in set (0.00 sec)

page cleaner thread

脏页的刷新

内存

缓冲池

缓冲池就是一块内存区域,在数据库中进行读取页的操作时,首先将读取到的页缓存到缓冲池中,下一次再读取相同页时,首先判断该页是否在缓冲池中,如果在(命中),则直接读取该页,若不在,则再从磁盘中取。对页的修改,首先修改在缓冲池中的页,然后再通过checkpoint机制刷新到磁盘上。下图为InnoDB中的内存数据对象。

-- 查看缓冲池大小
>show variables like 'innodb_buffer_pool_size'\G
*************************** 1. row ***************************
Variable_name: innodb_buffer_pool_size
        Value: 1073741824
1 row in set (0.00 sec)

-- 查看缓冲池实例个数
>show variables like 'innodb_buffer_pool_instances'\G
*************************** 1. row ***************************
Variable_name: innodb_buffer_pool_instances
        Value: 4
1 row in set (0.00 sec)

-- 查看每个缓冲池实例
>select pool_id, pool_size, free_buffers, database_pages
    -> from information_schema.innodb_buffer_pool_stats\G
*************************** 1. row ***************************
       pool_id: 0
     pool_size: 16382
  free_buffers: 7096
database_pages: 9176
*************************** 2. row ***************************
       pool_id: 1
     pool_size: 16382
  free_buffers: 7027
database_pages: 9242
*************************** 3. row ***************************
       pool_id: 2
     pool_size: 16382
  free_buffers: 7096
database_pages: 9172
*************************** 4. row ***************************
       pool_id: 3
     pool_size: 16382
  free_buffers: 7008
database_pages: 9261
4 rows in set (0.00 sec)

-- 查看缓冲池运行状态
>select pool_id, hit_rate,
    -> pages_made_young, pages_not_made_young
    -> from information_schema.INNODB_BUFFER_POOL_STATS\G
*************************** 1. row ***************************
             pool_id: 0
            hit_rate: 0
    pages_made_young: 0
pages_not_made_young: 0
*************************** 2. row ***************************
             pool_id: 1
            hit_rate: 0
    pages_made_young: 0
pages_not_made_young: 0
*************************** 3. row ***************************
             pool_id: 2
            hit_rate: 0
    pages_made_young: 0
pages_not_made_young: 0
*************************** 4. row ***************************
             pool_id: 3
            hit_rate: 0
    pages_made_young: 0
pages_not_made_young: 0
4 rows in set (0.00 sec)

LRU list、Free list和flush list

缓冲池中的页是通过LRU算法来进行管理的,总体上来说,最频繁使用的页在LRU列表的前端,而最少使用的页在LRU列表的尾端。

flush列表即为脏页列表,数据库会通过checkpoint机制将脏页刷新会磁盘中。

-- 新读取的页插入到LRU列表尾端的37%位置处
>show variables like 'innodb_old_blocks_pct'\G
*************************** 1. row ***************************
Variable_name: innodb_old_blocks_pct
        Value: 37
1 row in set (0.00 sec)

-- 查看缓冲池LRU列表中space为71的表的页类型
>select table_name, space, page_number, page_type 
from information_schema.innodb_buffer_page_lru 
where space = 71\G
*************************** 1. row ***************************
 table_name: NULL
      space: 71
page_number: 0
  page_type: FILE_SPACE_HEADER
*************************** 2. row ***************************
 table_name: NULL
      space: 71
page_number: 1
  page_type: IBUF_BITMAP
*************************** 3. row ***************************
 table_name: NULL
      space: 71
page_number: 2
  page_type: INODE
*************************** 4. row ***************************
 table_name: `testrep`.`t1`
      space: 71
page_number: 3
  page_type: INDEX
4 rows in set (0.15 sec)

-- 查看unzip_LRU列表
>select table_name, space, page_number, compressed_size 
from information_schema.innodb_buffer_page_lru 
where COMPRESSED_SIZE <> 0\G

-- 查看LRU列表中的脏页
>select table_name, space, page_number, page_type
from information_schema.innodb_buffer_page_lru 
where oldest_modification>0\G

重做日志缓冲

innodb存储引擎首先将重做日志放入到缓冲区,然后再按一定频率刷新到重做日志。

-- 查看重做日志缓冲大小
>show variables like 'innodb_log_buffer_size'\G
*************************** 1. row ***************************
Variable_name: innodb_log_buffer_size
Value: 16777216
1 row in set (0.00 sec)

额外的内存池

对一些数据结构本身的内存进行分配时,需要从额外的内存池中进行申请。缓冲控制对象,比如LRU、锁等信息所用内存也需要从额外的内存中中申请。

checkpoint技术

 

插入缓冲

[MySQL] InnoDB三大特性之 – 插入缓冲 

两次写

[MySQL] InnoDB三大特性之 – 两次写

自适应哈希索引

[MySQL] InnoDB三大特性之 – 自适应哈希索引