48fcfdf50f
* InnoDB corrupts files due to incorrect st_blksize calculation PR: 257728 Reported by: mfechner Obtained from: https://jira.mariadb.org/projects/MDEV/issues/MDEV-26537 MFH: 2021Q3
110 lines
4.5 KiB
Plaintext
110 lines
4.5 KiB
Plaintext
From 716f04a40f225af11fa88e793441da96dd0ddf21 Mon Sep 17 00:00:00 2001
|
|
From: =?UTF-8?q?Marko=20M=C3=A4kel=C3=A4?= <marko.makela@mariadb.com>
|
|
Date: Fri, 10 Sep 2021 11:20:12 +0300
|
|
Subject: [PATCH] MDEV-26537 InnoDB corrupts files due to incorrect st_blksize
|
|
calculation
|
|
|
|
The st_blksize returned by fstat(2) is not documented to be
|
|
a power of 2, like we assumed in
|
|
commit 58252fff15acfe7c7b0452a87e202e3f8e454e19 (MDEV-26040).
|
|
While on Linux, the st_blksize appears to report the file system
|
|
block size, on FreeBSD it seems to be something similar to st_size.
|
|
|
|
Also IBM AIX was affected by the bug. A simple test case that would
|
|
lead to a crash when using the minimum innodb_buffer_pool_size=5m:
|
|
|
|
seq -f 'create table t%g engine=innodb select * from seq_1_to_200000;' \
|
|
1 100|mysql test&
|
|
seq -f 'create table u%g engine=innodb select * from seq_1_to_200000;' \
|
|
1 100|mysql test&
|
|
|
|
We will fix this by not trusting st_blksize at all, and assuming that
|
|
the file system block size is 4096 bytes. We hope that no storage systems
|
|
with larger block size exist. Anything larger than 4096 bytes should be
|
|
unlikely, given that it is the minimum virtual memory page size of many
|
|
contemporary processors.
|
|
|
|
While the block size 512 bytes of the venerable Seagate ST-225 is still
|
|
in widespread use, the minimum innodb_page_size is 4096 bytes, and
|
|
innodb_log_file_size can be set in integer multiples of 65536 bytes.
|
|
|
|
The only occasion where InnoDB uses smaller block sizes than 4096 bytes
|
|
is with ROW_FORMAT=COMPRESSED tables with KEY_BLOCK_SIZE=1
|
|
or KEY_BLOCK_SIZE=2 (or innodb_page_size=4096). For such tables,
|
|
we will from now on preallocate space in integer multiples of 4096 bytes
|
|
and let regular writes extend the file by 1024, 2048, or 3072 bytes.
|
|
|
|
The view INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES.FS_BLOCK_SIZE
|
|
should report the raw st_blksize.
|
|
|
|
For page_compressed tables, the function fil_space_get_block_size()
|
|
will map to 512 any st_blksize value that is larger than 4096.
|
|
|
|
os_file_set_size(): Assume that the file system block size is 4096 bytes,
|
|
and only support extending files to integer multiples of 4096 bytes.
|
|
|
|
fil_space_extend_must_retry(): Round down the preallocation size to
|
|
an integer multiple of 4096 bytes.
|
|
---
|
|
mysql-test/suite/innodb/r/check_ibd_filesize,4k.rdiff | 2 +-
|
|
storage/innobase/fil/fil0fil.cc | 11 ++++++++---
|
|
storage/innobase/os/os0file.cc | 7 ++++---
|
|
3 files changed, 13 insertions(+), 7 deletions(-)
|
|
|
|
diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
|
|
index 6af8b729d78c5..b9421df3ffbad 100644
|
|
--- storage/innobase/fil/fil0fil.cc
|
|
+++ storage/innobase/fil/fil0fil.cc
|
|
@@ -578,10 +578,15 @@ fil_space_extend_must_retry(
|
|
|
|
const unsigned page_size = space->physical_size();
|
|
|
|
- /* Datafile::read_first_page() expects srv_page_size bytes.
|
|
- fil_node_t::read_page0() expects at least 4 * srv_page_size bytes.*/
|
|
+ /* Datafile::read_first_page() expects innodb_page_size bytes.
|
|
+ fil_node_t::read_page0() expects at least 4 * innodb_page_size bytes.
|
|
+ os_file_set_size() expects multiples of 4096 bytes.
|
|
+ For ROW_FORMAT=COMPRESSED tables using 1024-byte or 2048-byte
|
|
+ pages, we will preallocate up to an integer multiple of 4096 bytes,
|
|
+ and let normal writes append 1024, 2048, or 3072 bytes to the file. */
|
|
os_offset_t new_size = std::max(
|
|
- os_offset_t(size - file_start_page_no) * page_size,
|
|
+ (os_offset_t(size - file_start_page_no) * page_size)
|
|
+ & ~os_offset_t(4095),
|
|
os_offset_t(FIL_IBD_FILE_INITIAL_SIZE << srv_page_size_shift));
|
|
|
|
*success = os_file_set_size(node->name, node->handle, new_size,
|
|
diff --git a/storage/innobase/os/os0file.cc b/storage/innobase/os/os0file.cc
|
|
index dde1975ea08b5..9e1eeff202d4b 100644
|
|
--- storage/innobase/os/os0file.cc
|
|
+++ storage/innobase/os/os0file.cc
|
|
@@ -3303,6 +3303,8 @@ os_file_set_size(
|
|
os_offset_t size,
|
|
bool is_sparse)
|
|
{
|
|
+ ut_ad(!(size & 4095));
|
|
+
|
|
#ifdef _WIN32
|
|
/* On Windows, changing file size works well and as expected for both
|
|
sparse and normal files.
|
|
@@ -3344,7 +3346,7 @@ os_file_set_size(
|
|
if (current_size >= size) {
|
|
return true;
|
|
}
|
|
- current_size &= ~os_offset_t(statbuf.st_blksize - 1);
|
|
+ current_size &= ~4095ULL;
|
|
err = posix_fallocate(file, current_size,
|
|
size - current_size);
|
|
}
|
|
@@ -3384,8 +3386,7 @@ os_file_set_size(
|
|
if (fstat(file, &statbuf)) {
|
|
return false;
|
|
}
|
|
- os_offset_t current_size = statbuf.st_size
|
|
- & ~os_offset_t(statbuf.st_blksize - 1);
|
|
+ os_offset_t current_size = statbuf.st_size & ~4095ULL;
|
|
#endif
|
|
if (current_size >= size) {
|
|
return true;
|
|
|