freebsd-ports/databases/mariadb105-server/files/patch-MDEV-26537
Bernard Spil 48fcfdf50f databases/mariadb105-server: Fix DB corruption
* InnoDB corrupts files due to incorrect st_blksize calculation

PR:		257728
Reported by:	mfechner
Obtained from:	https://jira.mariadb.org/projects/MDEV/issues/MDEV-26537
MFH:		2021Q3
2021-09-11 12:18:23 +00:00

110 lines
4.5 KiB
Plaintext

From 716f04a40f225af11fa88e793441da96dd0ddf21 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marko=20M=C3=A4kel=C3=A4?= <marko.makela@mariadb.com>
Date: Fri, 10 Sep 2021 11:20:12 +0300
Subject: [PATCH] MDEV-26537 InnoDB corrupts files due to incorrect st_blksize
calculation
The st_blksize returned by fstat(2) is not documented to be
a power of 2, like we assumed in
commit 58252fff15acfe7c7b0452a87e202e3f8e454e19 (MDEV-26040).
While on Linux, the st_blksize appears to report the file system
block size, on FreeBSD it seems to be something similar to st_size.
Also IBM AIX was affected by the bug. A simple test case that would
lead to a crash when using the minimum innodb_buffer_pool_size=5m:
seq -f 'create table t%g engine=innodb select * from seq_1_to_200000;' \
1 100|mysql test&
seq -f 'create table u%g engine=innodb select * from seq_1_to_200000;' \
1 100|mysql test&
We will fix this by not trusting st_blksize at all, and assuming that
the file system block size is 4096 bytes. We hope that no storage systems
with larger block size exist. Anything larger than 4096 bytes should be
unlikely, given that it is the minimum virtual memory page size of many
contemporary processors.
While the block size 512 bytes of the venerable Seagate ST-225 is still
in widespread use, the minimum innodb_page_size is 4096 bytes, and
innodb_log_file_size can be set in integer multiples of 65536 bytes.
The only occasion where InnoDB uses smaller block sizes than 4096 bytes
is with ROW_FORMAT=COMPRESSED tables with KEY_BLOCK_SIZE=1
or KEY_BLOCK_SIZE=2 (or innodb_page_size=4096). For such tables,
we will from now on preallocate space in integer multiples of 4096 bytes
and let regular writes extend the file by 1024, 2048, or 3072 bytes.
The view INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES.FS_BLOCK_SIZE
should report the raw st_blksize.
For page_compressed tables, the function fil_space_get_block_size()
will map to 512 any st_blksize value that is larger than 4096.
os_file_set_size(): Assume that the file system block size is 4096 bytes,
and only support extending files to integer multiples of 4096 bytes.
fil_space_extend_must_retry(): Round down the preallocation size to
an integer multiple of 4096 bytes.
---
mysql-test/suite/innodb/r/check_ibd_filesize,4k.rdiff | 2 +-
storage/innobase/fil/fil0fil.cc | 11 ++++++++---
storage/innobase/os/os0file.cc | 7 ++++---
3 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
index 6af8b729d78c5..b9421df3ffbad 100644
--- storage/innobase/fil/fil0fil.cc
+++ storage/innobase/fil/fil0fil.cc
@@ -578,10 +578,15 @@ fil_space_extend_must_retry(
const unsigned page_size = space->physical_size();
- /* Datafile::read_first_page() expects srv_page_size bytes.
- fil_node_t::read_page0() expects at least 4 * srv_page_size bytes.*/
+ /* Datafile::read_first_page() expects innodb_page_size bytes.
+ fil_node_t::read_page0() expects at least 4 * innodb_page_size bytes.
+ os_file_set_size() expects multiples of 4096 bytes.
+ For ROW_FORMAT=COMPRESSED tables using 1024-byte or 2048-byte
+ pages, we will preallocate up to an integer multiple of 4096 bytes,
+ and let normal writes append 1024, 2048, or 3072 bytes to the file. */
os_offset_t new_size = std::max(
- os_offset_t(size - file_start_page_no) * page_size,
+ (os_offset_t(size - file_start_page_no) * page_size)
+ & ~os_offset_t(4095),
os_offset_t(FIL_IBD_FILE_INITIAL_SIZE << srv_page_size_shift));
*success = os_file_set_size(node->name, node->handle, new_size,
diff --git a/storage/innobase/os/os0file.cc b/storage/innobase/os/os0file.cc
index dde1975ea08b5..9e1eeff202d4b 100644
--- storage/innobase/os/os0file.cc
+++ storage/innobase/os/os0file.cc
@@ -3303,6 +3303,8 @@ os_file_set_size(
os_offset_t size,
bool is_sparse)
{
+ ut_ad(!(size & 4095));
+
#ifdef _WIN32
/* On Windows, changing file size works well and as expected for both
sparse and normal files.
@@ -3344,7 +3346,7 @@ os_file_set_size(
if (current_size >= size) {
return true;
}
- current_size &= ~os_offset_t(statbuf.st_blksize - 1);
+ current_size &= ~4095ULL;
err = posix_fallocate(file, current_size,
size - current_size);
}
@@ -3384,8 +3386,7 @@ os_file_set_size(
if (fstat(file, &statbuf)) {
return false;
}
- os_offset_t current_size = statbuf.st_size
- & ~os_offset_t(statbuf.st_blksize - 1);
+ os_offset_t current_size = statbuf.st_size & ~4095ULL;
#endif
if (current_size >= size) {
return true;