The post MariaDB 11.5.0 preview release available appeared first on MariaDB.org.
]]>Continue reading “MariaDB 11.5.0 preview release available”
The post MariaDB 11.5.0 preview release available appeared first on MariaDB.org.
The post MariaDB 11.5.0 preview release available appeared first on MariaDB.org.
]]>The post Comparing Postgres and MySQL on the insert benchmark with a small server appeared first on MariaDB.org.
]]>The per-DBMS results are here for Postgres, InnoDB and MyRocks. Those posts also have links to the configurations and builds that I used. This post shares the same result but makes it easier to compare across DBMS.
Results here are from a small server (8 cores) with a low concurrency workload (1 client, <= 3 concurrent connections). Results from a larger server are pending and might not be the same as what I share here.
Summary of throughput for the IO-bound workload
The post Comparing Postgres and MySQL on the insert benchmark with a small server appeared first on MariaDB.org.
]]>The post Percona XtraBackup 8.0.28 Supports Encrypted Table Backups with AWS KMS appeared first on MariaDB.org.
]]>The post Percona XtraBackup 8.0.28 Supports Encrypted Table Backups with AWS KMS appeared first on MariaDB.org.
]]>The post Yet another Insert Benchmark result: MyRocks, MySQL and a small server appeared first on MariaDB.org.
]]>tl;dr
The post Yet another Insert Benchmark result: MyRocks, MySQL and a small server appeared first on MariaDB.org.
]]>The post Yes another Insert Benchmark result: MySQL, InnoDB and a small server appeared first on MariaDB.org.
]]>tl;dr
The post Yes another Insert Benchmark result: MySQL, InnoDB and a small server appeared first on MariaDB.org.
]]>The post Release Roundup March 18, 2024 appeared first on MariaDB.org.
]]>The post Release Roundup March 18, 2024 appeared first on MariaDB.org.
]]>The post Trying to tune Postgres for the Insert Benchmark: small server appeared first on MariaDB.org.
]]>The results here are from Postgres 16.2 and a small server (8 CPU cores) with a low concurrency workload. Previous benchmark reports for Postgres on this setup are here for cached and IO-bound runs.
tl;dr
The l.i1 benchmark step deletes more rows/statement so the optimizer overhead is more significant on the l.i2 step. The ratios are much larger for InnoDB and MyRocks (they have perf problems, just not this perf problem).
I hope for a Postgres storage engine that provides MVCC without vacuum. In theory, more frequent vacuum might help and the perf overhead from frequent vacuum might be OK for the heap table given the usage of visibility bits. But when vacuum then has to do a full index scan (no visibility bits there) then that is a huge cost which limits vacuum frequency.
Build + Configuration
The post Trying to tune Postgres for the Insert Benchmark: small server appeared first on MariaDB.org.
]]>The post Identifying Performance Bottlenecks: Assessing IO Subsystem Reads in MySQL appeared first on MariaDB.org.
]]>High disk latency is a primary indicator of IO struggles. Use tools like iostat
, vmstat
, or atop
on Linux systems to monitor disk read latency. Look for increased await
and r_await
times which suggest that read operations are taking longer than usual.
innodb_io_capacity
The innodb_io_capacity
setting in MySQL determines the number of IO operations per second (IOPS) that InnoDB believes the disk can handle. If your actual disk IOPS is consistently near or exceeding this value, it might indicate that your disk is struggling to keep up with the workload. Adjust this setting based on your disk’s capabilities and workload requirements.
SHOW GLOBAL STATUS
OutputThe SHOW GLOBAL STATUS
command can provide insights into various IO-related metrics. Pay attention to:
Innodb_data_reads
and Innodb_data_read
: Increase in these values indicates higher read operations.Innodb_buffer_pool_reads
: High values suggest that many reads had to access the disk directly because the needed data was not in the buffer pool.Innodb_buffer_pool_wait_free
: Non-zero values indicate that InnoDB had to wait for clean pages to be written to disk before continuing.The InnoDB buffer pool is crucial for reducing disk IO by caching data and indexes. Key metrics include:
Innodb_buffer_pool_read_requests
: Shows the number of requests to read a page.Innodb_buffer_pool_reads
: Indicates the number of times a read had to go to disk.A low ratio of Innodb_buffer_pool_reads
to Innodb_buffer_pool_read_requests
suggests good buffer pool efficiency. A high ratio means the buffer pool may be too small or the workload is too large for the current configuration.
innodb_buffer_pool_size
ConfigurationEnsure your innodb_buffer_pool_size
is adequately sized for your dataset. A small buffer pool relative to your database size can lead to increased disk reads because less data can be cached in memory.
Long-running queries can also indicate IO struggles, especially if those queries involve large table scans or complex joins that are not optimized. Use the MySQL Slow Query Log to identify and optimize such queries.
MySQL’s Performance Schema and Sys Schema (a collection of views, functions, and procedures to simplify Performance Schema usage) can help diagnose IO issues. For instance, you can query file I/O events to see detailed file-level IO activity.
Lastly, consider your hardware. SSDs significantly reduce read latency compared to traditional HDDs. Ensure your hardware is suitable for your database’s IO demands.
By combining these approaches, you can get a comprehensive view of your MySQL IO subsystem’s health, especially concerning read operations. Addressing issues in IO can involve query optimization, hardware upgrades, or MySQL configuration adjustments.
The post Identifying Performance Bottlenecks: Assessing IO Subsystem Reads in MySQL appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Identifying Performance Bottlenecks: Assessing IO Subsystem Reads in MySQL appeared first on MariaDB.org.
]]>The post Optimizing PostgreSQL Performance: Navigating the Use of Bind Variables in PostgreSQL 16 appeared first on MariaDB.org.
]]>Bind variables are incredibly useful for optimizing database interactions, but their overuse can introduce some challenges:
Given the absence of a hard limit on the number of bind variables, developers must use judgment and best practices to determine the appropriate number:
work_mem
and maintenance_work_mem
, can help accommodate queries with a large number of bind variables more effectively.In PostgreSQL 16, while there is no explicit upper limit on the number of bind variables you can use, the practical limit is influenced by the specifics of your application, database design, and server capabilities. The key to effectively using bind variables is to balance their benefits in security and performance optimization against the potential overhead they introduce when used in large numbers. By adhering to best practices in query design, system configuration, and performance testing, developers can make informed decisions on the appropriate use of bind variables in their PostgreSQL applications.
The post Optimizing PostgreSQL Performance: Navigating the Use of Bind Variables in PostgreSQL 16 appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Optimizing PostgreSQL Performance: Navigating the Use of Bind Variables in PostgreSQL 16 appeared first on MariaDB.org.
]]> Functionality of dbstat
How does dbstat work
How to install dbstat
Query dbstat
table_size
processlist
trx_and_lck
metadata_lock
global_variables
global_status
Testing
Sources
An idea that I have been thinking about for a long time and have now, thanks to a customer, finally tackled is dbstat for MariaDB/MySQL. The idea is based on sar/sysstat by Sebastien Godard:
sar - Collect, report, or save system activity information.
and Oracle Statspack:
Statspack is a performance tuning tool ... to quickly gather detailed analysis of the performance of that database instance.
Functionality of dbstat
Although we have had the performance schema for some time, it does not cover some points that we see as a problem in practice and that are requested by customers:
The table_size module collects data on the growth of tables. This allows statements to be made about the growth of individual tables, databases, future MariaDB Catalogs or the entire instance. This is interesting for users who are using multi-tenant systems or are otherwise struggling with uncontrolled growth.
The processlist module takes a snapshot of the process list at regular intervals and saves it. This information is useful for post-mortem analyses if the user was too slow to save his process list or to understand how a problem has built up.
The problem is often caused by long-running transactions, row locks or metadata locks. These are recorded and saved by the trx_and_lck and metadata_lock modules. This means that we can see problems that we did not even notice before or we can see what led to the problem after the accident (analogous to a tachograph in a vehicle).
Another question that we sometimes encounter in practice is: When was which database variable changed and what did it look like before? This is covered by the global_variables module. Unfortunately, it is not possible to find out who changed the variable or why. Operational processes are required for this.
The last module, global_status, actually covers what sar/sysstat does. It collects the values from SHOW GLOBAL STATUS; and saves them for later analysis purposes or to simply create graphs.
How does dbstat work
dbstat uses the database Event Scheduler as a scheduler. This must first be switched on for MariaDB (event_scheduler = ON). With MySQL it is already switched on by default. The Event Scheduler has the advantage that we can activate the jobs at a finer granularity, for example 10 s, which would not be possible with the crontab.
The Event Scheduler then executes SQL/PSM code to collect the data on the one hand and to delete the data on the other, so that the dbstat database does not grow immeasurably.
The following jobs are currently planned:
ModuleCollectDeleteQuantity structureRemarks
table_size1/d at 02:0412/h, 1000 rows, > 31 d1000 tab × 31 d = 31k rowsShould work up to 288k tables.
processlist1/min1/min, 1000 rows, > 7 d1000 con × 1440 min × 7 d = 10M rowsShould work up to 1000 concurrent connections.
trx_and_lck1/min1/min, 1000 rows, > 7 d100 lck × 1440 min × 7 d = 1M rowsDepends very much on the application.
metadata_lock1/min12/h, 1000 rows, > 30 d100 mdl × 1440 × 30 d = 4M rowsDepends very much on the application.
global_variables1/minnever1000 rowsNormally this table should not grow.
global_status1/min1/min, 1000 rows, > 30 d1000 rows × 1440 × 30 d = 40MRows Can become large?
How to install dbstat
dbstat can be downloaded from Github and is licensed under GPLv2.
The installation is simple: First execute the SQL file create_user_and_db.sql. Then execute the corresponding create_*.sql files for the respective modules in the dbstat database. There are currently no direct dependencies between the modules. If you want to use a different user or a different database than dbstat, you have to take care of this yourself.
Query dbstat
Some possible queries on the data have already been prepared. They can be found in the query_*.sql files. Here are a few examples:
table_size
SELECT `table_schema`, `table_name`, `ts`, `table_rows`, `data_length`, `index_length`
FROM `table_size`
WHERE `table_catalog` = \'def\'
AND `table_schema` = \'dbstat\'
AND `table_name` = \'table_size\'
ORDER BY `ts` ASC
;
+--------------+------------+---------------------+------------+-------------+--------------+
| table_schema | table_name | ts | table_rows | data_length | index_length |
+--------------+------------+---------------------+------------+-------------+--------------+
| dbstat | table_size | 2024-03-09 20:01:00 | 0 | 16384 | 16384 |
| dbstat | table_size | 2024-03-10 17:26:33 | 310 | 65536 | 16384 |
| dbstat | table_size | 2024-03-11 08:28:12 | 622 | 114688 | 49152 |
| dbstat | table_size | 2024-03-12 08:02:38 | 934 | 114688 | 49152 |
| dbstat | table_size | 2024-03-13 08:08:55 | 1247 | 278528 | 81920 |
+--------------+------------+---------------------+------------+-------------+--------------+
processlist
SELECT connection_id, ts, time, state, SUBSTR(REGEXP_REPLACE(REPLACE(query, \"n\", \' \'), \' +\', \' \'), 1, 64) AS query
FROM processlist
WHERE command != \'Sleep\'
AND connection_id = @connection_id
ORDER BY ts ASC
LIMIT 5
;
+---------------+---------------------+---------+---------------------------------+---------------------------------------------+
| connection_id | ts | time | state | query |
+---------------+---------------------+---------+---------------------------------+---------------------------------------------+
| 14956 | 2024-03-09 20:21:12 | 13.042 | Waiting for table metadata lock | update test set data = \'bla\' where id = 100 |
| 14956 | 2024-03-09 20:22:12 | 73.045 | Waiting for table metadata lock | update test set data = \'bla\' where id = 100 |
| 14956 | 2024-03-09 20:23:12 | 133.044 | Waiting for table metadata lock | update test set data = \'bla\' where id = 100 |
| 14956 | 2024-03-09 20:24:12 | 193.044 | Waiting for table metadata lock | update test set data = \'bla\' where id = 100 |
| 14956 | 2024-03-09 20:25:12 | 253.041 | Waiting for table metadata lock | update test set data = \'bla\' where id = 100 |
+---------------+---------------------+---------+---------------------------------+---------------------------------------------+
trx_and_lck
SELECT * FROM trx_and_lckG
*************************** 1. row ***************************
machine_name:
connection_id: 14815
trx_id: 269766
ts: 2024-03-09 20:05:57
user: root
host: localhost
db: test
command: Query
time: 41.000
running_since: 2024-03-09 20:05:16
state: Statistics
info: select * from test where id = 6 for update
trx_state: LOCK WAIT
trx_started: 2024-03-09 20:05:15
trx_requested_lock_id: 269766:821:5:7
trx_tables_in_use: 1
trx_tables_locked: 1
trx_lock_structs: 2
trx_rows_locked: 1
trx_rows_modified: 0
lock_mode: X
lock_type: RECORD
lock_table_schema: test
lock_table_name: test
lock_index: PRIMARY
lock_space: 821
lock_page: 5
lock_rec: 7
lock_data: 6
*************************** 2. row ***************************
machine_name:
connection_id: 14817
trx_id: 269760
ts: 2024-03-09 20:05:57
user: root
host: localhost
db: test
command: Sleep
time: 60.000
running_since: 2024-03-09 20:04:57
state:
info:
trx_state: RUNNING
trx_started: 2024-03-09 20:04:56
trx_requested_lock_id: NULL
trx_tables_in_use: 0
trx_tables_locked: 1
trx_lock_structs: 2
trx_rows_locked: 1
trx_rows_modified: 1
lock_mode: X
lock_type: RECORD
lock_table_schema: test
lock_table_name: test
lock_index: PRIMARY
lock_space: 821
lock_page: 5
lock_rec: 7
lock_data: 6
metadata_lock
SELECT lock_mode, ts, user, host, lock_type, table_schema, table_name, time, started, state, query
FROM metadata_lock
WHERE connection_id = 14347
ORDER BY started DESC
LIMIT 5
;
+-------------------------+---------------------+------+-----------+----------------------+--------------+------------+-------+---------------------+----------------+------------------------------------------------------+
| lock_mode | ts | user | host | lock_type | table_schema | table_name | time | started | state | query |
+-------------------------+---------------------+------+-----------+----------------------+--------------+------------+-------+---------------------+----------------+------------------------------------------------------+
| MDL_SHARED_WRITE | 2024-03-13 10:27:33 | root | localhost | Table metadata lock | test | test | 1.000 | 2024-03-13 10:27:32 | Updating | UPDATE test set data3 = MD5(id) |
| MDL_BACKUP_TRANS_DML | 2024-03-13 10:27:33 | root | localhost | Backup lock | | | 1.000 | 2024-03-13 10:27:32 | Updating | UPDATE test set data3 = MD5(id) |
| MDL_BACKUP_ALTER_COPY | 2024-03-13 10:22:33 | root | localhost | Backup lock | | | 0.000 | 2024-03-13 10:22:33 | altering table | ALTER TABLE test DROP INDEX ts, ADD INDEX (ts, data) |
| MDL_SHARED_UPGRADABLE | 2024-03-13 10:22:33 | root | localhost | Table metadata lock | test | test | 0.000 | 2024-03-13 10:22:33 | altering table | ALTER TABLE test DROP INDEX ts, ADD INDEX (ts, data) |
| MDL_INTENTION_EXCLUSIVE | 2024-03-13 10:22:33 | root | localhost | Schema metadata lock | test | | 0.000 | 2024-03-13 10:22:33 | altering table | ALTER TABLE test DROP INDEX ts, ADD INDEX (ts, data) |
+-------------------------+---------------------+------+-----------+----------------------+--------------+------------+-------+---------------------+----------------+------------------------------------------------------+
global_variables
SELECT variable_name, COUNT(*) AS cnt
FROM global_variables
GROUP BY variable_name
HAVING COUNT(*) > 1
;
+-------------------------+-----+
| variable_name | cnt |
+-------------------------+-----+
| innodb_buffer_pool_size | 7 |
+-------------------------+-----+
SELECT variable_name, ts, variable_value
FROM global_variables
WHERE variable_name = \'innodb_buffer_pool_size\'
;
+-------------------------+---------------------+----------------+
| variable_name | ts | variable_value |
+-------------------------+---------------------+----------------+
| innodb_buffer_pool_size | 2024-03-09 21:36:28 | 134217728 |
| innodb_buffer_pool_size | 2024-03-09 21:40:25 | 268435456 |
| innodb_buffer_pool_size | 2024-03-09 21:48:14 | 134217728 |
+-------------------------+---------------------+----------------+
global_status
SELECT s1.ts
, s1.variable_value AS \'table_open_cache_misses\'
, s2.variable_value AS \'table_open_cache_hits\'
FROM global_status AS s1
JOIN global_status AS s2 ON s1.ts = s2.ts
WHERE s1.variable_name = \'table_open_cache_misses\'
AND s2.variable_name = \'table_open_cache_hits\'
AND s1.ts BETWEEN \'2024-03-13 11:55:00\' AND \'2024-03-13 12:05:00\'
ORDER BY ts ASC
;
+---------------------+-------------------------+-----------------------+
| ts | table_open_cache_misses | table_open_cache_hits |
+---------------------+-------------------------+-----------------------+
| 2024-03-13 11:55:47 | 1001 | 60711 |
| 2024-03-13 11:56:47 | 1008 | 61418 |
| 2024-03-13 11:57:47 | 1015 | 62125 |
| 2024-03-13 11:58:47 | 1022 | 62829 |
| 2024-03-13 11:59:47 | 1029 | 63533 |
| 2024-03-13 12:00:47 | 1036 | 64237 |
| 2024-03-13 12:01:47 | 1043 | 64944 |
| 2024-03-13 12:02:47 | 1050 | 65651 |
| 2024-03-13 12:03:47 | 1057 | 66355 |
| 2024-03-13 12:04:47 | 1064 | 67059 |
+---------------------+-------------------------+-----------------------+
Testing
We have currently rolled out dbstat on our test and production systems to test it and see whether our assumptions regarding stability and calculations of the quantity structure are correct. In addition, using it ourselves is the best way to find out if something is missing or if the handling is impractical (Eat your own dog food).
Sources
sar
Using Oracle Statspack
dbstat on Github
SQL/PSM
Taxonomy upgrade extras: performancemonitoringperformance monitoringmetadata locklocklockingperformance_schema
The post dbstat for MariaDB (and MySQL) appeared first on MariaDB.org.
]]>An idea that I have been thinking about for a long time and have now, thanks to a customer, finally tackled is dbstat
for MariaDB/MySQL. The idea is based on sar/sysstat
by Sebastien Godard:
sar – Collect, report, or save system activity information.
and Oracle Statspack:
Statspack is a performance tuning tool … to quickly gather detailed analysis of the performance of that database instance.
dbstat
Although we have had the performance schema for some time, it does not cover some points that we see as a problem in practice and that are requested by customers:
table_size
module collects data on the growth of tables. This allows statements to be made about the growth of individual tables, databases, future MariaDB Catalogs or the entire instance. This is interesting for users who are using multi-tenant systems or are otherwise struggling with uncontrolled growth.processlist
module takes a snapshot of the process list at regular intervals and saves it. This information is useful for post-mortem analyses if the user was too slow to save his process list or to understand how a problem has built up.trx_and_lck
and metadata_lock
modules. This means that we can see problems that we did not even notice before or we can see what led to the problem after the accident (analogous to a tachograph in a vehicle).global_variables
module. Unfortunately, it is not possible to find out who changed the variable or why. Operational processes are required for this.global_status
, actually covers what sar/sysstat does
. It collects the values from SHOW GLOBAL STATUS;
and saves them for later analysis purposes or to simply create graphs.dbstat
workdbstat
uses the database Event Scheduler as a scheduler. This must first be switched on for MariaDB (event_scheduler = ON
). With MySQL it is already switched on by default. The Event Scheduler has the advantage that we can activate the jobs at a finer granularity, for example 10 s, which would not be possible with the crontab.
The Event Scheduler then executes SQL/PSM code to collect the data on the one hand and to delete the data on the other, so that the dbstat
database does not grow immeasurably.
The following jobs are currently planned:
Module | Collect | Delete | Quantity structure | Remarks |
---|---|---|---|---|
table_size | 1/d at 02:04 | 12/h, 1000 rows, > 31 d | 1000 tab × 31 d = 31k rows | Should work up to 288k tables. |
processlist | 1/min | 1/min, 1000 rows, > 7 d | 1000 con × 1440 min × 7 d = 10M rows | Should work up to 1000 concurrent connections. |
trx_and_lck | 1/min | 1/min, 1000 rows, > 7 d | 100 lck × 1440 min × 7 d = 1M rows | Depends very much on the application. |
metadata_lock | 1/min | 12/h, 1000 rows, > 30 d | 100 mdl × 1440 × 30 d = 4M rows | Depends very much on the application. |
global_variables | 1/min | never | 1000 rows | Normally this table should not grow. |
global_status | 1/min | 1/min, 1000 rows, > 30 d | 1000 rows × 1440 × 30 d = 40M | Rows Can become large? |
dbstat
dbstat
can be downloaded from Github and is licensed under GPLv2.
The installation is simple: First execute the SQL file create_user_and_db.sql
. Then execute the corresponding create_*.sql
files for the respective modules in the dbstat
database. There are currently no direct dependencies between the modules. If you want to use a different user or a different database than dbstat, you have to take care of this yourself.
dbstat
Some possible queries on the data have already been prepared. They can be found in the query_*.sql
files. Here are a few examples:
SELECT `table_schema`, `table_name`, `ts`, `table_rows`, `data_length`, `index_length` FROM `table_size` WHERE `table_catalog` = 'def' AND `table_schema` = 'dbstat' AND `table_name` = 'table_size' ORDER BY `ts` ASC ; +--------------+------------+---------------------+------------+-------------+--------------+ | table_schema | table_name | ts | table_rows | data_length | index_length | +--------------+------------+---------------------+------------+-------------+--------------+ | dbstat | table_size | 2024-03-09 20:01:00 | 0 | 16384 | 16384 | | dbstat | table_size | 2024-03-10 17:26:33 | 310 | 65536 | 16384 | | dbstat | table_size | 2024-03-11 08:28:12 | 622 | 114688 | 49152 | | dbstat | table_size | 2024-03-12 08:02:38 | 934 | 114688 | 49152 | | dbstat | table_size | 2024-03-13 08:08:55 | 1247 | 278528 | 81920 | +--------------+------------+---------------------+------------+-------------+--------------+
SELECT connection_id, ts, time, state, SUBSTR(REGEXP_REPLACE(REPLACE(query, "n", ' '), ' +', ' '), 1, 64) AS query FROM processlist WHERE command != 'Sleep' AND connection_id = @connection_id ORDER BY ts ASC LIMIT 5 ; +---------------+---------------------+---------+---------------------------------+---------------------------------------------+ | connection_id | ts | time | state | query | +---------------+---------------------+---------+---------------------------------+---------------------------------------------+ | 14956 | 2024-03-09 20:21:12 | 13.042 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 | | 14956 | 2024-03-09 20:22:12 | 73.045 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 | | 14956 | 2024-03-09 20:23:12 | 133.044 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 | | 14956 | 2024-03-09 20:24:12 | 193.044 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 | | 14956 | 2024-03-09 20:25:12 | 253.041 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 | +---------------+---------------------+---------+---------------------------------+---------------------------------------------+
SELECT * FROM trx_and_lckG *************************** 1. row *************************** machine_name: connection_id: 14815 trx_id: 269766 ts: 2024-03-09 20:05:57 user: root host: localhost db: test command: Query time: 41.000 running_since: 2024-03-09 20:05:16 state: Statistics info: select * from test where id = 6 for update trx_state: LOCK WAIT trx_started: 2024-03-09 20:05:15 trx_requested_lock_id: 269766:821:5:7 trx_tables_in_use: 1 trx_tables_locked: 1 trx_lock_structs: 2 trx_rows_locked: 1 trx_rows_modified: 0 lock_mode: X lock_type: RECORD lock_table_schema: test lock_table_name: test lock_index: PRIMARY lock_space: 821 lock_page: 5 lock_rec: 7 lock_data: 6 *************************** 2. row *************************** machine_name: connection_id: 14817 trx_id: 269760 ts: 2024-03-09 20:05:57 user: root host: localhost db: test command: Sleep time: 60.000 running_since: 2024-03-09 20:04:57 state: info: trx_state: RUNNING trx_started: 2024-03-09 20:04:56 trx_requested_lock_id: NULL trx_tables_in_use: 0 trx_tables_locked: 1 trx_lock_structs: 2 trx_rows_locked: 1 trx_rows_modified: 1 lock_mode: X lock_type: RECORD lock_table_schema: test lock_table_name: test lock_index: PRIMARY lock_space: 821 lock_page: 5 lock_rec: 7 lock_data: 6
SELECT lock_mode, ts, user, host, lock_type, table_schema, table_name, time, started, state, query FROM metadata_lock WHERE connection_id = 14347 ORDER BY started DESC LIMIT 5 ; +-------------------------+---------------------+------+-----------+----------------------+--------------+------------+-------+---------------------+----------------+------------------------------------------------------+ | lock_mode | ts | user | host | lock_type | table_schema | table_name | time | started | state | query | +-------------------------+---------------------+------+-----------+----------------------+--------------+------------+-------+---------------------+----------------+------------------------------------------------------+ | MDL_SHARED_WRITE | 2024-03-13 10:27:33 | root | localhost | Table metadata lock | test | test | 1.000 | 2024-03-13 10:27:32 | Updating | UPDATE test set data3 = MD5(id) | | MDL_BACKUP_TRANS_DML | 2024-03-13 10:27:33 | root | localhost | Backup lock | | | 1.000 | 2024-03-13 10:27:32 | Updating | UPDATE test set data3 = MD5(id) | | MDL_BACKUP_ALTER_COPY | 2024-03-13 10:22:33 | root | localhost | Backup lock | | | 0.000 | 2024-03-13 10:22:33 | altering table | ALTER TABLE test DROP INDEX ts, ADD INDEX (ts, data) | | MDL_SHARED_UPGRADABLE | 2024-03-13 10:22:33 | root | localhost | Table metadata lock | test | test | 0.000 | 2024-03-13 10:22:33 | altering table | ALTER TABLE test DROP INDEX ts, ADD INDEX (ts, data) | | MDL_INTENTION_EXCLUSIVE | 2024-03-13 10:22:33 | root | localhost | Schema metadata lock | test | | 0.000 | 2024-03-13 10:22:33 | altering table | ALTER TABLE test DROP INDEX ts, ADD INDEX (ts, data) | +-------------------------+---------------------+------+-----------+----------------------+--------------+------------+-------+---------------------+----------------+------------------------------------------------------+
SELECT variable_name, COUNT(*) AS cnt FROM global_variables GROUP BY variable_name HAVING COUNT(*) > 1 ; +-------------------------+-----+ | variable_name | cnt | +-------------------------+-----+ | innodb_buffer_pool_size | 7 | +-------------------------+-----+ SELECT variable_name, ts, variable_value FROM global_variables WHERE variable_name = 'innodb_buffer_pool_size' ; +-------------------------+---------------------+----------------+ | variable_name | ts | variable_value | +-------------------------+---------------------+----------------+ | innodb_buffer_pool_size | 2024-03-09 21:36:28 | 134217728 | | innodb_buffer_pool_size | 2024-03-09 21:40:25 | 268435456 | | innodb_buffer_pool_size | 2024-03-09 21:48:14 | 134217728 | +-------------------------+---------------------+----------------+
SELECT s1.ts , s1.variable_value AS 'table_open_cache_misses' , s2.variable_value AS 'table_open_cache_hits' FROM global_status AS s1 JOIN global_status AS s2 ON s1.ts = s2.ts WHERE s1.variable_name = 'table_open_cache_misses' AND s2.variable_name = 'table_open_cache_hits' AND s1.ts BETWEEN '2024-03-13 11:55:00' AND '2024-03-13 12:05:00' ORDER BY ts ASC ; +---------------------+-------------------------+-----------------------+ | ts | table_open_cache_misses | table_open_cache_hits | +---------------------+-------------------------+-----------------------+ | 2024-03-13 11:55:47 | 1001 | 60711 | | 2024-03-13 11:56:47 | 1008 | 61418 | | 2024-03-13 11:57:47 | 1015 | 62125 | | 2024-03-13 11:58:47 | 1022 | 62829 | | 2024-03-13 11:59:47 | 1029 | 63533 | | 2024-03-13 12:00:47 | 1036 | 64237 | | 2024-03-13 12:01:47 | 1043 | 64944 | | 2024-03-13 12:02:47 | 1050 | 65651 | | 2024-03-13 12:03:47 | 1057 | 66355 | | 2024-03-13 12:04:47 | 1064 | 67059 | +---------------------+-------------------------+-----------------------+
We have currently rolled out dbstat
on our test and production systems to test it and see whether our assumptions regarding stability and calculations of the quantity structure are correct. In addition, using it ourselves is the best way to find out if something is missing or if the handling is impractical (Eat your own dog food).
The post dbstat for MariaDB (and MySQL) appeared first on MariaDB.org.
]]> Functionality of dbstat
How does dbstat work
How to install dbstat
Query dbstat
table_size
processlist
trx_and_lck
metadata_lock
global_variables
global_status
Testing
Sources
An idea that I have been thinking about for a long time and have now, thanks to a customer, finally tackled is dbstat for MariaDB/MySQL. The idea is based on sar/sysstat by Sebastien Godard:
sar - Collect, report, or save system activity information.
and Oracle Statspack:
Statspack is a performance tuning tool ... to quickly gather detailed analysis of the performance of that database instance.
Functionality of dbstat
Although we have had the performance schema for some time, it does not cover some points that we see as a problem in practice and that are requested by customers:
The table_size module collects data on the growth of tables. This allows statements to be made about the growth of individual tables, databases, future MariaDB Catalogs or the entire instance. This is interesting for users who are using multi-tenant systems or are otherwise struggling with uncontrolled growth.
The processlist module takes a snapshot of the process list at regular intervals and saves it. This information is useful for post-mortem analyses if the user was too slow to save his process list or to understand how a problem has built up.
The problem is often caused by long-running transactions, row locks or metadata locks. These are recorded and saved by the trx_and_lck and metadata_lock modules. This means that we can see problems that we did not even notice before or we can see what led to the problem after the accident (analogous to a tachograph in a vehicle).
Another question that we sometimes encounter in practice is: When was which database variable changed and what did it look like before? This is covered by the global_variables module. Unfortunately, it is not possible to find out who changed the variable or why. Operational processes are required for this.
The last module, global_status, actually covers what sar/sysstat does. It collects the values from SHOW GLOBAL STATUS; and saves them for later analysis purposes or to simply create graphs.
How does dbstat work
dbstat uses the database Event Scheduler as a scheduler. This must first be switched on for MariaDB (event_scheduler = ON). With MySQL it is already switched on by default. The Event Scheduler has the advantage that we can activate the jobs at a finer granularity, for example 10 s, which would not be possible with the crontab.
The Event Scheduler then executes SQL/PSM code to collect the data on the one hand and to delete the data on the other, so that the dbstat database does not grow immeasurably.
The following jobs are currently planned:
ModuleCollectDeleteQuantity structureRemarks
table_size1/d at 02:0412/h, 1000 rows, > 31 d1000 tab × 31 d = 31k rowsShould work up to 288k tables.
processlist1/min1/min, 1000 rows, > 7 d1000 con × 1440 min × 7 d = 10M rowsShould work up to 1000 concurrent connections.
trx_and_lck1/min1/min, 1000 rows, > 7 d100 lck × 1440 min × 7 d = 1M rowsDepends very much on the application.
metadata_lock1/min12/h, 1000 rows, > 30 d100 mdl × 1440 × 30 d = 4M rowsDepends very much on the application.
global_variables1/minnever1000 rowsNormally this table should not grow.
global_status1/min1/min, 1000 rows, > 30 d1000 rows × 1440 × 30 d = 40MRows Can become large?
How to install dbstat
dbstat can be downloaded from Github and is licensed under GPLv2.
The installation is simple: First execute the SQL file create_user_and_db.sql. Then execute the corresponding create_*.sql files for the respective modules in the dbstat database. There are currently no direct dependencies between the modules. If you want to use a different user or a different database than dbstat, you have to take care of this yourself.
Query dbstat
Some possible queries on the data have already been prepared. They can be found in the query_*.sql files. Here are a few examples:
table_size
SELECT `table_schema`, `table_name`, `ts`, `table_rows`, `data_length`, `index_length`
FROM `table_size`
WHERE `table_catalog` = \'def\'
AND `table_schema` = \'dbstat\'
AND `table_name` = \'table_size\'
ORDER BY `ts` ASC
;
+--------------+------------+---------------------+------------+-------------+--------------+
| table_schema | table_name | ts | table_rows | data_length | index_length |
+--------------+------------+---------------------+------------+-------------+--------------+
| dbstat | table_size | 2024-03-09 20:01:00 | 0 | 16384 | 16384 |
| dbstat | table_size | 2024-03-10 17:26:33 | 310 | 65536 | 16384 |
| dbstat | table_size | 2024-03-11 08:28:12 | 622 | 114688 | 49152 |
| dbstat | table_size | 2024-03-12 08:02:38 | 934 | 114688 | 49152 |
| dbstat | table_size | 2024-03-13 08:08:55 | 1247 | 278528 | 81920 |
+--------------+------------+---------------------+------------+-------------+--------------+
processlist
SELECT connection_id, ts, time, state, SUBSTR(REGEXP_REPLACE(REPLACE(query, \"n\", \' \'), \' +\', \' \'), 1, 64) AS query
FROM processlist
WHERE command != \'Sleep\'
AND connection_id = @connection_id
ORDER BY ts ASC
LIMIT 5
;
+---------------+---------------------+---------+---------------------------------+---------------------------------------------+
| connection_id | ts | time | state | query |
+---------------+---------------------+---------+---------------------------------+---------------------------------------------+
| 14956 | 2024-03-09 20:21:12 | 13.042 | Waiting for table metadata lock | update test set data = \'bla\' where id = 100 |
| 14956 | 2024-03-09 20:22:12 | 73.045 | Waiting for table metadata lock | update test set data = \'bla\' where id = 100 |
| 14956 | 2024-03-09 20:23:12 | 133.044 | Waiting for table metadata lock | update test set data = \'bla\' where id = 100 |
| 14956 | 2024-03-09 20:24:12 | 193.044 | Waiting for table metadata lock | update test set data = \'bla\' where id = 100 |
| 14956 | 2024-03-09 20:25:12 | 253.041 | Waiting for table metadata lock | update test set data = \'bla\' where id = 100 |
+---------------+---------------------+---------+---------------------------------+---------------------------------------------+
trx_and_lck
SELECT * FROM trx_and_lckG
*************************** 1. row ***************************
machine_name:
connection_id: 14815
trx_id: 269766
ts: 2024-03-09 20:05:57
user: root
host: localhost
db: test
command: Query
time: 41.000
running_since: 2024-03-09 20:05:16
state: Statistics
info: select * from test where id = 6 for update
trx_state: LOCK WAIT
trx_started: 2024-03-09 20:05:15
trx_requested_lock_id: 269766:821:5:7
trx_tables_in_use: 1
trx_tables_locked: 1
trx_lock_structs: 2
trx_rows_locked: 1
trx_rows_modified: 0
lock_mode: X
lock_type: RECORD
lock_table_schema: test
lock_table_name: test
lock_index: PRIMARY
lock_space: 821
lock_page: 5
lock_rec: 7
lock_data: 6
*************************** 2. row ***************************
machine_name:
connection_id: 14817
trx_id: 269760
ts: 2024-03-09 20:05:57
user: root
host: localhost
db: test
command: Sleep
time: 60.000
running_since: 2024-03-09 20:04:57
state:
info:
trx_state: RUNNING
trx_started: 2024-03-09 20:04:56
trx_requested_lock_id: NULL
trx_tables_in_use: 0
trx_tables_locked: 1
trx_lock_structs: 2
trx_rows_locked: 1
trx_rows_modified: 1
lock_mode: X
lock_type: RECORD
lock_table_schema: test
lock_table_name: test
lock_index: PRIMARY
lock_space: 821
lock_page: 5
lock_rec: 7
lock_data: 6
metadata_lock
SELECT lock_mode, ts, user, host, lock_type, table_schema, table_name, time, started, state, query
FROM metadata_lock
WHERE connection_id = 14347
ORDER BY started DESC
LIMIT 5
;
+-------------------------+---------------------+------+-----------+----------------------+--------------+------------+-------+---------------------+----------------+------------------------------------------------------+
| lock_mode | ts | user | host | lock_type | table_schema | table_name | time | started | state | query |
+-------------------------+---------------------+------+-----------+----------------------+--------------+------------+-------+---------------------+----------------+------------------------------------------------------+
| MDL_SHARED_WRITE | 2024-03-13 10:27:33 | root | localhost | Table metadata lock | test | test | 1.000 | 2024-03-13 10:27:32 | Updating | UPDATE test set data3 = MD5(id) |
| MDL_BACKUP_TRANS_DML | 2024-03-13 10:27:33 | root | localhost | Backup lock | | | 1.000 | 2024-03-13 10:27:32 | Updating | UPDATE test set data3 = MD5(id) |
| MDL_BACKUP_ALTER_COPY | 2024-03-13 10:22:33 | root | localhost | Backup lock | | | 0.000 | 2024-03-13 10:22:33 | altering table | ALTER TABLE test DROP INDEX ts, ADD INDEX (ts, data) |
| MDL_SHARED_UPGRADABLE | 2024-03-13 10:22:33 | root | localhost | Table metadata lock | test | test | 0.000 | 2024-03-13 10:22:33 | altering table | ALTER TABLE test DROP INDEX ts, ADD INDEX (ts, data) |
| MDL_INTENTION_EXCLUSIVE | 2024-03-13 10:22:33 | root | localhost | Schema metadata lock | test | | 0.000 | 2024-03-13 10:22:33 | altering table | ALTER TABLE test DROP INDEX ts, ADD INDEX (ts, data) |
+-------------------------+---------------------+------+-----------+----------------------+--------------+------------+-------+---------------------+----------------+------------------------------------------------------+
global_variables
SELECT variable_name, COUNT(*) AS cnt
FROM global_variables
GROUP BY variable_name
HAVING COUNT(*) > 1
;
+-------------------------+-----+
| variable_name | cnt |
+-------------------------+-----+
| innodb_buffer_pool_size | 7 |
+-------------------------+-----+
SELECT variable_name, ts, variable_value
FROM global_variables
WHERE variable_name = \'innodb_buffer_pool_size\'
;
+-------------------------+---------------------+----------------+
| variable_name | ts | variable_value |
+-------------------------+---------------------+----------------+
| innodb_buffer_pool_size | 2024-03-09 21:36:28 | 134217728 |
| innodb_buffer_pool_size | 2024-03-09 21:40:25 | 268435456 |
| innodb_buffer_pool_size | 2024-03-09 21:48:14 | 134217728 |
+-------------------------+---------------------+----------------+
global_status
SELECT s1.ts
, s1.variable_value AS \'table_open_cache_misses\'
, s2.variable_value AS \'table_open_cache_hits\'
FROM global_status AS s1
JOIN global_status AS s2 ON s1.ts = s2.ts
WHERE s1.variable_name = \'table_open_cache_misses\'
AND s2.variable_name = \'table_open_cache_hits\'
AND s1.ts BETWEEN \'2024-03-13 11:55:00\' AND \'2024-03-13 12:05:00\'
ORDER BY ts ASC
;
+---------------------+-------------------------+-----------------------+
| ts | table_open_cache_misses | table_open_cache_hits |
+---------------------+-------------------------+-----------------------+
| 2024-03-13 11:55:47 | 1001 | 60711 |
| 2024-03-13 11:56:47 | 1008 | 61418 |
| 2024-03-13 11:57:47 | 1015 | 62125 |
| 2024-03-13 11:58:47 | 1022 | 62829 |
| 2024-03-13 11:59:47 | 1029 | 63533 |
| 2024-03-13 12:00:47 | 1036 | 64237 |
| 2024-03-13 12:01:47 | 1043 | 64944 |
| 2024-03-13 12:02:47 | 1050 | 65651 |
| 2024-03-13 12:03:47 | 1057 | 66355 |
| 2024-03-13 12:04:47 | 1064 | 67059 |
+---------------------+-------------------------+-----------------------+
Testing
We have currently rolled out dbstat on our test and production systems to test it and see whether our assumptions regarding stability and calculations of the quantity structure are correct. In addition, using it ourselves is the best way to find out if something is missing or if the handling is impractical (Eat your own dog food).
Sources
sar
Using Oracle Statspack
dbstat on Github
SQL/PSM
Taxonomy upgrade extras: performancemonitoringperformance monitoringmetadata locklocklockingperformance_schema
The post Shinguz: dbstat for MariaDB (and MySQL) appeared first on MariaDB.org.
]]>An idea that I have been thinking about for a long time and have now, thanks to a customer, finally tackled is dbstat
for MariaDB/MySQL. The idea is based on sar/sysstat
by Sebastien Godard:
sar – Collect, report, or save system activity information.
and Oracle Statspack:
Statspack is a performance tuning tool … to quickly gather detailed analysis of the performance of that database instance.
dbstat
Although we have had the performance schema for some time, it does not cover some points that we see as a problem in practice and that are requested by customers:
table_size
module collects data on the growth of tables. This allows statements to be made about the growth of individual tables, databases, future MariaDB Catalogs or the entire instance. This is interesting for users who are using multi-tenant systems or are otherwise struggling with uncontrolled growth.processlist
module takes a snapshot of the process list at regular intervals and saves it. This information is useful for post-mortem analyses if the user was too slow to save his process list or to understand how a problem has built up.trx_and_lck
and metadata_lock
modules. This means that we can see problems that we did not even notice before or we can see what led to the problem after the accident (analogous to a tachograph in a vehicle).global_variables
module. Unfortunately, it is not possible to find out who changed the variable or why. Operational processes are required for this.global_status
, actually covers what sar/sysstat does
. It collects the values from SHOW GLOBAL STATUS;
and saves them for later analysis purposes or to simply create graphs.dbstat
workdbstat
uses the database Event Scheduler as a scheduler. This must first be switched on for MariaDB (event_scheduler = ON
). With MySQL it is already switched on by default. The Event Scheduler has the advantage that we can activate the jobs at a finer granularity, for example 10 s, which would not be possible with the crontab.
The Event Scheduler then executes SQL/PSM code to collect the data on the one hand and to delete the data on the other, so that the dbstat
database does not grow immeasurably.
The following jobs are currently planned:
Module | Collect | Delete | Quantity structure | Remarks |
---|---|---|---|---|
table_size | 1/d at 02:04 | 12/h, 1000 rows, > 31 d | 1000 tab × 31 d = 31k rows | Should work up to 288k tables. |
processlist | 1/min | 1/min, 1000 rows, > 7 d | 1000 con × 1440 min × 7 d = 10M rows | Should work up to 1000 concurrent connections. |
trx_and_lck | 1/min | 1/min, 1000 rows, > 7 d | 100 lck × 1440 min × 7 d = 1M rows | Depends very much on the application. |
metadata_lock | 1/min | 12/h, 1000 rows, > 30 d | 100 mdl × 1440 × 30 d = 4M rows | Depends very much on the application. |
global_variables | 1/min | never | 1000 rows | Normally this table should not grow. |
global_status | 1/min | 1/min, 1000 rows, > 30 d | 1000 rows × 1440 × 30 d = 40M | Rows Can become large? |
dbstat
dbstat
can be downloaded from Github and is licensed under GPLv2.
The installation is simple: First execute the SQL file create_user_and_db.sql
. Then execute the corresponding create_*.sql
files for the respective modules in the dbstat
database. There are currently no direct dependencies between the modules. If you want to use a different user or a different database than dbstat, you have to take care of this yourself.
dbstat
Some possible queries on the data have already been prepared. They can be found in the query_*.sql
files. Here are a few examples:
SELECT `table_schema`, `table_name`, `ts`, `table_rows`, `data_length`, `index_length` FROM `table_size` WHERE `table_catalog` = 'def' AND `table_schema` = 'dbstat' AND `table_name` = 'table_size' ORDER BY `ts` ASC ; +--------------+------------+---------------------+------------+-------------+--------------+ | table_schema | table_name | ts | table_rows | data_length | index_length | +--------------+------------+---------------------+------------+-------------+--------------+ | dbstat | table_size | 2024-03-09 20:01:00 | 0 | 16384 | 16384 | | dbstat | table_size | 2024-03-10 17:26:33 | 310 | 65536 | 16384 | | dbstat | table_size | 2024-03-11 08:28:12 | 622 | 114688 | 49152 | | dbstat | table_size | 2024-03-12 08:02:38 | 934 | 114688 | 49152 | | dbstat | table_size | 2024-03-13 08:08:55 | 1247 | 278528 | 81920 | +--------------+------------+---------------------+------------+-------------+--------------+
SELECT connection_id, ts, time, state, SUBSTR(REGEXP_REPLACE(REPLACE(query, "n", ' '), ' +', ' '), 1, 64) AS query FROM processlist WHERE command != 'Sleep' AND connection_id = @connection_id ORDER BY ts ASC LIMIT 5 ; +---------------+---------------------+---------+---------------------------------+---------------------------------------------+ | connection_id | ts | time | state | query | +---------------+---------------------+---------+---------------------------------+---------------------------------------------+ | 14956 | 2024-03-09 20:21:12 | 13.042 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 | | 14956 | 2024-03-09 20:22:12 | 73.045 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 | | 14956 | 2024-03-09 20:23:12 | 133.044 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 | | 14956 | 2024-03-09 20:24:12 | 193.044 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 | | 14956 | 2024-03-09 20:25:12 | 253.041 | Waiting for table metadata lock | update test set data = 'bla' where id = 100 | +---------------+---------------------+---------+---------------------------------+---------------------------------------------+
SELECT * FROM trx_and_lckG *************************** 1. row *************************** machine_name: connection_id: 14815 trx_id: 269766 ts: 2024-03-09 20:05:57 user: root host: localhost db: test command: Query time: 41.000 running_since: 2024-03-09 20:05:16 state: Statistics info: select * from test where id = 6 for update trx_state: LOCK WAIT trx_started: 2024-03-09 20:05:15 trx_requested_lock_id: 269766:821:5:7 trx_tables_in_use: 1 trx_tables_locked: 1 trx_lock_structs: 2 trx_rows_locked: 1 trx_rows_modified: 0 lock_mode: X lock_type: RECORD lock_table_schema: test lock_table_name: test lock_index: PRIMARY lock_space: 821 lock_page: 5 lock_rec: 7 lock_data: 6 *************************** 2. row *************************** machine_name: connection_id: 14817 trx_id: 269760 ts: 2024-03-09 20:05:57 user: root host: localhost db: test command: Sleep time: 60.000 running_since: 2024-03-09 20:04:57 state: info: trx_state: RUNNING trx_started: 2024-03-09 20:04:56 trx_requested_lock_id: NULL trx_tables_in_use: 0 trx_tables_locked: 1 trx_lock_structs: 2 trx_rows_locked: 1 trx_rows_modified: 1 lock_mode: X lock_type: RECORD lock_table_schema: test lock_table_name: test lock_index: PRIMARY lock_space: 821 lock_page: 5 lock_rec: 7 lock_data: 6
SELECT lock_mode, ts, user, host, lock_type, table_schema, table_name, time, started, state, query FROM metadata_lock WHERE connection_id = 14347 ORDER BY started DESC LIMIT 5 ; +-------------------------+---------------------+------+-----------+----------------------+--------------+------------+-------+---------------------+----------------+------------------------------------------------------+ | lock_mode | ts | user | host | lock_type | table_schema | table_name | time | started | state | query | +-------------------------+---------------------+------+-----------+----------------------+--------------+------------+-------+---------------------+----------------+------------------------------------------------------+ | MDL_SHARED_WRITE | 2024-03-13 10:27:33 | root | localhost | Table metadata lock | test | test | 1.000 | 2024-03-13 10:27:32 | Updating | UPDATE test set data3 = MD5(id) | | MDL_BACKUP_TRANS_DML | 2024-03-13 10:27:33 | root | localhost | Backup lock | | | 1.000 | 2024-03-13 10:27:32 | Updating | UPDATE test set data3 = MD5(id) | | MDL_BACKUP_ALTER_COPY | 2024-03-13 10:22:33 | root | localhost | Backup lock | | | 0.000 | 2024-03-13 10:22:33 | altering table | ALTER TABLE test DROP INDEX ts, ADD INDEX (ts, data) | | MDL_SHARED_UPGRADABLE | 2024-03-13 10:22:33 | root | localhost | Table metadata lock | test | test | 0.000 | 2024-03-13 10:22:33 | altering table | ALTER TABLE test DROP INDEX ts, ADD INDEX (ts, data) | | MDL_INTENTION_EXCLUSIVE | 2024-03-13 10:22:33 | root | localhost | Schema metadata lock | test | | 0.000 | 2024-03-13 10:22:33 | altering table | ALTER TABLE test DROP INDEX ts, ADD INDEX (ts, data) | +-------------------------+---------------------+------+-----------+----------------------+--------------+------------+-------+---------------------+----------------+------------------------------------------------------+
SELECT variable_name, COUNT(*) AS cnt FROM global_variables GROUP BY variable_name HAVING COUNT(*) > 1 ; +-------------------------+-----+ | variable_name | cnt | +-------------------------+-----+ | innodb_buffer_pool_size | 7 | +-------------------------+-----+ SELECT variable_name, ts, variable_value FROM global_variables WHERE variable_name = 'innodb_buffer_pool_size' ; +-------------------------+---------------------+----------------+ | variable_name | ts | variable_value | +-------------------------+---------------------+----------------+ | innodb_buffer_pool_size | 2024-03-09 21:36:28 | 134217728 | | innodb_buffer_pool_size | 2024-03-09 21:40:25 | 268435456 | | innodb_buffer_pool_size | 2024-03-09 21:48:14 | 134217728 | +-------------------------+---------------------+----------------+
SELECT s1.ts , s1.variable_value AS 'table_open_cache_misses' , s2.variable_value AS 'table_open_cache_hits' FROM global_status AS s1 JOIN global_status AS s2 ON s1.ts = s2.ts WHERE s1.variable_name = 'table_open_cache_misses' AND s2.variable_name = 'table_open_cache_hits' AND s1.ts BETWEEN '2024-03-13 11:55:00' AND '2024-03-13 12:05:00' ORDER BY ts ASC ; +---------------------+-------------------------+-----------------------+ | ts | table_open_cache_misses | table_open_cache_hits | +---------------------+-------------------------+-----------------------+ | 2024-03-13 11:55:47 | 1001 | 60711 | | 2024-03-13 11:56:47 | 1008 | 61418 | | 2024-03-13 11:57:47 | 1015 | 62125 | | 2024-03-13 11:58:47 | 1022 | 62829 | | 2024-03-13 11:59:47 | 1029 | 63533 | | 2024-03-13 12:00:47 | 1036 | 64237 | | 2024-03-13 12:01:47 | 1043 | 64944 | | 2024-03-13 12:02:47 | 1050 | 65651 | | 2024-03-13 12:03:47 | 1057 | 66355 | | 2024-03-13 12:04:47 | 1064 | 67059 | +---------------------+-------------------------+-----------------------+
We have currently rolled out dbstat
on our test and production systems to test it and see whether our assumptions regarding stability and calculations of the quantity structure are correct. In addition, using it ourselves is the best way to find out if something is missing or if the handling is impractical (Eat your own dog food).
The post Shinguz: dbstat for MariaDB (and MySQL) appeared first on MariaDB.org.
]]>The post ClusterControl adds in-place major PostgreSQL upgrade and pgvector extension in latest release appeared first on MariaDB.org.
]]>Let’s dive into the details to help you get up and running with these latest enhancements.
An in-place PostgreSQL major upgrade involves upgrading your current PostgreSQL instance to a newer major version directly on the same server. This method uses the pg_upgrade tool to facilitate a smooth transition to the latest version.
If your PostgreSQL database is running on an outdated version, we strongly advise upgrading to a newer version to reap the following benefits:
In summary, performing an in-place PostgreSQL major upgrade will enable you to take advantage of vital built-in functionalities, security updates, performance improvements, and implementations beneficial for database management.
How to upgrade your PostgreSQL version from the ClusterControl GUI:
Upgrading your PostgreSQL version with ClusterControl is easy with our intuitive wizard. Just follow these simple steps:
To help you through the upgrade process, refer to our documentation.
Pgvector, an open-source extension designed for PostgreSQL, functions as a robust tool for managing embeddings within PostgreSQL environments.
A significant addition to the PostgreSQL ecosystem, pgvector excels in identifying both precise and approximate nearest neighbors, thereby facilitating search functionality, recommendations, and anomaly detection.
Let’s delve into the powerful features and compelling use cases of pgvector.
Key features and benefits of pgvector:
Explore pgvector use cases:
How to enable pgvector from the ClusterControl GUI:
For more information, check our documentation and this pgvector resource.
If you’re a MongoDB user looking to perform a minor upgrade, we have you covered! This release also includes support for minor upgrades for replicaset and sharded clusters.
Head to the Upgrades tab in ClusterControl, conduct a quick version check and proceed with the upgrade process.
For more information, see our documentation for all the details.
ClusterControl v1.9.8 offers a range of other enhancements across MongoDB, MySQL, and the user interface:
MongoDB updates:
MySQL updates:
Interface improvements:
We’re committed to adding new ClusterControl capabilities as we strive to empower you to deploy, monitor, and scale your open-source databases in various environments (cloud, on-premise, hybrid).
Stay tuned for the next exciting ClusterControl release (v1.9.9), featuring support for the Redis Cluster, additional upgrades to multiple databases, and the ability to scale CC to thousands of nodes.
For more insights into v1.9.8, visit our changelogs for the details!
New to ClusterControl? Try our Enterprise edition free for 30 days and get technical support to guide you throughout your journey.
The post ClusterControl adds in-place major PostgreSQL upgrade and pgvector extension in latest release appeared first on Severalnines.
The post ClusterControl adds in-place major PostgreSQL upgrade and pgvector extension in latest release appeared first on MariaDB.org.
]]>The post MariaDB Enterprise Server Q1 2024 maintenance releases appeared first on MariaDB.org.
]]>The post MariaDB Enterprise Server Q1 2024 maintenance releases appeared first on MariaDB.org.
]]>The post Self-Hosted ServiceNow Quick Start Guide for MariaDB Enterprise Server 10.6 appeared first on MariaDB.org.
]]>The post Self-Hosted ServiceNow Quick Start Guide for MariaDB Enterprise Server 10.6 appeared first on MariaDB.org.
]]>The post Percona Operator for MySQL Now Supports Automated Volume Expansion in Technical Preview appeared first on MariaDB.org.
]]>The post Percona Operator for MySQL Now Supports Automated Volume Expansion in Technical Preview appeared first on MariaDB.org.
]]>The post Efficient Integration of PostgreSQL 16 with LDAP: Best Practices and Tips appeared first on MariaDB.org.
]]>The pg_hba.conf
file is where you configure client authentication in PostgreSQL. To set up LDAP authentication, you will need to add entries to this file specifying ldap
as the authentication method for the desired databases and users.
In your pg_hba.conf
, add an entry like the following to specify LDAP authentication:
host all all 0.0.0.0/0 ldap ldapserver=ldap.example.com ldapport=389 ldapbinddn="cn=admin,dc=example,dc=com" ldapbindpasswd=secret ldapprefix="uid=" ldapsuffix=",dc=example,dc=com"
Adjust the parameters to fit your LDAP server’s configuration:
ldapserver
: The hostname of your LDAP server.ldapport
: The port on which your LDAP server is listening (389 is the default, 636 for LDAPS).ldapbinddn
and ldapbindpasswd
: The distinguished name (DN) and password for binding to the LDAP server. These are required if your LDAP server does not allow anonymous binds.ldapprefix
and ldapsuffix
: Strings that are prepended and appended to the username to form the user’s DN. This depends on your LDAP schema.To ensure that authentication credentials and information are securely transmitted, configure LDAP over SSL (LDAPS) or start TLS:
ldaps://
in your ldapserver
URL and set the port to 636.ldapstarttls=1
to your pg_hba.conf
entry.Make sure your PostgreSQL server trusts your LDAP server’s SSL certificate. You might need to add the LDAP server’s CA certificate to the PostgreSQL server’s trust store.
Before applying the configuration widely, test the LDAP connection with a few database users to ensure that authentication works as expected. Use the psql
command-line tool or another PostgreSQL client to test logging in with LDAP credentials.
If your LDAP directory structure requires it, you can use a custom search filter with the ldapsearchattribute
and ldapsearchfilter
options in pg_hba.conf
:
ldapsearchattribute=uid ldapsearchfilter="(|(memberOf=cn=dbadmins,ou=groups,dc=example,dc=com)(memberOf=cn=developers,ou=groups,dc=example,dc=com))"
This allows more complex queries, like restricting authentication to members of certain groups.
After making changes to pg_hba.conf
, reload the PostgreSQL configuration for the changes to take effect without restarting the database:
pg_ctl reload
Initially, it’s useful to increase logging for connection and authentication issues. Adjust the log_connections
, log_disconnections
, and log_line_prefix
settings in postgresql.conf
to help diagnose any problems.
Integrating PostgreSQL with LDAP is a powerful way to manage database authentication centrally. By following these tips and ensuring secure LDAP connections, you can streamline user management while maintaining high security standards. Always refer to the PostgreSQL documentation for the most current information and best practices.
The post Efficient Integration of PostgreSQL 16 with LDAP: Best Practices and Tips appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Efficient Integration of PostgreSQL 16 with LDAP: Best Practices and Tips appeared first on MariaDB.org.
]]>The post How to define and capture Baselines in PostgreSQL Performance Troubleshooting? appeared first on MariaDB.org.
]]>First, identify which metrics are crucial for understanding the health and performance of your PostgreSQL database. These typically include:
PostgreSQL provides several views that can be queried to collect baseline data:
pg_stat_statements
extension to be enabled).Choose a period of normal operation that represents typical usage patterns of your database. This might be a few hours, days, or even weeks, depending on the variability of your workload.
Collect data on the key performance metrics identified earlier. This can be done manually by running queries against the relevant PostgreSQL views at regular intervals, or automatically using monitoring tools.
Aggregate the collected data to produce summary statistics for each metric. Analyze this data to establish average and peak values, identify patterns, and understand the normal range of variability for each metric.
Create a report or dashboard summarizing the baseline data. This should include not just the raw metrics, but also any insights or patterns observed during the baseline period. This documentation will be your reference point for future performance troubleshooting.
Set up continuous monitoring to track the key performance metrics against the established baseline. Many tools and extensions can help with this, including:
pg_stat_statements
for query analysis, and extensions that facilitate integration with external monitoring solutions.As your database workload evolves, periodically review and update your performance baselines to ensure they remain relevant. Significant changes in application behavior, data volume, or infrastructure might necessitate a new baseline.
Capturing performance baselines is a proactive step in database administration that enables you to quickly identify deviations from normal performance, making it easier to diagnose and resolve issues. By understanding the normal operational profile of your PostgreSQL database, you can more effectively troubleshoot performance issues, plan for capacity, and ensure optimal performance over time.
The post How to define and capture Baselines in PostgreSQL Performance Troubleshooting? appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post How to define and capture Baselines in PostgreSQL Performance Troubleshooting? appeared first on MariaDB.org.
]]>The post Exploring Alternatives to SQL Server Query Store in PostgreSQL appeared first on MariaDB.org.
]]>pg_stat_statements
extension with additional logging and monitoring tools. While PostgreSQL does not have a built-in feature identical to Query Store, pg_stat_statements
and other tools can provide deep insights into query performance and help with performance tuning and troubleshooting.
The pg_stat_statements
module is included with PostgreSQL and provides a means to track execution statistics of all SQL statements executed by the server, not just queries. This includes the number of times a statement was executed, the total time spent in the database for those executions, and more.
To use pg_stat_statements
, you need to:
CREATE EXTENSION pg_stat_statements;
shared_preload_libraries
in your postgresql.conf
file to ensure it’s loaded at server start:
shared_preload_libraries = 'pg_stat_statements'
After changing the configuration, you’ll need to restart your PostgreSQL server.
pg_stat_statements
view to analyze query performance:
SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent FROM pg_stat_statements ORDER BY total_time DESC;
For a more comprehensive solution akin to SQL Server’s Query Store, consider integrating pg_stat_statements
with other PostgreSQL features and third-party tools:
pg_qualstats
, pg_stat_kcache
, and auto_explain
for deeper insights into query execution and performance issues. External monitoring tools like Prometheus with Grafana, or commercial platforms like pganalyze, provide powerful interfaces for visualizing and analyzing PostgreSQL performance data over time.pg_stat_statements
with other PostgreSQL statistics.While PostgreSQL’s approach requires a bit more setup and integration work compared to SQL Server’s Query Store, it offers flexibility and powerful options for monitoring query performance and planning optimizations. The key is to leverage the pg_stat_statements
extension as the foundation of your query performance analysis strategy and integrate it with other tools and practices for a comprehensive solution.
The post Exploring Alternatives to SQL Server Query Store in PostgreSQL appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Exploring Alternatives to SQL Server Query Store in PostgreSQL appeared first on MariaDB.org.
]]>The post Understanding the Internal Locking Hierarchy and Mechanisms in PostgreSQL appeared first on MariaDB.org.
]]>DROP DATABASE
.ACCESS SHARE
(used for SELECT
operations), ROW SHARE
(used for UPDATE
, DELETE
), ACCESS EXCLUSIVE
(used for operations like DROP TABLE
, TRUNCATE
, which block all other operations), and several others. Each lock mode determines the compatibility with other lock modes.xmin
and xmax
system columns for each row (used for MVCC), and explicit row locks (SELECT FOR UPDATE
, for example). Row-level locks offer the finest granularity, allowing high concurrency.PostgreSQL defines various lock modes, which determine whether different operations are compatible with each other. For example, multiple transactions can hold SHARE UPDATE EXCLUSIVE
locks on a table simultaneously, but an ACCESS EXCLUSIVE
lock is incompatible with any other lock, effectively serializing access to the resource.
PostgreSQL automatically detects deadlocks, situations where two or more transactions are waiting for each other to release locks. When detected, PostgreSQL will abort one of the transactions to break the cycle, allowing the other transactions to proceed.
Internally, PostgreSQL uses a lock table to manage most types of locks. The lock table maps lockable objects to the list of transactions holding or waiting for locks on them. For row-level locks, PostgreSQL uses a combination of predicate locks for Serializable transactions and lightweight locks or flags directly on rows for other isolation levels, minimizing overhead and maximizing performance.
PostgreSQL administrators can view lock information using system views like pg_locks
, pg_class
, and pg_stat_activity
. These views can be queried to analyze current locks, which sessions are holding them, and potential locking issues.
Understanding the locking hierarchy and behavior in PostgreSQL is essential for database administration, performance tuning, and application development. Proper use of locks ensures data integrity and consistency while maximizing concurrency. Administrators and developers should design database operations with an understanding of lock compatibility and granularity to avoid unnecessary locking and potential performance issues.
The post Understanding the Internal Locking Hierarchy and Mechanisms in PostgreSQL appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Understanding the Internal Locking Hierarchy and Mechanisms in PostgreSQL appeared first on MariaDB.org.
]]>The post Why do We Need Databases and SQL? appeared first on MariaDB.org.
]]>The post Why do We Need Databases and SQL? appeared first on MariaDB.org.
]]>The post Help Us Improve MySQL Usability and Double Win! appeared first on MariaDB.org.
]]>The post Help Us Improve MySQL Usability and Double Win! appeared first on MariaDB.org.
]]>The post Release Roundup March 4, 2024 appeared first on MariaDB.org.
]]>The post Release Roundup March 4, 2024 appeared first on MariaDB.org.
]]>The post Using Linux perf: do we need to pass identifying info as arguments to important functions? appeared first on MariaDB.org.
]]>perf
to collect info about where the statement is spending its time.perf
allows to record variables but for some reason it doesn’t allow to record this->member_var
.this->name
as arguments to “important” functions like sp_head::execute
(run a stored routine). Should we?Longer:
Sometimes one has to analyze statement execution at a finer detail than ANALYZE FORMAT=JSON has. I was involved in such case recently: an UPDATE statement invoked a trigger which ran multiple SQL statements and invoked two stored functions. ANALYZE FORMAT=JSON showed that the top-level UPDATE statement didn’t have any issues. The issue was inside the trigger but where exactly?
We used Linux perf tool. In MariaDB (and MySQL) stored routine is executed by sp_head::execute()
. One can track it like so:
# Add the probe:
perf probe -x `which mariadbd` --add _ZN7sp_head7executeEP3THDb
perf probe -x `which mariadbd` --add _ZN7sp_head7executeEP3THDb%return
# Collect a list of probes
PROBES=`perf probe -l 'probe_mariadb*' | awk '{ printf " -e %s", $1 } '`;
# Now, PROBES has " -e probe_mariadbd:_ZN7sp_head7executeEP3THDb
# -e probe_mariadbd:_ZN7sp_head7executeEP3THDb__return"
then you can note your session’s thread id (TODO: does this work when using a thread pool?) :select tid from information_schema.processlist where id=connection_id();
then have pref record the profile and run your query:
perf record $PROBES -t $THREAD_ID
^C
perf script
mariadbd 1625339 [005] 874721.854399: probe_mariadbd:_ZN7sp_head7executeEP3THDb: (55942da55c3a)
mariadbd 1625339 [005] 874722.855064: probe_mariadbd:_ZN7sp_head7executeEP3THDb__return: (55942da55c3a <- 55942da586eb)
mariadbd 1625339 [005] 874722.855102: probe_mariadbd:_ZN7sp_head7executeEP3THDb: (55942da55c3a)
mariadbd 1625339 [005] 874724.855253: probe_mariadbd:_ZN7sp_head7executeEP3THDb__return: (55942da55c3a <- 55942da586eb)
Column #4 is time in seconds. This is nice but it’s not possible to tell which SP is which.
perf allows access to local variables:
perf probe -x `which mariadbd` -V _ZN7sp_head7executeEP3THDb%return
Available variables at _ZN7sp_head7executeEP3THDb%return
@<execute+0>
CSET_STRING old_query
Diagnostics_area* da
...
sp_head* this
...
but when I’ve tried to record this->m_name.str
, that failed:
perf probe -x `pwd`/mariadbd --add '_ZN7sp_head7executeEP3THDb this->m_name.str'
Probe on address 0x880c3a to force probing at the function entry.
this is not a data structure nor a union.
Error: Failed to add events.
What if the name of the stored function was passed as function argument? I edited MariaDB’s source code and added it (full diff):
--- a/sql/sp_head.cc
+++ b/sql/sp_head.cc
@@ -1192,7 +1192,7 @@
*/
bool
-sp_head::execute(THD *thd, bool merge_da_on_success)
+sp_head::execute(THD *thd, bool merge_da_on_success, const char *sp_name)
{
DBUG_ENTER("sp_head::execute");
char saved_cur_db_name_buf[SAFE_NAME_LEN+1];
Compiled, restarted the server and I was able to add a perf probe that records the name:
perf probe -x `pwd`/mariadbd
--add '_ZN7sp_head7executeEP3THDbPKc sp_name:string'
perf record
^C
perf script
This produced (line breaks added by me):
mariadbd 1627629 [003] 877069.642164: probe_mariadbd:_ZN7sp_head7executeEP3THDbPKc: (564e917f1c67)
sp_name_string="test.func1"
mariadbd 1627629 [003] 877070.642395: probe_mariadbd:_ZN7sp_head7executeEP3THDbPKc: (564e917f1c67)
sp_name_string="test.func2"
sp_name_string
shows which stored function was invoked.
The question: Should we now go now and add function arguments:
This would also help with crashing bugs – the stack trace would be informative. Currently the crash reports get benefit from dispatch_command() function, a random example from MDEV-22262:
#14 0x0000562e90545488 in dispatch_command ( ...
packet=packet@entry=0x7efdf0007a19 "UPDATE t1 PARTITION (p1) SET a=3 WHERE a=8" ... )
but if the crash happened when running a Prepared Statement, one is out of luck.
The post Using Linux perf: do we need to pass identifying info as arguments to important functions? appeared first on MariaDB.org.
]]>The post Make SHOW as good as SELECT appeared first on MariaDB.org.
]]>SHOW AUTHORS GROUP BY `Location` INTO OUTFILE 'tmp.txt';
You’re thinking “Hold it, MySQL and MariaDB won’t allow SHOW (and similar statements like ANALYZE or CHECK or CHECKSUM or DESCRIBE or EXPLAIN or HELP) to work with the same clauses as SELECT, or in the same places.” You’re right — but they work anyway. “Eppur si muove”, as Galileo maybe didn’t say.
I’ll explain that the Ocelot GUI client transforms the queries so that this is transparent, that is, the user types such things where SELECTs would work, and gets result sets the same way that SELECT would do them.
I’ll call these statements “semiselects” because they do what a SELECT does — they produce result sets — but they can’t be used where SELECT can be used — no subqueries, no GROUP BY or ORDER BY or INTO clauses, no way to way to choose particular columns and use them in expressions.
There are three workarounds …
You can select from a system table, such as sys or information_schema or performance_schema if available and if you have the privileges and if their information corresponds to what the semiselect produces.
For the semiselects that allow WHERE clauses, you can use the bizarre “:=” assignment operator, such as
SHOW COLUMNS IN table_name WHERE (@field:=`Field`) > '';
and now @field will have one of the field values.
You can get the result set into a log file or copy-paste it, then write or acquire a program that parses, for example by extracting what’s between |s in a typical ASCII-decorated display.
Those three workarounds can be good solutions, I’m not going to quibble about their merits. I’m just going to present a method that’s not a workaround at all. You just put the semiselect where you’d ordinarily put a SELECT. It involves no extra privileges or globals or file IO.
CHECK TABLE c1, m WHERE `Msg_text` <> 'OK'; SELECT * FROM (DESCRIBE information_schema.tables) AS x ORDER BY 1; SHOW COLLATION ORDER BY `Id` INTO OUTFILE 'tmp.txt'; SELECT `Type` FROM (SHOW COLUMNS IN Employees) AS x GROUP BY `Type`; SELECT UPPER(`Name`) from (SHOW Contributors) as x; SHOW ENGINES ORDER BY `Engine`; (SELECT `Name` FROM (SHOW CONTRIBUTORS) AS x UNION ALL SELECT `Name` FROM (SHOW AUTHORS) AS y) ORDER BY 1; CREATE TABLE engines AS SHOW ENGINES;
The client has to see where the semiselects are within the statement. That is easy, any client that can parse SQL can do it.
The client passes each semiselect to the server, and gets back a result, which ordinarily contains field names and values.
The client changes the field names and values to SELECTs, e.g. for SHOW CONTRIBUTORS the first row is
(SELECT 'Alibaba Cloud' AS `Name`, 'https://www.alibabacloud.com' AS `Location`, 'Platinum Sponsor of the MariaDB Foundation' AS `Comment")
and that gets UNION ALLed with the second row, and so on.
The client passes this SELECT to the server, and gets back a result as a select result set.
Or, in summary, what the client must do is: Pass the SHOW to the server, intercept the result, convert to a tabular form, send or SELECT … UNION ALL SELECT …; to the server, display.
However, these steps are all hidden. the user doesn’t have to care how it works.
It requires two trips to the server instead of one. The client log will only show the semiselect, but the server sees the SELECT UNION too.
It will not work inside routines. You will have to CREATE TEMPORARY TABLE AS semiselect; before invoking a routine, in order to use the semiselect’s result set inside CREATE FUNCTION | PROCEDURE | TRIGGER.
Speaking of CREATE TEMPORARY TABLE AS semiselect, if there are VARCHAR columns, they will only be as big as the largest item in the result set.
It will not work inside CREATE VIEW.
Sometimes it will not work with nesting, that is semiselects within semiselects might not be allowed.
Some rare situations will expose the SELECT result in very long column names.
On Linux this is easy — download libraries that ocelotgui needs, download ocelotgui, cmake, make. (On Windows it’s not as easy, sorry.) The source, and the README instructions for building, are on github.
After you’ve started up ocelotgui and connected to a MySQL or MariaDB server, there is one preparatory step: you have to enable the feature. (It’s not default because these aren’t standard SQL statements.) You can do this by going to the Settings|Statement menu and changing the Syntax Checker value to 7 and clicking OK. Or you can enter the statement
SET OCELOT_STATEMENT_SYNTAX_CHECKER = '7';
Now the feature is enabled and you can try all the examples I’ve given. You’ll see that they all work.
Of course it’s made available this way because the status is beta.
This will be available in executable form in the next release of ocelotgui, real soon now. If you have a github account, you can go to the github page and click Watch to keep track of updates.
The post Make SHOW as good as SELECT appeared first on MariaDB.org.
]]>The post Implement advanced replication features with Amazon RDS for MySQL and Amazon Aurora MySQL using intermediate replication servers appeared first on MariaDB.org.
]]>We discuss two replication capabilities in Amazon RDS and Amazon Aurora: multi-source replication and replication filtering. Multi-source replication is supported only in Amazon RDS for MySQL (8.0.35 and higher minor version and 5.7.44 and higher minor versions) but at the time of writing this post, it’s not supported for Aurora. We then implement those capabilities using an intermediate MySQL replication instance (a relay server) running in Amazon Elastic Compute Cloud (Amazon EC2).
A MySQL replication topology begins with a primary database server receiving write traffic and recording equivalent replication events in the binary log (or binlog). The binary log events describe all changes that happen on the primary server. Replicas connect to the primary server, download the binary logs, and apply the events locally in order to synchronize themselves with the primary.
In the most common scenario, the primary server records all changes, and each replica receives and applies all changes from a single primary. This is sufficient in most scenarios, but advanced use cases may require a tailored approach where the replicas aren’t simply one-to-one mirrors of the primary database.
Multi-source replication enables data from multiple MySQL-compatible sources to replicate to a single target (replica). Multi-source replication can facilitate a variety of use cases, including the following:
In our solution we are using an intermediate replication instance, which reads the binary log streams from multiple sources and produces a single replication stream that can be consumed by Amazon Aurora MySQL or Amazon RDS for MySQL.
The following diagram shows multiple MySQL databases instances being used as a source and replicating to another MySQL instance using an intermediate instance.
Replication filtering allows database administrators to exclude schemas or tables from replication. The configuration is flexible: you can list objects that should be replicated (and ignore everything else), or list objects that should be ignored (and replicate everything else). You can also choose to apply the filtering rules on the source or on the target.
Replication filtering can be helpful in the following situations:
Replication filtering is partially supported in Amazon Aurora MySQL and Amazon RDS for MySQL. Consult the user guides for Amazon RDS and Amazon Aurora for details. At the time of writing, feature limitations include the following:
--binlog-do-db
and --binlog-ignore-db
parameters aren’t supportedmysql
schema can’t be ignoredThe solution outlined in this post allows you to bypass these limitations.
The BLACKHOLE storage engine is an alternative way of implementing replication filtering at a table level. In this approach, a table converted to the BLACKHOLE
engine ignores all writes and doesn’t contain any data. The binary log format (STATEMENT
or ROW
) determines whether or not replication records are written to the binary log, and therefore whether the intermediate replication instance sends these records further down the replication stream. This post includes a BLACKHOLE
example for completeness. However, due to the relative complexity of BLACKHOLE
behavior, we recommend using the replication filtering parameters instead where possible.
The following diagram shows a MySQL database instance being used as a source and replicating to another MySQL instance using an intermediate instance that is using the BLACKHOLE
engine.
You can use the aforementioned replication features individually, or you can run a combination of filtering, multi-source replication, and BLACKHOLE
tables on the same intermediate replication instance. The intermediate instance acts as a processing layer that replicates from binary log sources, applies the desired operations (multi-source aggregation, filtering), and generates new binary logs of its own. This new binary logs stream can then be consumed by Amazon Aurora MySQL or Amazon RDS for MySQL.
You need the following components to implement this solution:
Note that for replication to work reliably throughout the entire chain, all MySQL servers must be replication-compatible in terms of their versions and configuration. For example, MySQL supports replication to the next higher major version (for example, 5.7 to 8.0), but doesn’t officially support replication from a higher to a lower major version. Similarly, the gtid_mode configuration must be compatible across all servers.
In this post, the test databases start out empty and aren’t receiving any write traffic until after replication is configured. Consequently, there’s no need to synchronize binary log positions between the servers, and we can use the current positions without running into replication conflicts. In real-world migration scenarios where the intermediate and target servers are provisioned from physical backups or logical dumps, you must ensure the binary log positions are correct in the context of those backups and dumps.
The examples provided in this post were tested using MySQL 8.0 and Amazon Aurora MySQL version 3 (compatible with MySQL 8.0).
MySQL servers acting as a replication source must meet the following configuration requirements:
REPLICATION SLAVE
permissions, which will be used to accept replication connections from the intermediate server.The exact steps will vary depending on whether you’re working with a managed MySQL service or a self-managed database.
If you don’t have existing MySQL servers you could use to test this solution, you can provision one or more RDS for MySQL instances by completing the following steps:
REPLICATION SLAVE
permissions:
Complete the following steps to provision the intermediate MySQL instance:
At this time, you can make additional MySQL configuration changes according to your requirements. At a minimum, verify the following prerequisites:
ROW
format (binlog_format setting). This should be the default in MySQL 8.0.The examples in this post don’t require advanced MySQL tuning, but production use cases might require adjustments to the instance sizing as well as MySQL buffers, caches, and other configuration values that influence the performance and behavior of the server.
The target preparation steps depend on the nature of your project. Migration projects might create the target database from a backup; others might start with an empty database to be filled with data after creation.
If you don’t already have a target database to use with this solution, you can create a new Aurora MySQL cluster. Make sure to use a major engine version that’s compatible with the MySQL version of the intermediate EC2 instance. For example, if using MySQL 8.0 on the EC2 instance, use Amazon Aurora MySQL version 3.
We now configure multi-source replication between the two RDS for MySQL source instances and the intermediate MySQL server we created in the preceding section.
Multi-source replication uses the concept of replication channels, with each channel connecting to a different binary log source. Note that MySQL doesn’t perform automatic conflict resolution for changes coming from multiple sources, which means that the changes must be non-conflicting for the replication to work.
Use the CHANGE REPLICATION SOURCE TO statement to configure each channel. In MySQL versions before 8.0.23, the equivalent command is CHANGE MASTER TO.
This example involves two RDS for MySQL source instances, so the setup requires the creation of two replication channels.
To configure multi-source replication, complete the following steps:
source-mysql-instance-1
, use the following code:
source-mysql-instance-2
, use the following code:
source-mysql-instance-1
:
source-mysql-instance-2
:
As the final step in our setup, we configure replication from the intermediate MySQL server to the Amazon Aurora MySQL target:
SHOW REPLICA STATUS
command as demonstrated previously:
At this point, binary log replication is up and running between all three database layers: the Amazon RDS for MySQL sources, the intermediate MySQL server in Amazon EC2, and the Amazon Aurora MySQL target. Let’s proceed with the demonstration of replication features.
To demonstrate multi-source replication, we create a schema with a couple of tables on each of the RDS for MySQL source instances. The objects are replicated to the intermediate MySQL server in Amazon EC2, so that the server sees both schemas, each replicated from a different source. Complete the following steps:
source-mysql-instance-1
:
source-mysql-instance-2
. Make sure to use a different schema name, so that it doesn’t conflict with the schema we created in the previous step:
Building upon our existing replication setup, we now introduce replication filtering to ignore certain tables on the intermediate MySQL server. Let’s say that one of the source instances (source-mysql-instance-1
) runs regular data archiving jobs on tables called demo_source1_db.archive_*
. We still want those tables to be binary logged for other reasons (like backup and restore), but we don’t need them in our Amazon Aurora MySQL target.
We use our intermediate MySQL server to filter those tables out, so that the Aurora MySQL cluster never has to process them. Complete the following steps:
/etc/my.cnf
) and add the following setting, then restart the MySQL service:
source-mysql-instance-1
and create a table that’s matched by the replication filtering rule:
The same is true on the Aurora MySQL cluster. Although we didn’t configure any replication filtering rules on the Aurora side, the events for the filtered table were already ignored on the intermediate database, and they never made it to the Aurora cluster.
There are several replication filtering parameters that you can use according to your requirements. Filtering settings can list objects that should be replicated (and ignore everything else), or list objects that should be ignored (and replicate everything else).
Note that if you configure multiple filtering settings, there’s a specific order in which the server evaluates them. This can sometimes lead to unexpected results, such as when the same table names are listed in both the do and the ignore parameters. Refer to How Servers Evaluate Replication Filtering Rule for details.
BLACKHOLE
storage engineAs the final step in our demonstration, let’s take one of the tables that have already been replicated and convert that table to BLACKHOLE
storage engine. We then observe how it affects replication on that table.
source-mysql-instance-1
and insert a few rows into one of the tables:
BLACKHOLE
engine on the intermediate MySQL server. Note that we want to modify the table on the intermediate server, but we want the table to stay on InnoDB
in the Aurora cluster. To achieve that, we’re using the session-level sql_log_bin variable to temporarily disable binary logging while we’re altering the table:
At this point, the table is a regular InnoDB table on the Amazon RDS for MySQL source and in Amazon Aurora MySQL, but it’s a BLACKHOLE
table on the intermediate server. Let’s see how that affects replication.
source-mysql-instance-1
, insert a couple more rows into the table:
BLACKHOLE
engine:
source-mysql-instance-1
and delete all the rows from the table:
Out of all the changes initially made on the source, the inserted rows made it to the target, but the deletes did not. This is due to the following reasons:
InnoDB
, so it accepted and logged the inserts normally.BLACKHOLE
tables are always logged, regardless of the binary log format. Because the inserts were logged, Aurora received and replicated them.ROW
, so it didn’t record those deletes in its binary log. That’s because the BLACKHOLE
engine treats updates and deletes differently than inserts. Refer to Replication and BLACKHOLE Tables for details.This demonstration shows why BLACKHOLE
tables might be seen as unpredictable in complex replication setups. For that reason, we recommend using replication filtering instead of the BLACKHOLE
engine where possible. Nevertheless, BLACKHOLE
tables are an interesting concept and might be useful in scenarios that can take advantage of their unique characteristics.
In this post, we demonstrated how you can use advanced replication features by inserting an intermediate replication component between two MySQL servers. This technique can be very useful in situations when the source or target servers are constrained in their features or configuration options, or when you want to perform data transformations such as schema aggregation or data filtering without having to modify the source databases directly.
We hope you find this post helpful, please let us know your thoughts and questions in the comment section.
Shyam Sunder Rakhecha is a Lead Consultant with the Professional Services team at AWS based out of Hyderabad, India, and specializes in database migrations and modernization. He helps customers in migration and optimization in the AWS Cloud. He is curious to explore emerging technology in terms of databases. He is fascinated with RDBMS and big data. He also loves to organize team building events and activities.
Neha Sharma is a Database Consultant with Amazon Web Services. With over a decade of experience in working with databases, she enables AWS customers to migrate their databases to AWS Cloud. Besides work, she likes to be actively involved in various sports activities and likes to socialize with people.
Szymon Komendera is a Database Solutions Architect at AWS, with nearly 20 years of experience in databases, software development, and application availability. He spent the majority of his 8-year AWS tenure developing Aurora MySQL, and supporting other AWS databases such as Amazon Redshift and Amazon ElastiCache.
The post Implement advanced replication features with Amazon RDS for MySQL and Amazon Aurora MySQL using intermediate replication servers appeared first on MariaDB.org.
]]>The post Notable optimizer fixes released in February, 2024 appeared first on MariaDB.org.
]]>This is a follow-up to MDEV-32203 I’ve covered for the previous release: MariaDB now emits a warning for conditions in form indexed_column CMP_OP const
that are not unusable for the optimizer. The most common case where they are not usable is varchar_column=INTEGER_CONSTANT
but there are less obvious cases as well, like mismatched collations.
The original patch failed to produce the warning in some some cases. Now, this is fixed.
Writing varchar_col=INTEGER_CONSTANT
looks like a newbie mistake, but it is not. I’ve encountered several such cases in the last couple of months alone. They were in fairly complex and well-written queries. MDEV-32203 was a good idea.
This is added to address poor join query plans. Consider a query plan using ref access:
select * from t1, t2 where t2.key1=t1.col1 and t2.key2='foo'
+------+-------------+-------+------+---------------+------+---------+---------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+---------+------+-------------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where |
| 1 | SIMPLE | t2 | ref | key1 | key1 | 5 | t1.col1 | 200 | Using where |
+------+-------------+-------+------+---------------+------+---------+---------+------+-------------+
When computing cost of reading table t2 by doing index lookups using t2.key1=t1.col1, MariaDB tried to take into account that some of the reads would hit the cache. Basically, the total cost of all lookups was capped by “worst_seeks” value which was a function of how much we would read from table t2 if we read its matching rows “independently” of table t1.
However this cap didn’t apply for all possible ref accesses. ref accesses that have a constant key part (like t2.key2='foo'
in this example) “borrowed” #rows and cost estimate from the range optimizer, and that number was not capped.
This resulted in very poor query plan choices in some scenarios. The visible effect was that the optimizer picked an obviously bad ref access plan when a better option was clearly present.
Another related issue was the relative costs of clustered index scans and secondary index scans. Secondary index scans cost was too low.
Both of these issues are fixed in The Big Cost Model Rewrite in MariaDB 11.0. But if one can’t upgrade to 11.0 yet, they can get these fixes in MariaDB 10.6+ by setting optimizer_adjust_secondary_key_costs accordingly.
MariaDB (and MySQL) has two datatypes for storing points in time: TIMESTAMP and DATETIME.
DATETIME is “YYYY-MM-DD HH:MM:SS” value, without specifying which time zone it is in.
TIMESTAMP is a point in time. It is “the number of [micro]seconds since midnight January 1st, 1970 GMT”. When you read a TIMESTAMP column, it is converted to ‘YYYY-MM-DD …’ datetime in your local @@session.time_zone.
Consider Query-1 which compares a timestamp column with a datetime literal:
SELECT ... FROM tbl WHERE timestamp_column <= 'YYYY-MM-DD ...'
Here, for each row considered, MariaDB would convert the value of timestamp_column
into a DATETIME structure consisting of Year, Month, Date, …. and then compare it with the DATETIME structure representing ‘YYYY-MM-DD…’. But if TIMESTAMPs are just integers, why not compare as TIMESTAMPs instead?
This is surprisingly complex. First, DATETIME values have a wider range: they span from year 0000 to 9999 while TIMESTAMP covers only 1970 to 2038. Second, DST time changes mean that some DATETIME values map to two possible TIMESTAMP points-in-time: one before the clock is moved backwards, and one after. When the clock is moved forward, there are DATETIME values that do not map to any TIMESTAMP.
The patch for MDEV-32148 carefully takes all these limitations into account and makes queries like Query-1 use TIMESTAMP comparisons whenever it’s safe.
The post Notable optimizer fixes released in February, 2024 appeared first on MariaDB.org.
]]>The new MyEnv can be downloaded here. How to install MyEnv is described in the MyEnv Installation Guide.
In the inconceivable case that you find a bug in the MyEnv please report it to the FromDual bug tracker.
Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.
Upgrade from 1.1.x to 2.0
Please look at the MyEnv 2.0.0 Release Notes.
Upgrade from 2.0.x to 2.1.0
shell > cd ${HOME}/product
shell > tar xf /download/myenv-2.1.0.tar.gz
shell > rm -f myenv
shell > ln -s myenv-2.1.0 myenv
Plug-ins
If you are using plug-ins for showMyEnvStatus create all the links in the new directory structure:
shell > cd ${HOME}/product/myenv
shell > ln -s ../../utl/oem_agent.php plg/showMyEnvStatus/
Upgrade of the instance directory structure
From MyEnv 1.0 to 2.0 the directory structure of instances has fundamentally changed. Nevertheless MyEnv 2.0 works fine with MyEnv 1.0 directory structures.
Changes in MyEnv 2.1.0
MyEnv
Removed hard coded parts for running MyEnv under O/S user mariadb.
Function substitute_path was refactored.
Branch guessing improved.
Warnings and errors are in color now.
MyEnv log file is now touched to avoid problems with O/S user root.
O/S user mysql removed in start/stop script.
Checks for DB start improved.
/var/run replaced by the more modern location /run.
Should now be completely MariaDB compatible (mariadbd vs. mysqld).
Wrapper mysqld_safe was extended to mariadbd-safe.
Replaced getVersionFromMysqld by getVersionAndBranchFromDaemon and extended functionality of this function.
LD_LIBRARY_PATH was set to the wrong directory.
Reverting Commit: fcc93c5 from v2.0.3 related to CDPATH. Break commands like cd log or cd etc.
Database mysql_innodb_cluster_metadata is hidden now.
Database #innodb_redo is suppressed now as well for MySQL 8.0, and hideschema is not added to every new instance any more to not overwrite the default.
Bug while stopping instance with missing my.cnf fixed.
Function getDistribution cleaned-up.
MySQL should now also be detected correctly from Ubuntu repository.
Function my_exec rewritten.
Debian GNU/Linux tag added for distros.
Function extractBranch made better to work on Debian and Ubuntu with distribution packages.
Oracle Linux is considered as well now.
Made scripts ready for new MariaDB behaviour.
my.cnf template adapted to newest knowledge.
Directory changed from /tmp to /var/tmp, code cleaned-up and renewal, PID file code and message improved in stopInstance.
Distributions cleaned-up and cloudlinux, rocky linux and almalinux added as centos compatible distros.
MyEnv Installer
Debian 10 and 11 do not support PHP 8.0 yet, fixed.
Unit file is copied now correctly.
MyEnv instance installation is automatizable now.
Instance creation automation added.
my.cnf template together with installMyenv should now work without errors or warnings for MariaDB 10.5 - 11.2 and MySQL 8.0 - 8.3.
Command yum replaced by dnf.
Command apt-get comments replaced by apt.
MyEnv Utilities
Client utility adapted in *monitor scripts.
InnoDB cluster monitor added.
wsrep_last_committed was added in galera_monitor.sh.
AWR added, sharding stuff added, lock and trx analysis scripts added.
Memory analysis added, NUMA maps output made ready for new variables.
connect_maxout utility added.
For subscriptions of commercial use of MyEnv please get in contact with us.
Taxonomy upgrade extras: MyEnvmulti-instancevirtualizationconsolidationSaaSOperationsreleasemysqld_multi
The post Shinguz: MariaDB/MySQL Environment MyEnv 2.1.0 has been released appeared first on MariaDB.org.
]]>FromDual has the pleasure to announce the release of the new version 2.1.0 of its popular MariaDB, Galera Cluster and MySQL multi-instance environment MyEnv.
The new MyEnv can be downloaded here. How to install MyEnv is described in the MyEnv Installation Guide.
In the inconceivable case that you find a bug in the MyEnv please report it to the FromDual bug tracker.
Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.
Please look at the MyEnv 2.0.0 Release Notes.
shell> cd ${HOME}/product shell> tar xf /download/myenv-2.1.0.tar.gz shell> rm -f myenv shell> ln -s myenv-2.1.0 myenv
If you are using plug-ins for showMyEnvStatus
create all the links in the new directory structure:
shell> cd ${HOME}/product/myenv shell> ln -s ../../utl/oem_agent.php plg/showMyEnvStatus/
From MyEnv 1.0 to 2.0 the directory structure of instances has fundamentally changed. Nevertheless MyEnv 2.0 works fine with MyEnv 1.0 directory structures.
mariadb
.substitute_path
was refactored.root
.mysql removed in start/stop script.
/var/run
replaced by the more modern location /run
.mariadbd
vs. mysqld
).mysqld_safe
was extended to mariadbd-safe
.getVersionFromMysqld
by getVersionAndBranchFromDaemon
and extended functionality of this function.LD_LIBRARY_PATH
was set to the wrong directory.CDPATH
. Break commands like cd log
or cd etc
.mysql_innodb_cluster_metadata
is hidden now.#innodb_redo
is suppressed now as well for MySQL 8.0, and hideschema
is not added to every new instance any more to not overwrite the default.my.cnf
fixed.getDistribution
cleaned-up.my_exec
rewritten.extractBranch
made better to work on Debian and Ubuntu with distribution packages.my.cnf
template adapted to newest knowledge./tmp
to /var/tmp
, code cleaned-up and renewal, PID file code and message improved in stopInstance
.my.cnf
template together with installMyenv
should now work without errors or warnings for MariaDB 10.5 – 11.2 and MySQL 8.0 – 8.3.yum
replaced by dnf
.apt-get
comments replaced by apt
.*monitor
scripts.wsrep_last_committed
was added in galera_monitor.sh
.connect_maxout
utility added.For subscriptions of commercial use of MyEnv please get in contact with us.
The post Shinguz: MariaDB/MySQL Environment MyEnv 2.1.0 has been released appeared first on MariaDB.org.
]]>The new MyEnv can be downloaded here. How to install MyEnv is described in the MyEnv Installation Guide.
In the inconceivable case that you find a bug in the MyEnv please report it to the FromDual bug tracker.
Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.
Upgrade from 1.1.x to 2.0
Please look at the MyEnv 2.0.0 Release Notes.
Upgrade from 2.0.x to 2.1.0
shell > cd ${HOME}/product
shell > tar xf /download/myenv-2.1.0.tar.gz
shell > rm -f myenv
shell > ln -s myenv-2.1.0 myenv
Plug-ins
If you are using plug-ins for showMyEnvStatus create all the links in the new directory structure:
shell > cd ${HOME}/product/myenv
shell > ln -s ../../utl/oem_agent.php plg/showMyEnvStatus/
Upgrade of the instance directory structure
From MyEnv 1.0 to 2.0 the directory structure of instances has fundamentally changed. Nevertheless MyEnv 2.0 works fine with MyEnv 1.0 directory structures.
Changes in MyEnv 2.1.0
MyEnv
Removed hard coded parts for running MyEnv under O/S user mariadb.
Function substitute_path was refactored.
Branch guessing improved.
Warnings and errors are in color now.
MyEnv log file is now touched to avoid problems with O/S user root.
O/S user mysql removed in start/stop script.
Checks for DB start improved.
/var/run replaced by the more modern location /run.
Should now be completely MariaDB compatible (mariadbd vs. mysqld).
Wrapper mysqld_safe was extended to mariadbd-safe.
Replaced getVersionFromMysqld by getVersionAndBranchFromDaemon and extended functionality of this function.
LD_LIBRARY_PATH was set to the wrong directory.
Reverting Commit: fcc93c5 from v2.0.3 related to CDPATH. Break commands like cd log or cd etc.
Database mysql_innodb_cluster_metadata is hidden now.
Database #innodb_redo is suppressed now as well for MySQL 8.0, and hideschema is not added to every new instance any more to not overwrite the default.
Bug while stopping instance with missing my.cnf fixed.
Function getDistribution cleaned-up.
MySQL should now also be detected correctly from Ubuntu repository.
Function my_exec rewritten.
Debian GNU/Linux tag added for distros.
Function extractBranch made better to work on Debian and Ubuntu with distribution packages.
Oracle Linux is considered as well now.
Made scripts ready for new MariaDB behaviour.
my.cnf template adapted to newest knowledge.
Directory changed from /tmp to /var/tmp, code cleaned-up and renewal, PID file code and message improved in stopInstance.
Distributions cleaned-up and cloudlinux, rocky linux and almalinux added as centos compatible distros.
MyEnv Installer
Debian 10 and 11 do not support PHP 8.0 yet, fixed.
Unit file is copied now correctly.
MyEnv instance installation is automatizable now.
Instance creation automation added.
my.cnf template together with installMyenv should now work without errors or warnings for MariaDB 10.5 - 11.2 and MySQL 8.0 - 8.3.
Command yum replaced by dnf.
Command apt-get comments replaced by apt.
MyEnv Utilities
Client utility adapted in *monitor scripts.
InnoDB cluster monitor added.
wsrep_last_committed was added in galera_monitor.sh.
AWR added, sharding stuff added, lock and trx analysis scripts added.
Memory analysis added, NUMA maps output made ready for new variables.
connect_maxout utility added.
For subscriptions of commercial use of MyEnv please get in contact with us.
Taxonomy upgrade extras: MyEnvmulti-instancevirtualizationconsolidationSaaSOperationsreleasemysqld_multi
The post MariaDB/MySQL Environment MyEnv 2.1.0 has been released appeared first on MariaDB.org.
]]>FromDual has the pleasure to announce the release of the new version 2.1.0 of its popular MariaDB, Galera Cluster and MySQL multi-instance environment MyEnv.
The new MyEnv can be downloaded here. How to install MyEnv is described in the MyEnv Installation Guide.
In the inconceivable case that you find a bug in the MyEnv please report it to the FromDual bug tracker.
Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.
Please look at the MyEnv 2.0.0 Release Notes.
shell> cd ${HOME}/product shell> tar xf /download/myenv-2.1.0.tar.gz shell> rm -f myenv shell> ln -s myenv-2.1.0 myenv
If you are using plug-ins for showMyEnvStatus
create all the links in the new directory structure:
shell> cd ${HOME}/product/myenv shell> ln -s ../../utl/oem_agent.php plg/showMyEnvStatus/
From MyEnv 1.0 to 2.0 the directory structure of instances has fundamentally changed. Nevertheless MyEnv 2.0 works fine with MyEnv 1.0 directory structures.
mariadb
.substitute_path
was refactored.root
.mysql removed in start/stop script.
/var/run
replaced by the more modern location /run
.mariadbd
vs. mysqld
).mysqld_safe
was extended to mariadbd-safe
.getVersionFromMysqld
by getVersionAndBranchFromDaemon
and extended functionality of this function.LD_LIBRARY_PATH
was set to the wrong directory.CDPATH
. Break commands like cd log
or cd etc
.mysql_innodb_cluster_metadata
is hidden now.#innodb_redo
is suppressed now as well for MySQL 8.0, and hideschema
is not added to every new instance any more to not overwrite the default.my.cnf
fixed.getDistribution
cleaned-up.my_exec
rewritten.extractBranch
made better to work on Debian and Ubuntu with distribution packages.my.cnf
template adapted to newest knowledge./tmp
to /var/tmp
, code cleaned-up and renewal, PID file code and message improved in stopInstance
.my.cnf
template together with installMyenv
should now work without errors or warnings for MariaDB 10.5 – 11.2 and MySQL 8.0 – 8.3.yum
replaced by dnf
.apt-get
comments replaced by apt
.*monitor
scripts.wsrep_last_committed
was added in galera_monitor.sh
.connect_maxout
utility added.For subscriptions of commercial use of MyEnv please get in contact with us.
The post MariaDB/MySQL Environment MyEnv 2.1.0 has been released appeared first on MariaDB.org.
]]>The post The most noteworthy improvements in MariaDB 10.11 appeared first on MariaDB.org.
]]>The PUBLIC
pseudo-role allows the following SQL statements to be implemented:
GRANT TO PUBLIC
– Grants some privileges to all users.REVOKE FROM PUBLIC
– Revokes privileges from PUBLIC
. However, roles and users that have explicitly been granted those privileges will retain them.SHOW GRANT FOR PUBLIC
– Shows the GRANT
statement that can be used to restore PUBLIC
permissions.A typical use case for PUBLIC
would be an instance where you want all users to have UPDATE
privileges on a certain table.
Usually in replicas, we set the read_only=1
to make MariaDB instances read-only to ensure that no accidental writes are done on replicas. However, there are instances where a DBA might need to change data on replicas due to inconsistencies with the primary. In previous MariaDB versions, the SUPER
or READ-ONLY ADMIN
privileges were required to write data into a replica. Now, to write on read-only instances, the READ-ONLY ADMIN
privilege is necessary and the SUPER
privilege is not allowed to.
MariaDB’s unix_socket plugin in Linux systems allows the binding of one system user to MariaDB users. This allows for passwordless authentication for a system user to the MariaDB instance, eliminating the need for double authentication (MariaDB and system). With MariaDB 10.11, the GSSAPI authentication plugin is included in the server — this allows for the same passwordless local authentication in Windows.
Some InnoDB performance enhancements were also introduced via easier configuration tuning.
innodb_undo_tablespaces
determines the number of InnoDB undo logs. In previous MariaDB versions, the default value was 0, which meant the undo log was written into the system tablespace. However, with MariaDB 10.11, the default value is 3. This means that there are three undo logs, and each is written in the defined innodb_undo_directory
. The most significant aspect is that in previous versions the innodb_undo_tablespaces
value could not be changed after database creation and one wouldn’t try to find the optimal value..
The InnoDB background IO threads are represented by the innodb_write_io_threads
and innodb_read_io_threads
variables. In the previous versions, these variables were not dynamic and changes required a MariaDB restart. In MariaDB 10.11 these variables are now dynamic and do not require a restart.
INSERTS
.innodb_change_buffering
is now deprecated and ignored.innodb_log_file_size
is now dynamic.innodb_buffer_pool_chunk_size
is now allocated dynamically.FULLTEXT
searches can now find words with apostrophes, like O’Connor.MariaDB introduced temporal tables in 2018. MariaDB 10.11 includes improvements on a temporal table called system-versioned tables. These tables use row versioning which is controlled by MariaDB automatically. These tables preserve past data that allows running queries on the tables to show how data has evolved or was at a certain point in time.
Previously, system-versioned tables’ history could not be modified until now thus it was impossible to take mariadb-dump and restore it. The following changes have been made:
system_versioning_insert_history
variable was added. It is set to OFF by default. But it’s dynamic and, if enabled, allows the insertion of past versions rows, with specified timestamps. Without this option, we could take a dump but not restore it.DELETE
s and UPDATE
s with semi-joins are now properly optimized. Previously, it was recommended to use multi-table syntax with DELETE
and UPDATE
. These are some of the improvements done to the information_schema
database which contains informational systems tables. In previous versions, queries executed on PARAMETERS
and ROUTINES
run with a full table scan, even if there was a WHERE
clause used that was specific and efficiently used an index — this behavior has been modified.
Previously, queries executed on PARAMETERS
and ROUTINES
loaded all examined procedure codes, which translated to slowness, especially in the case of a full table scan. Now, if the query only returns procedures’ and parameters’ names, the procedure codes are not loaded.
One issue with variable names is that they are not always grouped cleanly. From MariaDB 10.11, we can see all variables that affect the slow query log by running SHOW VARIABLES LIKE ‘slow_log%’;. The slow_log
prefix has been added to the following variables:
min_examined_row_limit
slow_query_log
slow_query_log_file
Long_query_time
To instruct a replica to change a database name we use replicate_rewrite_db. This is often done to replicate a test database. Previously, this was a startup option that could be specified on startup. This implies that the service configuration must be modified which is not a good idea. Services should only be modified to change the start/stop/restart logic. Now, the corresponding variable exists as a dynamic system variable.
The above are some of the more interesting features introduced in MariaDB 10.11. If you want to see the full list of improvements, you can go here.
With ClusterControl, you can easily upgrade to MariaDB 10.11 stress-free. Go to our MariaDB on ClusterControl page to see all of the ops features available to you. Once familiar, see how ClusterControl gives you unprecedented control in administering MariaDB in any environment — try our free 30 day trial, no CC required. In the meantime, stay up to date with all the latest news and best practices for the most popular open-source databases by following us on Twitter and LinkedIn, or subscribing to our monthly newsletter below.
The post The most noteworthy improvements in MariaDB 10.11 appeared first on Severalnines.
The post The most noteworthy improvements in MariaDB 10.11 appeared first on MariaDB.org.
]]>The post Webinar recording: Mastering Galera Cluster, Best Practices and New Features appeared first on MariaDB.org.
]]>What You Will Learn:
* Core Best Practices: Dive into essential practices, from employing primary keys and leveraging InnoDB to deciding if to optimise read/write splits and managing AUTO_INCREMENT settings.
* Advanced Configuration: Uncover advanced techniques for error monitoring, configuring Galera across networks, and fine-tuning the gcache for optimal performance. * Innovative Features: Stay ahead with insights on implementing Non-Blocking Operations for seamless schema changes, coordinating distributed transactions with XA transactions, and securing your GCache through encryption.
* Protocol and Network Enhancements: Discover the latest advancements in handling unstable networks, protocol improvements, and explore new options to elevate your cluster operations.
Have in-depth questions or faced intricate production challenges? This extended Q&A session is your opportunity to seek advice, clarify doubts, and engage directly with Galera Cluster experts.
The post Webinar recording: Mastering Galera Cluster, Best Practices and New Features appeared first on MariaDB.org.
]]>The post Trigger an AWS Lambda function from Amazon RDS for MySQL or Amazon RDS for MariaDB using audit logs and Amazon CloudWatch appeared first on MariaDB.org.
]]>In this post, we show you how to invoke Lambda functions from Amazon Relational Databases Service (Amazon RDS) for MySQL and Amazon RDS for MariaDB using Amazon CloudWatch and messages published in audit logs. The same architecture can also be used with Amazon Aurora MySQL-Compatible Edition.
This solution consists of publishing RDS for MySQL or MariaDB audit logs to a CloudWatch log group, and creating a CloudWatch subscription filter for Lambda to trigger a Lambda function.
The following diagram illustrates the solution architecture and flow.
In this solution, audit logs generated by Amazon RDS for MySQL or Amazon RDS for MariaDB are published to a CloudWatch log group. The CloudWatch subscription filter filters the logs based on a user-defined pattern—when there is a pattern match, the subscription filter triggers the specified Lambda function and sends the log event to the Lambda function.
The logs received are base64 encoded and compressed with gzip format. We show you how to decode the log event to only get the desired payload for the Lambda function.
To deploy the solution, we complete the following steps (corresponding to the numbered components in the architecture diagram):
To follow along with this post, you should have an RDS for MySQL or MariaDB database instance for which you wish to trigger the Lambda function. For this post, our DB instance is named Lambda-trigger-mariadb
. For instructions to create an RDS instance, refer to Create a RDS Instance.
In this step, we create a table with the same name as the Lambda function we need to trigger, with a single column to insert the JSON payload for the Lambda function. Complete the following steps to create your table:
You can capture DML queries running in the RDS instance by enabling audit logs. Unlike error logs, general logs, and slow query logs, there is no direct parameter in the parameter group to enable audit logs. You must use the MariaDB Audit Plugin using an option group to enable auditing in your RDS for MySQL or MariaDB instance. Complete the following steps:
You can enable audits logs in Amazon Aurora MySQL from the DB cluster parameter group by setting the parameter server_audit_logging
to 1
. Refer to Configuring an audit log to capture database activities for Amazon RDS for MySQL and Amazon Aurora with MySQL compatibility for detailed steps.
After you have enabled audit logs on the RDS instance, the logs are listed on the Amazon RDS console in the Logs section on the instance details page, as shown in the following screenshot.
Now we need to publish the logs to CloudWatch. Complete the following steps:
After you publish the audit logs to CloudWatch, the log group is listed on the CloudWatch console with the naming convention aws/rds/instance/<database-name>/audit
.
To create your Lambda function, complete the following steps:
MyLambda
).Note that the invocation payload size for an asynchronous Lambda function is 256 KB. For more details, refer to Lambda quotas.
A CloudWatch subscription filter lets you filter log data coming from a CloudWatch log group based on the terms or pattern you design and send it to Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose, or Lambda. For this post, we create a filter on the audit logs for any INSERT
operation that happens in the table named MyLambda
. You can choose any tables or filter pattern published in the audit logs as needed for your use case.
Complete the following steps to create a CloudWatch subscription filter:
aws/rds/instance/<database-name>/audit
).Note Subscription filter is case-sensitive. It is not possible to add more than one pattern for a given filter; however, you can add more than one subscription filter in a log group.
MyInsertFilter
Lambda-trigger-mariadb
).Note that the test pattern only shows the 50 most recent entries in the logs. Therefore, it’s possible that the complete logs have a matching pattern but the test returns zero matches.
The filter is now listed on the Subscription filters tab in the log group page on the CloudWatch console.
To verify the solution is working, connect to the RDS instance and run the query for which you created the metric filter.
In the following example, we connect to our MariaDB instance using the MySQL CLI client and run an insert query on the MyLambda
table:
We can see the Lambda function is triggered and a message is printed in the logs in the CloudWatch log group. The log group is listed on the CloudWatch console with the naming convention aws/lambda/<lambda-function-name>
.
In place of audit logs, you can use any of the other MySQL or MariaDB logs (such as general logs, slow query logs, or error logs) to trigger the Lambda function using the same architecture illustrated in this post.
Note than when using general or slow query logs, you have to set the log_output
parameter as FILE from the RDS parameter group for both Amazon RDS for MySQL and Amazon RDS for MariaDB to push the logs to CloudWatch.
The example in this post shows how to trigger a Lambda function and get the payload. You can also use the Lambda function as a router function and trigger other Lambda functions from it-simply pass the name of the function to be triggered as the payload in JSON.
If the Lambda function does not trigger, check the following:
When you’re done with the solution, delete the resources you created to avoid ongoing charges.
In this post, we showed how to deploy a solution to trigger a Lambda function from your RDS for MySQL or MariaDB instance.
Try it out today and leave a comment if you have any questions or suggestions.
Asad Aazam is a Solutions Architect at AWS with expertise in AWS Security services and AWS Database technologies such as Amazon Aurora, Amazon RDS, and Amazon DynamoDB. He helps homogeneous and heterogeneous database migrations and optimizations in the AWS Cloud. He currently holds 11 of 12 AWS Certificates. When not working, he likes to go on bike rides, travel, and enjoy the beauty of nature.
The post Trigger an AWS Lambda function from Amazon RDS for MySQL or Amazon RDS for MariaDB using audit logs and Amazon CloudWatch appeared first on MariaDB.org.
]]>The post MariaDB Encryption ( data at rest ) appeared first on MariaDB.org.
]]>You have to consider what you want to encrypt . The data communication (data in transit) or the data on the instance (data at rest).
This post is going to focus on the data at rest option using a AWS free tier node running on Amazon Linux. I will be using the world database on 2 different instances to show updating current tables with encryption as well as new loading tables to be auto-encrypted.
1st we will start with installs.. quick and simple just for this demo.
We will load the world db into server_id 100 instance.
Now we can see that currently, both instances are not using encryption.
Now across both systems, I am going to set up Random Keys and encrypt them.
# mkdir /etc/mysql/
Now we can set up the cnf file to enable the plugin as well as options for encryption.
## Temp & Log Encryption
encrypt-tmp-disk-tables = 1
encrypt-tmp-files = 1
encrypt_binlog = ON
Load up the world data into the server_id 200 instance as well.
According to the information_schema.INNODB_TABLESPACES_ENCRYPTION we are encrypted now. However, they do not show that at the schema level. While they say it is encrypted if showing up in the INNODB_TABLESPACES_ENCRYPTION table, I would rather be sure and have it seen in the table and on the schema.
Up to this point, you can see that both instances have been accounted for in the INNODB_TABLESPACES_ENCRYPTION schema after the restart or loading of the schema and data.
So… a few table alters will help here…
This is simple and etc so far.. now we need to enable binlogs and double check more.
Checking via a look at the binlogs….
mariadb-binlog–base64-output=DECODE-ROWS –verbose demo.000001
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#240225 0:06:06 server id 100 end_log_pos 256 CRC32 0x04ce3741 Start: binlog v 4, server v 10.5.23-MariaDB-log created 240225 0:06:06 at startup
# Warning: this binlog is either in use or was not closed properly.
ROLLBACK/*!*/;
# at 256
# Encryption scheme: 1, key_version: 1, nonce: eb7991b210f3f4d2f7f21537
# The rest of the binlog is encrypted!
ERROR: Error in Log_event::read_log_event(): ‘Event decryption failure’, data_len: 2400465656, event_type: 240
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
Good to see that it says it is being encrypted now.
I want to see these transactions though in the binlog.. how? You can use mariadb_binlog along with –read-from-remote-server to be able to see the data in the logs…
Hopefully, this can at least help get you started ….
https://mariadb.com/kb/en/securing-mariadb-encryption/
The post MariaDB Encryption ( data at rest ) appeared first on MariaDB.org.
]]>The post Announcing MariaDB Connector/R2DBC 1.2 appeared first on MariaDB.org.
]]>The post Announcing MariaDB Connector/R2DBC 1.2 appeared first on MariaDB.org.
]]>The post MariaDB plc – looking forward to business as usual appeared first on MariaDB.org.
]]>Continue reading “MariaDB plc – looking forward to business as usual”
The post MariaDB plc – looking forward to business as usual appeared first on MariaDB.org.
The post MariaDB plc – looking forward to business as usual appeared first on MariaDB.org.
]]>The post MariaDB Python Connector 1.1.10 now available appeared first on MariaDB.org.
]]>The post MariaDB Python Connector 1.1.10 now available appeared first on MariaDB.org.
]]>The post Post-mortem: PHP and MariaDB Docker issue appeared first on MariaDB.org.
]]>Continue reading “Post-mortem: PHP and MariaDB Docker issue”
The post Post-mortem: PHP and MariaDB Docker issue appeared first on MariaDB.org.
The post Post-mortem: PHP and MariaDB Docker issue appeared first on MariaDB.org.
]]>The post Get Started with MariaDB in Kubernetes and mariadb-operator appeared first on MariaDB.org.
]]>The post Get Started with MariaDB in Kubernetes and mariadb-operator appeared first on MariaDB.org.
]]>The post The benefits of MariaDB ColumnStore appeared first on MariaDB.org.
]]>Since then, I’ve got some questions from customers, colleagues and friends: why did you guys decide to robustly invest into ColumnStore, and offer your ColumnStore services? After all, it’s not a new or trendy technology. So, why ColumnStore and not, for example, the latest NoSQL database?
The main reason is pretty simple: MariaDB ColumnStore can bring huge benefits to our customers. In this article I’m going to elaborate the main advantages of ColumnStore, from both a technical and a strategic perspective.
The first point is about avoiding the introduction of too many technologies in companies that, typically, already use too many technologies.
Just in case you don’t understand what I’m talking about… make a list of the technologies used in your company, or by your team. Include everything that plays a key role: databases, load balancers, caches, object storages, and so on. Each of them should have proper monitoring and alerts, automated backups, an upgrade policy, people able to troubleshoot in depth, inventory/documentation, and more. But the budget is often too low, the team is often too small, and in the end this just doesn’t happen.
ColumnStore is a technology for analytics and data warehouse that runs on top of MariaDB, which is an operational database. Using MariaDB for OLTP and MariaDB ColumnStore for OLAP allows the reuse of people skills. Several tools can be shared: monitoring, ProxySQL, Ansible roles and more. Data pipelines can be greatly improved by using MariaDB replication. The use of the MariaDB CONNECT engine can also simplify ETL processes from other data sources.
MariaDB ColumnStore has a distributed, highly scalable architecture. ColumnStore architecture consists of the following parts:
We can add Performance Nodes to improve a cluster IO capacity, and User Nodes to be able to run more complex queries at the same time.
This architecture allows us to store Petabytes of data (compressed, handled by Performance Nodes), and answer complex queries on billions of rows in seconds (User Nodes).
Even on a single node MariaDB ColumnStore massively scales up taking full advantage of the CPUs.
As ColumnStore name suggests, it is a columnar technology. But this is a simplification. The traditional difference between row-based and columnar architectures is that the latter typically stores each table column in a different file, to allow fast aggregations and better compression. But, in order to scale better on each node, ColumnStore has a more complex storage design, in order to store big amounts of data while making sorting and grouping fast and reducing contention.
Column data is split into partitions, that is, logical blocks that contain a range of values. Partitions are stored in big units called extents. Typically, each file contains up to two extents from the same column. There are no indexes; instead, an extent map indicates the location of each partition and the lowest value it contains.
ColumnStore has a pool of threads that remain running even when not in use. When PrimProc receives a request, it will split the job into multiple parts and each thread will run one of these parts, in parallel. So jobs are split not just over multiple nodes, but even over multiple threads within each node.
If an S3-compatible service is used as main storage, all Performance Nodes have access to it, but a local cache exists on a shared storage device.
Whether S3 is used or not, a shared device allows high availability. Each node has a corresponding directory in the shared storage, but each directory is mounted on all nodes.
MariaDB ColumnStore is designed for intensive reads with occasional huge data updates, which is typical of OLAP databases. So the locking system is minimal, and designed to serve this type of usage without reducing scalability. So reads are not locking, and no operation blocks them (not even ALTER TABLEs). Writes acquire locks on whole tables, rather than locking every modified row.
Let me stress this again:
This is the result of MCS super scalable architecture illustrated above.
Each MariaDB node sees ColumnStore as a regular storage engine. All it knows is that when data needs to be written into a ColumnStore table or read from it, MariaDB needs to call the proper methods of the ColumnStore storage engine API.
As a result, almost all MariaDB SQL syntaxes work on ColumnStore tables. There are some exceptions, but they’re not very relevant.
More importantly, an SQL statement can involve ColumStore tables and tables built with any other engine. This opens up new scenarios that would be impossible if you use different technologies for OLTP and for analytics. Some examples:
INSERT SELECT
statement from a cron job, skip the most complex parts of ETL processes.For the reasons explained above, technologies that integrate with MariaDB will also work with ColumnStore. Some relevant examples for data analysis professionals are:
ColumnStore (as part of the MariaDB community edition) is distributed under the terms of the GNU GPL, version 2. There are no license costs.
MariaDB ColumnStore, community edition, won’t bind you to any particular vendor:
MariaDB ColumnStore is an outstanding solution for analytics. This is because of its peculiar features built to scale OLAP workloads, and because it’s based on MariaDB, one of the most widely used DBMSs for OLTP. And it’s open source, and free.
To begin with, you can read our unofficial documentation and try ColumnStore as a single node on your laptop with our Vagrant or Docker image.
Do you need help to evaluate ColumnStore for your specific use case? Do you need help to deploy and configure it? Do you need help with data integration? Or maybe a training for your data analysts? Contact us to discuss your needs!
Federico Razzoli
The post The benefits of MariaDB ColumnStore appeared first on MariaDB.org.
]]>The post Announcing General Availability of MariaDB Connector/C++1.1 appeared first on MariaDB.org.
]]>The post Announcing General Availability of MariaDB Connector/C++1.1 appeared first on MariaDB.org.
]]>The post MariaDB Java Connector 3.3.3 and 2.7.12 now available appeared first on MariaDB.org.
]]>The post MariaDB Java Connector 3.3.3 and 2.7.12 now available appeared first on MariaDB.org.
]]>The post Release Roundup February 21, 2024 appeared first on MariaDB.org.
]]>The post Release Roundup February 21, 2024 appeared first on MariaDB.org.
]]>The post MariaDB 11.4.1, 11.3.2 now available appeared first on MariaDB.org.
]]>Continue reading “MariaDB 11.4.1, 11.3.2 now available”
The post MariaDB 11.4.1, 11.3.2 now available appeared first on MariaDB.org.
The post MariaDB 11.4.1, 11.3.2 now available appeared first on MariaDB.org.
]]>The post Codership shines with other EIC-funded companies at Mobile World Congress Barcelona 2024 appeared first on MariaDB.org.
]]>Join us at EIC Pavilion-4YFN-booth 8.1A20
The post Codership shines with other EIC-funded companies at Mobile World Congress Barcelona 2024 appeared first on MariaDB.org.
]]>The post Perf regressions in Postgres from 9.0 to 16 with sysbench and a small server appeared first on MariaDB.org.
]]>My results here aren’t universal, but you have to start somewhere:
The configuration files are in the subdirectories named pg9, pg10, pg11, pg12, pg13, pg14, pg15 and pg16 from here. They are named conf.diff.cx9a2_bee.
The benchmark is run with:
The post Perf regressions in Postgres from 9.0 to 16 with sysbench and a small server appeared first on MariaDB.org.
]]>The post Optimizing PostgreSQL Performance: A Comprehensive Guide to Rowstore Index Implementation and Tuning appeared first on MariaDB.org.
]]>fillfactor
, which defines how full index pages should be before splitting. A lower fillfactor
on a heavily updated table can reduce page splits, improving performance.ANALYZE
and VACUUM
commands helps keep indexes efficient by updating statistics and reclaiming space from deleted rows. This is crucial for maintaining query performance over time.pg_stat_user_indexes
and pg_stat_statements
to monitor index usage and query performance. Over time, query patterns may change, and some indexes may become unnecessary or suboptimal, requiring adjustments.By carefully implementing and tuning rowstore indexes according to these guidelines, you can significantly enhance the performance of your PostgreSQL database.
The post Optimizing PostgreSQL Performance: A Comprehensive Guide to Rowstore Index Implementation and Tuning appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Optimizing PostgreSQL Performance: A Comprehensive Guide to Rowstore Index Implementation and Tuning appeared first on MariaDB.org.
]]>The post Optimizing SQL Server Performance: Implementing RowStore vs. ColumnStore Indexes appeared first on MariaDB.org.
]]>RowStore indexes are the traditional way of storing data in SQL Server, where data is stored in rows within pages. Each page can contain multiple rows, depending on the size of the rows and the page size (8KB).
Implementation and Example:
CREATE TABLE Employees ( EmployeeID INT PRIMARY KEY, Name NVARCHAR(100), DepartmentID INT ); -- Creating a clustered index CREATE CLUSTERED INDEX IX_Employees ON Employees(DepartmentID);
-- Creating a non-clustered index CREATE NONCLUSTERED INDEX IX_Employees_Name ON Employees(Name);
Performance Considerations:
Introduced in SQL Server 2012, ColumnStore indexes store data column-wise, meaning each column’s data is stored together, allowing for high compression rates and fast query performance on large datasets.
Implementation and Example:
-- Creating a clustered columnstore index CREATE CLUSTERED COLUMNSTORE INDEX CCI_Employees ON Employees;
-- Creating a non-clustered columnstore index CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_Employees_Name ON Employees(Name);
Performance Considerations:
By strategically implementing RowStore and ColumnStore indexes based on the specific needs of your workload, you can optimize SQL Server’s performance, enhancing both the efficiency and speed of data retrieval and analysis.
The post Optimizing SQL Server Performance: Implementing RowStore vs. ColumnStore Indexes appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Optimizing SQL Server Performance: Implementing RowStore vs. ColumnStore Indexes appeared first on MariaDB.org.
]]>The post Percona Monthly Bug Report: January 2024 appeared first on MariaDB.org.
]]>The post Percona Monthly Bug Report: January 2024 appeared first on MariaDB.org.
]]>The post How to implement Parallel Redo Logging in InnoDB? appeared first on MariaDB.org.
]]>Here are generalized steps and considerations for optimizing redo log performance in InnoDB, which can indirectly influence how redo operations are executed:
innodb_log_file_size
configuration to a larger size to reduce the frequency of redo log flushes, but ensure it’s balanced with recovery time considerations.innodb_flush_log_at_trx_commit
parameter controls the balance between ACID compliance and performance. Setting it to 2
can improve write performance by reducing disk flush operations but at a slight risk to data durability.innodb_log_files_in_group
). While this doesn’t parallelize the logging within a single transaction, it can optimize I/O operations across transactions.innodb_io_capacity
and innodb_write_io_threads
to align with your hardware’s capabilities.While InnoDB does not offer a direct setting named “parallel redo logging,” the combination of configuration optimizations, adequate hardware, and MySQL version updates can collectively enhance the efficiency of redo log operations. These improvements can lead to better overall performance, especially for write-intensive applications. Always test configuration changes in a development environment before applying them to production to understand their impact on your specific workload.
The post How to implement Parallel Redo Logging in InnoDB? appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post How to implement Parallel Redo Logging in InnoDB? appeared first on MariaDB.org.
]]>The post Perf regressions in MySQL from 5.6.21 to 8.0.36 using sysbench and a small server appeared first on MariaDB.org.
]]>My results here aren’t universal.
tl;dr
The benchmark is run with one connection and a database cached by InnoDB.
From 5.6.21 to 8.0.36
This section uses 5.6.21 as the base version and then compares that with 5.6.51, 5.7.10, 5.7.44, 8.0.13, 8.0.14, 8.0.20, 8.0.28, 8.0.35 and 8.0.36 to show how performance has changed from oldest tested (5.6.21) to newest tested (8.0.36).
The post Perf regressions in MySQL from 5.6.21 to 8.0.36 using sysbench and a small server appeared first on MariaDB.org.
]]>The post Announcing MariaDB Community Server 11.3 GA and 11.4 RC appeared first on MariaDB.org.
]]>The post Announcing MariaDB Community Server 11.3 GA and 11.4 RC appeared first on MariaDB.org.
]]>The post Installing Galera Cluster 4 with MySQL on Ubuntu 22.04 appeared first on MariaDB.org.
]]>First, you will need to ensure that the Galera Cluster GPG key is installed:
apt-key adv --keyserver keyserver.ubuntu.com --recv 8DA84635
You will see the message as follows:
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)). Executing: /tmp/apt-key-gpghome.pEjdHcaXNs/gpg.1.sh --keyserver keyserver.ubuntu.com --recv 8DA84635 gpg: key 45460A518DA84635: public key "Codership Oy (Codership Signing Key) <info@galeracluster.com>" imported gpg: Total number processed: 1 gpg: imported: 1
You can now edit the /etc/apt/sources.list.d/galera.list to include the following lines:
deb https://releases.galeracluster.com/galera-4.17/ubuntu jammy main deb https://releases.galeracluster.com/mysql-wsrep-8.0.35-26.16/ubuntu jammy main
You should also pin the repository by editing /etc/apt/preferences.d/galera.pref:
# Prefer the Codership repository Package: * Pin: origin releases.galeracluster.com Pin-Priority: 1001
You should now run an apt update and then install Galera 4 with MySQL 8:
apt install galera-4 mysql-wsrep-8.0
You are now told to enter a root password as apt/dpkg supports interactivity during installations. Please enter a reasonably secure password. Then you are asked if you should use a strong password, which is caching_sha2_password (you are encouraged to pick this, compared to the older mysql_native_password).
Now it is as simple as configuring your my.cnf to enable Galera Cluster. You can edit /etc/mysql/mysql.conf.d/mysqld.cnfand add a basic configuration:
[mysqld] pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock datadir = /var/lib/mysql log-error = /var/log/mysql/error.log binlog_format=ROW default-storage-engine=innodb innodb_autoinc_lock_mode=2 bind-address=0.0.0.0 # Galera Provider Configuration wsrep_on=ON wsrep_provider=/usr/lib/galera/libgalera_smm.so # Galera Cluster Configuration wsrep_cluster_name="galera" wsrep_cluster_address="gcomm://128.199.161.224,188.166.183.120,188.166.242.246" # Galera Synchronization Configuration wsrep_sst_method=rsync # Galera Node Configuration wsrep_node_address="128.199.161.224"
Execute systemctl stop mysql. Run mysqld_bootstrap only on the first node.
You can execute: mysql -u root -p -e "show status like 'wsrep_cluster_size'" and see:
mysql -u root -p -e "show status like 'wsrep_cluster_size'" Enter password: +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 1 | +--------------------+-------+
Now, when you bring up the second node as simply as systemctl start mysql, you can execute the same command above and will see that the wsrep_cluster_size has increased to 2. Repeat this again for the third node. You can also choose to test replication by creating a database and table on one node, and see that the replication is happening in real time.
To find out more, start MySQL and execute show status like 'wsrep%';.
Remember that you did not have to look for passwords in the error log because this was already executed via the interactive installer. Enjoy your deployment of a 3-node Ubuntu 22.04 LTS Galera Cluster.
The post Installing Galera Cluster 4 with MySQL on Ubuntu 22.04 appeared first on MariaDB.org.
]]>The post Let’s go, MariaDB ColumnStore at Vettabase! appeared first on MariaDB.org.
]]>Having a columnar based engine built right into MariaDB, aptly named ColumnStore, means there is no excuse for you to not give it a whirl where you probably already have a MariaDB replica server for OLAP workloads like reporting, archiving, or data warehousing.
So for the new year I thought, what better way to share my love of ColumnStore than to host a webinar just before the Spring?
While I personally run ColumnStore on my trusty server in the garage, I needed to create a portable environment using Docker in a repository for people attending to follow along with. Pretty standard so far.
Alas to my surprise, the official ColumnStore image is out of date, built manually, and actually does not work as the required processes are being run by a tool that has since been migrated to use systemd. And there is no official Vagrant image. See MCOL-5646 and MCOL-3906.
I guess we should just make our own ColumnStore images for Docker and Vagrant then!
And so the ColumnStore adventure begins here at Vettabase.
Another issue we quickly encountered was the lack of documentation. MariaDB has great documentation, the MariaDB Knowledge Base, and it used to include ColumnStore documentation. It is a wiki that anyone can edit, and the contents are covered by the GNU FDL and CC-BY-SA3 licenses. Unfortunately all ColumnStore documentation was removed and only a small part of it migrated to MariaDB Enterprise documentation. See MCOL-5655.
So we decided to start the MariaDB ColumnStore Unofficial Documentation Project! It is a public wiki that the community can edit, and the contents are covered by the same licenses as the original documentation. See our manifesto.
At the time of this writing registration is broken due to a problem with Amazon SES. We are working to fix it. In the meanwhile, feel free to ask us to give you access by writing an email to co**************@ve*******.com.
So what does this mean going forward? Should yee abandon all hope for those who dare adventure outside of our new realm? No, of course not.
While we are working on a solution for customers and the community to run and understand ColumnStore, you can help by voting on existing issues to bring the official docker image up to date and have ColumnStore enabled in the community image:
While our own OCI image, at the time of writing, is still facing some issues, we are making progress:
MariaDB [(none)]> create schema test; create table test.t (a int) engine=columnstore;
Query OK, 1 row affected (0.001 sec)
Query OK, 0 rows affected (0.300 sec)
MariaDB [(none)]> insert into test.t () values (1);
Query OK, 1 row affected (0.059 sec)
The current state is:
dev
.latest
tag, you will know that the image is fairly stable.So for both our container and Vagrant efforts please do raise an issue or submit a pull request.
ColumnStore is a great engine, which I will be demonstrating in my first and next webinar at Vettabase, it really does deserve some love.
Do you have any ColumnStore success stories? We would love to hear them.
vettadock/mariadb-columnstore
)
vettabase/mariadb-columnstore
)
Richard Bensley
The post Let’s go, MariaDB ColumnStore at Vettabase! appeared first on MariaDB.org.
]]>The post Webinar: Mastering Galera Cluster, Best Practices and New Features 27th February appeared first on MariaDB.org.
]]>What You Will Learn:
* Core Best Practices: Dive into essential practices, from employing primary keys and leveraging InnoDB to deciding if to optimise read/write splits and managing AUTO_INCREMENT settings.
* Advanced Configuration: Uncover advanced techniques for error monitoring, configuring Galera across networks, and fine-tuning the gcache for optimal performance.
* Innovative Features: Stay ahead with insights on implementing Non-Blocking Operations for seamless schema changes, coordinating distributed transactions with XA transactions, and securing your GCache through encryption.
* Protocol and Network Enhancements: Discover the latest advancements in handling unstable networks, protocol improvements, and explore new options to elevate your cluster operations.
Have in-depth questions or faced intricate production challenges? This extended Q&A session is your opportunity to seek advice, clarify doubts, and engage directly with Galera Cluster experts.
JOIN EMEA WEBINAR 27th FEBRUARY 13 PM CET
JOIN USA WEBINAR 27th FEBRUARY 9 AM PST
Do not forget Galera Cluster Advanced Database Administration with Galera Cluster training Emea 4th-5th of March and USA 6th-7th of March.
Check training content and join 2 days training
The post Webinar: Mastering Galera Cluster, Best Practices and New Features 27th February appeared first on MariaDB.org.
]]>The post It wasn’t a performance regression in Postgres 14 appeared first on MariaDB.org.
]]>The reason for the false alarm is that index cleanup was skipped during vacuum starting with Postgres 14 and the impact is that the optimizer had more work to do (more not-visible index entries to skip) in the get_actual_variable_range function. Output like this from the vacuum command makes that obvious:
table “pi1”: index scan bypassed: 48976 pages from table (0.62% of total) have 5000000 dead item identifiers
The problem is solved by adding INDEX_CLEANUP ON to the vacuum command.
tl;dr
Editorial
At a high-level there are several issues:
Third, the Postgres optimizer can use too much CPU time in get_actual_variable_range. I don’t mind that get_actual_variable_range exists because it is useful for cases where index statistics are not current. But the problem is that for the problematic SQL statement (see the DELETE above and this blog post) there is only one good index for the statement. So I prefer the optimizer not do too much work in that case. I have experienced this problem a few times with MySQL. One of the fixes from upstream MySQL was to change the optimizer to do less work when there was a FORCE INDEX hint. And with some OLTP workloads where the same statements are so frequent I really don’t want the optimizer to use extra CPU time. For the same reason, I get much better throughput from Postgres when prepared statements are enabled and now I always enable them for the range and point queries with Postgres during the insert benchmark, but not for MySQL (because they don’t help much with MySQL).
Build + Configuration
The post It wasn’t a performance regression in Postgres 14 appeared first on MariaDB.org.
]]>The post Tips and Tricks for reducing Leaf Block Contention happening to InnoDB appeared first on MariaDB.org.
]]>innodb_page_size
configuration. The default page size is 16KB, but if your workload involves large rows or if you’re experiencing high contention, increasing the page size can reduce the number of row locks within the same leaf block. However, be cautious as this change requires recreating the database and can affect disk usage and memory utilization.LOW_PRIORITY
write operations for less critical updates to decrease their priority and reduce contention.innodb_autoinc_lock_mode
. Setting it to 2
(interleaved) mode reduces contention on auto-increment locks by allowing statements to get the next auto-increment value without waiting for other statements to complete, suitable for high concurrency INSERT operations.READ COMMITTED
instead of REPEATABLE READ
can decrease the number of locks set by a transaction, reducing contention. However, ensure that this change is compatible with your application’s consistency requirements.SHOW ENGINE INNODB STATUS
and performance schema tables can help identify contention points.INFORMATION_SCHEMA.INNODB_TRX
and INNODB_LOCKS
tables to analyze locking behavior and identify contentious queries.DYNAMIC
and COMPRESSED
can store more data on a page, reducing the need for accessing multiple leaf blocks for queries. This change, however, should be tested as it might have implications on CPU usage due to compression.Reducing leaf block contention in InnoDB requires a combination of database configuration adjustments, query optimization, and strategic schema design. By implementing these tips and continuously monitoring your database’s performance, you can significantly mitigate the impact of contention on your database’s throughput and response times.
The post Tips and Tricks for reducing Leaf Block Contention happening to InnoDB appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Tips and Tricks for reducing Leaf Block Contention happening to InnoDB appeared first on MariaDB.org.
]]>The post Considering Alternatives for Your MySQL Migration? Why Percona Should Be Your First Choice appeared first on MariaDB.org.
]]>The post Considering Alternatives for Your MySQL Migration? Why Percona Should Be Your First Choice appeared first on MariaDB.org.
]]>The post Can Disk Space Be Saved in MySQL by Adding a Primary Key? appeared first on MariaDB.org.
]]>The post Can Disk Space Be Saved in MySQL by Adding a Primary Key? appeared first on MariaDB.org.
]]>The post FOSDEM 2024 follow-up appeared first on MariaDB.org.
]]>Continue reading “FOSDEM 2024 follow-up”
The post FOSDEM 2024 follow-up appeared first on MariaDB.org.
The post FOSDEM 2024 follow-up appeared first on MariaDB.org.
]]>The post MariaDB 11.2.3, 11.1.4, 11.0.5, 10.11.7, 10.6.17, 10.5.24, 10.4.33 now available appeared first on MariaDB.org.
]]>Continue reading “MariaDB 11.2.3, 11.1.4, 11.0.5, 10.11.7, 10.6.17, 10.5.24, 10.4.33 now available”
The post MariaDB 11.2.3, 11.1.4, 11.0.5, 10.11.7, 10.6.17, 10.5.24, 10.4.33 now available appeared first on MariaDB.org.
The post MariaDB 11.2.3, 11.1.4, 11.0.5, 10.11.7, 10.6.17, 10.5.24, 10.4.33 now available appeared first on MariaDB.org.
]]>The post Maximizing Database High Availability with MariaDB MaxScale appeared first on MariaDB.org.
]]>The post Maximizing Database High Availability with MariaDB MaxScale appeared first on MariaDB.org.
]]>The post Migration with Docker Official Images appeared first on MariaDB.org.
]]>Continue reading “Migration with Docker Official Images”
The post Migration with Docker Official Images appeared first on MariaDB.org.
The post Migration with Docker Official Images appeared first on MariaDB.org.
]]>The post MariaDB Community Server Q1 2024 maintenance releases appeared first on MariaDB.org.
]]>The post MariaDB Community Server Q1 2024 maintenance releases appeared first on MariaDB.org.
]]>The post PostgreSQL for SQL Server DBAs – What is an alternative to sys.dm_exec_query_stats in the PostgreSQL world? appeared first on MariaDB.org.
]]>sys.dm
_exec_query_stats
is a Dynamic Management View (DMV) in Microsoft SQL Server that provides performance statistics for cached query plans in SQL Server. It’s used for monitoring and identifying performance issues with SQL queries. This DMV can be very useful for database administrators and developers to analyze the performance of SQL queries, understand how often they are executed, and identify which queries are consuming the most resources.
Here is a basic example of how you might use sys.dm
_exec_query_stats
to get information about query execution times, CPU time, logical reads, and so on:
SELECT qs.execution_count, qs.total_logical_reads, qs.total_logical_writes, qs.total_worker_time, qs.total_elapsed_time, qs.total_elapsed_time / qs.execution_count AS avg_elapsed_time, SUBSTRING(st.text, (qs.statement_start_offset/2) + 1, ((CASE qs.statement_end_offset WHEN -1 THEN DATALENGTH(st.text) ELSE qs.statement_end_offset END - qs.statement_start_offset)/2) + 1) AS statement_text FROM sys.dm_exec_query_stats AS qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st ORDER BY qs.total_elapsed_time DESC;
This query returns statistics about the executed SQL statements, including the number of times each statement was executed (execution_count
), total logical reads and writes, total worker (CPU) time, total elapsed time, average elapsed time per execution, and the text of the SQL statement itself.
In PostgreSQL, there isn’t a direct equivalent to SQL Server’s sys.dm
_exec_query_stats
DMV, but you can get similar insights using a combination of PostgreSQL’s system catalogs and views, particularly the pg_stat_statements
extension. This extension provides a means to track execution statistics of all SQL statements executed by a server.
First, ensure that the pg_stat_statements
module is enabled in your PostgreSQL instance. This can usually be done by adding pg_stat_statements
to the shared_preload_libraries
in your PostgreSQL configuration file (postgresql.conf
), and then restarting the PostgreSQL server. You may also need to create the extension in your database with:
CREATE EXTENSION pg_stat_statements;
Once pg_stat_statements
is enabled, you can query its view to get query performance statistics. Here’s an example query similar in spirit to the SQL Server example:
SELECT query, calls, total_time, rows, min_time, max_time, mean_time, stddev_time, blocks_hit, blocks_read FROM pg_stat_statements ORDER BY total_time DESC;
This will give you:
query
: Text of a representative query, with some values anonymized.calls
: Number of times the statement was executed.total_time
: Total time spent in the statement, in milliseconds.rows
: Total number of rows retrieved or affected by the statement.min_time
, max_time
, mean_time
, stddev_time
: Minimum, maximum, mean, and standard deviation of the execution times for the statement, respectively.blocks_hit
: Number of times disk blocks were found already in the buffer cache, avoiding disk reads.blocks_read
: Number of disk blocks read.This view is extremely useful for identifying slow queries, frequently executed queries, and queries that are reading a lot of data from disk.
Keep in mind that pg_stat_statements
tracks queries across all databases in the server by default, and its data persists across server restarts until it’s explicitly reset using functions like pg_stat_reset()
or pg_stat_statements_reset()
. Permissions to access pg_stat_statements
data can be managed at the PostgreSQL role level.
The post PostgreSQL for SQL Server DBAs – What is an alternative to sys.dm_exec_query_stats in the PostgreSQL world? appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post PostgreSQL for SQL Server DBAs – What is an alternative to sys.dm_exec_query_stats in the PostgreSQL world? appeared first on MariaDB.org.
]]>The post MySQL 8.2.0 Community vs. Enterprise; Is There a Winner? appeared first on MariaDB.org.
]]>The post MySQL 8.2.0 Community vs. Enterprise; Is There a Winner? appeared first on MariaDB.org.
]]>The post Are Your MySQL Users Using ‘password’ or ‘thebossisajerk’ as Passwords? appeared first on MariaDB.org.
]]>The post Are Your MySQL Users Using ‘password’ or ‘thebossisajerk’ as Passwords? appeared first on MariaDB.org.
]]>The post Five Notable Changes in Percona Everest Alpha appeared first on MariaDB.org.
]]>The post Five Notable Changes in Percona Everest Alpha appeared first on MariaDB.org.
]]>The post Adjusting the MariaDB Server release model appeared first on MariaDB.org.
]]>Continue reading “Adjusting the MariaDB Server release model”
The post Adjusting the MariaDB Server release model appeared first on MariaDB.org.
The post Adjusting the MariaDB Server release model appeared first on MariaDB.org.
]]>INTRODUCTION
On occasions, DBAs come across segmentation fault issues while executing some queries. However, this is one of the least explored topics till time. I tried to search for details related to segmentation fault on the internet and found many articles, however it failed to quench my thirst as none of them had an answer I was looking for. So, I decided to gather information and write detailed information about this issue.
In order to understand “segmentation fault”, it is inevitable to know the basic idea of segmentation and its implementation in C programming. In this blog, I will also cover a scenario that causes “segmentation fault”.
BASIC UNDERSTANDING
In order to understand segmentation fault, it is necessary to understand memory management methods for processes.
When we need to execute any program, it should be loaded into memory first. Inside the memory, they can be allocated any available space. When a program leaves the memory, space becomes available, however, the OS may or may not be able to allocate vacant memory space to another program or process as it has some issues. As the amount of space required by the new program may be higher than the space available in a fragment; the program should be broken into different chunks before it is loaded into memory, due to which memory management becomes challenging because it leads to fragmentation.
In order to overcome these issues, the concept of paging and segmentation was introduced where physical address space and virtual address space were designed. A detailed description of these concepts are as below.
Paging
This was designed to allow non-contiguous space allocation to processes. Here, memory is divided into equal sizes of partitions where the code of a program resides. The chunks in main memory are called frames, while they are called pages in the secondary(or HDD). In order to handle memory management, a structure called memory management unit(MMU) is built, which divides memory blocks in2 major sections: logical address space and physical address space.
Logical address space - it comprises logical addresses that are generated by CPU for program Physical address space - it has physical addresses that are pointers to actual locations in memory.
In order to perform actual translation of a logical address to the physical address, MMU needs to perform memory mapping operations, which can be accomplished by another structure called page table. A page table has actual references to relevant physical addresses for logical addresses.
The figure below describes the same.
Segmentation
This scheme was introduced to overcome disadvantages with paging; it works similar to paging. Instead of fixed size pages, it creates different sizes of segments that are based on program code. In this case we do not need physical address space. Here, a segment table manages everything.
Here, virtual(logical) to physical address translation is a little easier as segment tables store adequate information.
I do not dive into this topic further as it requires a bit more technical understanding. The purpose of adding this section was to have some basic understanding of mapping from logical to physical addresses.
WHAT IS SEGMENTATION FAULT
As explained above, the CPU first fetches a logical address, and by using a page table or a segment table, it finds/calculates the physical address of the desired memory location. That is how memory management works.
In an attempt to access the desired location, we sometimes come across some issues that are as described below.
On occasions, after calculating the physical address using a page/segment table, the program comes across the issue that required contents(piece of code, variables or anything else) are not available in the physical memory location. This phenomenon is called “page fault”. This is not unusual and doesn’t affect the course of the execution as it just loads desirable items in memory.
Another one is a classical case of inaccessible memory location. When the generated physical address points to a physical location that is not accessible by the program. This is called “segmentation fault”, which terminates the process execution. This happens when a program tries to access a read-only portion of memory or another program’s space.
Although the segmentation fault has been maligned as a showstopper, it is still mandatory as it is a mechanism to provide protection against any internal corruption.
Note:- segmentation fault has nothing to do with segmentation memory management method.
A REPRODUCIBLE SCENARIO
While exploring at code-level there are a number of scenarios that result in a segmentation fault, such as buffer overflows, stack overflows and so on. However, This blog is written from the database perspective, hence, I would not prefer to dive into those scenarios as they are very high-level programming concepts.
In this section, I will focus on a scenario in PostgreSQL database that causes segmentation fault.
This is the one that I came across once where the database gets restarted due to “segmentation fault”. Below is a line of code that result into a segmentation fault on PostgreSQL 13.4 and 12.8
CREATE SCHEMA debug;
CREATE TABLE debug.downloaded_images (
itemid text NOT NULL,
download_time timestamp,
PRIMARY KEY(itemId)
);
INSERT INTO debug.downloaded_images (itemid, download_time) VALUES (\'1190300\',\'2021-09-07 11:00:10.255831\');
BEGIN;
CREATE TABLE IF NOT EXISTS \"debug\".\"foo\"
(itemId TEXT,
last_update TIMESTAMP,
PRIMARY KEY(itemId)
);
DECLARE \"test-cursor-crash\" CURSOR WITH HOLD FOR
SELECT di.itemId FROM \"debug\".downloaded_images di
LEFT JOIN (SELECT itemId, MIN(last_update) as last_update FROM
\"debug\".\"foo\" GROUP BY itemId) computed ON di.itemId=computed.itemId
WHERE COALESCE(last_update, \'1970-01-01\') < download_time;
FETCH 10000 IN \"test-cursor-crash\";
COMMIT;
The above example is taken from the hyperlinked page. By doing some further analysis, it came to the light that it creates issues with LEFT JOIN only. In the case of an equi-join, it works as expected. This error was fixed in later versions of PostgreSQL.
CAUSES
As described above, the actual cause of this error is trying to access a memory address that is not accessible by the program, and there are various reasons for the same to happen. However, sophisticated users have limited understanding of such concepts, due to that, I will try to explain in the simplest possible terms.
The following are possible causes for segmentation fault.
Operating system issues
Buggy OS kernel
Faulty hardware(specifically memory)
Bug in a product(e.g. PostgreSQL, MySQL)
Database corruption
Though the scope of this error is not limited to above mentioned reasons only, these are most probable ones. In order to know the root cause of the issue, one needs to troubleshoot it with the help of programmers.
TROUBLESHOOTING
To delve into the root cause of segmentation fault, it is imperative to install debug symbols and enable creation of a core dump on failure. This helps analyze the issue and shows what function or a part of code causes the issue. If requirements do not meet, it is not able to generate the core dump and it becomes impossible to trace the issue.
Enable core dump generation
Every database has different methods to generate core dump files. In order to enable generation of core dump, one needs to set some kernel settings as below.
# echo \'kernel.core_pattern=/var/crash/core-%e-%p\' > > /etc/sysctl.conf
# ulimit -c unlimited
Here, any other path can be used instead of /var/crash.
Enable debugging
Debug symbols enable code-level debugging. It shows details about the file being executed and the line of the code where the execution is happening. It is a responsibility of software developers to build debug symbols. In PostgreSQL, debug symbols can be enabled at the time of installation as below.
# ./configure CFLAGS=\"-O0 -g3\"
Also, there are certain packages available in PostgreSQL, such as postgresql-12-dbg
In case of MySQL, the following command during the source code installation may turn on debugging.
# cmake -DWITH_DEBUG=1
Allow database to generate core dumps
After enabling the core dump generation and debugging, it is important that databases should also collaborate with the host OS to generate core dump. Hence, the database should be started with an option to create core files. In order to accomplish this, one should start the database with such an option.
In the case of PostgreSQL, the pg_ctl command should be started with the -c option as shown below.
$ /usr/local/pgsql/bin -D -c start
While in MySQL, following lines can be added in my.cnf or my.ini
[mysqld]
core-file
Note :- In an event of a crash, the OS dumps all the contents from memory in the core file. So, before enabling, be sure you have sufficient space to accommodate the core dump.
Debugging core files
Core files are version specific, and they can be read with the binary of a specific version of the database. Another version’s binary file cannot read the core file generated by the current version of the database. Like, the core file generated by MySQL 8 cannot be read by mysql binary from any other version.
The core dump can be traced by Gnu debugger(gdb). The below one is an example of reading the core dump.
$ gdb /usr/local/pgsql/bin/postgres /var/crash/core-postgres-64807
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
.
.
.
Reading symbols from /usr/local/pgsql/bin/postgres...
[New LWP 64807]
[Thread debugging using libthread_db enabled]
Using host libthread_db library \"/lib/x86_64-linux-gnu/libthread_db.so.1\".
Core was generated by `postgres: postgres postgres [local] COMMIT\'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 slot_deform_heap_tuple (natts=2, offp=0x5560dcfd1d58, tuple=, slot=0x5560dcfd1d10) at execTuples.c:930
930 execTuples.c: No such file or directory.
(gdb)
Apart from that, Valgrind is also one of the tools that can be used to debug the issue. To know more about Valgrind, kindly click on the link.
PERCONA’S INITIATIVE
As it is described, Segmentation fault is caused by various issues that are sometimes not even in control of Programmers. But in many cases, programs themselves are culprits and trigger segmentation faults; however users have least knowledge of the same. Percona is committed to strengthening an open-source community and has acknowledged the issue. The Percona team strongly believes that users should have knowledge of perils associated with some non-standard modules(or PostgreSQL extensions) that are identified as troublemakers.
These details are planned to be added in pg_gather reports. At present, this is in a development phase. The next version of the pg_gather will have these details available.
SUMMARY
Indeed, segmentation fault is a kind of issue that is not widely explored yet. Having said that, it revisits frequently on database systems due to a variety of reasons. Basically, it surfaces due to an attempt to access an unauthorized area or segment of memory where a normal DBA is least aware of the same. The issue can be troubleshooted by enabling core dump generation and installation of debug symbols.
The post Segmentation Fault – A DBA Perspective appeared first on MariaDB.org.
]]>INTRODUCTION
On occasions, DBAs come across segmentation fault issues while executing some queries. However, this is one of the least explored topics till time. I tried to search for details related to segmentation fault on the internet and found many articles, however it failed to quench my thirst as none of them had an answer I was looking for. So, I decided to gather information and write detailed information about this issue.
In order to understand “segmentation fault”, it is inevitable to know the basic idea of segmentation and its implementation in C programming. In this blog, I will also cover a scenario that causes “segmentation fault”.
BASIC UNDERSTANDING
In order to understand segmentation fault, it is necessary to understand memory management methods for processes.
When we need to execute any program, it should be loaded into memory first. Inside the memory, they can be allocated any available space. When a program leaves the memory, space becomes available, however, the OS may or may not be able to allocate vacant memory space to another program or process as it has some issues. As the amount of space required by the new program may be higher than the space available in a fragment; the program should be broken into different chunks before it is loaded into memory, due to which memory management becomes challenging because it leads to fragmentation.
In order to overcome these issues, the concept of paging and segmentation was introduced where physical address space and virtual address space were designed. A detailed description of these concepts are as below.
Paging
This was designed to allow non-contiguous space allocation to processes. Here, memory is divided into equal sizes of partitions where the code of a program resides. The chunks in main memory are called frames, while they are called pages in the secondary(or HDD). In order to handle memory management, a structure called memory management unit(MMU) is built, which divides memory blocks in2 major sections: logical address space and physical address space.
Logical address space – it comprises logical addresses that are generated by CPU for program Physical address space – it has physical addresses that are pointers to actual locations in memory.
In order to perform actual translation of a logical address to the physical address, MMU needs to perform memory mapping operations, which can be accomplished by another structure called page table. A page table has actual references to relevant physical addresses for logical addresses.
The figure below describes the same.
Segmentation
This scheme was introduced to overcome disadvantages with paging; it works similar to paging. Instead of fixed size pages, it creates different sizes of segments that are based on program code. In this case we do not need physical address space. Here, a segment table manages everything.
Here, virtual(logical) to physical address translation is a little easier as segment tables store adequate information.
I do not dive into this topic further as it requires a bit more technical understanding. The purpose of adding this section was to have some basic understanding of mapping from logical to physical addresses.
WHAT IS SEGMENTATION FAULT
As explained above, the CPU first fetches a logical address, and by using a page table or a segment table, it finds/calculates the physical address of the desired memory location. That is how memory management works.
In an attempt to access the desired location, we sometimes come across some issues that are as described below.
On occasions, after calculating the physical address using a page/segment table, the program comes across the issue that required contents(piece of code, variables or anything else) are not available in the physical memory location. This phenomenon is called “page fault”. This is not unusual and doesn’t affect the course of the execution as it just loads desirable items in memory.
Another one is a classical case of inaccessible memory location. When the generated physical address points to a physical location that is not accessible by the program. This is called “segmentation fault”, which terminates the process execution. This happens when a program tries to access a read-only portion of memory or another program’s space.
Although the segmentation fault has been maligned as a showstopper, it is still mandatory as it is a mechanism to provide protection against any internal corruption.
Note:- segmentation fault has nothing to do with segmentation memory management method.
A REPRODUCIBLE SCENARIO
While exploring at code-level there are a number of scenarios that result in a segmentation fault, such as buffer overflows, stack overflows and so on. However, This blog is written from the database perspective, hence, I would not prefer to dive into those scenarios as they are very high-level programming concepts.
In this section, I will focus on a scenario in PostgreSQL database that causes segmentation fault.
This is the one that I came across once where the database gets restarted due to “segmentation fault”. Below is a line of code that result into a segmentation fault on PostgreSQL 13.4 and 12.8
CREATE SCHEMA debug;
CREATE TABLE debug.downloaded_images (
itemid text NOT NULL,
download_time timestamp,
PRIMARY KEY(itemId)
);
INSERT INTO debug.downloaded_images (itemid, download_time) VALUES (‘1190300′,’2021-09-07 11:00:10.255831’);
BEGIN;
CREATE TABLE IF NOT EXISTS “debug”.”foo”
(itemId TEXT,
last_update TIMESTAMP,
PRIMARY KEY(itemId)
);
DECLARE “test-cursor-crash” CURSOR WITH HOLD FOR
SELECT di.itemId FROM “debug”.downloaded_images di
LEFT JOIN (SELECT itemId, MIN(last_update) as last_update FROM
“debug”.”foo” GROUP BY itemId) computed ON di.itemId=computed.itemId
WHERE COALESCE(last_update, ‘1970-01-01′) > /etc/sysctl.conf
# ulimit -c unlimited
Here, any other path can be used instead of /var/crash.
Enable debugging
Debug symbols enable code-level debugging. It shows details about the file being executed and the line of the code where the execution is happening. It is a responsibility of software developers to build debug symbols. In PostgreSQL, debug symbols can be enabled at the time of installation as below.
# ./configure CFLAGS=”-O0 -g3″
Also, there are certain packages available in PostgreSQL, such as postgresql-12-dbg
In case of MySQL, the following command during the source code installation may turn on debugging.
# cmake -DWITH_DEBUG=1
Allow database to generate core dumps
After enabling the core dump generation and debugging, it is important that databases should also collaborate with the host OS to generate core dump. Hence, the database should be started with an option to create core files. In order to accomplish this, one should start the database with such an option.
In the case of PostgreSQL, the pg_ctl command should be started with the -c option as shown below.
$ /usr/local/pgsql/bin -D -c start
While in MySQL, following lines can be added in my.cnf or my.ini
[mysqld]
core-file
Note :- In an event of a crash, the OS dumps all the contents from memory in the core file. So, before enabling, be sure you have sufficient space to accommodate the core dump.
Debugging core files
Core files are version specific, and they can be read with the binary of a specific version of the database. Another version’s binary file cannot read the core file generated by the current version of the database. Like, the core file generated by MySQL 8 cannot be read by mysql binary from any other version.
The core dump can be traced by Gnu debugger(gdb). The below one is an example of reading the core dump.
$ gdb /usr/local/pgsql/bin/postgres /var/crash/core-postgres-64807
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
.
.
.
Reading symbols from /usr/local/pgsql/bin/postgres…
[New LWP 64807]
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.
Core was generated by `postgres: postgres postgres [local] COMMIT’.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 slot_deform_heap_tuple (natts=2, offp=0x5560dcfd1d58, tuple=, slot=0x5560dcfd1d10) at execTuples.c:930
930 execTuples.c: No such file or directory.
(gdb)
Apart from that, Valgrind is also one of the tools that can be used to debug the issue. To know more about Valgrind, kindly click on the link.
PERCONA’S INITIATIVE
As it is described, Segmentation fault is caused by various issues that are sometimes not even in control of Programmers. But in many cases, programs themselves are culprits and trigger segmentation faults; however users have least knowledge of the same. Percona is committed to strengthening an open-source community and has acknowledged the issue. The Percona team strongly believes that users should have knowledge of perils associated with some non-standard modules(or PostgreSQL extensions) that are identified as troublemakers.
These details are planned to be added in pg_gather reports. At present, this is in a development phase. The next version of the pg_gather will have these details available.
SUMMARY
Indeed, segmentation fault is a kind of issue that is not widely explored yet. Having said that, it revisits frequently on database systems due to a variety of reasons. Basically, it surfaces due to an attempt to access an unauthorized area or segment of memory where a normal DBA is least aware of the same. The issue can be troubleshooted by enabling core dump generation and installation of debug symbols.
The post Segmentation Fault – A DBA Perspective appeared first on MariaDB.org.
]]>The post Codership partners with Ordix AG in Germany appeared first on MariaDB.org.
]]>Paderborn, Germany (February 2nd ) – ORDIX, IT consulting house in Germany, announced its partnership with Codership who produces Galera Cluster, the leading clustering solution for MySQL and MariaDB databases. This strategic partnership supports all ORDIX-clients in Germany who have the need to secure their services and applications business continuity. The partnership between ORDIX and Codership demonstrates their commitment to provide best of breed services to German MySQL users.
Since Codership released the first version of their Galera Cluster back in 2010, the product has become the leading high availability and business continuity solution for MySQL and MariaDB databases. Thousands of companies, in telecommunications, e-commerce, lottery, travel, payments, government, just to mention a few, are using Galera Cluster to protect their businesses from lost data and customers. Galeras solution can be used on-premises or in the cloud. Codership’s Galera Cluster offers 80-90 % cost savings to proprietary and legacy database costs.
“Knowledge increases by sharing it” a motto for ORDIX AG, an IT-service provider, who has been acting according to this principle for more than 30 years. Because only if we share our knowledge and experience with each other, we will be able to act innovatively, improve together and thus advance digitalization. Through the constant development and transfer of knowledge, we have created a multi-layered and valuable know-how network in the field of information technology.
“As Galera Cluster has become industry standard for MySQL high availability, we see more and more demand for Galera Cluster services.” said Matthias Jung, Area Manager data management. “We’re thrilled to expand our existing MySQL services with Galera Cluster support. We look forward to leveraging this strategic partnership with our customers in Germany.”
“We’re excited to partner with ORDIX,” said Sakari Keskitalo, Chief Operating Officer Codership. “Galera Cluster is trusted by thousands of companies all over the world. Germany is one of the biggest markets for Galera Cluster. ORDIX service organization offers tremendous value to our German customers who wish to assure that they are supported in the case of emergency and advised with best practices. By partnering with ORDIX we’re able to serve customers locally faster and in German language”.
More about ORDIX AG
ORDIX AG – “Consulting with focus on the essential, development with an eye for detail, and project management with a comprehensive overview. With over 30 years of experience, we propel digital transformation forward. From the initial idea to the roll-out and beyond, we deliver tailored IT-solutions all under one roof.”
More about Codership
Codership develops replication and clustering solutions for open source databases, adopting ideas from latest DBMS and distributed computing research to build fundamentally new high availability solutions. Our flagship product, Codership’s Galera Cluster for MySQL, provides high system uptime without data loss guaranteeing scalability for future growth. Galera is open-source product, and we offer high quality support to help our customers in increasing their business continuity and lower the total costs of ownership.
The post Codership partners with Ordix AG in Germany appeared first on MariaDB.org.
]]>The post Updated Insert benchmark: InnoDB/MySQL 5.6, 5.7 and 8.0, small server, IO-bound database appeared first on MariaDB.org.
]]>tl;dr
I used the cz10a_bee my.cnf files that are here for 5.6, for 5.7 and for 8.0. For 5.7 and 8.0 there are many variants of that file to make them work on a range of the point releases.
The post Updated Insert benchmark: InnoDB/MySQL 5.6, 5.7 and 8.0, small server, IO-bound database appeared first on MariaDB.org.
]]>The post Updated Insert benchmark: Postgres 9.x to 16.x, large server, cached database appeared first on MariaDB.org.
]]>tl;dr
Build + Configuration
The post Updated Insert benchmark: Postgres 9.x to 16.x, large server, cached database appeared first on MariaDB.org.
]]>The post Simplify the Use of ENV Variables in Percona Monitoring and Management AMI appeared first on MariaDB.org.
]]>The post Simplify the Use of ENV Variables in Percona Monitoring and Management AMI appeared first on MariaDB.org.
]]>The post MySQL vs PostgreSQL: Which is Better? Exploring Key Differences and Similarities appeared first on MariaDB.org.
]]>The post MySQL vs PostgreSQL: Which is Better? Exploring Key Differences and Similarities appeared first on MariaDB.org.
]]>The post Choosing the Best MySQL High Availability Solution: 20 Key Questions and Considerations appeared first on MariaDB.org.
]]>The post Choosing the Best MySQL High Availability Solution: 20 Key Questions and Considerations appeared first on MariaDB.org.
]]>The post Resources to help you get started with Galera Manager in 2024 appeared first on MariaDB.org.
]]>If you are not inclined to watch videos, we also have the appropriate blog posts:
While we haven’t updated the videos and blog posts around Galera Manager deploying Galera Clusters (or just monitoring them), they are still relevant to helping you get started in 2024.
The post Resources to help you get started with Galera Manager in 2024 appeared first on MariaDB.org.
]]>The post Deploying a MariaDB Galera Cluster with Galera Manager on your own on-premise hosts appeared first on MariaDB.org.
]]>So to start, we will deploy 3 hosts, running Ubuntu 22.04 LTS. These are just deployed with the base operating system (OS). You are advised to read the supported OS matrix which can change as releases abound. You will need a fourth pristine host to run Galera Manager (it can be a different OS, but for ease of installation and simplicity, we will keep it uniform). So with four provisioned hosts, you’re ready to get started with your Galera Manager + MariaDB Galera Cluster on-premise installation. Obtain Galera Manager by filling in the form.
Now you can login to the host that you’re installing Galera Manager on. Ensure that you are the root user.
Now grab the gm-installer either via scp or wget it to your host. The direct link is in the video or the documentation! It is time to make the installer executable, which you do by typing: chmod +x gm-installer. Verify the version:
./gm-installer version gm-installer version 1.12.0 (linux/amd64)
To get started, simply execute:
./gm-installer install
Accept the license agreement, enter the admin password, enter the IP (this means that you will get an install over insecure HTTP) or hostname (this install thus executes over secure HTTPS), and you’re on your way to getting your Galera Manager host installed.
Typically this installation process takes less than 5 minutes, as it has to pull in packages from multiple repositories. Yes, it goes without saying, the current install method requires access to the Internet (we do not support an offline install mode). It is also important to ensure that if you have a firewall, to open up TCP ports 80, 8081. 443 will also apply if you’re using HTTPS. Once the installation is complete, you will see something similar to the following:
█INFO[0299] Galera Manager installation finished. Enter http://206.189.153.240 in a web browser to access. Please note, you chose to use an unencrypted http protocol, such connections are prone to several types of security issues. Always use only trusted networks when connecting to the service. INFO[0299] Logs DB url: http://206.189.153.240:8081 IMPORTANT: ensure TCP ports 80, 8081 are open in firewall. INFO[0299] Below you can see Logs DB credentials: DB name: gmd DB user: gmd DB password: hOfuUXqZdQ The installation log is located at /tmp/gm-installer.log
Logon via the web URL. As you can see, by default there are no clusters, and when you click on it, you are given options to deploy a fully managed cluster, or to deploy a cluster on user-provided hosts, and finally just to monitor an existing cluster. For the purpose of this document, we are going to take option 2, which is “Deploy cluster on user-provided hosts”.
You now have to create the cluster type, and what database engine you are choosing. Don’t forget to give your cluster a name! Galera Manager can manage multiple clusters in one instance.
Almost immediately after, you’re told to copy the SSH keys into every host you plan to deploy to. A point to note, is that if you select the copy icon, and do this over HTTP, you will get an error saying, “Couldn’t copy public key”; this is a security measure, as the copy functionality only works over HTTPS. You will actually have to select, and manually copy the SSH key using Ctrl+C/Command+C, etc. You have to have this SSH key on all hosts in /root/.ssh/authorized_keys. Only once you have ticked “I have added public key to /root/.ssh/authorized_keys file on all nodes” can you move on in the installation.
Now it is time to click: Add node. The popup should now enable you to add an SSH address, and don’t forget to give it a node name. One thing we like to ensure is that the SSH access is working, so you really should click Check Access and ensure that the SSH connection is working sufficiently. Then go right ahead and Deploy.
This is a blocking operation, and you can only deploy each node one-by-one. Repeat the process for nodes 2 and 3.
Voila! You now have a 3-node MariaDB Galera Cluster, deployed by Galera Manager, and one you can fully manage through the GUI.
You can SSH into any of your hosts, and you will be able to login without a password, and execute mysql to login.
With Galera Manager, you can also stop the node, restart it, delete a node, but better yet, you can also have a browser-based SSH Terminal. Naturally you can add monitoring metrics, change the frequency, and so much more.
Happy deploying your MariaDB Galera Clusters in Amazon EC2 with Galera Manager!
The post Deploying a MariaDB Galera Cluster with Galera Manager on your own on-premise hosts appeared first on MariaDB.org.
]]>The post Deploying a MariaDB Galera Cluster with Galera Manager automatically on Amazon Web Services (AWS) appeared first on MariaDB.org.
]]>On AWS EC2, it is worth noting that Galera Manager itself can be deployed on the free tier for testing purposes. However, in production environments, you might expect up to 100GB of logs on a monthly basis, so you should plan accordingly.
Obtain Galera Manager by filling in the form. Logon to your AWS Console. Launch just one EC2 instance. You are advised to read the supported OS matrix which can change as releases abound; for this particular example, we will use a base of Ubuntu Server 22.04 LTS. Please ensure to use the 64-bit (x86) option, not the Arm variant, as Galera Manager is meant for x86_64 platforms only. Either create a new key pair, or ensure you already have an existing key pair. We cover all this in the first minute of the video. The rest of the defaults are fine (you can tick the Allow HTTPS and HTTP traffic from the internet as options), so go right ahead and launch an instance.
Now you’ll need to login, and you can do so similarly:
ssh -i gmd.pem ubuntu@3.64.252.66
You can now execute:
sudo su
to become the root user, then type cd to ensure that your current working directory is /root. Now grab the gm-installer either via scp or wget it to your host. The direct link is in the video or the documentation! It is time to make the installer executable, which you do by typing: chmod +x gm-installer. Verify the version:
./gm-installer version gm-installer version 1.12.0 (linux/amd64)
To get started, simply execute:
./gm-installer install
Accept the license agreement, enter the admin password, enter the IP (this means that you will get an install over insecure HTTP) or hostname (this install thus executes over secure HTTPS), and you’re on your way to getting your Galera Manager host installed.
Typically this installation process takes less than 5 minutes, as it has to pull in packages from multiple repositories. Once the installation is complete, you will see something similar to the following:
▋INFO[0218] Galera Manager installation finished. Enter http://3.64.252.66 in a web browser to access. Please note, you chose to use an unencrypted http protocol, such connections are prone to several types of security issues. Always use only trusted networks when connecting to the service. INFO[0218] Logs DB url: http://3.64.252.66:8081 IMPORTANT: ensure TCP ports 80, 8081 are open in firewall. INFO[0218] Below you can see Logs DB credentials: DB name: gmd DB user: gmd DB password: Soq3EXzYcn The installation log is located at /tmp/gm-installer.log
Typically this tells you how to access Galera Manager. It also tells you that you need to open up ports 80 and 8081. And if anything did go wrong, you will be able to find out more at the installer log.
So let us do so within Amazon’s configuration for inbound rules in Security groups: opening up TCP ports 80, 8081. 443 will also apply if you’re using HTTPS. It is at 5:30 in the video.
Enter the URL and you will now see the login screen, fill in your credentials that you entered on the command line earlier. As you can see, by default there are no clusters, and when you click on it, you are given options to deploy a fully managed cluster, or to deploy a cluster on user-provided hosts, and finally just to monitor an existing cluster. For the purpose of this document, we are going to take option 1 and deploy fully managed clusters.
You’ll notice that you’re asked for an AWS Access Key ID and an AWS Secret Access Key. It is only with that, and the ability to pass the credential check, that you’ll be able to select a region and instance type. Go back to your AWS console and get to Security Credentials by clicking on your name on the top right hand corner. Then you should create an Access Key and you will be able to retrieve your access key and show your secret access key. Copy and paste those details into your Galera Manager setup. If there are errors, you will know, and if everything is green, you’re good to select a region and instance type.
In this example, we will continue using the eu-central-1 region and use a t2.medium instance for the 3 MariaDB Galera Cluster nodes. Note that you are unlikely to be able to deploy successfully on anything smaller, so it is also disabled in the Galera Manager GUI.
You can now just go right ahead and click ADD NODES. Ensure that you’re adding 3 nodes, and let Galera Manager do the magic of deploying for you. In under five minutes or so, you’ll likely have a 3-node MariaDB Galera Cluster running for you.
We opted to login using the SSH terminal within the web browser and can verify we have deployed a 3-node Galera Cluster.
Happy deploying your MariaDB Galera Clusters in Amazon EC2 with Galera Manager!
The post Deploying a MariaDB Galera Cluster with Galera Manager automatically on Amazon Web Services (AWS) appeared first on MariaDB.org.
]]>The post Deploying a Percona XtraDB Cluster (PXC) with Galera Manager on your own on-premise hosts appeared first on MariaDB.org.
]]>So to start, we will deploy 3 hosts, running Ubuntu 22.04 LTS. These are just deployed with the base operating system (OS). You are advised to read the supported OS matrix which can change as releases abound. You will need a fourth pristine host to run Galera Manager (it can be a different OS, but for ease of installation and simplicity, we will keep it uniform). So with four provisioned hosts, you’re ready to get started with your Galera Manager + Percona XtraDB Cluster on-premise installation.
A point to note is that if you are trying to do the above, i.e. deploy your hosts on DigitalOcean, it is worth noting that it currently does not work, as Galera Manager does not support automatic deployment on that platform, and the base OS image that they provide, makes Percona XtraDB Cluster (PXC) fail. It will however work on Amazon EC2, and any other base OS image (e.g. Hetzner, OVH, etc.).
Obtain Galera Manager by filling in the form.
Now you can login to the host that you’re installing Galera Manager on. Ensure that you are the root user.
Now grab the gm-installer either via scp or wget it to your host. The direct link is in the video or the documentation! It is time to make the installer executable, which you do by typing: chmod +x gm-installer. Verify the version:
./gm-installer version gm-installer version 1.12.0 (linux/amd64)
To get started, simply execute:
./gm-installer install
Accept the license agreement, enter the admin password, enter the IP (this means that you will get an install over insecure HTTP) or hostname (this install thus executes over secure HTTPS), and you’re on your way to getting your Galera Manager host installed.
Typically this installation process takes less than 5 minutes, as it has to pull in packages from multiple repositories. Yes, it goes without saying, the current install method requires access to the Internet (we do not support an offline install mode). It is also important to ensure that if you have a firewall, to open up TCP ports 80, 8081. 443 will also apply if you’re using HTTPS. Once the installation is complete, you will see something similar to the following:
█INFO[0299] Galera Manager installation finished. Enter http://206.189.153.240 in a web browser to access. Please note, you chose to use an unencrypted http protocol, such connections are prone to several types of security issues. Always use only trusted networks when connecting to the service. INFO[0299] Logs DB url: http://206.189.153.240:8081 IMPORTANT: ensure TCP ports 80, 8081 are open in firewall. INFO[0299] Below you can see Logs DB credentials: DB name: gmd DB user: gmd DB password: hOfuUXqZdQ The installation log is located at /tmp/gm-installer.log
Logon via the web URL. As you can see, by default there are no clusters, and when you click on it, you are given options to deploy a fully managed cluster, or to deploy a cluster on user-provided hosts, and finally just to monitor an existing cluster. For the purpose of this document, we are going to take option 2, which is “Deploy cluster on user-provided hosts”.
You now have to create the cluster type, and what database engine you are choosing. Don’t forget to give your cluster a name! Galera Manager can manage multiple clusters in one instance.
Almost immediately after, you’re told to copy the SSH keys into every host you plan to deploy to. A point to note, is that if you select the copy icon, and do this over HTTP, you will get an error saying, “Couldn’t copy public key”; this is a security measure, as the copy functionality only works over HTTPS. You will actually have to select, and manually copy the SSH key using Ctrl+C/Command+C, etc. You have to have this SSH key on all hosts in /root/.ssh/authorized_keys. Only once you have ticked “I have added public key to /root/.ssh/authorized_keys file on all nodes” can you move on in the installation.
Now it is time to click: Add node. The popup should now enable you to add an SSH address, and don’t forget to give it a node name. One thing we like to ensure is that the SSH access is working, so you really should click Check Access and ensure that the SSH connection is working sufficiently. Then go right ahead and Deploy.
This is a blocking operation, and you can only deploy each node one-by-one. Repeat the process for nodes 2 and 3.
Voila! You now have a 3-node Percona XtraDB Cluster, deployed by Galera Manager, and one you can fully manage through the GUI.
You can SSH into any of your hosts, and you will be able to login without a password, and execute mysql to login.
With Galera Manager, you can also stop the node, restart it, delete a node, but better yet, you can also have a browser-based SSH Terminal. Naturally you can add monitoring metrics, change the frequency, and so much more.
Happy deploying your Percona XtraDB Clusters in Amazon EC2 with Galera Manager!
The post Deploying a Percona XtraDB Cluster (PXC) with Galera Manager on your own on-premise hosts appeared first on MariaDB.org.
]]>The post Deploying a Percona XtraDB Cluster (PXC) with Galera Manager automatically on Amazon Web Services appeared first on MariaDB.org.
]]>On AWS EC2, it is worth noting that Galera Manager itself can be deployed on the free tier for testing purposes. However, in production environments, you might expect up to 100GB of logs on a monthly basis, so you should plan accordingly.
Obtain Galera Manager by filling in the form. Logon to your AWS Console. Launch just one EC2 instance. You are advised to read the supported OS matrix which can change as releases abound; for this particular example, we will use a base of Ubuntu Server 22.04 LTS. Please ensure to use the 64-bit (x86) option, not the Arm variant, as Galera Manager is meant for x86_64 platforms only. Either create a new key pair, or ensure you already have an existing key pair. We cover all this in the first minute of the video. The rest of the defaults are fine (you can tick the Allow HTTPS and HTTP traffic from the internet as options), so go right ahead and launch an instance.
Now you’ll need to login, and you can do so similarly:
ssh -i gmd.pem ubuntu@3.64.252.66
You can now execute:
sudo su
to become the root user, then type cd to ensure that your current working directory is /root. Now grab the gm-installer either via scp or wget it to your host. The direct link is in the video or the documentation! It is time to make the installer executable, which you do by typing: chmod +x gm-installer. Verify the version:
./gm-installer version gm-installer version 1.12.0 (linux/amd64)
To get started, simply execute:
./gm-installer install
Accept the license agreement, enter the admin password, enter the IP (this means that you will get an install over insecure HTTP) or hostname (this install thus executes over secure HTTPS), and you’re on your way to getting your Galera Manager host installed.
Typically this installation process takes less than 5 minutes, as it has to pull in packages from multiple repositories. Once the installation is complete, you will see something similar to the following:
▋INFO[0218] Galera Manager installation finished. Enter http://3.64.252.66 in a web browser to access. Please note, you chose to use an unencrypted http protocol, such connections are prone to several types of security issues. Always use only trusted networks when connecting to the service. INFO[0218] Logs DB url: http://3.64.252.66:8081 IMPORTANT: ensure TCP ports 80, 8081 are open in firewall. INFO[0218] Below you can see Logs DB credentials: DB name: gmd DB user: gmd DB password: Soq3EXzYcn The installation log is located at /tmp/gm-installer.log
Typically this tells you how to access Galera Manager. It also tells you that you need to open up ports 80 and 8081. And if anything did go wrong, you will be able to find out more at the installer log.
So let us do so within Amazon’s configuration for inbound rules in Security groups: opening up TCP ports 80, 8081. 443 will also apply if you’re using HTTPS. It is at 4:05 in the video.
Enter the URL and you will now see the login screen, fill in your credentials that you entered on the command line earlier. As you can see, by default there are no clusters, and when you click on it, you are given options to deploy a fully managed cluster, or to deploy a cluster on user-provided hosts, and finally just to monitor an existing cluster. For the purpose of this document, we are going to take option 1 and deploy fully managed clusters.
You’ll notice that you’re asked for an AWS Access Key ID and an AWS Secret Access Key. It is only with that, and the ability to pass the credential check, that you’ll be able to select a region and instance type. Go back to your AWS console and get to Security Credentials by clicking on your name on the top right hand corner. Then you should create an Access Key and you will be able to retrieve your access key and show your secret access key. Copy and paste those details into your Galera Manager setup. If there are errors, you will know, and if everything is green, you’re good to select a region and instance type.
In this example, we will continue using the eu-central-1 region and use a t2.medium instance for the 3 Percona XtraDB Cluster nodes. Note that you are unlikely to be able to deploy successfully on anything smaller, so it is also disabled in the Galera Manager GUI.
You can now just go right ahead and click ADD NODES. Ensure that you’re adding 3 nodes, and let Galera Manager do the magic of deploying for you. In under five minutes or so, you’ll likely have a 3-node Percona XtraDB Cluster running for you.
You can opt to login using the SSH terminal within the web browser and can verify we have deployed a 3-node Percona XtraDB Cluster (PXC).
Happy deploying your Percona XtraDB Clusters in Amazon EC2 with Galera Manager!
The post Deploying a Percona XtraDB Cluster (PXC) with Galera Manager automatically on Amazon Web Services appeared first on MariaDB.org.
]]>The post Packages in MariaDB default mode appeared first on MariaDB.org.
]]>A package is a group of routines (procedures or functions) for which I can CREATE and GRANT and DROP as a unit, all at once.
Roland Bouman wrote a feature request for it in 2005 for MySQL, but MySQL hasn’t got it yet, the workaround is to create whole databases. MariaDB has had CREATE PACKAGE since version 10.3 but only when sql_mode=’oracle’, and only with Oracle syntax (“PL/SQL”) for defining the routines.
Now MariaDB has CREATE PACKAGE with the default sql_mode, i.e. anything except sql_mode=’oracle’, and with ordinary standard-like syntax (“SQL/PSM”) for defining the routines. But it’s a bit of a hybrid because, although the routine definitions within the package are SQL/PSM, the CREATE PACKAGE statements themselves are not.
CREATE PACKAGE is a PL/SQL statement. CREATE MODULE is the SQL/PSM statement for something functionally very similar.
Here I compare the way MariaDB creates packages versus the way the standard prescribes for modules. I ignore trivial clauses that appear in most CREATE statements.
The MariaDB way
+------------------------------------------------------------+ | CREATE PACKAGE package_name | | [ COMMENT or SQL SECURITY clause ... ] | | [ FUNCTION | PROCEDURE name + COMMENT or SQL clauses ... ] | | END | +------------------------------------------------------------+ +-------------------------------+ | CREATE PACKAGE BODY | | [ variable declaration ... ] | | | routine definition ... ] | | END | +-------------------------------+
The standard way
+-------------------------------------+ | CREATE MODULE module_name | | [ NAMES ARE character_set_name ] | [ [ SCHEMA default_schema_name ] | [ [ path specification ] | | [ temporary table declaration ... ] | | [DECLARE] routine-definition; ... ] | END MODULE | +-------------------------------------+
The most prominent vendor with CREATE PACKAGE is of course Oracle, but others, for example PostgreSQL and IBM, have it too.
The most prominent vendor with CREATE MODULE is IBM but Mimer has it too.
So the absolute smallest example of statements that have all the relevant features is:
CREATE PACKAGE pkg1 PROCEDURE p1(); FUNCTION f1() RETURNS INT; END; CREATE PACKAGE BODY pkg1 DECLARE var1 INT; FUNCTION f1() RETURNS INT RETURN var1; PROCEDURE p1() SELECT f1(); SET var1=1; END; SELECT pkg1.f1(); CALL pkg1.p1(); SHOW CREATE PACKAGE pkg1; SHOW CREATE PACKAGE BODY pkg1; GRANT EXECUTE ON PACKAGE db.pkg TO PUBLIC; DROP PACKAGE pkg1;
In the Canadian Football League there used to be an official term “non-import” for a player who, essentially, wasn’t from the States or Europe or Samoa etc. This caused some complaint because there were simpler terms, like, um, “Canadian” or “national” i.e. native.
Eventually the League realized that adding “non-” was being negative about the default player situation.
I was reminded of that when reading the MariaDB manual, which now has split up the sections for CREATE PACKAGE and CREATE PACKAGE BODY to put “Oracle mode” and “non-Oracle mode”. I am hopeful that someday MariaDB, like the Canadian Football League, will come up with a less negative term such as “default”, or “when sql_mode is the default”. Also I am hopeful — here I speak as the former head of documentation for MySQL — that there will be rearrangement so that the default is shown first, as it will be more important than sql_mode=’oracle’, won’t it?
Another change will happen soon — perhaps by the time you read this — to the BNF. Currently it is
CREATE [ OR REPLACE] [DEFINER = { user | CURRENT_USER | role | CURRENT_ROLE }] PACKAGE [ IF NOT EXISTS ] [ db_name . ] package_name [ package_characteristic ... ] [ package_specification_element ... ] END [ package_name ]
… which is wrong, adding [ package_name ] after END will just cause an error.
And later
package_specification_function: func_name [ ( func_param [, func_param]... ) ] RETURN func_return_type [ package_routine_characteristic... ]
… which is wrong, it should be RETURNS not RETURN.
Also, since CREATE FUNCTION documentation says “RETURNS type” not “RETURNS func_return_type”, there’s no need to introduce a new term here.
As for CREATE PACKAGE BODY the default mode BNF is undocumented, only Oracle mode BNF is documented. So my description above might be missing some detail, for example maybe it’s possible somehow to declare package-wide cursors and handlers as well as variables.
I see two package-related error messages in sql/share/errmsg-utf8.txt
"Subroutine '%-.192s' is declared in the package specification but is not defined in the package body"
and
"Subroutine '%-.192s' has a forward declaration but is not defined"
… which is wrong, there is no such thing as a subroutine, the term is “routine”. (Oracle has a thing called “subprogram” but it too would be a wrong term.)
After I create a package named pkg6 with a procedure p1, if I say
DROP PROCEDURE pkg6.p1;
I get told “PROCEDURE pkg6.p1 does not exist”.
… which is wrong, pkg6.p1 does exist, I can CALL it. It would be better to re-use the message “The used command is not allowed with this MariaDB version”. (Yes, it’s a statement not a command, but I can’t ask for the moon.)
If I say
GRANT EXECUTE ON PACKAGE no_such_package TO PUBLIC;
I get told “FUNCTION or PROCEDURE no_such_package does not exist”
which is wrong, I’m trying to grant on a nonexistent package not a nonexistent routine.
Suppose we have a package named pkg containing a procedure p1. “CALL p1();” is legal inside another routine in the same package, but outside the package we have to add a qualifier: “CALL pkg.p1();”.
Here is an example that shows why this is dangerous. (Delimiters added so mysql client understands.)
DROP DATABASE pkg; DROP PACKAGE pkg; CREATE DATABASE pkg; CREATE PROCEDURE pkg.p1() SELECT 'database'; CALL pkg.p1(); DELIMITER $ CREATE PACKAGE pkg PROCEDURE p1(); END; $ DELIMITER ; DELIMITER $ CREATE PACKAGE BODY PKG PROCEDURE p1() SELECT 'package'; END; $ DELIMITER ; CALL pkg.p1();
…
The first “CALL pkg.p1();” will display “database”, the second “CALL pkg.p1();” will display “package”. The package has shadowed the database!
People can avoid the danger by adopting a naming convention that database names and package names will always have different prefixes, but they won’t.
Or people can “fully” qualify the package’s P1 by saying “CALL [database_name.].[package_name.].p1();”. But they cannot “fully” qualify the database’s P1 by saying “CALL [catalog_name.][database_name.]p1();” — you’ll see a CATALOG_NAME column in INFORMATION_SCHEMA tables, but it is useless.
Therefore MariaDB should emit a warning message when there’s ambiguity, or support a different qualifier syntax. I’m hopeful that will happen in some future version.
By the way, Mimer “solves” this by disallowing: “The module name is never used to qualify the name of a routine.” It’s unstated, but I suppose this would mean that no two procedures can have the same name in the same schema, even if they are in different packages of the schema.
Also the standard allows SCHEMA and PATH which might be another way to evade the ambiguity, but it’s not necessary.
The obvious question after creation is: how can I see what’s in a package?
SHOW CREATE PACKAGE works. SHOW CREATE PACKAGE BODY works.
SHOW PACKAGE STATUS works. SHOW PACKAGE BODY STATUS works.
But they’re SHOW statements and therefore they’re no good.
In INFORMATION_SCHEMA.ROUTINES the package will appear with routine_type = ‘PACKAGE’ and routine_definition = ‘procedure pkg(); end’.
This is odd because
(a) a package is not a routine
(b) there is no procedure named pkg
(c) the actual routine is not a row in information_schema!
I can dig the routine out of another row that has routine_type = ‘PACKAGE BODY’ but I can do it because I have an SQL parser available, other people would be stalled because the body is a mishmash of routines and contents.
Similar cluttering occurs for mysql.proc, although at least there I see PROCEDURE and FUNCTION entries. Remember that the the ‘body’ field might be blank unless you have appropriate privileges.
The obvious answer, similar to what the standard has, is: put routines in INFORMATION_SCHEMA.ROUTINES, and add a PACKAGE_NAME column. Probably something needs to be added to mysql.proc too. Until that happens, since SHOW is not useful, getting metadata for package routines is awkward.
The answer hasn’t appeared in code yet but I’ll assume that what’s obvious will happen.
I can declare variables that are accessible from all routines in the package. This is possible in CREATE PACKAGE BODY and alas might soon be in CREATE PACKAGE too, if this is done.
Here is an illustration.
DELIMITER $ CREATE OR REPLACE PACKAGE BODY pkg -- variable declarations DECLARE a INT DEFAULT 11; DECLARE b INT DEFAULT 10; FUNCTION f1() RETURNS INT BEGIN SET a=a-1; RETURN a; END; -- routine declarations PROCEDURE p1() BEGIN SELECT a,f1(),a; END; -- package initialization section SET a=a-b; END; $ DELIMITER ;
And the question is: what should “CALL pkg.p1();” display?
If you guessed 1, 0, 0 then good for you, but notice what’s unpleasant here. First: we have a procedure’s variable’s value being changed in a way that the procedure doesn’t see. Second: the value changes between the first time it’s selected and the second time it’s selected, in the same statement.
Now, This won’t startle any experienced person, since MariaDB user variables (the ones whose names start with ‘@’) have always worked that way. But I can’t think of any case where that can happen with a DECLAREd variable, so it might startle people who have only worked with standard-like syntax.
I like globals, but I am just expecting that some people will consider it should be noted in a style guide. One of the suggestions I’ve seen (for Oracle) is that package variables are a way to do “constants”. I must emphasize, though, that I’m only talking about what some people might like in style guides, and I’m recognizing that many more people will see an advantage to sharing dynamic variables.
Suppose I say
CREATE PACKAGE pkg12 PROCEDURE p1(); END; CREATE PACKAGE BODY pkg12 PROCEDURE p0() SELECT 5; PROCEDURE p1() CALL p0(); END; CALL pkg12.p1() /* This succeeds */; CALL pkg12.p0() /* This fails */;
Thus p0 is not in CREATE PACKAGE but p0 is in CREATE PACKAGE BODY. That is legal provided p0 comes before p1 (no forward references please). In this case p1 is a “public” routine — I can CALL pkg12.p1() from outside the package. However, p0 is a “private” routine — I cannot CALL pkg12.p0() from outside the package. I will see “Error 1305 (42000) PROCEDURE pkg12.p0 does not exist”.
Nothing against private, but since pkg12.p0 does exist, I think a message that’s more explicit would help somebody in ages to come. Otherwise, it should be made obvious. Probably a naming convention would be a good way to do that. A comment would not be a good way because many clients, including mysql and ocelotgui, have –skip-comments as a default.
To allow CREATE PACKAGE (example);
GRANT CREATE ROUTINE ON w2.* TO k@localhost;
To allow EXECUTE of a package (example):
GRANT EXECUTE ON PACKAGE w2.pkg TO k@localhost;
This is a good thing, the usual privileges affecting routines will affect packages, as a whole. It’s a bit odd that a qualifier is necessary for GRANT but not for CALL; however.
To allow SHOW CREATE PACKAGE (example):
GRANT EXECUTE ON PACKAGE w2.pkg TO k@localhost; GRANT ALTER ROUTINE ON PACKAGE w2.pkg TO k@localhost;
This is a strange thing, currently one way to make SHOW CREATE possible is to grant ALTER ROUTINE.
MariaDB has eleven ALTER statements, but ALTER PACKAGE is not one of them. Given that Oracle has one, and DB2 has ALTER MODULE, and it’s mentioned in a MariaDB document, I expect this will eventually be added with an excuse of “orthogonality”.
The debugger in the Ocelot GUI does not yet work with routines inside packages. However, in a version which will be released soon, the “recognizer” will see MariaDB 11.4 syntax and be able to alert typists about what syntax is expected as they type, the same experience that they get for other statements.
This enhancement is already in the source code, in this patch.
The post Packages in MariaDB default mode appeared first on MariaDB.org.
]]>The post MySQL Table Size Is Way Bigger After Adding a Simple Index; Why? appeared first on MariaDB.org.
]]>The post MySQL Table Size Is Way Bigger After Adding a Simple Index; Why? appeared first on MariaDB.org.
]]>The post Accelerating MariaBackup with Intel QuickAssist appeared first on MariaDB.org.
]]>Continue reading “Accelerating MariaBackup with Intel QuickAssist”
The post Accelerating MariaBackup with Intel QuickAssist appeared first on MariaDB.org.
The post Accelerating MariaBackup with Intel QuickAssist appeared first on MariaDB.org.
]]>The post Explaining a performance regression in Postgres 14 appeared first on MariaDB.org.
]]>The primary problem appears to be more CPU used by the query planner for DELETE statements when the predicates in the WHERE clause have constants that fall into either the max or min histogram bucket for a given column. An example is a DELETE statement like the following and transactionid is the primary key so there is an index on it.
delete from t1 where (transactionid>=100 and transactionid<110)
The table is used like a queue — inserts are done in increasing order with respect to transactionid and when N rows are inserted, then N more rows are deleted to keep the size of the table constant. The rows to be deleted are the N rows with the smallest value for transactionid.
The problem is worse for IO-bound workloads (see here) than for cached workloads (see here) probably because the extra work done by the query planner involves accessing the index and possibly reading data from storage.
It is always possible I am doing something wrong but I suspect there is a fixable performance regression in Postgres 14 for this workload. The workload is explained here and note that vacuum (analyze) is done between the write-heavy and read-heavy benchmark steps.
Request 1
Can the query planner to do less work when there is only one index that should be used? The full DDL for the table is here.
An abbreviated version of the DDL is below and the PK is on transactionid which uses a sequence.
For a DELETE statement like the following, the only efficient index is pi1_pkey. So I prefer that the query planner do less work to figure that out.
delete from t1 where (transactionid>=100 and transactionid<110)
CPU overhead
When I run the Insert Benchmark there are 6 read-write benchmark steps — 3 that do range queries as fast as possible, 3 that do point queries as fast as possible. For all of them there are also inserts and deletes done concurrent with the range queries and they are rate limited — first at 100 inserts/s and 100 deletes/s, then at 500 inserts/s and 500 deletes/s and finally at 1000 inserts/s and 1000 deletes/s. So the work for writes (inserts & deletes) is fixed per benchmark step while the work done by queries is not. Also, for each benchmark step there are three connections — one for queries, one for inserts, one for deletes.
Using separate connections makes it easier to spot changes in CPU overhead and below I show the number of CPU seconds for the range query benchmark steps (qr100, qr500, qr1000) where the number indicates the write (insert & delete) rate. Results are provided for Postgres 13.13 and 14.10 from the benchmark I described here (small server, IO-bound).
From below I see two problems. First, the CPU overhead for the delete connection is much larger with Postgres 14.10 for all benchmark steps (qr100, qr500, qr1000). Second, the CPU overhead for the query connection is much larger with Postgres 14.10 for qr1000, the benchmark step with the largest write rate.
Debugging after the fact: CPU profiling
I repeated the benchmark for Postgres 13.13 and 14.10 and after it finished repeated the qr100 benchmark step a few times for each of Postgres 13.13 and 14.10. The things that I measure here don’t match exactly what happens during the benchmark because the database might be in a better state with respect to write back and vacuum.
While this is far from scientific, I used explain analyze on a few DELETE statements some time after they were used. The results are here. I repeated the statement twice for each Postgres release and the planning time for the first explain is 49.985ms for Postgres 13.13 vs 100.660ms for Postgres 14.10.
So I assume the problem is the CPU overhead from the planner and not from executing the statement.
Then I looked at the CPU seconds used by the connection that does deletes after running for 10 minutes and it was ~50s for Postgres 13.13 vs ~71s for 14.10. So the difference at this point is large, but much smaller than what I report above which means the things I want to spot via CPU profiling might be harder to spot. Also, if the problem is IO latency rather than CPU overhead then CPU profiling won’t be as useful.
This gist has the top-5 call stacks from hierarchical profiling with perf for the connection that does deletes. While there isn’t an obvious difference between Postgres 13.13 and 14.10 there is something I don’t like — all stacks are from the query planner and include the function get_actual_variable_range.
IO profiling
It looks like the query planner does more read IO for delete statements in Postgres 14.10 than in 13.13.
From the full benchmark I see the following for the range query benchmark steps which means there is more read IO (see rps column) with Postgres 14.10 for the qr100 and qr500 benchmark steps but not with the qr1000 benchmark step. And in call cases the range query rate (see qps column) is significantly less with Postgres 14.10.
The post Explaining a performance regression in Postgres 14 appeared first on MariaDB.org.
]]>The post DBAs’ Inconceivable Tales: A Rare Cause of Replication Lag appeared first on MariaDB.org.
]]>The post DBAs’ Inconceivable Tales: A Rare Cause of Replication Lag appeared first on Shattered Silicon.
The post DBAs’ Inconceivable Tales: A Rare Cause of Replication Lag appeared first on MariaDB.org.
]]>The post Updated Insert benchmark: Postgres 9.x to 16.x, small server, IO-bound database appeared first on MariaDB.org.
]]>Comparing throughput in Postgres 16.1 to 9.0.23
Build + Configuration
The post Updated Insert benchmark: Postgres 9.x to 16.x, small server, IO-bound database appeared first on MariaDB.org.
]]>The post Unexpected Stalled Upgrade to MySQL 8.0 appeared first on MariaDB.org.
]]>The post Unexpected Stalled Upgrade to MySQL 8.0 appeared first on MariaDB.org.
]]>The post Updated Insert benchmark: InnoDB/MySQL 5.6, 5.7 and 8.0, small server, cached database appeared first on MariaDB.org.
]]>tl;dr
I used the cz10a_bee my.cnf files that are here for 5.6, for 5.7 and for 8.0. For 5.7 and 8.0 there are many variants of that file to make them work on a range of the point releases.
The post Updated Insert benchmark: InnoDB/MySQL 5.6, 5.7 and 8.0, small server, cached database appeared first on MariaDB.org.
]]>The post Updated Insert benchmark: Postgres 9.x to 16.x, small server, cached database, v3 appeared first on MariaDB.org.
]]>In previous blog posts I claimed that there are large regressions from old to new MySQL but not from old to new Postgres. And I shared results for MySQL 5.6, 5.7 and 8.0 along with Postgres versions 10 through 16. A comment about these results is the comparison was unfair because the first GA MySQL 5.6 release is 5.6.10 from 2013 while the first Postgres 10 GA release is 10.0 from 2017.
Here I have results going back to Postgres 9.0.23 and the first 9.0 release is 9.0.0 from 2010.
tl;dr
Build + Configuration
The post Updated Insert benchmark: Postgres 9.x to 16.x, small server, cached database, v3 appeared first on MariaDB.org.
]]>The post How to Implement Encryption at Rest Using Hashicorp Vault and MariaDB appeared first on MariaDB.org.
]]>The post How to Implement Encryption at Rest Using Hashicorp Vault and MariaDB appeared first on MariaDB.org.
]]>The post Galera Cluster for Debian 12 “bookworm” is now available appeared first on MariaDB.org.
]]>As always we have binary installation documentation available, and if you are just after a quick install, the following is what is in your /etc/apt/sources.list.d/galera.list:
deb https://releases.galeracluster.com/galera-4.17/debian bookworm main
deb https://releases.galeracluster.com/mysql-wsrep-8.0.35-26.16/debian bookworm main
Happy installing, and the next step is to ensure that Debian 12 is enabled on Galera Manager, which is a very popular feature request.
The post Galera Cluster for Debian 12 “bookworm” is now available appeared first on MariaDB.org.
]]>The post FOSDEM Fringe Schedule is Up appeared first on MariaDB.org.
]]>Continue reading “FOSDEM Fringe Schedule is Up”
The post FOSDEM Fringe Schedule is Up appeared first on MariaDB.org.
The post FOSDEM Fringe Schedule is Up appeared first on MariaDB.org.
]]>The post MariaDB C++ Connector 1.0.3 now available appeared first on MariaDB.org.
]]>The post MariaDB C++ Connector 1.0.3 now available appeared first on MariaDB.org.
]]>The post InnoDB Locking Mechanisms Explained: From Flush Locks to Deadlocks appeared first on MariaDB.org.
]]>InnoDB uses flush locks primarily for managing the flushing of dirty pages (modified data that hasn’t been written to disk) from the buffer pool to disk. These locks are internal to InnoDB and not directly exposed to database users. They are used to ensure consistency between the in-memory buffer pool and the on-disk data.
Meta locks (metadata locks) are used to manage access to database objects like tables, ensuring that structural changes (like dropping a table) don’t occur while queries that access the table are running.
DROP TABLE
, ALTER TABLE
) and DML (Data Manipulation Language) operations (like SELECT
, INSERT
, UPDATE
, DELETE
).Schema locks are similar to meta locks but are specifically used to protect the schema or structure of a database object. They prevent simultaneous operations that could modify the database schema.
This is the most granular level of locking in InnoDB, allowing multiple transactions to work on different rows of the same table concurrently.
Gap locks are a type of record-level lock in InnoDB, but instead of locking a single row, they lock a range of records.
A deadlock occurs when two or more transactions are waiting for each other to release locks, creating a cycle of dependency with no resolution.
Understanding these locks is vital for database administrators and developers working with InnoDB in MySQL. Optimal use of these locking mechanisms can significantly affect the performance, scalability, and reliability of applications interacting with the database.
The post InnoDB Locking Mechanisms Explained: From Flush Locks to Deadlocks appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post InnoDB Locking Mechanisms Explained: From Flush Locks to Deadlocks appeared first on MariaDB.org.
]]>The post Tips and Tricks for troubleshooting MySQL Thread Cache performance in high concurrent update applications appeared first on MariaDB.org.
]]>Troubleshooting thread cache performance in high-concurrency update applications in MySQL involves several strategies:
Threads_created
status variable; a high number indicates a too small thread cache.thread_cache_size
. While increasing it can improve performance under high concurrency, be cautious of using too much memory.In conclusion, effectively troubleshooting thread cache performance in MySQL for high-concurrency update applications involves careful monitoring and adjustment of the thread cache size. It’s crucial to balance the thread cache against the specific load and connection patterns of the server. Using tools like MySQL’s Performance Schema can provide valuable insights for optimization. Remember, changes to server configurations should always be tested and monitored to ensure they positively impact performance. This approach helps in achieving an optimized, efficient environment for handling high-concurrency scenarios.
The post Tips and Tricks for troubleshooting MySQL Thread Cache performance in high concurrent update applications appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Tips and Tricks for troubleshooting MySQL Thread Cache performance in high concurrent update applications appeared first on MariaDB.org.
]]>The post Optimizing Query Performance in PostgreSQL 16 with the Advanced auto_explain Extension appeared first on MariaDB.org.
]]>auto_explain
extension in PostgreSQL 16, you should first load it into the server. This can be done by adding auto_explain
to either session_preload_libraries
or shared_preload_libraries
in the postgresql.conf
file. This setup allows you to track slow queries as they occur.
The auto_explain
module has several configurable parameters:
auto_explain.log_min_duration
: Sets the minimum execution time for a statement to have its plan logged.auto_explain.log_analyze
: Enables the logging of EXPLAIN ANALYZE
output.auto_explain.log_buffers
and auto_explain.log_wal
: Control the logging of buffer and WAL usage statistics.auto_explain.log_timing
: Toggles the logging of per-node timing information.auto_explain.log_triggers
: Includes trigger execution statistics in logs.auto_explain.log_verbose
: Enables verbose output.auto_explain.log_settings
: Logs information about modified configuration options.auto_explain.log_format
: Sets the output format (text, xml, json, yaml).auto_explain.log_level
: Determines the log level for the query plan.auto_explain.log_nested_statements
: Controls whether nested statements are logged.auto_explain.sample_rate
: Sets the fraction of statements to explain in each session.For example, to log every query with its execution plan, you can set auto_explain.log_min_duration
to 0
and enable auto_explain.log_analyze
.
Remember, enabling some of these features, especially auto_explain.log_analyze
, can impact performance due to the overhead of collecting detailed statistics.
For detailed information and examples, please refer to the PostgreSQL 16 documentation on auto_explain.
The post Optimizing Query Performance in PostgreSQL 16 with the Advanced auto_explain Extension appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Optimizing Query Performance in PostgreSQL 16 with the Advanced auto_explain Extension appeared first on MariaDB.org.
]]>The post Composite Indexes in PostgreSQL appeared first on MariaDB.org.
]]>The functionality of composite indexes lies in the creation of a unique data structure. This structure stores the values of the specified columns along with a pointer to the row containing them. This setup allows PostgreSQL to bypass scanning the entire table to locate the rows that match the query, instead, it can directly refer to the index, enabling faster retrieval of matching rows.
However, there are some important considerations associated with the use of composite indexes. Firstly, they add an overhead to the database due to the additional space they consume on the disk. This is because an entry is added to the index for each row in the table. Secondly, composite indexes can potentially slow down insert and update operations. This is due to the fact that the index needs to be updated whenever a row is inserted or updated. If a table has many indexes or if the indexes include many columns, these operations can significantly decelerate.
Therefore, understanding composite indexes and their impact on database operations is crucial for efficient database management and query performance.
Implementing a composite index in PostgreSQL involves specifying the columns to be indexed during the CREATE INDEX operation. The syntax is as follows: CREATE INDEX index_name ON table_name (column1, column2, …). The order of the columns can play a significant role in the efficiency of the index, especially when performing queries that don’t involve all the columns in the index.
Composite indexes are typically used in scenarios where queries frequently involve more than one column in their WHERE clause. For instance, if a table of employees has columns for ‘last_name’ and ‘first_name’, and queries often search for both these fields, a composite index on both ‘last_name’ and ‘first_name’ would optimize these queries. Similarly, they are beneficial for JOIN operations involving multiple columns. However, it’s important to remember that composite indexes are most effective when the cardinality of the indexed columns is high.
Let’s consider a practice dataset employees
with columns employee_id
, first_name
, last_name
, email
, and department
.
Here’s the SQL command to create a composite index on first_name
and last_name
:
CREATE INDEX idx_employee_names ON employees (first_name, last_name);
Now, when you run a query that involves both first_name
and last_name
in the WHERE clause, PostgreSQL can use this composite index to speed up the search. For example:
SELECT * FROM employees WHERE first_name = 'John' AND last_name = 'Doe';
Similarly, if you often run queries that join employees
with another table departments
based on department
and employee_id
, you might consider a composite index on these columns:
CREATE INDEX idx_employee_dept ON employees (department, employee_id);
This composite index could speed up a JOIN operation like this:
SELECT e.first_name, e.last_name, d.department_name FROM employees e JOIN departments d ON e.department = d.department_id AND e.employee_id = d.manager_id;
Remember, composite indexes are a powerful tool, but they require careful consideration of your query patterns and data characteristics to use effectively. Always test different index configurations to find the optimal solution for your specific use case.
In conclusion, composite indexes in PostgreSQL offer significant benefits in terms of optimizing database performance, especially for complex queries involving multiple columns. They create a unique data structure that bypasses the need to scan the entire table, resulting in faster retrieval of matching rows. However, it’s crucial to carefully select the columns for indexing, maintain the correct order, and consider the cardinality of the columns to maximize the efficiency of composite indexes. Over-indexing and neglecting to maintain indexes can lead to slower operations and degraded performance.
For more in-depth insights and best practices, refer to expert PostgreSQL blogs on minervadb.xyz. These blogs provide a wealth of knowledge on not only composite indexes but also a wide variety of other PostgreSQL topics. Whether you’re a novice or a seasoned database administrator, these resources can help you leverage the full potential of PostgreSQL in your database operations.
The post Composite Indexes in PostgreSQL appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Composite Indexes in PostgreSQL appeared first on MariaDB.org.
]]>The post Vettabase Milestones: Year 2023 in Review appeared first on MariaDB.org.
]]>New customers
Last year we started working with several new customers, and two of them went public about their collaboration with us. They are:
One remarkable renewal also happened in 2023. Treedom renewed their contract with us, which proved 4 years of successful work described in a case study.
We appreciate the trust that all our customers place in Vettabase, and hope to continue to take care of the customers’ database infrastructures in the future.
New partnerships
In addition to our existing partnerships with MariaDB Foundation and Treedom, we onboarded two more partners:
MindsDB is a meta-database that connects to remote data sources (integrations) and answers SQL queries. These SQL queries can ask for predictions about the future, or other somehow missing data. For example, how many sales will be performed next year, or how the customers behaviour would change under certain conditions.
Vettabase currently maintains MindsDB integration with MySQL. We work to make integration between these technologies smoother and easier. Things we’ve done until now include:
– Make some parameters optional when you connect MindsDB to MySQL.
– Allow to use a MySQL database URI rather than specifying each parameter separately.
– Implement handling of MySQL query timeout from MindsDB, to handle circumstances where the timeout is too short, or the opposite case, when we want to set a stricter timeout for MindsDB queries.
We also look forward to doing joint partner events and creating partner content together for the benefit of our customers.
Our first joint webinar with MindsDB will take place on January 24, 2024. We have officially started rendering services to MindsDB users willing to employ traditional databases as external data sources.
Webinars
We introduced free Vettabase webinars in April 2023 and decided to make them our regular monthly practice. Here are the links to our 2023 webinars for you to revisit:
Concepts of ProxySQL configuration for Galera cluster
MariaDB 10.11, key features overview for DBAs
Key Reasons to Upgrade to MySQL 8 or MariaDB 10.11
MySQL 8: improvements in asynchronous replication
MariaDB Temporal Tables: A Demonstration
What Database Professionals, DevOps and Others Can Learn from Flight Safety?
A first look at MariaDB 11 features and ideas on how to use them
MariaDB Security Best Practices
In 2024, we’ll continue hosting free webinars and hope to have more joint events with our partners, as I have already mentioned above.
Offline events
In October 2023, I presented at MariaDB (Un)Conference covering MariaDB stored procedures (the recording is available here). The whole event was about “shaping MariaDB future”, and I brought some suggestions about why and how stored procedures should be improved. My talk was well received, I got some questions from MariaDB Foundation members, and apparently some JIRA tasks I pointed out received some attention. I won’t stop here, and I started to write a series of blog posts on the topic.
Amongst other talks, I found Monty’s presentation on MariaDB catalogs very interesting. I wrote a blog post about it to show how, in my opinion, catalogs use cases go well beyond the use cases that Monty seemed to have in mind.
Blog posts
In addition to two posts I mentioned in the previous paragraph, the Vettabase team authored 8 more technical blog posts in 2023. Here’s the list in reverse chronological order:
First steps with pgbackrest, a backup solution for PostgreSQL
MariaDB/MySQL: working with storage engines
MySQL and MariaDB storage engines: an overview
Overview of detailed slow query logging in MySQL 8: log_slow_extra
MariaDB 10.11 LTS: New types and functions, more dynamic InnoDB configuration
A summary of MariaDB 10.10: INET4 type, RANDOM_BYTES() and more
Next year we’ll continue to blog covering the technologies we support and will probably provide more expert advice database automation which we believe is key to managing large database setups.
Interviews
In mid October, I was interviewed by The Register’s Lindsay Clark to provide insights on the recent MariaDB Corporation (a.k.a. MariaDB plc) restructuring:
https://www.theregister.com/AMP/2023/10/19/mariadb_restructure_analysts/
In early December, DevRims Tech Talk #037 was published, which I strongly recommend to anyone interested in the principles backing our work at Vettabase:
For interview enquiries, please email in**@ve*******.com, we’re open to media placements.
Plans for 2024
This year we’re planning to blog more about database automation. Automation remains a very important aspect of Vettabase services, because it’s essential to make databases scalable and (as much as possible) error-free. Particularly, we’ll cover Ansible as our core, first-choice technology for automating database tasks. If you are interested in certain topics, feel free to propose them in the comments.
We have already mentioned that some joint activities with partners will take place in 2024. There are solid grounds for it as our new partners, MindsDB and Bytebase, offer products that are extremely useful for our customers:
Vettabase is working on a number of exciting projects, but they cannot be disclosed at these early stages of development. We’ll continue to improve them, and share the news in due time.
All in all, we expect 2024 to become an even more fruitful year for the Vettabase team in terms of technical webinars, blog posts, and, of course, establishing more customer and partner relationships.
The post Vettabase Milestones: Year 2023 in Review appeared first on MariaDB.org.
]]>The post Optimizing MySQL 8 Performance: Strategies for Using Workload Statistics Effectively appeared first on MariaDB.org.
]]>The post Optimizing MySQL 8 Performance: Strategies for Using Workload Statistics Effectively appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Optimizing MySQL 8 Performance: Strategies for Using Workload Statistics Effectively appeared first on MariaDB.org.
]]>The post Maximizing MySQL Database Performance: Advanced Statistical Analysis of Query Throughput Capacity appeared first on MariaDB.org.
]]>Query Throughput Capacity in MySQL performance troubleshooting is a metric that quantifies the number of queries processed by the server within a specific time frame (e.g., queries per second). It’s a vital indicator of the database’s ability to handle its workload and is critical in assessing both current performance and in forecasting future performance needs.
Total Queries Executed / Total Time Period
By applying these statistical approaches to the query throughput capacity metric, database administrators can gain a comprehensive understanding of the current performance landscape and make informed predictions about future performance. This foresight is crucial for ensuring the scalability, reliability, and efficiency of the MySQL database in response to changing demands.
The post Maximizing MySQL Database Performance: Advanced Statistical Analysis of Query Throughput Capacity appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Maximizing MySQL Database Performance: Advanced Statistical Analysis of Query Throughput Capacity appeared first on MariaDB.org.
]]>The post Syscalls Analysis in MySQL When Using innodb_flush_method and innodb_use_fdatasync appeared first on MariaDB.org.
]]>The post Syscalls Analysis in MySQL When Using innodb_flush_method and innodb_use_fdatasync appeared first on MariaDB.org.
]]>The post Galera Manager January 2024 release appeared first on MariaDB.org.
]]>The major reason to release this was to ensure that Galera Manager would accept the new signing keys of Galera Cluster (key ID: 8DA84635).
One will now also note that gm-installer reports a new version: gm-installer version 1.12.0 (linux/amd64). And when you install it, Galera Manager itself is now at version 1.8.3. One of the major fixes is that Ubuntu 22.04 and Debian 12 support for self-provided hosts are now exposed in the UI. This fixes galera-manager-support#85.
One more important thing to note: if you create a database called test, it will never be deleted. This was a bug in the execution of mysql_secure_installation. This fixes galera-manager-support#84. It is worth remembering that you probably should not have a test database in production, anyway.
Please evaluate Galera Manager now!
The post Galera Manager January 2024 release appeared first on MariaDB.org.
]]>The post Updated Insert benchmark: MyRocks 5.6 and 8.0, medium server, IO-bound database, v2 appeared first on MariaDB.org.
]]>tl;dr
Build + Configuration
See the previous report.
Benchmark
See the previous report.
The post Updated Insert benchmark: MyRocks 5.6 and 8.0, medium server, IO-bound database, v2 appeared first on MariaDB.org.
]]>The post Fast Analytics with MariaDB ColumnStore appeared first on MariaDB.org.
]]>The post Fast Analytics with MariaDB ColumnStore appeared first on MariaDB.org.
]]>The post Securing MariaDB Server & MariaDB MaxScale Connections (TLS) appeared first on MariaDB.org.
]]>The post Securing MariaDB Server & MariaDB MaxScale Connections (TLS) appeared first on MariaDB.org.
]]>The post Quick Peek: MySQL 8.0.36 and 8.3 appeared first on MariaDB.org.
]]>The post Quick Peek: MySQL 8.0.36 and 8.3 appeared first on MariaDB.org.
]]>The post MySQL's random number generator appeared first on MariaDB.org.
]]>The post MySQL's random number generator appeared first on MariaDB.org.
]]>The post Optimizing PostgreSQL Performance: Mastering Checkpointing Configuration appeared first on MariaDB.org.
]]>postgresql.conf
file. Here are some key settings:
checkpoint_timeout
: This parameter determines the maximum time between automatic WAL checkpoints. Setting it too low can cause frequent disk writes, while setting it too high can lead to longer recovery times. A balanced value based on your workload is essential.max_wal_size
: This setting controls the maximum size of WAL files between two checkpoints. Increasing it can reduce the frequency of checkpoints but requires more disk space.min_wal_size
: This setting controls the minimum size of WAL files retained in the pg_wal directory. It helps in providing sufficient WAL files for replication and recovery without overburdening disk space.checkpoint_completion_target
: This parameter is a fraction that determines how much of the checkpoint interval should be used for writing WAL records to disk. A higher value can help in spreading out the I/O load, reducing the performance impact.wal_buffers
: Determines the amount of memory used for WAL data that hasn’t been written to disk yet. Increasing it can help in situations with high write loads.effective_io_concurrency
: If your storage supports multiple concurrent I/O operations, adjusting this parameter can help optimize the I/O performance.Remember, the optimal configuration for checkpointing depends on the specific workload and hardware of your PostgreSQL server. It’s often a good idea to monitor your system’s performance and adjust these settings incrementally. Additionally, using tools like pg_stat_bgwriter
can provide insights into checkpoint activity and help in tuning the parameters more effectively.
The post Optimizing PostgreSQL Performance: Mastering Checkpointing Configuration appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Optimizing PostgreSQL Performance: Mastering Checkpointing Configuration appeared first on MariaDB.org.
]]>The post Enhancing MySQL Performance: Strategic CPU Affinity and Priority Management appeared first on MariaDB.org.
]]>nice
levels to prioritize MySQL processes can significantly enhance performance, especially on multi-core systems or servers with other demanding applications. Here’s how to do it:
taskset
command in Linux.ps -aux | grep mysql
.taskset
. For example, taskset -cp 0,1 [PID]
binds the MySQL process to CPUs 0 and 1.nice
Levelsnice
command in Linux adjusts the priority of a process. A lower nice
value increases the priority, giving the process more CPU time.nice
level when starting MySQL, e.g., nice -n -5 mysqld_safe &
.nice
level of a running process, use renice
. For example, renice -n -5 -p [PID]
sets a higher priority for the MySQL process.Optimizing CPU usage through affinity and nice
levels can significantly improve MySQL performance. However, it’s crucial to balance MySQL’s needs with the overall system requirements. Fine-tuning these settings based on your specific workload and server environment will help achieve the best performance outcomes. Always monitor the system’s overall health and performance to ensure that changes are having the desired effect without negatively impacting other critical operations.
The post Enhancing MySQL Performance: Strategic CPU Affinity and Priority Management appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Enhancing MySQL Performance: Strategic CPU Affinity and Priority Management appeared first on MariaDB.org.
]]>The post Galera Cluster for MySQL 5.7.44 and MySQL 8.0.35 released appeared first on MariaDB.org.
]]>For MySQL 5.7, one should continue using the Galera replication library 3.37 implementing the wsrep API version 25 for MySQL-wsrep 5.7.32. One can presume that unless there is a bug, we will not be releasing a new version of the library, as MySQL 5.7 is End of Life (EOL) since October 2023. We strongly encourage you to upgrade from MySQL-wsrep 5.7 to MySQL-wsrep 8.0. In addition, since the EOL announcement, FreeBSD removed the expired port, so mysqlwsrep57-server is no more.
There has been a new package signing key, with key ID: 8DA84635. The old signing key is still active for older packages, but note the new signing key. This is to ensure it is compatible with Red Hat Enterprise Linux 9 and greater.
The only major changes in In MySQL 5.7.44-25.36 is the new merge with the upstream release, plus the manpages. As stated above, you are encouraged to upgrade to MySQL 8.0, as this is likely the last release in the MySQL 5.7 series. Supported operating systems are CentOS 7, RHEL 7 and 8.
In MySQL 8.0.35-26.16, during the joiner CLONE SST process, a temporary user was created and dropped, and this is added to the binary log by default and sets off the MySQL GTID. This is now fixed, as the joiner process operations are now disabled from binlogging. In addition, when it comes to SST user account management, there is automatic creation of temporary accounts for SST, passed via socket (and not environment variables). Account credentials can be passed directly to the SST script. This is more secure, and also helps with simple node configuration. It should also be noted that wsrep_sst_auth set by the administrator is also respected. It works for mysqldump, CLONE, and xtrabackup methods.
When using wsrep_notify_cmd, the script is now only called when Galera Cluster has formed a cluster view or when it is synced or is the donor, and this ensures that it prevents any untoward hangs. It should be noted that INFORMATION_SCHEMA.PROCESSLISTis now deprecated, and one should use PERFORMANCE_SCHEMA.PROCESSLIST instead. One can find information on appliers and rollback threads via SELECT * FROM PERFORMANCE_SCHEMA WHERE NAME = 'thread/sql/wsrep_applier_thread'; or SELECT * FROM PERFORMANCE_SCHEMA WHERE NAME = 'thread/sql/wsrep_rollback_thread';.
There is now a new foreign key constraint check retrying implementation, as we have found on occasion, foreign key constrains may fail even though the constraints themselves are not violated (e.g. the same transaction inserts in the parent table, and the next insert into the child table fails in FK checks). The number of retries by default is set to 1, and can be controlled by the new system variable wsrep_applier_FK_failure_retries. If the constraint check fails despite retires, the final retry prints out a warning with an error code and InnoDB system monitor output for further troubleshooting.
For MySQL 8.0.35, we build packages for many operating systems: Debian 10 and 11, Ubuntu 20.04 and 22.04, CentOS 7, and Red Hat Enterprise Linux 7, 8 and 9.
Please download the latest software and update your Galera Clusters! We continue to provide repositories for popular Linux distributions, and we encourage you to use them. Contact us more more information about what Galera Cluster Enterprise Edition can do for you.
The post Galera Cluster for MySQL 5.7.44 and MySQL 8.0.35 released appeared first on MariaDB.org.
]]>The post Updated Insert benchmark: MyRocks 5.6 and 8.0, small(est) server, cached database, v2 appeared first on MariaDB.org.
]]>tl;dr
The post Updated Insert benchmark: MyRocks 5.6 and 8.0, small(est) server, cached database, v2 appeared first on MariaDB.org.
]]>The post Updated Insert benchmark: MyRocks 5.6 and 8.0, small server, cached database, v2 appeared first on MariaDB.org.
]]>tl;dr
Noise
I recently improved the benchmark scripts to remove writeback and compaction debt after the l.i2 benchmark step to reduce noise in the read-write steps that follow. At least for MyRocks, the range query benchmark steps (qr100, qr500, qr1000) have more noise. The worst case for noise with MyRocks is the qr100 step, and this is more obvious on a small server.
For MyRocks, the benchmark script now does the following after l.i2:
Build + Configuration
Benchmark
The post Updated Insert benchmark: MyRocks 5.6 and 8.0, small server, cached database, v2 appeared first on MariaDB.org.
]]>The post Can’t We Assign a Default Value to the BLOB, TEXT, GEOMETRY, and JSON Data Types? appeared first on MariaDB.org.
]]>The post Can’t We Assign a Default Value to the BLOB, TEXT, GEOMETRY, and JSON Data Types? appeared first on MariaDB.org.
]]>The post Understanding MySQL’s Thread-Based Architecture: Internal Workings, Connection Handling, and Performance Optimization appeared first on MariaDB.org.
]]>SHOW PROCESSLIST
) can be crucial for diagnosing performance issues.innodb_thread_concurrency
, thread_cache_size
) is vital for performance.SHOW PROCESSLIST
, Performance Schema, and SHOW ENGINE INNODB STATUS
are essential for monitoring thread activity and identifying bottlenecks.my.cnf/my.ini
based on the workload and hardware can significantly impact performance.Understanding MySQL’s thread model is essential for database administration, especially for performance tuning and troubleshooting. Each thread plays a specific role in the overall operation of the MySQL server, and effective management of these threads is key to ensuring optimal database performance.
The post Understanding MySQL’s Thread-Based Architecture: Internal Workings, Connection Handling, and Performance Optimization appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Understanding MySQL’s Thread-Based Architecture: Internal Workings, Connection Handling, and Performance Optimization appeared first on MariaDB.org.
]]>The post Updated Insert benchmark: MyRocks 5.6 and 8.0, medium server, cached database, v2 appeared first on MariaDB.org.
]]>tl;dr – context matters
The biggest concerns I have are the ~16% slowdown on the initial load (l.i0) benchmark step from MyRocks 5.6.35 to 8.0.32 and the ~5% slowdown for benchmark steps that do point queries (qp*) from MyRocks 8.0.28 to 8.0.32.
Comparing latest MyRocks 8.0.32 relative to latest MyRocks 5.6.35
Comparing latest MyRocks 8.0.32 to an old build of MyRocks 5.6.35
Comparing latest MyRocks 8.032 to latest MyRocks 8.0.28
Build + Configuration
See the previous report.
Benchmark
See the previous report.
The post Updated Insert benchmark: MyRocks 5.6 and 8.0, medium server, cached database, v2 appeared first on MariaDB.org.
]]>The post MariaDB Contribution Statistics, January 2024 appeared first on MariaDB.org.
]]>Continue reading “MariaDB Contribution Statistics, January 2024”
The post MariaDB Contribution Statistics, January 2024 appeared first on MariaDB.org.
The post MariaDB Contribution Statistics, January 2024 appeared first on MariaDB.org.
]]>The post Is MySQL Router 8.2 Any Better? appeared first on MariaDB.org.
]]>The post Is MySQL Router 8.2 Any Better? appeared first on MariaDB.org.
]]>The post Updated Insert benchmark: Postgres 9.x to 16.x, small server, cached database, v2 appeared first on MariaDB.org.
]]>tl;dr
The PG planner has code in get_actual_variable_range to determine the min or max value of a column when there is a predicate on that column like X < $const or X > $const and $const falls into the largest or smallest histogram bucket. From PMP thread stacks, what I see is too much time with that function on the call stack. From ps output, the session that does delete statements can use 10X to 100X more CPU than the session that does insert statements. From explain analyze I see that the planner spends ~100 milliseconds per delete statement.
The benchmark report is here.
There are big regressions in 11.19, 11.22 and a small one in 13.13 for the l.i1 and l.i2 benchmark steps which is visible in the summary.
This table shows the value of cpupq (CPU overhead) per version for the l.i1 and l.i2 benchmark steps. All of the numbers for iostat and vmstat are here for l.i1 and for l.i2.
The post Updated Insert benchmark: Postgres 9.x to 16.x, small server, cached database, v2 appeared first on MariaDB.org.
]]>The post Getting Started with MindsDB and MySQL appeared first on MariaDB.org.
]]>In this post we shall take a look at getting started with MindsDB by connecting to MySQL and some of the improvements to date.
We have a ready to go example environment which will run MindsDB and MySQL in two separate Docker containers. We will be creating a new database user for MindsDB to connect, creating a database connection, and running a couple of queries to test the connection.
I have prepared the example environment, using Docker Compose:
You can either clone the example repo with git
, or download and extract a zip file of the repository from Github.
From the root of the directory, we can start the containers:
docker compose up -d
NOTE: The MindsDB image is quite large, around 8GB at the time of writing. The image itself has the tag, lightwood, and AutoML framework.
Run docker ps
to see the running containers
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
54cbac9642d7 mysql "docker-entrypoint.s…" 11 minutes ago Up 11 minutes 3306/tcp, 33060/tcp vettabase-mindsdb-intro-mysql-1
6cfe06520b07 mindsdb/mindsdb "sh -c 'python -m mi…" 11 minutes ago Up 11 minutes 0.0.0.0:47334-47335->47334-47335/tcp, 47336/tcp vettabase-mindsdb-intro-mindsdb-1
The MySQL container will automatically load the SQL in data/sample.sql
to create the schema named sample
and the test_table
table.
Let’s also perform a quick check to see if all the data was loaded into MySQL by counting the rows and performing a sum on the two columns:
$ docker exec -it vettabase-mindsdb-intro-mysql-1 mysql sample -uroot -pSuperPass123 -e "select count(*) as c,sum(total) as t,sum(value) as v from sample.test_table;"
mysql: [Warning] Using a password on the command line interface can be insecure.
+----+------+------+
| c | t | v |
+----+------+------+
| 10 | 6146 | 94 |
+----+------+------+
MindsDB provides a very handy web User Interface so we can start exploring our data right away.
As per the docker-compose.yml
and in docker ps
we can see the port 47334, is exposed. From your host machine, point a web browser to 127.0.0.7:47334
or the IP address of the machine where the containers are running.
You should now see fresh MindsDB instance:
Before we can connect MindsDB to MySQL, we need to create a user in MySQL.
First let us open a MySQL command shell:
$ docker exec -it vettabase-mindsdb-intro-mysql-1 mysql -uroot -pSuperPass123
Then create a MySQL database user for the sample schema:
mysql> CREATE USER IF NOT EXISTS 'mindsdb'@'%' IDENTIFIED BY 'sampleData_12345';
Query OK, 0 rows affected (0.02 sec)
mysql> GRANT SELECT ON sample.* TO 'mindsdb'@'%';
Query OK, 0 rows affected (0.01 sec)
We can now create a connection in the MindsDB web interface using our new user:
CREATE DATABASE sample
WITH ENGINE = "mysql",
PARAMETERS = {
"user": "mindsdb",
"password": "sampleData_12345",
"host": "mysql",
"port": "3306",
"database": "sample"
};
Alternatively, you can use the new URL parameter, which was also commited by Vettabase:
CREATE DATABASE sample
WITH ENGINE = "mysql",
PARAMETERS = {
"url": "mysql://mindsdb:sampleData_12345@mysql/sample"
};
You may notice I have excluded the port number as we are using the default port of 3306 for MySQL, it is not necessary to explicitly set it.
You may already be familiar with adding database specific parameters to your connection details. One enhancement commited by Vettabase was having autocommit on, this means no transactions are left open.
Once the connection has been completed, we can use the sidebar to see what objects have been created and preview the data.
Using the same query as earlier, we can also query our MySQL data directly within the MindsDB interface:
select count(*) as c,
sum(total) as t,
sum(value) as v
from sample.test_table;
The user we created only has SELECT
privileges. If your user also has write privileges (INSERT/UPDATE/DELETE/...
) then you can also use the MindsDB interface to edit your database! This is another useful secenario for when autocommit is enabled by default.
Using the methods above you can start by connecting MindsDB to your own data and start building machine learning models.
If you haven’t already, be sure to register for our Webinar, Unlocking Real-Time Insights: AI-Powered Forecasting with MySQL and MindsDB!
Richard Bensley
The post Getting Started with MindsDB and MySQL appeared first on MariaDB.org.
]]>The post Volunteering as a Program Committee Member for Data on Kubernetes Day Europe 2024 appeared first on MariaDB.org.
]]>The post Volunteering as a Program Committee Member for Data on Kubernetes Day Europe 2024 appeared first on MariaDB.org.
]]>The post Harness the Power of Generative AI by Training Your LLM on Custom Data appeared first on MariaDB.org.
]]>The post Harness the Power of Generative AI by Training Your LLM on Custom Data appeared first on MariaDB.org.
]]>The post How to Use Group Replication with Haproxy appeared first on MariaDB.org.
]]>The post How to Use Group Replication with Haproxy appeared first on MariaDB.org.
]]>The post Explaining changes in RocksDB performance for IO-bound workloads appeared first on MariaDB.org.
]]>tl;dr
I repeated the IO-bound benchmark using buffered IO in 3 setups:
The performance summaries from the benchmark scripts are here and the iostat summary is here.
The post Explaining changes in RocksDB performance for IO-bound workloads appeared first on MariaDB.org.
]]>The post The Underlying Importance of the server_id Parameter appeared first on MariaDB.org.
]]>The post The Underlying Importance of the server_id Parameter appeared first on MariaDB.org.
]]>The post Expert Guide to MySQL Performance Troubleshooting: Best Practices and Optimization Techniques appeared first on MariaDB.org.
]]>top
, vmstat
, iostat
, and mpstat
to identify system-level bottlenecks.EXPLAIN
and EXPLAIN ANALYZE
to understand their execution plans and optimize them accordingly.innodb_buffer_pool_size
to ensure efficient data caching, typically set to about 70-80% of available memory on a dedicated database server.thread_cache_size
to optimize thread handling.Troubleshooting MySQL performance is an ongoing process that requires a thorough understanding of both MySQL internals and the specific characteristics of your workload. By systematically applying these principles, you can identify performance issues, implement optimizations, and maintain a high-performing MySQL database environment.
The post Expert Guide to MySQL Performance Troubleshooting: Best Practices and Optimization Techniques appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Expert Guide to MySQL Performance Troubleshooting: Best Practices and Optimization Techniques appeared first on MariaDB.org.
]]>The post Comprehensive MySQL Health Check Guide: Scripts and Strategies for Optimal Database Performance appeared first on MariaDB.org.
]]>#!/bin/bash top -n 1 iostat free -m
SHOW VARIABLES LIKE 'slow_query_log'; SHOW VARIABLES LIKE 'long_query_time';
my.cnf
or my.ini
file.
cat /etc/mysql/my.cnf
SHOW GLOBAL VARIABLES;
CHECK TABLE tablename;
pt-table-checksum --host=localhost --user=root --password=yourpassword
SELECT user, host, authentication_string FROM mysql.user;
mysqlcheck --all-databases --check-backup
df -h
SELECT table_schema AS 'Database', table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES ORDER BY (data_length + index_length) DESC;
SHOW SLAVE STATUS\\G;
ping your-database-host
SELECT VERSION();
Regularly performing these health checks can help you proactively manage your MySQL installation, ensuring it runs efficiently and securely. Automation of these checks where possible will help maintain consistent monitoring and timely identification of potential issues.
The post Comprehensive MySQL Health Check Guide: Scripts and Strategies for Optimal Database Performance appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Comprehensive MySQL Health Check Guide: Scripts and Strategies for Optimal Database Performance appeared first on MariaDB.org.
]]>The post Configuring Keyring for Encryption Using AWS Key Management Service in Percona Server for MySQL appeared first on MariaDB.org.
]]>The post Configuring Keyring for Encryption Using AWS Key Management Service in Percona Server for MySQL appeared first on MariaDB.org.
]]>The post RocksDB 8.x benchmarks: large server, IO-bound appeared first on MariaDB.org.
]]>tl;dr
Builds
I used my fork of the RocksDB benchmark scripts that are wrappers to run db_bench. These run db_bench tests in a special sequence — load in key order, read-only, do some overwrites, read-write and then write-only. The benchmark was run using 24 threads. How I do benchmarks for RocksDB is explained here and here. The command line to run them is:
bash x3.sh 24 no 3600 c40r256bc180 40000000 4000000000 iobuf iodir
The post RocksDB 8.x benchmarks: large server, IO-bound appeared first on MariaDB.org.
]]>The post Failover and Recovery Scenarios in InnoDB Cluster and ClusterSet appeared first on MariaDB.org.
]]>group_replication
plugin is installed and enabled.
INSTALL PLUGIN group_replication SONAME 'group_replication.so';
server_id
for each member and configure replication settings.
SET GLOBAL server_id = 1; -- Different for each member SET GLOBAL group_replication_group_name = 'uuid()'; SET GLOBAL group_replication_start_on_boot = ON;
SET GLOBAL group_replication_force_members = 'member_uuid';
#!/bin/bash # Script to rejoin a node to the cluster mysql -e "STOP GROUP_REPLICATION; START GROUP_REPLICATION;";
-- On the primary CREATE CLUSTERSET primary_cluster; -- On replicas CLUSTERSET REPLICATE FROM primary_cluster AT primary_host:port;
#!/bin/bash # Switchover to a new primary cluster mysql -e "CLUSTERSET SWITCHOVER TO replica_cluster;";
#!/bin/bash # Resynchronize a cluster after recovery mysql -e "CLUSTERSET REPLICATE FROM new_primary_cluster AT host:port;";
SET GLOBAL group_replication_consistency = 'BEFORE_ON_PRIMARY_FAILOVER';
#!/bin/bash # Monitor cluster health mysql -e "SELECT MEMBER_STATE FROM performance_schema.replication_group_members;";
#!/bin/bash # Backup script mysqldump -u root -p --all-databases > all_databases.sql
Implementing failover and recovery in InnoDB Cluster and ClusterSet requires careful planning and configuration. Utilizing scripts can help automate many aspects of this process, enhancing the reliability and efficiency of failover operations. Regular testing and validation of these scripts and configurations are critical to ensure the high availability and durability of your MySQL deployment.
The post Failover and Recovery Scenarios in InnoDB Cluster and ClusterSet appeared first on The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse.
The post Failover and Recovery Scenarios in InnoDB Cluster and ClusterSet appeared first on MariaDB.org.
]]>The post Connecting to Oracle from MariaDB Enterprise Server using Spider appeared first on MariaDB.org.
]]>The post Connecting to Oracle from MariaDB Enterprise Server using Spider appeared first on MariaDB.org.
]]>The post MySQL General Tablespaces: A Powerful Storage Option for Your Data appeared first on MariaDB.org.
]]>The post MySQL General Tablespaces: A Powerful Storage Option for Your Data appeared first on MariaDB.org.
]]>The post innodb_log_writer_threads and the Insert Benchmark appeared first on MariaDB.org.
]]>The MySQL docs suggest only using =ON for high-concurrency workloads, alas it is =ON by default.
Dedicated log writer threads can improve performance on high-concurrency systems, but for low-concurrency systems, disabling dedicated log writer threads provides better performance.
tl;dr, v1
tl;dr, v2
The bugs
The redo log code was changed in a big way in MySQL 8.0 and my experience with that has not been great. It was nice to get the ability to disable the new features, but that (innodb_log_writer_threads) didn’t arrive until 8.0.22.
The table below lists the fsync ratio which is:
(fsyncs with innodb_log_writer_threads =ON) / (fsyncs with it =OFF)
Explaining: 40-core server
The post innodb_log_writer_threads and the Insert Benchmark appeared first on MariaDB.org.
]]>