Federico RazzoliCreating JSON documents with MariaDB (23.11.2014, 14:50 UTC)
Peter ZaitsevGoogle Compute Engine adds Percona XtraDB Cluster to click-to-deploy process (21.11.2014, 16:00 UTC)

I’m happy to announce that Google has added Click-to-deploy functionality for Percona XtraDB Cluster (PXC) on Google Cloud Compute nodes. This gives you the ability to rapidly spin up a cluster for experimentation, performance testing, or even production use on Google Compute Engine virtual machines.

What is Percona XtraDB Cluster?

Google Cloud Platform adds Percona XtraDB Cluster to click-to-deploy processPercona XtraDB Cluster is a virtually synchronous cluster of MySQL Innodb nodes. Unlike conventional MySQL asynchronous replication which has a specific network topology of master and slaves, PXC’s nodes have no specific topology.  Functionally, this means that there are no masters and slaves, so you can read and write on any node.

Further, any failure in the cluster does not require any re-arranging of the replication topology. Instead, clients just reconnect to another node and continue reading and writing.

We have a ton of material about Percona XtraDB Cluster in previous posts, in the PXC manual, and in various webinars. If you want a concentrated hour overview of Percona XtraDB Cluster, I’d recommend watching this webinar.

How do I use Click-to-deploy?

Simply visit Google Cloud’s solutions page here: https://cloud.google.com/solutions/percona to get started. You are given a simple setup wizard that allows you choose the size and quantity of nodes you want, disk storage type and volume, etc.  Once you ‘Deploy Cluster’, your instances will launch and form a cluster automatically with sane default tunings. After that, it’s all up to you what you want to do.

Seeing it in action

Once your instances launch, you can add an SSH key to access (you can also install the Google Cloud SDK). This, handily, creates a new account based on the username in the SSH public key text. Once I add the key, I can easily just ssh to the public IP of the instance and I have my own account:

jayj@~ [500]$ ssh [public ip]
Linux percona-mysql-niyr 3.2.0-4-amd64 #1 SMP Debian 3.2.60-1+deb7u3 x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

Once there I installed sysbench 0.5 from the Percona apt repo:

jayj@percona-mysql-niyr:~$ sudo -i
root@percona-mysql-niyr:~# apt-key adv --keyserver keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A
root@percona-mysql-niyr:~# echo "deb http://repo.percona.com/apt wheezy main" > /etc/apt/sources.list.d/percona.list
root@percona-mysql-niyr:~# apt-get update; apt-get install sysbench -y
Setting up sysbench (0.5-3.wheezy) ...
root@percona-mysql-niyr:~# logout

Now we can load some data and run a quick test:

jayj@percona-mysql-niyr:~$ sysbench --test=/usr/share/doc/sysbench/tests/db/parallel_prepare.lua --oltp-tables-count=16 --oltp-table-size=1000000 --oltp-auto-inc=off --num-threads=8 --mysql-user=root --mysql-password=[the admin password you set in the wizard] --mysql-db=test run
sysbench 0.5:  multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 8
Random number generator seed is 0 and will be ignored
Threads started!
thread prepare4
Creating table 'sbtest5'...
thread prepare2
Creating table 'sbtest3'...
thread prepare6
Creating table 'sbtest7'...
thread prepare0
Creating table 'sbtest1'...
thread prepare7
Creating table 'sbtest8'...
thread prepare1
Creating table 'sbtest2'...
thread prepare5
Creating table 'sbtest6'...
thread prepare3
Creating table 'sbtest4'...
Inserting 100000 records into 'sbtest5'
Inserting 100000 records into 'sbtest3'
Inserting 100000

Truncated by Planet PHP, read more at the original (another 5688 bytes)

Peter ZaitsevSys Schema for MySQL 5.6 and MySQL 5.7 (20.11.2014, 14:31 UTC)

Performance Schema (P_S) has been available since MySQL 5.5, more than 4 years ago. It is still difficult to see production servers with P_S enabled, even with MySQL 5.6 where the feature is enabled by default. There have been several complaints like the overhead, that the fix is a work in progress, and the ease of use. 52 tables to query and 31 configuration variables is enough to scare people.

There is a solution for the second problem, the usability. It’s name is “sys schema“. It is a collection of views, functions and procedures to make P_S more user friendly.


If you are a MySQL Workbench user the installation is pretty easy because sys schema is already included. You just need to install it. Click on “Performance – Performance Reports” and there you will find the “Install Helper” button that will install sys schema.

sys schema mysql workbench

If you don’t use MySQL Workbench you need to download sys_56.sql or sys_57.sql (depends if you use 5.6 or 5.7) from the github repository. Then, just import the sql file as usual:

mysql -u root -p < ./sys_56.sql


After the import, you will have a new “sys” schema with some very descriptive table names. Let’s see an example. Do you want to know what tables are using most of our InnoDB buffer memory? Easy:

mysql> select * from sys.innodb_buffer_stats_by_table;
| object_schema | object_name        | allocated | data      | pages | pages_hashed | pages_old | rows_cached |
| test          | t                  | 63.61 MiB | 58.06 MiB |  4071 |         4071 |      4071 |     2101222 |
| InnoDB System | SYS_FOREIGN        | 32.00 KiB | 0 bytes   |     2 |            2 |         2 |           0 |
| InnoDB System | SYS_TABLES         | 32.00 KiB | 1.11 KiB  |     2 |            2 |         2 |          10 |
| InnoDB System | SYS_COLUMNS        | 16.00 KiB | 4.68 KiB  |     1 |            1 |         1 |          71 |
| InnoDB System | SYS_DATAFILES      | 16.00 KiB | 324 bytes |     1 |            1 |         1 |           6 |
| InnoDB System | SYS_FIELDS         | 16.00 KiB | 722 bytes |     1 |            1 |         1 |          17 |
| InnoDB System | SYS_INDEXES        | 16.00 KiB | 836 bytes |     1 |            1 |         1 |          12 |
| InnoDB System | SYS_TABLESPACES    | 16.00 KiB | 318 bytes |     1 |            1 |         1 |           6 |
| mysql         | innodb_index_stats | 16.00 KiB | 274 bytes |     1 |            1 |         1 |           3 |
| mysql         | innodb_table_stats | 16.00 KiB | 53 bytes  |     1 |            1 |         1 |           1 |

Pretty easy and useful, right? You can also get what is the database using more memory in the buffer pool querying innodb_buffer_stats_by_schema.

For each table there is another similar ones that it’s name starts with x$. For example, you have user_summary_by_file_io and x$user_summary_by_file_io. The difference is that the first table has a human readable output values while the second one has the real values. Let’s see an example:

mysql> select * from sys.user_summary_by_file_io;
| user       | ios   | io_latency |
| root       | 19514 | 2.87 s     |
| background |  5916 | 1.91 s     |
2 rows in set (0.00 sec)
mysql> select * from sys.x$user_summary_by_file_io;
| user       | ios   | io_latency    |
| root       | 19514 | 2871847094292 |
| background |  5916 | 1905079715132 |

For humans, at least for me, it is easier to read seconds rather than picoseconds :)

There are multiple tables with very descriptive names.

– io_by_thread_by_latency
– schema_unused_indexes
– statements_with_temp_tables
– statements_with_errors_or_warnings
– user_summary_by_

Truncated by Planet PHP, read more at the original (another 2468 bytes)

Jean-Jerome SchmidtMulti-source Replication with Galera Cluster for MySQL (20.11.2014, 10:25 UTC)
November 20, 2014
By Severalnines

Multi-source replication means that one server can have multiple masters from which it replicates. Why multi-source? One good reason is to consolidate databases (e.g. merge your shards) for analytical reporting or as a centralized backup server. MariaDB 10 already has this feature, and MySQL 5.7 will also support it. 

It is possible to set up your Galera Cluster as an aggregator of your masters in a multi-source replication setup, we’ll walk you through the steps in this blog. Note that the howto is for Galera Cluster for MySQL (Codership) and Percona XtraDB Cluster. In a separate post, we’ll show you how to configure MariaDB Cluster 10 instead. If you would like to use MySQL Cluster (NDB) as aggregator, then check out this blog.


Galera Cluster as Aggregator/Slave


Galera cluster can operate both as MySQL master and slave. Each Galera node can act as a slave channel accepting replication from a master. The number of slave channels should be equal or less to the number of Galera master nodes in the cluster. So, if you have a three-node Galera cluster, you can have up to three different replication sources connected to it. Note that in MariaDB Galera Cluster 10, you can configure as many sources as you want since each node supports multi-source replication. 

To achieve multi-source replication in MySQL 5.6, you cannot have GTID enabled for Galera Cluster. GTID will cause our Galera cluster to work as a single unit (imagine one single slave server), since it globally preserves the MySQL GTID events on the cluster.  So the cluster will not be able to replicate from more than one master. Hence, we will use the “legacy” way to determine the starting binary log file and position. On a side note, enabling GTID is highly recommended if your Galera Cluster acts as a MySQL master, as described in this blog post.

We will setup multi-source replication as below:

We have 3 standalone MySQL servers (masters), and each master has a separate database: mydb1, mydb2 and mydb3. We would like to consolidate all 3 databases into our Galera cluster.


Setting Up Masters


1. On each standalone MySQL server, configure it as a master by adding a server ID, enabling binary logging with ROW format:

# mysql1 my.cnf


# mysql2 my.cnf


# mysql3 my.cnf


read more

Open QueryImproving InnoDB index statistics (20.11.2014, 02:45 UTC)

The MySQL/MariaDB optimiser likes to know things like the cardinality of an index – that is, the number of distinct values the index holds. For a PRIMARY KEY, which only has unique values, the number is the same as the number of rows.  For an indexed column that is boolean (such as yes/no) the cardinality would be 2.

There’s more to it than that, but the point is that the optimiser needs some statistics from indexes in order to try and make somewhat sane decisions about which index to use for a particular query. The statistics also need to be updated when a significant number of rows have been added, deleted, or modified.

In MyISAM, ANALYZE TABLE does a tablescan where everything is tallied, and the index stats are updated. InnoDB, on the other hand, has always done “index dives”, looking at a small sample and deriving from that. That can be ok as a methodology, but unfortunately the history is awkward. The number used to be a constant in the code (4), and that is inadequate for larger tables. Later the number was made a server variable innodb_stats_sample_pages and its default is now 8 – but that’s still really not enough for big(ger) tables.

We recently encountered this issue again with a client, and this time it really needed addressing as no workarounds were effective across the number of servers and of course over time. Open Query engineer Daniel filed https://mariadb.atlassian.net/browse/MDEV-7084 which was picked up by MariaDB developed Jan Lindström.

Why not just set the innodb_stats_sample_pages much higher? Well, every operation takes time, so setting the number appropriate for your biggest table means that the sampling would take unnecessarily long for all the other (smaller, or even much smaller) tables. And that’s just unfortunate.

So why doesn’t InnoDB just scale the sample size along with the table size? Because, historically, it didn’t know the table size: InnoDB does not maintain a row count (this has to do with its multi-versioned architecture and other practicalities – as with everything, it’s a trade-off). However, these days we have persistent stats tables – rather than redoing the stats the first time a table is opened after server restart, they’re stored in a table. Good improvement. As part of that information, InnoDB now also knows how many index pages (and leaf nodes in its B+Tree) it has for each table. And while that’s not the same as a row count (rows have a variable length so there’s no fixed number of rows per index page), at least it grows along with the table. So now we have something to work with! The historical situation is no longer a hindrance.

In order to scale the sample size sanely, that is not have either too large a number for small tables, or a number for big tables that’s over the top, we’ll want some kind of logarithmic scale. For instance, log2(16 thousand) = 14, and log2(1 billion) = 30. That’s small enough to be workable. The new code as I suggested:

n_sample_pages = max(min(srv_stats_sample_pages, index->stat_index_size), log2(index->stat_index_size) * srv_stats_sample_pages);

This is a shorter construct (using min/max instead of ifs) of what was already there, combined with the logarithmic sample basis. For very small tables, either the innodb_stats_sample_pages number if used or the actual number of pages, whichever is smaller – for bigger tables, the log2 of the #indexpages is used, multiplied by the dynamic system variable innodb_stats_sample_pages. So we can still scale and thus influence the system in case we want more samples. Simple, but it seems effective – and it any case we get decent results in many more cases than before, so it’s a worthwhile improvement. Obviously, since it’s a statistical sample, it could still be wrong for an individual case.

Jan reckons that just like MyISAM, InnoDB should do a table scan and work things out properly – I agree, this makes sense now that we have persistent stats. So the above is a good fix for 5.5 and 10.0, and the more significant change to comprehensive stats can be in a near future major release. So then we have done away with the sampling altogether, instead basing the info on the full dataset. Excellent.

Another issue that needed to be dealt with is when InnoDB recalculates the statistics. You don’t want to do it on every change, but regularly if there has been some change is good as it might affect which indexes should be chosen for optimal query execution. The hardcoded rule was 1/16th of the table or 2 billion rows, whichever comes first. Again that’s unfortunate, because for a bigger table 1/16th still amounts to a very signifi

Truncated by Planet PHP, read more at the original (another 792 bytes)

Peter ZaitsevHow a set of queries can be killed in MySQL using Percona Toolkit’s pt-kill (19.11.2014, 15:18 UTC)

You might have encountered situations where you had to kill some specific select queries that were running for long periods and choking the database. This post will go into more detail with an example of report query offloading.

Report query (select) offloading to a slave server is a common practice to reduce the workload of the master DB server. The long running selects will be executed in the slave for report generation. I have observed in many cases where the slave used to get delayed or the slave DB encounters a slowdown due to some heavy long-running orphaned selects from some wrong reports.

There are two main ways to kill queries in MySQL: 1. use custom scripts that match on a regular expression, or 2. use a tool written and supported by Percona that is designed to kill queries based on matching conditions. Below is one script that will help you to kill those queries. The script will take the process list from MySQL and filter the long-running select query considering the “User”, “Time” and “State” from the list. However I suggest that you use the pt-kill tool from Percona Toolkit which provides a more reliable choice and options based on your requirements.

Process list:

| Id  | User | Host      | db   | Command | Time | State                        | Info                        |
| 103 | root | localhost | test | Query   |    0 | init                         | show processlist            |
| 104 | root | localhost | test | Sleep   |  383 |                              | NULL                        |
| 106 | root | localhost | test | Query   |  377 | Waiting for table level lock | SELECT * FROM t FOR UPDATE  |
| 107 | root | localhost | test | Query   |  364 | Waiting for table level lock | insert into t value(5)      |
| 108 | rpt  | localhost | test | Query   |  345 | Waiting for table level lock | SELECT c1 FROM t FOR UPDATE |

1. Shell script example:

List all queries from rpt user having query time greater than 1 minute:
[root@test3 ~]# for id in $(mysql -t -e "SHOW FULL PROCESSLIST" | /bin/awk -F "|" '($6 ~ /Query/) && (60< $7) &&
 ($3 ~ /rpt/) { print $2}'); do echo "KILL QUERY $id;"; done
Kill all queries from rpt user having query time greater than 1 minute:
[root@test3 ~]# for id in $(mysql -t -e "SHOW FULL PROCESSLIST" | /bin/awk -F "|" '($6 ~ /Query/) && (60< $7) &&
 ($3 ~ /rpt/)  { print $2}'); do mysql  -e "KILL QUERY $id;"; done

2. Pt-kill example:

List all queries from rpt user having query time greater than 1 minute (–print):

[root@test3 ~]# pt-kill --busy-time 60 --match-command Query --user root --print --match-user rpt
--socket /tmp/mysql.sock
# 2014-11-12T03:51:01 KILL 108 (Query 485 sec) SELECT c1 FROM t FOR UPDATE
# 2014-11-12T03:51:31 KILL 108 (Query 515 sec) SELECT c1 FROM t FOR UPDATE
# 2014-11-12T03:52:01 KILL 108 (Query 545 sec) SELECT c1 FROM t FOR UPDATE

Kill all queries from rpt user having query time greater than 1 minute (–kill-query):

[root@test3 ~]# pt-kill --busy-time 60 --match-command Query --user root --print --kill-query --match-user rpt
--socket /tmp/mysql.sock
# 2014-11-12T03:53:26 KILL QUERY 108 (Query 630 sec) SELECT c1 FROM t FOR UPDATE

Process list:

| Id  | User | Host      | db   | Command | Time | State                        | Info                       |
| 103 | root | localhost | test | Query   |    0 | init                         | show processlist           |
| 104 | root | localhost | test | Sleep   |  843 |                              | NULL                       |
| 106 | root | localhost | test | Query   |  837 | Waiting for table level lock | SELECT * FROM t FOR UPDATE |
| 107 | root | local

Truncated by Planet PHP, read more at the original (another 2048 bytes)

Jean-Jerome SchmidtNew ClusterControl User Guide for MySQL based Clusters (19.11.2014, 14:13 UTC)
November 19, 2014
By Severalnines

Following the release of ClusterControl 1.2.8 with a range of cool new features, we have now also published a new ClusterControl User Guide that provides all the steps, tips & tricks to follow in order to successfully deploy, monitor, manage and scale database clusters.  

This user guide predominantly covers ClusterControl with MySQL-based clusters, namely:

  • Percona XtraDB Cluster
  • MariaDB Cluster
  • Galera Cluster for MySQL (Codership)
  • MySQL Cluster
  • MySQL Replication
  • A pool of MySQL single instances


New features in ClusterControl 1.2.8 include: 

  • Deployment and scaling of single-instance MySQL, MariaDB and MongoDB
  • YUM/APT repositories for ClusterControl
  • Alerts and incident tracking with PagerDuty
  • New flexible alarms/email notification system
  • Cluster-wide MySQL User Management
  • New default dashboards for MySQL/MariaDB
  • Puppet Module for ClusterControl


read more

Peter ZaitsevAvoiding MySQL ALTER table downtime (18.11.2014, 17:50 UTC)

MySQL table alterations can interrupt production traffic causing bad customer experience or in worst cases, loss of revenue. Not all DBAs, developers, syadmins know MySQL well enough to avoid this pitfall. DBAs usually encounter these kinds of production interruptions when working with upgrade scripts that touch both application and database or if an inexperienced admin/dev engineer perform the schema change without knowing how MySQL operates internally.

* Direct MySQL ALTER table locks for duration of change (pre-5.6)
* Online DDL in MySQL 5.6 is not always online and may incurr locks
* Even with Percona Toolkit‘s pt-online-schema-change there are several workloads that can experience blocking

Here on the Percona MySQL Managed Services team we encourage our clients to work with us when planning and performing schema migrations. We aim to ensure that we are using the best method available in their given circumstance. Our intentions to avoid blocking when performing DDL on large tables ensures that business can continue as usual whilst we strive to improve response time or add application functionality. The bottom line is that a business relying on access to its data cannot afford to be down during core trading hours.

Many of the installations we manage are still below MySQL 5.6, which requires us to seek workarounds to minimize the amount of disruption a migration can cause. This may entail slave promotion or changing the schema with an ‘online schema change’ tool. MySQL version 5.6 looks to address this issue by reducing the number of scenarios where a table is rebuilt and locked but it doesn’t yet cover all eventualities, for example when changing the data type of a column a full table rebuild is necessary. The topic of 5.6 Online Schema Change was discussed in great detail last year in the post, “Schema changes – what’s new in MySQL 5.6?” by Przemysław Malkowski

With new functionality arriving in MySQL 5.7, we look forward to non-blocking DDL operations such as; OPTIMIZE TABLE and RENAME INDEX. (More info)

The best advice for MySQL 5.6 users is to review the matrix to familiarize with situations where it might be best to look outside of MySQL to perform schema changes, the good news is that we’re on the right path to solving this natively.

Truth be told, a blocking alter is usually going to go unnoticed on a 30MB table and we tend to use a direct alter in this situation, but on a 30GB or 300GB table we have some planning to do. If there is a period of time where activity is low and the this is permissive of locking the table then sometimes it is better execute within this window. Frequently though we are reactive to new SQL statements or a new performance issue and an emergency index is required to reduce load on the master in order to improve the response time.

To pt-osc or not to pt-osc?

As mentioned, pt-online-schema-change is a fixture in our workflow. It’s usually the right way to go but we still have occasions where pt-online-schema-change cannot be used, for example; when a table already uses triggers. It’s an important to remind ourselves of the the steps that pt-online-schema-change traverses to complete it’s job. Lets look at the source code to identify these;

[moore@localhost]$ egrep 'Step' pt-online-schema-change
# Step 1: Create the new table.
# Step 2: Alter the new, empty table. This should be very quick,
# Step 3: Create the triggers to capture changes on the original table and <--(metadata lock)
# Step 4: Copy rows.
# Step 5: Rename tables: orig -> old, new -> orig <--(metadata lock)
# Step 6: Update foreign key constraints if there are child tables.
# Step 7: Drop the old table.

I pick out steps 3 and 5 from above to highlight a source of a source of potential downtime due to locks, but step 6 is also an area for concern since foreign keys can have nested actions and should be considered when planning these actions to avoid related tables from being rebuilt with a direct alter implicitly. There are several ways to approach a table with referential integrity constraints and they are detailed within the pt-osc documentation a good preparation step is to review the structure of your table inc

Truncated by Planet PHP, read more at the original (another 5297 bytes)

Peter ZaitsevMySQL’s INNODB_METRICS table: How much is the overhead? (18.11.2014, 13:00 UTC)

Starting with MySQL 5.6 there is an INNODB_METRICS table available in INFORMATION_SCHEMA which contains some additional information than provided in the SHOW GLOBAL STATUS output – yet might be more lightweight than PERFORMANCE_SCHEMA.

Too bad INNODB_METRICS was designed during the Oracle-Sun split under MySQL leadership and so it covers only InnoDB counters. I think this would be a great replacement to all counters that are currently provided though SHOW STATUS – it captures more information such as providing MIN/MAX counts for variables as well as providing the type of the counter (whenever it is current or commutative) as well as human readable comment – describing what such counter means.

The examples of data you can get only from the INNODB_METRICS table includes information about InnoDB Page Splits and merging (which can cause quite an impact to the database performance).

As well as details of InnoDB purging performance, adaptive hash index activity, details about InnoDB flushing or how index condition pushdown (ICP) is working for you.

The InnoDB Metrics come disabled by default as of MySQL 5.6 and it provides very elaborate configuration commands – you can enable/disable individual counters or counters for specific subsystems you’re interested in. I would expect most users though would need only basic configuration:

set global innodb_monitor_disable = all;
set global innodb_monitor_enable = all;

…which enables and disables all InnoDB Metrics appropriately. Of course if you just rather keep changes permanently you would want to keep it as a setting in the MySQL Configuration file. Small side note – some of the InnoDB metrics correspond to SHOW STATUS counters and those are permanently enabled.

As those metrics are disabled by default I was wondering if they really do have so huge a overhead that we can’t keep them enabled. In my tests I’ve measured less than 1% overhead, really the variance between runs of benchmark was larger than caused by having metrics enabled. It might be on very large systems with heavy workloads that the overhead might be higher – if you observe any significant overhead from using INNODB_METRICS please comment on this post so we know.

Finally let me post a selection of metrics that have been actively running a simple sysbench test, in total there are 214 metrics as of the current MySQL 5.6 release.

mysql> select name,count,round(avg_count,3) avg_cnt ,type,comment from innodb_metrics where count>0 or max_count>0;
| name                                     | count         | avg_cnt      | type           | comment                                                                                                           |
| metadata_table_handles_opened            |             1 |        0.000 | counter        | Number of table handles opened                                                                                    |
| metadata_table_reference_count           |            -6 |       -0.000 | counter        | Table reference counter                                                                                           |
| metadata_mem_pool_size                   |       8388608 |         NULL | value          | Size of a memory pool InnoDB uses to store data dictionary and internal data structures in bytes                  |
| lock_deadlocks                           |           521 |        0.002 | counter        | Number of deadlocks                                                                                               |
| lock_timeouts                            |          6857 |        0.027 | counter        | Number of lock timeouts                                                                                           |
| lock_rec_lock_waits                      |       1837441 |       16.150 | counter        | Number of times enqueued into record lock wait queue                                                              |
| lock_rec_lock_requests          

Truncated by Planet PHP, read more at the original (another 27405 bytes)

Open QueryOptimising multi-threaded replication (18.11.2014, 01:40 UTC)

Multi-threaded replication is a new feature introduced in MySQL 5.6 and MariaDB 10.0. In traditional single-threaded replication, the slaves have a disadvantage as they have to process in sequence what a master executed in parallel. This, plus the fact that slaves usually have a lot of read-only connections to deal with as well, can easily create performance problems. That is, a single-threaded slave needs to be set to allow fewer connections, otherwise there’s a higher risk of it not being able to keep up with the replication stream. There is no exact rule for this, as it relates to general I/O capacity and fsync latency, as well as general CPU and RAM considerations and query patterns.

Currently, it appears that the MariaDB implementation is a bit more mature in terms of design and effective implementation. For instance, MySQL 5.6 does not currently support retrying transactions while doing parallel replication. This can easily cause problems as commit conflicts are possible and obviously need to be handled. So for the purpose of this blog post, we’re going to focus on MariaDB 10.0, and it is what we currently use with some of our clients. MariaDB developer Kristian Nielsen has done awesome work and is very responsive to questions and bug reports. Rock on, Kristian!

The fundamental challenge for parallel replication is that some queries are safe to be executed in parallel, and some are not – and somehow, the server needs to know which is which. MariaDB employs two strategies to assist with this:

  1. Group commit. Since 5.5, transactions (remember, a standalone statement without START TRANSACTION/COMMIT is technically also a transaction) that happen around the same time in different connections are grouped in the binary log and effectively committed together. This is accomplished by the server trying to gather at least a certain number of transactions (binlog_commit_wait_count) and having individual connections wait just a fraction (binlog_commit_wait_usec) to increase the chances of gathering a nice number. This strategy reduces I/O and fsyncs, and thus helps quite a bit with write scaling. The miniscule delay that a transaction might incur because it has to wait is easily offset by the overall better performance. It’s good stuff. For the purpose of parallel replication, any transactions in the same group commit can in principle be executed in parallel on a slave – conflicts are possible, so deadlock handling and retries are essential.
  2. Global Transaction IDs (GTID) Domain IDs (gtid_domain_id) in MariaDB 10.0, which an application can set within a connection. Quite often, different applications and different components of applications use the same database server, but their actions are completely independent: no write operations will ever conflict between the different applications. GTID Domain IDs allows us to tell the server about this, allowing it to always run those transactions in parallel even if they weren’t part of the same group commit! Now that’s a real bonus!

Now, as a practicality, we’re not always able to modify applications to for instance set the GTID Domain ID. Plus, a magic (integer) number is required and so we need some planning/coordination between completely independent applications! Through database server consolidation, you may get applications on your server that were previously on a different one – strictly speaking having two applications use the same GTID Domain ID is harmless (after all, by default all transactions run in the same domain!) but obviously it doesn’t improve performance.

Open Query engineer Daniel Black and I came up with the following. It’s a combination of MySQL’s init_connect system variable (gets called when a user connects, except if they have SUPER privilege), a few stored procedures, and an event to keep the domain map reasonably up-to-date. The premise of this implementation is that each database username uniquely identifies an application, and that no two usernames refer to the same application. So, if you have for instance a general application user but also one for background scripts or one with special administrative privileges, then you need to modify the code in setdomain() a bit to take this into account. If you have transactions with a different GTID Domain ID execute in parallel on the same database, obviously this can cause conflicts. The MariaDB slave threads will retry, but in some cases conflicts cannot be resolved by retrying.

Obviously it’s not perfect, but it does resolve the issue for many situations. Feedback and improvements welco

Truncated by Planet PHP, read more at the original (another 2144 bytes)

LinksRSS 0.92   RDF 1.
Atom Feed