Planet MariaDB

April 25, 2017

Peter Zaitsev

Percona Live 2017 Tutorials Day

Percona Live 2017 Tutorials

Percona Live 2017 Tutorials RegistrationWelcome to the first day of the Percona Live Open Source Database Conference: Percona Live 2017 tutorials day! While technically the first day of the conference, this day focused on provided hands-on tutorials for people interested in learning directly how to use open source tools and technologies.

Today attendees went to training sessions taught by open source database experts and got first-hand experience configuring, working with, and experimenting with various open source technologies and software.

The first full day (which includes opening keynote speakers and breakout sessions) starts Tuesday 4/25 at 9:00 am.

Some of the tutorial topics covered today were:

MySQL Performance Schema in ActionPercona Live 2017 Tutorials Sveta Alex

Sveta Smirnova, Alexander Rubin

Performance Schema in MySQL is becoming more mature from version to version. In version 5.7, it includes extended lock instrumentation, memory usage statistics, new tables for server variables, first time ever instrumentation for user variables, prepared statements and stored routines. In this tutorial Sveta and Alexander helped the participants try these new instruments out. They provided a test environment and a few typical problems that were hard to solve before MySQL 5.7.

Just a few examples: “Where is memory going?” , “Why do these queries hang?”, “How huge is the overhead of my stored procedures?”, “Why are queries waiting for metadata locks?”. Attendees learned how to collect and use this information.

Best Practices for MySQL High Availability in 2017Percona Live 2017 Tutorials Colin

Colin Charles

The MySQL world is full of tradeoffs, and choosing a high availability (HA) solution is no exception. This session aims to look at all of the alternatives in an unbiased nature. Topics covered included but weren’t limited to MySQL replication, MHA, DRBD, Tungsten Replicator, Galera Cluster, NDB Cluster, Vitess, etc. The focus of the talk was what is recommended for today, and what to look out for.

InnoDB Architecture and Performance Optimization

Peter Zaitsev

InnoDB is the most commonly used storage engine for MySQL and Percona Server and is the focus for the majority of storage engine development by the MySQL and Percona Server teams. This tutorial looked at the InnoDB architecture, including new developments in MySQL 5.6 as well as Percona Server. It provided specific advice on server configuration, schema design, application architecture, and hardware choices.

MongoDB 101: What NoSQL is All About

Jon Tobin, Rick GolbaBarrett Chambers

MongoDB is quickly becoming one of the NoSQL standards, but represents a very different way of thinking from traditional RDBMSs. Many database users tend to think of things from the perspective of the transactional DBs that they know and love, but there are other ways of doing things. The Percona Solutions Engineering team helped attendees fill out their database resume and become a more knowledgeable user by showing them the basics. This tutorial gave users with little or no experience with NoSQL databases an introduction to MongoDB.

Join us tomorrow for the first full day of the Percona Live Open Source Database Conference 2017!

by Dave Avery at April 25, 2017 05:38 AM

April 24, 2017

Peter Zaitsev

Improved wsrep-stages and related instrumentation in Percona XtraDB Cluster

wsrep-stages

wsrep-stagesIn this blog post, we’ll look at how we’ve improved wsrep-stages and related instrumentation in Percona XtraDB Cluster.

Introduction

When you execute a workload and need to find out what the given thread is working on, “SHOW PROCESSLIST” comes to the top of your mind. It is an effective way to track the thread status. We decided to improve the stages in Percona XtraDB Cluster to make “SHOW PROCESSLIST” more meaningful.

In the blog below, we will check out the different wsrep-stages and the significance associated with them.

Loading of data

Running a simple insert/sysbench prepare workload. The state is stable as it mainly captures MySQL stages indicating that the table is being updated:

| 9 | root | localhost | test | Query | 0 | update | INSERT INTO sbtest3(id, k, c, pad) VALUES(893929,515608,'28459951974-62599818307-78562787160-7859397 | 0 | 0 |

Running UPDATE workload

Running simple sysbench update-key workload. Let’s look at the different states that the user sees and their significance. (MASTER and SLAVE states are different and are marked accordingly.)

MASTER view:

  • This stage indicates that the write-set is trying to replicate. Global sequence numbers are assigned after the write-set is replicated and so the global-seqno is currently -1:

| 80 | root | localhost | test | Query | 0 | wsrep: initiating replication for write set (-1) | UPDATE sbtest4 SET k=k+1 WHERE id=502338 | 0 | 1 |

  • This stage indicates successful replication of the write-set. This means the write-set is now added to the group-channel. Global-seqno is updated in the message too:

| 79 | root | localhost | test | Query | 0 | wsrep: write set replicated (196575) | UPDATE sbtest3 SET k=k+1 WHERE id=502723 | 0 | 1 |

  • This stage indicates the write-set has successfully passed the certification stage (making its path clear for commit):

| 85 | root | localhost | test | Query | 0 | wsrep: pre-commit/certification passed (196574) | UPDATE sbtest7 SET k=k+1 WHERE id=495551 | 0 | 1 |

  • This stage indicates that InnoDB commit has been triggered for the write-set:

| 138 | root | localhost | test | Query | 0 | innobase_commit_low (585721) | UPDATE sbtest6 SET k=k+1 WHERE id=500551 | 0 | 1 |

SLAVE/Replicating node view:

  • This stage indicates that the slave thread is trying to commit the replicated write-set with the given seqno. It is likely waiting for its turn of the CommitMonitor:

|  6 | system user |           | NULL | Sleep   |    0 | wsrep: committing write set (224905) | NULL             |         0 |             0 |

  • This stage indicates a successful commit of the replicated write-set with the given seqno:

| 2 | system user | | NULL | Sleep | 0 | wsrep: committed write set (224896) | NULL | 0 | 0 |

  • This stage indicates that updating the rows is in progress. (Often it was difficult to know what the workload is trying to do: UPDATE/INSERT/DELETE.) Now there is an easy way to find out:

| 13 | system user |           | NULL | Sleep   |    0 | wsrep: updating row for write-set (178711) | NULL             |         0 |             0 |

| 18 | system user | | NULL | Sleep | 0 | wsrep: writing row for write-set (793454) | NULL | 0 | 0 |

| 11 | system user | | NULL | Sleep | 0 | wsrep: deleting row for write-set (842123) | NULL | 0 | 0 |

  • This stage indicates that the given write-set is being applied:

| 10 | system user | | NULL | Sleep | 0 | wsrep: applying write-set (899370) | NULL | 0 | 0 |

Improved Instrumentation

Let’s answer some simple questions that most profiling experts will face:

  • How long did replication take (adding write-set to channel)?

mysql> select event_name, timer_wait/1000000 as time_in_mics from events_stages_history where event_name like '%in replicate%' order by time_in_mics desc limit 5;
+---------------------------------------+--------------+
| event_name | time_in_mics |
+---------------------------------------+--------------+
| stage/wsrep/wsrep: in replicate stage | 1.2020 |
| stage/wsrep/wsrep: in replicate stage | 0.7880 |
| stage/wsrep/wsrep: in replicate stage | 0.7740 |
| stage/wsrep/wsrep: in replicate stage | 0.7550 |
| stage/wsrep/wsrep: in replicate stage | 0.7480 |
+---------------------------------------+--------------+
5 rows in set (0.01 sec)

  • How long did it take for pre-commit/certification checks?

mysql> select event_name, timer_wait/1000000 as time_in_mics from events_stages_history where event_name like '%in pre-commit%' order by time_in_mics desc limit 5;
+----------------------------------------+--------------+
| event_name | time_in_mics |
+----------------------------------------+--------------+
| stage/wsrep/wsrep: in pre-commit stage | 1.3450 |
| stage/wsrep/wsrep: in pre-commit stage | 1.0000 |
| stage/wsrep/wsrep: in pre-commit stage | 0.9480 |
| stage/wsrep/wsrep: in pre-commit stage | 0.9180 |
| stage/wsrep/wsrep: in pre-commit stage | 0.9030 |
+----------------------------------------+--------------+
5 rows in set (0.01 sec)

  • How long did it take to commit a transaction on the slave (slave_thread=16 threads)?

mysql> select thread_id, event_name, timer_wait/1000000 as time_in_mics from events_stages_history where event_name like '%committing%' order by time_in_mics desc limit 5;
+-----------+-------------------------------+--------------+
| thread_id | event_name | time_in_mics |
+-----------+-------------------------------+--------------+
| 56 | stage/wsrep/wsrep: committing | 0.5870 |
| 58 | stage/wsrep/wsrep: committing | 0.5860 |
| 47 | stage/wsrep/wsrep: committing | 0.5810 |
| 59 | stage/wsrep/wsrep: committing | 0.5740 |
| 60 | stage/wsrep/wsrep: committing | 0.5220 |
+-----------+-------------------------------+--------------+
5 rows in set (0.00 sec)

  • Increasing the number of slave thread creates more contention (slave_thread=64):

mysql> select thread_id, event_name, timer_wait/1000000 as time_in_mics from events_stages_history where event_name like '%committing%' order by time_in_mics desc limit 5;
+-----------+-------------------------------+--------------+
| thread_id | event_name | time_in_mics |
+-----------+-------------------------------+--------------+
| 90 | stage/wsrep/wsrep: committing | 1.6930 |
| 97 | stage/wsrep/wsrep: committing | 1.5870 |
| 103 | stage/wsrep/wsrep: committing | 1.5140 |
| 87 | stage/wsrep/wsrep: committing | 1.2560 |
| 102 | stage/wsrep/wsrep: committing | 1.1040 |
+-----------+-------------------------------+--------------+
5 rows in set (0.00 sec)

  • The amount oftTime taken to apply a write-set:

mysql> select thread_id, event_name, timer_wait/1000000 as time_in_mics from events_stages_history where event_name like '%applying%' order by time_in_mics desc limit 5;
+-----------+---------------------------------------+--------------+
| thread_id | event_name | time_in_mics |
+-----------+---------------------------------------+--------------+
| 166 | stage/wsrep/wsrep: applying write set | 1.6880 |
| 168 | stage/wsrep/wsrep: applying write set | 1.5820 |
| 146 | stage/wsrep/wsrep: applying write set | 1.5270 |
| 124 | stage/wsrep/wsrep: applying write set | 1.4760 |
| 120 | stage/wsrep/wsrep: applying write set | 1.4440 |
+-----------+---------------------------------------+--------------+
5 rows in set (0.00 sec)

Conclusion

The improved wsrep-stage framework makes it more effective for a user to find out the state of a given thread. Using the derived instrumentation through wsrep-stage is a good way to understand where the time is being spent.

by Krunal Bauskar at April 24, 2017 09:22 PM

Percona XtraDB Cluster: SST tmpdir option

wsrep-stages

SST tmpdirIn this blog post, I’ll discuss some changes to the behavior of the Percona XtraDB Cluster SST tmpdir option in the latest versions of Percona XtraDB Cluster 5.6.35-26.20-3 and 5.7.17-29.30.

Previously, we did not use the path specified by the tmpdir. From Percona XtraDB Cluster 5.6.35-26.20-3 and 5.7.17-29.20, we use the tmpdir option to specify the location of temporary files used by the SST (on the DONOR this may be very large since the backup logs may be stored here).

Specifying the tmpdir is as expected. You supply the directory path, as in the following example:

[sst]
tmpdir=/path/to/tmp/dir/

We look for the tmpdir in the [sst], [xtrabackup], and [mysqld] sections, in that order. If left unspecified, “mktemp” follows the default behavior (on Linux distributions this will typically create the directory in $TMPDIR, followed by /tmp).

Normal security settings and permissions apply here. The directory must already exist and be readable and writable by MySQL otherwise a fatal error will be generated by the SST script.

by Kenn Takara at April 24, 2017 08:46 PM

Jean-Jerome Schmidt

Webinar Replay and Q&A: High Availability in ProxySQL for HA MySQL infrastructures

Thanks to everyone who participated in our recent webinar on High Availability in ProxySQL and on how to build a solid, scalable and manageable proxy layer using ProxySQL for highly available MySQL infrastructures.

This second joint webinar with ProxySQL creator René Cannaò saw lots of interest and some nice questions from our audience, which we’re sharing below with this blog post as well as the answers to them.

Building a highly available proxy layer creates additional challenges, such as how to manage multiple proxy instances, how to ensure that their configuration is in sync, Virtual IP and fail-over; and more, which we’ve covered in this webinar with René. And we demonstrated how you can make your ProxySQL highly available when deploying it from ClusterControl (download & try it free).

If you missed the webinar, would like to watch it again or browse through the slides, it is available for viewing online.

Watch the webinar replay

Webinar Questions & Answers

Q.: In a MySQL master/slave pair, I am inclined to deploy ProxySQL instances directly on both master and slave hosts. In an environment of 100s of master/slave pairs, with new hosts being built all the time, I can see this as a good way to combine host / MySQL / ProxySQL master/slave pair deploys via a single Ansible playbook. Do you guys have any thoughts on this?

A.: Our only concern here is that co-locating ProxySQL with database servers can make the debugging of database performance issues harder - the proxy will add overhead for CPU and memory and MySQL may have to compete for those resources.

Additionally, we’re not really sure what you’d like to achieve by deploying ProxySQL on all database servers - where would you like to connect? To one instance or to both? In the first case, you’d have to come up with a solution to handling potentially hundreds of failovers - when a master goes down, you’d have to re-route traffic to the ProxySQL instance on a slave. It adds more complexity than it’s really worth. The second case also creates complexity: instead of connecting to one proxy, the application would have to connect to both.

Co-locating ProxySQL on the application hosts is not that much more complex regarding configuration management than deploying it on database hosts. Yet it makes it so much easier for the application to route traffic - just connect to the local ProxySQL instance over the UNIX socket and that’s all.

Q.: Do you recommend for multiple ProxySQL instances to talk to each other or is it preferable for config changes to rely on each ProxySQL instance detecting the same issue at the same time? For example, would you make ProxySQL01 apply config changes in proxysql_master_switchover.sh to both itself and ProxySQL02 to ensure they stay the same? (I hope this isn't a stupid question... I've not yet succeeded in making this work so I thought maybe I'm missing something!)

A.: This is a very good question indeed. As long as you have scripts which would ensure that the configuration is the same on all of the ProxySQL instances - it should result in more consistent configuration across the whole infrastructure.

Q.: Sometimes I get the following warning 2017-04-04T02:11:43.996225+02:00 Keepalived_vrrp: Process [113479] didn't respond to SIGTERM. and VIP was moved to another server ... I can send you the complete configuration keepalived ... I didn't find a solution as to why I am getting this error/warning.

A.: This may happen from time to time. Timeout results in a failed check which triggers VIP failover. And as to why the monitored process didn't respond to signal in time, that’s really hard to tell. It is possible to increase the number of health-check fails required to trigger a VIP move to minimize the impact of such timeouts.

Q.: What load balancer can we use in front of ProxySQL?

A.: You can use virtually every load balancer out there, including ProxySQL itself - this is actually a topology we’d suggest. It’s better to rely on a single piece of software than to use ProxySQL and then another tool which would be redundant - more steep learning curve, more issues to debug.

Q.: When I started using ProxySQL I had this issue "access denied for MySQL user"; it was random, what is the cause of it?

A.: If it is random and not systematic, it may be worth investigating if it is a bug. We strongly recommend to open an issue on github.

Q.: I have tried ProxySQL and the issue we faced was that after using ProxySQL to split read/write , the connection switched to Master for all reads. How can we prevent the connection?

A.: This is most likely a configuration issue, and there are multiple reasons why this may happen. For example, if transaction_persistent was set to 1 and reads were all within a transaction. Or perhaps the query rules in mysql_query_rules weren’t configured correctly, and all traffic was being sent to the default hostgroup (the master).

Q.: How can Service Discovery help me?

A.: If your infrastructure is constantly changing, tools like etcd, Zookeeper or Consul can help you to track those changes, detect and push configuration changes to proxies. When your database clusters are going up and down, this can simplify configuration management.

Q.: In the discussion on structure, the load balancer scenario was quickly moved on from because of its single point of failure. How about when having a HA load balancer using CNAMES (not IP) for example AWS ElasticLoadBalancer on TCP ports. Would that be a structure that could work well in production?

A.: As long as the load balancer is highly available, this is not a problem, because it’s not a single point of failure. ELB itself is deployed in HA mode, so having a single ELB in front of anything (database servers, a pool of ProxySQL instances) will not introduce a single point of failure.

Q.: Don't any of the silo approaches have a single point of failure in the proxy that is fronting the silo?

A.: Indeed, although it is not a single point of failure - it’s more like multiple points of failure introduced in the infrastructure. If we are talking about huge infrastructure of hundreds or thousands of proxies, a loss of very small subset of application hosts should be acceptable. If we are talking about smaller setups, it should be manageable to have a ProxySQL per application host setup.

Watch the webinar replay

by krzysztof at April 24, 2017 10:06 AM

April 23, 2017

Peter Zaitsev

Percona XtraDB Cluster: “dh key too small” error during an SST using SSL

wsrep-stages

dh key too smallIf you’ve tried to use SSL in Percona XtraDB Cluster and saw an error in the logs like SSL3_CHECK_CERT_AND_ALGORITHM:dh key too small, we’ve implemented some changes in Percona XtraDB Cluster 5.6.34 and 5.7.16 that get rid of these errors.

Some background

dh key too small refers to the Diffie-Hellman parameters used by the SSL code that are shorter than recommended.

Due to the Logjam vulnerability (https://weakdh.org/), the required key-lengths for the Diffie-Hellman parameters were changed from 512 bits to 2048 bits. Unfortunately, older versions of OpenSSL/socat still use 512 bits (and thus caused the error to appear).

Changes made to Percona XtraDB Cluster

Since versions of socat greater than 1.7.3 now use 2048 bits for the Diffie-Hellman parameters, we only do extra work for the older versions of socat (less than 1.7.3). The SST code now:

  1. Looks for a file with the DH params
    1. Uses the “ssl_dhparams” option in the [sst] section if it exists
    2. Looks for a “dhparams.pem” file in the datadir
  2. If the file is specified and exists, uses that file as a source for the DH parameters
  3. If the file does not exist, creates a dhparams.pem file in the datadir

Generating the dhparams yourself

Unfortunately, the time it can take several minutes to create the dhparams file. We recommend that the dhparams.pem be created prior to starting the SST.

openssl dhparam -out path/to/datadir/dhparams.pem 2048

by Kenn Takara at April 23, 2017 04:16 PM

Percona XtraDB Cluster Transaction Replay Anomaly

wsrep-stages

Replay AnomalyIn this blog post, we’ll look at a transaction replay anomaly in Percona XtraDB Cluster.

Introduction

Percona XtraDB Cluster/Galera replays a transaction if the data is non-conflicting but, the transaction happens to have conflicting locks.

Anomaly

Let’s understand this with an example:

  • Let’s assume a two-node cluster (node-1 and node-2)
  • Base table “t” is created as follows:

create database test;
use test;
create table t (i int, c char(20), primary key pk(i)) engine=innodb;
insert into t values (1, 'abc'), (2, 'abc'), (4, 'abc');
select * from t;
mysql> select * from t;
+---+------+
| i | c |
+---+------+
| 1 | abc |
| 2 | abc |
| 4 | abc |
+---+------+

  • node-2 starts runs a transaction (trx-2):

trx-2: update t set c = 'pqr';

  • node-2 creates a write-set and is just about to replicate it. At the same time, node-1 executes the following transaction (trx-1), and is first to add it to the group-channel (before node-2 adds transaction (trx-2))

trx-1: insert into t values (3, 'a');

  • trx-1 is replicated on node-2, and it proceeds with the apply action. Since there is a lock conflict (no certification conflict), node-2 local transaction (trx-2) is aborted and scheduled for replay.
  • trx-1 causes addition of (3, ‘a’) and then node-2 transaction is REPLAYed.
  • REPLAY is done using the pre-created write-set that only modifies existing entries (1,2,4).

End-result:

mysql> select * from t;
+---+------+
| i | c |
+---+------+
| 1 | pqr |
| 2 | pqr |
| 3 | a |
| 4 | pqr |
+---+------+

  • At first, nothing looks wrong. If you look closely, however, the REPLAYed transaction “UPDATE t set c= ‘pqr'” is last to commit. But the effect of it is not seen as there is still a row (3, ‘a’) that has ‘a’ instead of ‘pqr’.

| mysql-bin.000003 | 792 | Gtid | 2 | 857 | SET @@SESSION.GTID_NEXT= '6706fa1f-e3df-ee18-6621-c4e0bae533bd:4' |
| mysql-bin.000003 | 857 | Query | 2 | 925 | BEGIN |
| mysql-bin.000003 | 925 | Table_map | 2 | 972 | table_id: 219 (test.t) |
| mysql-bin.000003 | 972 | Write_rows | 2 | 1014 | table_id: 219 flags: STMT_END_F existing|
| mysql-bin.000003 | 1014 | Xid | 2 | 1045 | COMMIT /* xid=4 */ |
| mysql-bin.000003 | 1045 | Gtid | 3 | 1110 | SET @@SESSION.GTID_NEXT= '6706fa1f-e3df-ee18-6621-c4e0bae533bd:5' |
| mysql-bin.000003 | 1110 | Query | 3 | 1187 | BEGIN |
| mysql-bin.000003 | 1187 | Table_map | 3 | 1234 | table_id: 219 (test.t) |
| mysql-bin.000003 | 1234 | Update_rows | 3 | 1324 | table_id: 219 flags: STMT_END_F |
| mysql-bin.000003 | 1324 | Xid | 3 | 1355 | COMMIT /* xid=5 */ |
+------------------+------+----------------+-----------+-------------+---------------------------------------------------------------------------------+
21 rows in set (0.00 sec)

  • We have used a simple char string, but if there is a constraint here, like c should have X after UPDATE is complete, than the CONSTRAINT will be violated even though the application reports UPDATE as a success.
  • Is it interesting to note what happens on node-1:
    • node-1 applies the local transaction (trx-1) and then gets the replicated write-set from node-2 (trx-2) that has changes only for (1,2,4). Thereby data consistency is not compromised.

by Krunal Bauskar at April 23, 2017 04:05 PM

BEWARE: Increasing fc_limit can affect SELECT latency

SELECT Latency

SELECT LatencyIn this blog post, we’ll look at how increasing the fc_limit can affect SELECT latency.

Introduction

Recent Percona XtraDB Cluster optimizations have exposed fc_limit contention. It was always there, but was never exposed as the Commit Monitor contention was more significant. As it happens with any optimization, once we solve the bigger contention issues, smaller contention issues start popping up. We have seen this pattern in InnoDB, and Percona XtraDB Cluster is no exception. In fact, it is good because it tells us that we are on the right track.

If you haven’t yet checked the performance blogs, then please visit here and here.

What is FC_LIMIT?

Percona XtraDB Cluster has the concept of Flow Control. If any member of the cluster (not garbd) is unable to match the apply speed with the replicated write-set speed, then the queue builds up. If this queue crosses some threshold (dictated by gcs.fc_limit), then flow control kicks in. Flow control causes members of the cluster to temporary halt/slow-down so that the slower node can catch up.

The user can, of course, disable this by setting wsrep_desync=1 on the slower node, but make sure you understand the effect of doing so. Unless you have a good reason, you should avoid setting it.

mysql> show status like 'wsrep_flow_control_interval';
+-----------------------------+------------+
| Variable_name | Value |
+-----------------------------+------------+
| wsrep_flow_control_interval | [ 16, 16 ] |
+-----------------------------+------------+
1 row in set (0.01 sec)

Increasing fc_limit

Until recently, the default fc_limit was 16 (starting with Percona XtraDB Cluster 5.7.17-29.20, the default is 100). This worked until now, since Percona XtraDB Cluster failed to scale and rarely hit the limit of 16. With new optimizations, Percona XtraDB Cluster nodes can process more write-sets in a given time period, and thereby can replicate more write-sets (anywhere in the range of three to ten times). Of course, the replicating/slave nodes are also performing at a higher speed. But depending on the slave threads, it is easy to start hitting this limit.

So what is the solution?

  • Increase fc_limit from 16 to something really big. Say 1600.

Is this correct?

YES and NO.

Why YES?

  • If you don’t care about the freshness of data on the replicated nodes, then increasing the limit to a higher value is not an issue. Say setting it to 10K means that the replicating node is holding 10K write-sets to replicate, and a SELECT fired during this time will not view changes from these 10K write-sets.
  • But if you insist on having fresh data, then Percona XtraDB Cluster has a solution for this (set wsrep_sync_wait=7).
  • Setting wsrep_sync_wait places the SELECT request in a queue that is serviced only after existing replicated write-sets (at the point when the SELECT was fired) are done with. If the queue has 8K write-sets, then SELECT is placed at the 8K+1 position. As the queue progresses, SELECT gets serviced only when all those 8K write-sets are done. This insanely increases SELECT latency and can cause all Monitoring ALARM to go ON.

Why NO?

  • For the reason mentioned above, we feel it is not a good idea to increase the fc_limit beyond some value unless you don’t care about data freshness and in turn don’t care to set wsrep_sync_wait.
  • We did a small experiment with the latest Percona XtraDB Cluster release to understand the effects.

- Started 2 node cluster.
- Fired 64-threads workload on node-1 of the cluster.
- node-2 is acting as replicating slave without any active workload.
- Set wsrep_sync_wait=7 on node-2 to ensure data-freshness.
Using default fc_limit (= 16)
-----------------------------
mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+-------------+
| sum(k) |
+-------------+
| 22499552612 |
+-------------+
1 row in set (0.03 sec)
Increasing it from 16 -> 1600
-----------------------------
mysql> set global wsrep_provider_options="gcs.fc_limit=1600";
Query OK, 0 rows affected (0.00 sec)
mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+-------------+
| sum(k) |
+-------------+
| 22499552612 |
+-------------+
1 row in set (0.46 sec)
That is whopping 15x increase in SELECT latency.
Increasing it even further (1600 -> 25000)
-------------------------------------------
mysql> set global wsrep_provider_options="gcs.fc_limit=25000";
Query OK, 0 rows affected (0.00 sec)
mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+-------------+
| sum(k) |
+-------------+
| 22499552612 |
+-------------+
1 row in set (7.07 sec)

Note: wsrep_sync_wait=7 will enforce the check for all DMLs (INSERT/UPDATE/DELETE/SELECT). We highlighted the SELECT example, as that is more concerning at first go. But latency for other DMLs also increases for the same reasons as mentioned above.

Conclusion

Let’s conclude with the following observation:

  • Avoid increasing fc_limit to an insanely high value as it can affect SELECT latency (if you are running a SELECT session with wsrep_sync_wait=7 for data freshness).

by Krunal Bauskar at April 23, 2017 02:22 AM

April 22, 2017

Peter Zaitsev

Better Than Linear Scaling

Scalability

In this blog, we’ll look at how to achieve better-than-linear scaling.

Scalability is the capability of a system, network or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth. For example, we consider a system scalable if it is capable of increasing its total output under an increased load when resources (typically hardware) are added: https://en.wikipedia.org/wiki/Scalability.

It is often accepted as a fact that systems (in particular databases) can’t scale better than linearly. By this I mean when you double resources, the expected performance doubles, at best (and often is less than doubled).  

We can attribute this assumption to Amdahl’s law (https://en.wikipedia.org/wiki/Amdahl%27s_law), and later to the Universal Scalability Law (http://www.perfdynamics.com/Manifesto/USLscalability.html). Both these laws prescribe that it is impossible to achieve better than linear scalability. To be totally precise, this is practically correct for single server systems when the added resources are only CPU units.

Multi-nodes systems

However, I think databases systems no longer should be seen as single server systems. MongoDB and Cassandra for a long time have had multi-node auto-sharding capabilities. We are about to see the rise of strongly-consistent SQL based multi-node systems. And even MySQL is frequently deployed with manual sharding on multi-nodes.

The products like Vitess (http://vitess.io/) proposes auto-sharding for MySQL, and with ProxySQL (which I will use in my experiment) you can setup a basic sharding schema.

I describe multi-nodes setups, because in this environment it is possible to achieve much better than linear scalability. I will show this below.

Why is this important?

Understanding scalability of multi-node systems is important for resource planning, and understanding how much of a potential performance gain we can expect when we add more nodes. This is especially interesting for cloud deployments.

How is it possible?

I’ve written about how the size of available memory (cache) affects the performance. When we add additional nodes to the deployment, effectively we increase not only CPU cores, but also the memory that comes with the node (and we are adding extra IO capacity). So, with increasing node counts, we also increase available memory (and cache). As we can see from these graphs, the effect of extra memory could be non-linear (and actually better than linear). Playing on this fact, we can achieve better-than-linear scaling in a sharded setup. I am going to show the experimental setup of how to achieve this.

Experimental setup

To show the sharded setup we will use ProxySQL in front of N MySQL servers (shards). We also will use sysbench with 60 tables (4 million rows each, uniform distribution).

  • For one shard, this shard contains all 60 tables
  • For two shards, each shard contains 30 tables each
  • For three shards, each shard contains 20 tables each
  • For six shards, each shard contains ten tables each

So schematically, it looks like this:

One shard:

Scaling

Two shards:

Scaling

Six shards:

Scaling

We want to measure how the performance (for both throughput and latency) changes when we go from 1 to 2, to 3, to 4, to 5 and to 6 shards.

For the single shard, I used a Google Cloud instance with eight virtual CPUs and 16GB of RAM, where 10GB is allocated for the innodb_buffer_pool_size.

The database size (for all 60 tables) is about 51GB for the data, and 7GB for indexes.

For this we will use a sysbench read-only uniform workload, and ProxySQL helps to perform query routing. We will use ProxySQL query rules, and set sharding as:

mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "DELETE FROM mysql_query_rules"
shards=$1
for i in {1..60}
do
hg=$(( $i % $shards + 1))
mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,destination_hostgroup,apply) VALUES ($i,1,'root','sbtest$is',$hg,1);"
done
mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "LOAD MYSQL QUERY RULES TO RUNTIME;"

Command line for sysbench 1.0.4:
sysbench oltp_read_only.lua --mysql-socket=/tmp/proxysql.sock --mysql-user=root --mysql-password=test --tables=60 --table-size=4000000 --threads=60 --report-interval=10 --time=900 --rand-type=pareto run

The results

Nodes Throughput Speedup vs. 1 node Latency, ms
1 245 1.00 244.88
2 682 2.78 87.95
3 1659 6.77 36.16
4 2748 11.22 21.83
5 3384 13.81 17.72
6 3514 14.34 17.07

Scaling
As we can see, the performance improves by a factor much better than just linearly.

With five nodes, the improvement is 13.81 times compared to the single node.

The 6th node does not add much benefit, as at this time data practically fits into memory (with five nodes, the total cache size is 50GB compared to the 51GB data size)

Factors that affects multi-node scaling

How can we model/predict the performance gain? There are multiple factors to take into account: the size of the active working set, the available memory size and (also importantly) the distribution of the access to the working set (with uniform distribution being the best case scenario, and with access to the one with only one row being the opposite corner-case, where speedup is impossible). Also we need to keep network speed in mind: if we come close to using all available network bandwidth, it will be impossible to get significant improvement.

Conclusion

In multi-node, auto-scaling, auto-sharding distributed systems, the traditional scalability models do not provide much help. We need to have a better framework to understand how multiple nodes affect performance.

by Vadim Tkachenko at April 22, 2017 10:54 PM

MariaDB Foundation

MariaDB 10.3.0 Alpha, 5.5.55 Stable, and Connector/J 2.0.0 RC now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 5.5.55 Stable (GA), MariaDB Galera Cluster 5.5.55 Stable (GA) and MariaDB Connector/J 2.0.0 RC. See the release notes and changelogs for details. Download MariaDB 5.5.55 Release Notes Changelog What is MariaDB 5.5? MariaDB APT and YUM Repository Configuration Generator Download MariaDB Galera Cluster […]

The post MariaDB 10.3.0 Alpha, 5.5.55 Stable, and Connector/J 2.0.0 RC now available appeared first on MariaDB.org.

by Daniel Bartholomew at April 22, 2017 04:03 PM

April 21, 2017

Peter Zaitsev

Percona Monitoring and Management 1.1.3 is Now Available

Percona Monitoring and Management

Percona Monitoring and ManagementPercona announces the release of Percona Monitoring and Management 1.1.3 on April 21, 2017.

For installation instructions, see the Deployment Guide.

This release includes several new graphs in dashboards related to InnoDB and MongoDB operation, as well as smaller fixes and improvements.

New in PMM Server

  • PMM-649: Added the InnoDB Page Splits and InnoDB Page Reorgs graphs to the MySQL InnoDB Metrics Advanced dashboard.
  • Added the following graphs to the MongoDB ReplSet dashboard:
    • Oplog Getmore Time
    • Oplog Operations
    • Oplog Processing Time
    • Oplog Buffered Operations
    • Oplog Buffer Capacity
  • Added descriptions for graphs in the following dashboards:
    • MongoDB Overview
    • MongoDB ReplSet
    • PMM Demo

Changes in PMM Client

  • PMM-491: Improved pmm-admin error messages.
  • PMM-523: Added the --verbose option for pmm-admin add.
  • PMM-592: Added the --force option for pmm-admin stop.
  • PMM-702: Added the db.serverStatus().metrics.repl.executor stats to mongodb_exporter. These new stats will be used for graphs in future releases.
  • PMM-731: Added real-time checks to pmm-admin check-network output.
  • The following commands no longer require connection to PMM Server:
    • pmm-admin start --all
    • pmm-admin stop --all
    • pmm-admin restart --all
    • pmm-admin show-passwords

    NOTE: If you want to start, stop, or restart a specific service, connection to PMM Server is still required.

About Percona Monitoring and Management

Percona Monitoring and Management is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.

PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

A live demo of PMM is available at pmmdemo.percona.com.

Please provide your feedback and questions on the PMM forum.

If you would like to report a bug or submit a feature request, use the PMM project in JIRA.

by Alexey Zhebel at April 21, 2017 10:32 PM

Percona Server for MySQL in Docker Swarm with Secrets

This quick post demonstrates using Percona Server for MySQL in Docker Swarm with some new authentication provisioning practices.

Some small changes to the startup script for the Percona-Server container image allows us to specify a file that contains password values to set as our root user’s secret. “Why do we need this functionality,” I hear you cry? When we use an environment variable, it’s not terribly hard to locate the value to which someone has set as their database root password. Environment variables are not well suited for sensitive data. We preach against leaving our important passwords in easy to reach places. So moving towards something more secure whilst retaining usability is desirable. I’ll detail the current methods, the problems, and finish off with Docker Secrets – which in my opinion, is the right direction to be heading.

Environment Variables

I’ll elaborate on the main reason why we would want to change from the default given behavior. In the documentation for using the MySQL/Percona and MariaDB containers, we are invited to start containers with an environment variable to control what the instance’s root password is set as upon startup. Let’s demonstrate with the latest official Percona-Server image from Percona’s repository of images on the Docker Hub registry:

moore@chom:~$ docker pull percona/percona-server:latest
latest: Pulling from percona/percona-server
e12c678537ae: Already exists
65ab4b835640: Pull complete
f63269a127d1: Pull complete
757a4fef28b8: Pull complete
b0cb547a5105: Pull complete
4214179ba9ea: Pull complete
155dafd2fd9c: Pull complete
848020b1da10: Pull complete
771687fe7e8b: Pull complete
Digest: sha256:f3197cac76cccd40c3525891ce16b0e9f6d650ccef76e993ed7a22654dc05b73
Status: Downloaded newer image for percona/percona-server:latest

Then start a container:

moore@chom:~$ docker run -d
--name percona-server-1
-e MYSQL_ROOT_PASSWORD='secret'
percona/percona-server
d08f299a872f1408c142b58bc2ce8e59004acfdb26dca93d71f5e9367b4f2a57
moore@chom:~$ docker ps
CONTAINER ID        IMAGE                            COMMAND             CREATED             STATUS              PORTS               NAMES
d08f299a872f        percona/percona-server           "/entrypoint.sh "   32 seconds ago      Up 31 seconds       3306/tcp            percona-server-1

Looks good, eh? Let’s inspect this container a little closer to reveal why this method is flawed:

moore@chom:~$ docker inspect --format '{{ index (index .Config.Env) 0}}' percona-server-1
MYSQL_ROOT_PASSWORD=secret

*facepalm*

We don’t want the root password exposed here, not really. If we wanted to use this method in docker-compose files, we would also be storing passwords inline, which isn’t considered a secure practice.

Environment File

Another approach is to use an environment file. This is simply a file that we can provide docker run or docker-compose in order to instantiate the environment variables within the container. It’s a construct for convenience. So just to illustrate that we have the same problem, the next example uses the mechanism of an environment file for our database container:

moore@chom:~$ echo 'MYSQL_ROOT_PASSWORD=secret' > /tmp/ps.env
moore@chom:~$ docker run -d --name percona-server-2 --env-file=/tmp/ps.env percona/percona-server
d5105d044673bd5912e0e29c2f56fa37c5f174d9d2a4811ceaba284092837c84
moore@chom:~$ docker inspect --format '{{ index (index .Config.Env) 0}}' percona-server-2
MYSQL_ROOT_PASSWORD=secret
NOTE: shortly after starting this container failed because we didn't provide mysql root password options

While we’re not specifying it in our docker run command or our docker-compose.yml file, the password value remains on our filesystem within the environment file. Again, not ideal.

Password File

With the ability to use a password file it obscures this from the inspect output. Let’s roll through the steps we would use to leverage this new option. With our new Percona-Server image, we’re going to start a container, but first let’s create an arbitrary file containing our desired password:

moore@chom:~$ docker:cloud> echo "secret" > /tmp/mysql_pwd_file

Now start a container where we’re going to bind mount the file, and use our new environment variable to point to it:

moore@chom:~$ docker run -v /tmp/mysql_pwd_file:/tmp/mysqlpwd --name percona-secret
-e MYSQL_ROOT_PASSWORD_FILE=/tmp/mysqlpwd percona/percona-server:latest

With the same inspect command, let’s show that there’s no snooping on our password value:

moore@chom:~$ docker inspect --format '{{ index (index .Config.Env) 0}}' percona-secret
MYSQL_ROOT_PASSWORD_FILE=/tmp/mysqlpwd

We are revealing the path where our password was read from within the container. For those eagle-eyed readers, this file was just a bind mounted file in the docker run command, and it’s still on the host’s filesystem.

moore@chom:~$ cat /tmp/mysql_pwd_file
secret
moore@chom:~$ docker exec percona-secret cat /tmp/mysqlpwd
secret

Not perfect, because we need to have that file available on all of our Docker hosts, but it works and we’re closer to a more robust solution.

Docker Secrets

The main reason for the new environment variable is to leverage the docker secrets feature. Since Docker version 1.13 (17.03 is now GA), we have the Docker Secrets feature, however it’s only available to the Docker Swarm workflow. If you’re not already working with Docker Swarm mode, I can’t recommend it enough. It’s part of Docker-engine, simple to get started, and intuitive since 1.13 it is compatible with docker-compose files. You don’t need to have a cluster of hardware, it’s entirely valid to use Docker Swarm on a single node. This allows you to test on your local environment with ease.

I won’t waste pixels explaining what’s already well documented in official channels, but in summary: Docker secrets is a new feature that allows us to keep sensitive information out of source code and configuration files. Secrets are stored in the Raft log which is encrypted and replicated throughout the Docker Swarm cluster. The protection and distribution come for free out of the box, which is a fantastic feature if you ask me.

So, let’s create a Docker Secret. Please note that I’ve moved to my Docker Swarm installation for this next part:

moore@chom:~$ docker:cloud> docker info | egrep -i 'swarm|version'
Server Version: 17.03.0-ce
Swarm: active

Operating as a swarm manager we have the ability to create a new secret to serve as our root user’s password:

moore@chom:~$ docker:cloud> echo "{secret_string}" | docker secret create mysql_root_password -
ugd8dx0kae9hbyt4opbolukgi

We can list all of our existing secrets:

moore@chom:~$ docker:cloud> docker secret ls
ID                          NAME                  CREATED                  UPDATED
ugd8dx0kae9hbyt4opbolukgi   mysql_root_password   Less than a second ago   Less than a second ago

Now our secret has been created, it’s obscured from us. We are unable to see it’s value.

moore@chom:~$ docker secret inspect mysql_root_password
[
    {
        "ID": "ugd8dx0kae9hbyt4opbolukgi",
        "Version": {
            "Index": 905780
        },
        "CreatedAt": "2017-04-11T23:33:08.118037434Z",
        "UpdatedAt": "2017-04-11T23:33:08.118037434Z",
        "Spec": {
            "Name": "mysql_root_password"
        }
    }
]

Now we can use our secret to set our authentication for the MySQL instance by doing the following:

moore@chom:~$ docker service create
--name percona-secret
--secret mysql_root_password
-e MYSQL_ROOT_PASSWORD_FILE=/run/secrets/mysql_root_password
percona/percona-server:latest

You can see that instead of docker run, I’ve issued the swarm equivalent docker service create, which is going to start a new Percona-Server container in the scope of my Swarm workflow. I’m also using the –secret option to tell docker to mount my secret in the container, which gets mounted to a file under the path /run/secrets/{secret_name}. The final point here, I’m passing MYSQL_ROOT_PASSWORD_FILE=/path/to/secret as an environment variable to let the startup script know where to find the file with my secret value for the root password. Once the startup routine has completed and the container has started successfully I can connect to my container to test the password was set correctly:

moore@chom:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
397bdf9b75f9 percona/percona-server "/entrypoint.sh " 46 seconds ago Up 44 seconds 3306/tcp percona-secret.1.9xvbneset9363dr5xv4fqqxua
moore@chom:~$ docker exec -ti 397bdf9b75f9 bash
mysql@397bdf9b75f9:/$ cat /run/secrets/mysql_root_password
{secret_string}
mysql@397bdf9b75f9:/$ mysql -uroot -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 4
Server version: 5.7.17-11 Percona Server (GPL), Release '11', Revision 'f60191c'
Copyright (c) 2009-2016 Percona LLC and/or its affiliates
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql>

The secret can be shared around any container where it’s necessary, simply by telling Docker to use the secret when instantiating a container. For example, if I wanted to start an application container such as a WordPress instance, I can use a secret object to easily share credentials to the data source safely and consistently.

This method is totally viable for other forms of sensitive data. For example, I can generate SSL certificates and use Docker secrets to add them to my containers for encrypted replication or forcing secure logins from remote clients. I’m still thinking of all the possible use cases for this option and no doubt will have some more to share with you in the near future.

Please share your comments, suggestions and corrections in the comments below. Thank you for reading.

by Andrew Moore at April 21, 2017 09:43 PM

Simplified Percona XtraDB Cluster SSL Configuration

wsrep-stages

Percona XtraDB Cluster SSLIn this blog post, we’ll look at a feature that recently added to Percona XtraDB Cluster 5.7.16, that makes it easier to configure Percona XtraDB Cluster SSL for all related communications. It uses mode “encrypt=4”, and configures SSL for both IST/Galera communications and SST communications using the same SSL files. “encrypt=4” is a new encryption mode added in Percona XtraDB Cluster 5.7.16 (we’ll cover it in a later blog post).

If this option is used, this will override all other Galera/SST SSL-related file options. This is to ensure that a consistent configuration is applied.
Using this option also means that the Galera/SST communications are using the same keys as client connections.

Example

This example shows how to startup a cluster using this option. We will use the default SSL files created by the bootstrap node. Basically, there are two steps:

  1. Set
    pxc_encrypt_cluster_traffic=ON
     on all nodes
  2. Ensure that all nodes share the same SSL files

Step 1: Configuration (on all nodes)

We enable the

pxc_encrypt_cluster_traffic
 option in the configuration files on all nodes. The default value of this option is “OFF”, so we enable it here.

[mysqld]
 pxc_encrypt_cluster_traffic=ON

Step 2: Startup the bootstrap node

After initializing and starting up the bootstrap node, the datadir will contain the necessary data files. Here is some SSL-related log output:

[Note] Auto generated SSL certificates are placed in data directory.
 [Warning] CA certificate ca.pem is self signed.
 [Note] Auto generated RSA key files are placed in data directory.

The required files are ca.pem, server-cert.pem and server-key.pem, which are the Certificate Authority (CA) file, the server certificate and the server private key, respectively.

Step 3: Copy the SSL files to all other nodes

Galera views the cluster as a set of homogeneous nodes, so the same configuration is expected on all nodes. Therefore, we have to copy the CA file, the server’s certificate and the server’s private key. By default, MySQL names these: ca.pem, server-cert.pem, and server-key.pem, respectively.

Step 4: Startup the other nodes

This is some log output showing that the SSL certificate files have been found. The other nodes should be using the files that were created on the bootstrap node.

[Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
[Note] Skipping generation of SSL certificates as certificate files are present in data directory.
[Warning] CA certificate ca.pem is self signed.
[Note] Skipping generation of RSA key pair as key files are present in data directory.

This is some log output (with

log_error_verbosity=3
), showing the SST reporting on the configuration used.

WSREP_SST: [DEBUG] pxc_encrypt_cluster_traffic is enabled, using PXC auto-ssl configuration
WSREP_SST: [DEBUG] with encrypt=4 ssl_ca=/my/data//ca.pem ssl_cert=/my/data//server-cert.pem ssl_key=/my/data//server-key.pem

Customization

The “ssl-ca”, “ssl-cert”, and “ssl-key” options in the “[mysqld]” section can be used to specify the location of the SSL files. If these are not specified, then the datadir is searched (using the default names of “ca.pem”, “server-cert.pem” and “server-key.pem”).

[mysqld]
 pxc_encrypt_cluster_traffic=ON
 ssl-ca=/path/to/ca.pem
 ssl-cert=/path/to/server-cert.pem
 ssl-key=/path/to/server-key.pem

If you want to implement this yourself, the equivalent configuration file options are:

[mysqld]
wsrep_provider_options=”socket.ssl_key=server-key.pem;socket.ssl_cert=server-cert.pem;socket.ssl_ca=ca.pem”
[sst]
encrypt=4
ssl-ca=ca.pem
ssl-cert=server-cert.pem
ssl-key=server-key.pem

How it works

  1. Determine the location of the SSL files
    1. Uses the values if explicitly specified (via the “ssl-ca”, “ssl-cert” and “ssl-key” options in the “[mysqld]” section)
    2. If the SSL file options are not specified, we look in the data directory for files named “ca.pem”, “server-cert.pem” and “server-key.pem” for the CA file, the server certificate, and the server key, respectively.
  2. Modify the configuration
    1. Overrides the values for socket.ssl_ca, socket.ssl_cert, and socket.ssl_key in
      wsrep_provider_options
       in the “[mysqld]” section.
    2. Sets “encrypt=4” in the “[sst]” section.
    3. Overrides the values for ssl-ca, ssl-cert and ssl-key in the “[sst]” section.

This is not a dynamic setting, and is only available on startup.

by Kenn Takara at April 21, 2017 05:09 PM

How to Setup and Troubleshoot Percona PAM with LDAP for External Authentication

Percona PAM

Percona PAMIn this blog, we’ll look at how to setup and troubleshoot the Percona PAM authentication plugin.

We occasionally get requests from our support clients on how to get Percona Server for MySQL to authenticate with an external authentication service via LDAP or Active Directory. However, we normally do not have access to client’s infrastructure to help troubleshoot these cases. To help them effectively, we need to setup a testbed to reproduce their issues and guide them on how to get authentication to work. Fortunately, we only need to install Samba to provide an external authentication service for both LDAP and AD.

In this article, I will show you how to (a) compile and install Samba, (b) create a domain environment with Samba, (c) add users and groups to this domain and (d) get Percona Server to use these accounts for authentication via LDAP. In my follow-up article, I will discuss how to get MySQL to authenticate credentials with Active Directory.

My testbed environment consists of two machines

Samba PDC
OS: CentOS 7
IP Address: 172.16.0.10
Hostname: samba-10.example.com
Domain name: EXAMPLE.COM
DNS: 8.8.8.8(Google DNS), 8.8.4.4(Google DNS), 172.16.0.10(Samba)
Firewall: none

Percona Server 5.7 with LDAP authentication
OS: CentOS 7
IP Address: 172.16.0.20
Hostname: ps-ldap-20.example.com

and have several users and groups:

Domain Groups and Users
Support: jericho, jervin and vishal
DBA: sidd, paul and arunjith
Search: ldap

Compile and Install Samba

We will install an NTP client on the Samba PDC/samba-10.example.com machine because time synchronization is a requirement for domain authentication. We will also compile and install Samba from source because the Samba implementation in the official repository doesn’t include the Active Directory Domain Controller role. Hence, samba-tool is not included in the official repository. For our testbed, we need this tool because it makes it easier to provision a domain and manage users and groups. So, for CentOS 7, you can either build from source or use a trusted 3rd party build of Samba (as discussed in Samba’s wiki).

For more information, please read Setting up Samba as an Active Directory Domain Controller as well.

  1. Install, configure, and run the NTP client. Ensure that this client service runs when the server boots up:

[root@samba-10 ~]# yum -y install ntp
* * *
Installed:
  ntp.x86_64 0:4.2.6p5-25.el7.centos.1
Dependency Installed:
  autogen-libopts.x86_64 0:5.18-5.el7                     ntpdate.x86_64 0:4.2.6p5-25.el7.centos.1
[root@samba-10 ~]# ntpdate 0.centos.pool.ntp.org
 7 Apr 06:06:07 ntpdate[9788]: step time server 202.90.132.242 offset 0.807640 sec
[root@samba-10 ~]# systemctl enable ntpd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/ntpd.service to /usr/lib/systemd/system/ntpd.service.
[root@samba-10 ~]# systemctl start ntpd.service

  1. Install compilers and library dependencies for compiling Samba:

[root@samba-10 ~]# yum -y install gcc perl python-devel gnutls-devel libacl-devel openldap-devel
* * *
Installed:
  gcc.x86_64 0:4.8.5-11.el7  gnutls-devel.x86_64 0:3.3.24-1.el7  libacl-devel.x86_64 0:2.2.51-12.el7  openldap-devel.x86_64 0:2.4.40-13.el7  perl.x86_64 4:5.16.3-291.el7  python-devel.x86_64 0:2.7.5-48.el7
Dependency Installed:
  cpp.x86_64 0:4.8.5-11.el7                            cyrus-sasl.x86_64 0:2.1.26-20.el7_2               cyrus-sasl-devel.x86_64 0:2.1.26-20.el7_2             glibc-devel.x86_64 0:2.17-157.el7_3.1
  glibc-headers.x86_64 0:2.17-157.el7_3.1              gmp-devel.x86_64 1:6.0.0-12.el7_1                 gnutls-c++.x86_64 0:3.3.24-1.el7                      gnutls-dane.x86_64 0:3.3.24-1.el7
  kernel-headers.x86_64 0:3.10.0-514.10.2.el7          ldns.x86_64 0:1.6.16-10.el7                       libattr-devel.x86_64 0:2.4.46-12.el7                  libevent.x86_64 0:2.0.21-4.el7
  libmpc.x86_64 0:1.0.1-3.el7                          libtasn1-devel.x86_64 0:3.8-3.el7                 mpfr.x86_64 0:3.1.1-4.el7                             nettle-devel.x86_64 0:2.7.1-8.el7
  p11-kit-devel.x86_64 0:0.20.7-3.el7                  perl-Carp.noarch 0:1.26-244.el7                   perl-Encode.x86_64 0:2.51-7.el7                       perl-Exporter.noarch 0:5.68-3.el7
  perl-File-Path.noarch 0:2.09-2.el7                   perl-File-Temp.noarch 0:0.23.01-3.el7             perl-Filter.x86_64 0:1.49-3.el7                       perl-Getopt-Long.noarch 0:2.40-2.el7
  perl-HTTP-Tiny.noarch 0:0.033-3.el7                  perl-PathTools.x86_64 0:3.40-5.el7                perl-Pod-Escapes.noarch 1:1.04-291.el7                perl-Pod-Perldoc.noarch 0:3.20-4.el7
  perl-Pod-Simple.noarch 1:3.28-4.el7                  perl-Pod-Usage.noarch 0:1.63-3.el7                perl-Scalar-List-Utils.x86_64 0:1.27-248.el7          perl-Socket.x86_64 0:2.010-4.el7
  perl-Storable.x86_64 0:2.45-3.el7                    perl-Text-ParseWords.noarch 0:3.29-4.el7          perl-Time-HiRes.x86_64 4:1.9725-3.el7                 perl-Time-Local.noarch 0:1.2300-2.el7
  perl-constant.noarch 0:1.27-2.el7                    perl-libs.x86_64 4:5.16.3-291.el7                 perl-macros.x86_64 4:5.16.3-291.el7                   perl-parent.noarch 1:0.225-244.el7
  perl-podlators.noarch 0:2.5.1-3.el7                  perl-threads.x86_64 0:1.87-4.el7                  perl-threads-shared.x86_64 0:1.43-6.el7               unbound-libs.x86_64 0:1.4.20-28.el7
  zlib-devel.x86_64 0:1.2.7-17.el7
Complete!

  1. Download, compile and install Samba:

[root@samba-10 ~]# yum -y install wget
* * *
[root@samba-10 ~]# wget https://www.samba.org/samba/ftp/samba-latest.tar.gz
* * *
2017-04-07 06:16:59 (337 KB/s) - 'samba-latest.tar.gz' saved [21097045/21097045]
[root@samba-10 ~]# tar xzf samba-latest.tar.gz
[root@samba-10 ~]# cd samba-4.6.2/
[root@samba-10 samba-4.6.2]# ./configure --prefix=/opt/samba
Checking for program gcc or cc           : /usr/bin/gcc
Checking for program cpp                 : /usr/bin/cpp
Checking for program ar                  : /usr/bin/ar
Checking for program ranlib              : /usr/bin/ranlib
* * *
Checking compiler for PIE support                                                               : yes
Checking compiler for full RELRO support                                                        : yes
Checking if toolchain accepts -fstack-protector                                                 : yes
'configure' finished successfully (39.119s)
[root@samba-10 samba-4.6.2]# make
WAF_MAKE=1 python ./buildtools/bin/waf build
Waf: Entering directory `/root/samba-4.6.2/bin'
symlink: tevent.py -> python/tevent.py
* * *
[3773/3775] Linking default/source3/modules/libvfs_module_acl_xattr.so
[3774/3775] Linking default/source3/modules/libvfs_module_shadow_copy.so
[3775/3775] Linking default/source3/modules/libvfs_module_dirsort.so
Waf: Leaving directory `/root/samba-4.6.2/bin'
'build' finished successfully (6m58.144s)
[root@samba-10 samba-4.6.2]# make install
WAF_MAKE=1 python ./buildtools/bin/waf install
Waf: Entering directory `/root/samba-4.6.2/bin'
* creating /opt/samba/etc
* creating /opt/samba/private
* * *
* installing bin/default/source3/nmbd/nmbd.inst as /opt/samba/sbin/nmbd
* installing bin/default/file_server/libservice_module_s3fs.inst.so as /opt/samba/lib/service/s3fs.so
Waf: Leaving directory `/root/samba-4.6.2/bin'
'install' finished successfully (1m44.377s)

Please take note that when I downloaded Samba, the latest version was 4.6.2. If you have a problem with compiling the latest version of Samba, try using version 4.6.2.

  1. Include executable path of Samba to the PATH variable so we can call samba binaries without specifying its absolute path:

[root@samba-10 samba-4.6.2]# echo "PATH=/opt/samba/sbin:/opt/samba/bin:/usr/sbin:/usr/bin" >> /etc/environment
[root@samba-10 samba-4.6.2]# PATH=/opt/samba/sbin:/opt/samba/bin:/usr/sbin:/usr/bin
[root@samba-10 samba-4.6.2]# which samba-tool
/opt/samba/bin/samba-tool

  1. Setup systemd script for Samba and ensure that this service auto starts on server boot

[root@samba-10 samba-4.6.2]# echo "[Unit]
Description=Samba PDC
After=syslog.target network.target
[Service]
Type=forking
PIDFile=//opt/samba/var/run/samba.pid
ExecStart=/opt/samba/sbin/samba -D
ExecReload=/usr/bin/kill -HUP $MAINPID
ExecStop=/usr/bin/kill $MAINPID
[Install]
WantedBy=multi-user.target" > /etc/systemd/system/samba.service
[root@samba-10 samba-4.6.2]# systemctl enable samba.service
Created symlink from /etc/systemd/system/multi-user.target.wants/samba.service to /etc/systemd/system/samba.service.

  1. Remove existing /etc/krb5.conf, because the existing configuration prevents us from provisioning a new domain.

[root@samba-10 samba-4.6.2]# rm -f /etc/krb5.conf
[root@samba-10 samba-4.6.2]# cd
[root@samba-10 ~]#

  1. Done.

Create a domain environment with Samba

  1. To setup a domain, all we need to do is to run “samba-tool domain provision” and pass the following details:

Realm: EXAMPLE.COM
Domain: EXAMPLE
Server Role: dc(domain controller)
DNS backend: SAMBA_INTERNAL
DNS forwarder IP address: 8.8.8.8

You will also need to supply the Administrator password. This account is used to join a workstation or server to a domain:

[root@samba-10 ~]# samba-tool domain provision
Realm [EXAMPLE.ORG]: EXAMPLE.COM
 Domain [EXAMPLE]: EXAMPLE
 Server Role (dc, member, standalone) [dc]: dc
 DNS backend (SAMBA_INTERNAL, BIND9_FLATFILE, BIND9_DLZ, NONE) [SAMBA_INTERNAL]: SAMBA_INTERNAL
 DNS forwarder IP address (write 'none' to disable forwarding) [8.8.8.8]: 8.8.8.8
Administrator password:
Retype password:
Looking up IPv4 addresses
Looking up IPv6 addresses
No IPv6 address will be assigned
Setting up secrets.ldb
Setting up the registry
Setting up the privileges database
Setting up idmap db
Setting up SAM db
Setting up sam.ldb partitions and settings
Setting up sam.ldb rootDSE
Pre-loading the Samba 4 and AD schema
Adding DomainDN: DC=example,DC=com
Adding configuration container
Setting up sam.ldb schema
Setting up sam.ldb configuration data
Setting up display specifiers
Modifying display specifiers
Adding users container
Modifying users container
Adding computers container
Modifying computers container
Setting up sam.ldb data
Setting up well known security principals
Setting up sam.ldb users and groups
Setting up self join
Adding DNS accounts
Creating CN=MicrosoftDNS,CN=System,DC=example,DC=com
Creating DomainDnsZones and ForestDnsZones partitions
Populating DomainDnsZones and ForestDnsZones partitions
Setting up sam.ldb rootDSE marking as synchronized
Fixing provision GUIDs
A Kerberos configuration suitable for Samba AD has been generated at /opt/samba/private/krb5.conf
Once the above files are installed, your Samba4 server will be ready to use
Server Role:           active directory domain controller
Hostname:              samba-10
NetBIOS Domain:        EXAMPLE
DNS Domain:            example.com
DOMAIN SID:            S-1-5-21-1337223342-1741564684-602463608

Please take note that if you get the error below, it’s likely due to not removing the existing /etc/krb5.conf before using samba-tool:

ERROR(ldb): uncaught exception - operations error at ../source4/dsdb/samdb/ldb_modules/password_hash.c:2820
  File "/opt/samba/lib64/python2.7/site-packages/samba/netcmd/__init__.py", line 176, in _run
    return self.run(*args, **kwargs)
  File "/opt/samba/lib64/python2.7/site-packages/samba/netcmd/domain.py", line 471, in run
    nosync=ldap_backend_nosync, ldap_dryrun_mode=ldap_dryrun_mode)
  File "/opt/samba/lib64/python2.7/site-packages/samba/provision/__init__.py", line 2175, in provision
    skip_sysvolacl=skip_sysvolacl)
  File "/opt/samba/lib64/python2.7/site-packages/samba/provision/__init__.py", line 1787, in provision_fill
    next_rid=next_rid, dc_rid=dc_rid)
  File "/opt/samba/lib64/python2.7/site-packages/samba/provision/__init__.py", line 1447, in fill_samdb
    "KRBTGTPASS_B64": b64encode(krbtgtpass.encode('utf-16-le'))
  File "/opt/samba/lib64/python2.7/site-packages/samba/provision/common.py", line 55, in setup_add_ldif
    ldb.add_ldif(data, controls)
  File "/opt/samba/lib64/python2.7/site-packages/samba/__init__.py", line 225, in add_ldif
    self.add(msg, controls)

You could also get an error if you entered a simple password for the Administrator account.

  1. Create a symlink of the generated krb5.conf in /etc. This configuration is used authenticate machines, accounts and services:

[root@samba-10 ~]# ln -s /opt/samba/private/krb5.conf /etc

  1. Start the Samba service:

[root@samba-10 ~]# systemctl start samba.service

  1. Check network ports to see if Samba is running:

[root@samba-10 ~]# yum -y install net-tools
* * *
[root@samba-10 ~]# netstat -tapn
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:464             0.0.0.0:*               LISTEN      13296/samba
tcp        0      0 0.0.0.0:53              0.0.0.0:*               LISTEN      13302/samba
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      875/sshd
tcp        0      0 0.0.0.0:88              0.0.0.0:*               LISTEN      13296/samba
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      1327/master
tcp        0      0 0.0.0.0:636             0.0.0.0:*               LISTEN      13294/samba
tcp        0      0 0.0.0.0:445             0.0.0.0:*               LISTEN      13307/smbd
tcp        0      0 0.0.0.0:1024            0.0.0.0:*               LISTEN      13291/samba
tcp        0      0 0.0.0.0:1025            0.0.0.0:*               LISTEN      13291/samba
tcp        0      0 0.0.0.0:3268            0.0.0.0:*               LISTEN      13294/samba
tcp        0      0 0.0.0.0:3269            0.0.0.0:*               LISTEN      13294/samba
tcp        0      0 0.0.0.0:389             0.0.0.0:*               LISTEN      13294/samba
tcp        0      0 0.0.0.0:135             0.0.0.0:*               LISTEN      13291/samba
tcp        0      0 0.0.0.0:139             0.0.0.0:*               LISTEN      13307/smbd

  1. Done.

Add users and groups to this domain

Now that Samba is running we can add users and groups, and assign users to groups with samba-tool.

  1. Add groups by running “samba-tool group add group_name”:

[root@samba-10 ~]# samba-tool group add support
Added group support
[root@samba-10 ~]# samba-tool group add dba
Added group dba
[root@samba-10 ~]# samba-tool group add search
Added group search

  1. Add users by running “samba-tool user create username”:

[root@samba-10 ~]# samba-tool user create jericho
New Password:
Retype Password:
User 'jericho' created successfully
[root@samba-10 ~]# samba-tool user create jervin
New Password:
Retype Password:
User 'jervin' created successfully
[root@samba-10 ~]# samba-tool user create vishal
New Password:
Retype Password:
User 'vishal' created successfully
[root@samba-10 ~]# samba-tool user create sidd
New Password:
Retype Password:
User 'sidd' created successfully
[root@samba-10 ~]# samba-tool user create paul
New Password:
Retype Password:
User 'paul' created successfully
[root@samba-10 ~]# samba-tool user create arunjith
New Password:
Retype Password:
User 'arunjith' created successfully
[root@samba-10 ~]# samba-tool user create ldap
New Password:
Retype Password:
User 'ldap' created successfully

  1. Add users to their corresponding groups with “samba-tool group addmembers group_name user,user2,usern”:

[root@samba-10 ~]# samba-tool group addmembers support jericho,jervin,vishal
Added members to group support
[root@samba-10 ~]# samba-tool group addmembers dba sidd,paul,arunjith
Added members to group dba
[root@samba-10 ~]# samba-tool group addmembers search ldap
Added members to group search

  1. Verify that users, groups and memberships exist with commands “samba-tool user list”, “samba-tool group list” and “samba-tool group listmembers group_name”:

[root@samba-10 ~]# samba-tool user list
Administrator
arunjith
jericho
jervin
krbtgt
vishal
Guest
ldap
paul
sidd
[root@samba-10 ~]# samba-tool group list
Allowed RODC Password Replication Group
Enterprise Read-Only Domain Controllers
Denied RODC Password Replication Group
Pre-Windows 2000 Compatible Access
Windows Authorization Access Group
Certificate Service DCOM Access
Network Configuration Operators
Terminal Server License Servers
Incoming Forest Trust Builders
Read-Only Domain Controllers
Group Policy Creator Owners
Performance Monitor Users
Cryptographic Operators
Distributed COM Users
Performance Log Users
Remote Desktop Users
Account Operators
Event Log Readers
RAS and IAS Servers
Backup Operators
Domain Controllers
Server Operators
Enterprise Admins
Print Operators
Administrators
Domain Computers
Cert Publishers
DnsUpdateProxy
Domain Admins
Domain Guests
Schema Admins
Domain Users
Replicator
IIS_IUSRS
DnsAdmins
Guests
Users
support
search
dba
[root@samba-10 ~]# samba-tool group listmembers support
jervin
jericho
vishal
[root@samba-10 ~]# samba-tool group listmembers dba
arunjith
sidd
paul
[root@samba-10 ~]# samba-tool group listmembers search
ldap

For more information on using samba-tool, just run

samba-tool --help
.

  1. Done.

How to get Percona Server to use these accounts for authentication via LDAP

We will be using the machine ps-ldap-20.example.com to offer MySQL service with LDAP authentication via Percona PAM. If you’re not familiar with Percona PAM, please have a look at this before moving forward.

At this point, our Samba service is running with users, groups and memberships added. We can now query Samba via LDAP ports 389 and 636. We will configure the server to do LDAP lookups when searching for users and groups. This is necessary because we use the name service to validate group membership. We will then install Percona Server for MySQL and configure our PAM plugin to use

nss-pam-ldapd
 to authenticate to LDAP. Finally, we will test LDAP authentication on Percona Server for MySQL using a regular user and proxy user.

  1. Install
    nss-pam-ldapd
     and
    nscd
    . We will use these packages to query LDAP server from our server:

[root@ps-20 ~]# yum -y install nss-pam-ldapd

  1. Configure
    nss-pam-ldapd
     by incorporating our Samba’s LDAP settings:

[root@ps-20 ~]# echo "uid nslcd
gid ldap
pagesize 1000
referrals off
idle_timelimit 800
filter passwd (&(objectClass=user)(objectClass=person)(!(objectClass=computer)))
map    passwd uid           sAMAccountName
map    passwd uidNumber     objectSid:S-1-5-21-1337223342-1741564684-602463608
map    passwd gidNumber     objectSid:S-1-5-21-1337223342-1741564684-602463608
map    passwd homeDirectory "/home/$cn"
map    passwd gecos         displayName
map    passwd loginShell    "/bin/bash"
filter group (|(objectClass=group)(objectClass=person))
map    group gidNumber      objectSid:S-1-5-21-1337223342-1741564684-602463608
uri ldaps://172.16.0.10
base dc=example,dc=com
tls_reqcert never
binddn cn=ldap,cn=Users,dc=example,dc=com
bindpw MyLdapPasswordDontCopyIt2017" > /etc/nslcd.conf

As you can see above, this config contains LDAP settings, mapping custom LDAP attributes, and LDAP credentials. The value of objectSid was taken from “DOMAIN SID” that was generated when I created a new domain. So, be sure to use the value of “DOMAIN SID” generated on your end. Otherwise, your LDAP queries will not match any record. However, if you’re authenticating from an existing Windows AD server, you can obtain the value of “DOMAIN SID” by running “Get-ADDomain”. Also, you can take a look at this link to get to know more about other configurations for nslcd.conf.

  1. Add LDAP lookup to nsswitch service by editing /etc/nsswitch.conf:

Find:
passwd: files sss
shadow: files sss
group: files sss

Replace with:
passwd: files sss ldap
shadow: files sss ldap
group: files sss ldap

  1. Run nslcd in debug mode:

[root@ps-20 ~]# nslcd -d
nslcd: DEBUG: add_uri(ldaps://172.16.0.10)
nslcd: DEBUG: ldap_set_option(LDAP_OPT_X_TLS_REQUIRE_CERT,0)
nslcd: version 0.8.13 starting
nslcd: DEBUG: unlink() of /var/run/nslcd/socket failed (ignored): No such file or directory
nslcd: DEBUG: initgroups("nslcd",55) done
nslcd: DEBUG: setgid(55) done
nslcd: DEBUG: setuid(65) done
nslcd: accepting connections

  1. Test if LDAP lookups work by running “id ” and “getent passwd” on another terminal:

[root@ps-20 ~]# id jervin
uid=1107(jervin) gid=1107(jervin) groups=1107(jervin),1103(support)
[root@ps-20 ~]# id paul
uid=1110(paul) gid=1110(paul) groups=1110(paul),1104(dba)
[root@ps-20 ~]# getent passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
avahi-autoipd:x:170:170:Avahi IPv4LL Stack:/var/lib/avahi-autoipd:/sbin/nologin
systemd-bus-proxy:x:999:997:systemd Bus Proxy:/:/sbin/nologin
systemd-network:x:998:996:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
polkitd:x:997:995:User for polkitd:/:/sbin/nologin
tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
user:x:1000:1000:user:/home/user:/bin/bash
mysql:x:27:27:Percona Server:/var/lib/mysql:/bin/false
nscd:x:28:28:NSCD Daemon:/:/sbin/nologin
nslcd:x:65:55:LDAP Client User:/:/sbin/nologin
Administrator:*:500:500::/home/Administrator:/bin/bash
arunjith:*:1111:1111::/home/arunjith:/bin/bash
jericho:*:1106:1106::/home/jericho:/bin/bash
jervin:*:1107:1107::/home/jervin:/bin/bash
krbtgt:*:502:502::/home/krbtgt:/bin/bash
vishal:*:1108:1108::/home/vishal:/bin/bash
Guest:*:501:501::/home/Guest:/bin/bash
ldap:*:1112:1112::/home/ldap:/bin/bash
paul:*:1110:1110::/home/paul:/bin/bash
sidd:*:1109:1109::/home/sidd:/bin/bash

If you take a look at the nslcd terminal again, you will see that it’s trying to resolve the user and group identification with LDAP searches:

* * *
nslcd: [7b23c6] <passwd=1107> DEBUG: ldap_simple_bind_s("cn=ldap,cn=Users,dc=example,dc=com","***") (uri="ldaps://172.16.0.10")
nslcd: [7b23c6] <passwd=1107> DEBUG: ldap_result(): CN=jervin,CN=Users,DC=example,DC=com
nslcd: [7b23c6] <passwd=1107> DEBUG: ldap_result(): end of results (1 total)
nslcd: [3c9869] DEBUG: connection from pid=10468 uid=0 gid=0
nslcd: [3c9869] <passwd=1107> DEBUG: myldap_search(base="dc=example,dc=com", filter="(&(&(objectClass=user)(objectClass=person)(!(objectClass=computer)))(objectSid=�1�5�0�0�0�0�0�515�0�0�0ae68b44f�c2bce6778dde8...
* * *
nslcd: [5558ec] <passwd="paul"> DEBUG: myldap_search(base="dc=example,dc=com", filter="(&(&(objectClass=user)(objectClass=person)(!(objectClass=computer)))(sAMAccountName=paul))")
nslcd: [5558ec] <passwd="paul"> DEBUG: ldap_result(): CN=paul,CN=Users,DC=example,DC=com
nslcd: [5558ec] <passwd="paul"> DEBUG: ldap_result(): end of results (1 total)
* * *
nslcd: [e2a9e3] <passwd(all)> DEBUG: myldap_search(base="dc=example,dc=com", filter="(&(objectClass=user)(objectClass=person)(!(objectClass=computer)))")
nslcd: [e2a9e3] <passwd(all)> DEBUG: ldap_result(): CN=Administrator,CN=Users,DC=example,DC=com
nslcd: [e2a9e3] <passwd(all)> DEBUG: ldap_result(): CN=arunjith,CN=Users,DC=example,DC=com
nslcd: [e2a9e3] <passwd(all)> DEBUG: ldap_result(): CN=jericho,CN=Users,DC=example,DC=com
nslcd: [e2a9e3] <passwd(all)> DEBUG: ldap_result(): CN=jervin,CN=Users,DC=example,DC=com
nslcd: [e2a9e3] <passwd(all)> DEBUG: ldap_result(): CN=krbtgt,CN=Users,DC=example,DC=com
nslcd: [e2a9e3] <passwd(all)> DEBUG: ldap_result(): CN=vishal,CN=Users,DC=example,DC=com
nslcd: [e2a9e3] <passwd(all)> DEBUG: ldap_result(): CN=Guest,CN=Users,DC=example,DC=com
nslcd: [e2a9e3] <passwd(all)> DEBUG: ldap_result(): CN=ldap,CN=Users,DC=example,DC=com
nslcd: [e2a9e3] <passwd(all)> DEBUG: ldap_result(): CN=paul,CN=Users,DC=example,DC=com
nslcd: [e2a9e3] <passwd(all)> DEBUG: ldap_result(): CN=sidd,CN=Users,DC=example,DC=com
nslcd: [e2a9e3] <passwd(all)> DEBUG: ldap_result(): end of results (10 total)

Now that we know nslcd is working, shut it down by running “Ctrl-C”.

  1. Run nslcd normally and make sure it starts up on boot:

[root@ps-20 ~]# systemctl start nslcd.service
[root@ps-20 ~]# systemctl enable nslcd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/nslcd.service to /usr/lib/systemd/system/nslcd.service.

  1. Install and run Percona Server for MySQL 5.7 and make sure it runs when the server boots up:

[root@ps-20 ~]# rpm -Uvh https://www.percona.com/redir/downloads/percona-release/redhat/percona-release-0.1-4.noarch.rpm
Retrieving https://www.percona.com/redir/downloads/percona-release/redhat/percona-release-0.1-4.noarch.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:percona-release-0.1-4            ################################# [100%]
[root@ps-20 ~]# yum -y install Percona-Server-server-57
* * *
[root@ps-20 ~]# mysqld --initialize-insecure --user=mysql
[root@ps-20 ~]# systemctl start mysqld.service
[root@ps-20 ~]# systemctl enable mysqld.service
Created symlink from /etc/systemd/system/mysql.service to /usr/lib/systemd/system/mysqld.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/mysqld.service to /usr/lib/systemd/system/mysqld.service.

  1. Login to MySQL and change the root password:

[root@ps-20 ~]# mysql -uroot
mysql> SET PASSWORD=PASSWORD('MyNewAndImprovedPassword');

  1. Install the Percona PAM plugin:

mysql> delete from mysql.user where user='';
Query OK, 0 rows affected (0.00 sec)
mysql> INSTALL PLUGIN auth_pam SONAME 'auth_pam.so';
Query OK, 0 rows affected (0.01 sec)
mysql> INSTALL PLUGIN auth_pam_compat SONAME 'auth_pam_compat.so';
Query OK, 0 rows affected (0.00 sec)

  1. Configure Percona PAM to authenticate to LDAP by creating /etc/pam.d/mysqld with this content:

auth required pam_ldap.so
account required pam_ldap.so

  1. Create a MySQL user that will authenticate via auth_pam:

mysql> CREATE USER jervin@'%' IDENTIFIED WITH auth_pam;
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT ALL PRIVILEGES ON support.* TO jervin@'%';
Query OK, 0 rows affected (0.00 sec)
mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)

  1. Login as this user and check grants:

[root@ps-20 ~]# mysql -u jervin
Password:
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 22
Server version: 5.7.17-13 Percona Server (GPL), Release 13, Revision fd33d43
Copyright (c) 2009-2016 Percona LLC and/or its affiliates
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> SHOW GRANTS;
+-----------------------------------------------------+
| Grants for jervin@%                                 |
+-----------------------------------------------------+
| GRANT USAGE ON *.* TO 'jervin'@'%'                  |
| GRANT ALL PRIVILEGES ON `support`.* TO 'jervin'@'%' |
+-----------------------------------------------------+
2 rows in set (0.00 sec)

It works! However, if you have 100 support users who have the same MySQL privileges, creating 100 MySQL users is tedious and can be difficult to maintain. If belonging to a group has certain MySQL privileges, setup proxy users instead to map a user’s privilege to its defined group. We will implement this for both dba and support users in the next step.

For now, delete the user we just created:

mysql> DROP USER jervin@'%';
Query OK, 0 rows affected (0.00 sec)

  1. Create proxy user and proxied accounts:

mysql> CREATE USER ''@'' IDENTIFIED WITH auth_pam as 'mysqld,support=support_users,dba=dba_users';
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE USER support_users@'%' IDENTIFIED BY 'some_password';
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE USER dba_users@'%' IDENTIFIED BY 'some_password';
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT ALL PRIVILEGES ON support.* TO support_users@'%';
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT ALL PRIVILEGES ON *.* TO dba_users@'%';
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT PROXY ON support_users@'%' TO ''@'';
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT PROXY ON dba_users@'%' TO ''@'';
Query OK, 0 rows affected (0.00 sec)
mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)

To know more about setting up proxy users, see this article written by Stephane.

  1. Let’s try logging in as “jericho” and “paul” and see if they inherit the privileges of their group.

[root@ps-20 ~]# mysql -ujericho -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 25
Server version: 5.7.17-13 Percona Server (GPL), Release 13, Revision fd33d43
Copyright (c) 2009-2016 Percona LLC and/or its affiliates
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> SELECT user(), current_user(), @@proxy_user;
+-------------------+-----------------+--------------+
| user()            | current_user()  | @@proxy_user |
+-------------------+-----------------+--------------+
| jericho@localhost | support_users@% | ''@''        |
+-------------------+-----------------+--------------+
1 row in set (0.00 sec)
mysql> SHOW GRANTS;
+------------------------------------------------------------+
| Grants for support_users@%                                 |
+------------------------------------------------------------+
| GRANT USAGE ON *.* TO 'support_users'@'%'                  |
| GRANT ALL PRIVILEGES ON `support`.* TO 'support_users'@'%' |
+------------------------------------------------------------+
2 rows in set (0.00 sec)
mysql> quit
Bye
[root@ps-20 ~]# mysql -upaul -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 27
Server version: 5.7.17-13 Percona Server (GPL), Release 13, Revision fd33d43
Copyright (c) 2009-2016 Percona LLC and/or its affiliates
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> SELECT user(), current_user(), @@proxy_user;
+----------------+----------------+--------------+
| user()         | current_user() | @@proxy_user |
+----------------+----------------+--------------+
| paul@localhost | dba_users@%    | ''@''        |
+----------------+----------------+--------------+
1 row in set (0.00 sec)
mysql> SHOW GRANTS;
+------------------------------------------------+
| Grants for dba_users@%                         |
+------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'dba_users'@'%' |
+------------------------------------------------+
1 row in set (0.00 sec)

As you can see, they did inherit the MySQL privileges of their groups.

  1. Done.

Conclusion

To be honest, setting up Percona PAM with LDAP can be challenging if you add this functionality with existing infrastructure. But hopefully, by setting this up in a lab environment from scratch, and doing some tests, you’ll be confident enough to incorporate this feature in production environments.

by Jaime Sicam at April 21, 2017 04:20 PM

Jean-Jerome Schmidt

ClusterControl for Galera Cluster for MySQL

ClusterControl allows you to easily manage your database infrastructure on premise or in the cloud. With in-depth support for technologies like Galera Cluster for MySQL and MariaDB setups, you can truly automate mixed environments for next-level applications.

Since the launch of ClusterControl in 2012, we’ve experienced growth in new industries with customers who are benefiting from the advancements ClusterControl has to offer - in particular when it comes to Galera Cluster for MySQL.

In addition to reaching new highs in ClusterControl demand, this past year we’ve doubled the size of our team allowing us to continue to provide even more improvements to ClusterControl.

Take a look at this infographic for our top Galera Cluster for MySQL resources and information about how ClusterControl works with Galera Cluster.

<<

by Severalnines at April 21, 2017 03:04 PM

Peter Zaitsev

Enabling Percona XtraDB Cluster SST Traffic Encryption

dh key too small

Percona XtraDB Cluster SST Traffic EncryptionIn this blog post, we’ll look at enabling Percona XtraDB Cluster SST Traffic Encryption, and some of the changes to the SSL-based encryption of SST traffic in Percona XtraDB Cluster 5.7.16.

Some background

Percona XtraDB Cluster versions prior to 5.7 support encryption methods 0, 1, 2 and 3:

  • encrypt = 0 : (default) No encryption
  • encrypt = 1 : Symmetric encryption using AES-128, user-supplied key
  • encrypt = 2 : SSL-based encryption with a CA and cert files (via socat)
  • encrypt = 3 : SSL-based encryption with cert and key files (via socat)

We are deprecating modes encrypt=1,2,3 in favor of the new mode, encrypt=4. “encrypt=3” is not recommended, since it does not verify the cert being used (it cannot verify since no Certificate Authority (CA) file is provided). “encrypt=2” and “encrypt=3” use a slightly different way of building the SSL files than MySQL does. In order to remove confusion, we’ve deprecated these modes in favor of “encrypt=4”, which can use the MySQL generated SSL files.

New feature: encrypt= 4

The previous SSL methods (encrypt=2 and encrypt=3), are based on socat usage, http://www.dest-unreach.org/socat/doc/socat-openssltunnel.html. The certs are not built the same way as the certs created by MySQL (for encryption of client communication with MySQL). To simplify SSL configuration and usage, we added a new encryption method (encrypt=4) so that the SSL files generated by MySQL can now be used for SSL encryption of SST traffic.

For instructions on how to create these files, see https://www.percona.com/doc/percona-xtradb-cluster/LATEST/howtos/encrypt-traffic.html.

Configuration

In general, galera views the cluster as homogeneous, so it expects that all nodes are identically configured. This extends to the SSL configuration, so the preferred configuration is that all machines share the same certs/keys. The security implication is that possession of these certs/keys allows a machine to join the cluster and receive all of the data. So proper care must be taken to secure the SSL files.

The mode “encrypt=4” uses the same option names as MySQL, so it reads the SSL information from “ssl-ca”, “ssl-cert”, and “ssl-key” in the “[sst]” section of the configuration file.

Example my.cnf:

[sst]
 encrypt=4
 ssl-ca=/path/to/ca.pem
 ssl-cert=/path/to/server-cert.pem
 ssl-key=/path/to/server-key.pem

All three options (ssl-ca, ssl-cert, and ssl-key) must be specified otherwise the SST will return an error.

ssl-ca

This is the location of the Certificate Authority (CA) file. Only servers that have certificates generated from this CA file will be allowed to connect if SSL is enabled.

ssl-cert

This is the location fo the Certificate file. This is the digital certificate that will be sent to the other side of the SSL connection. The remote server will then verify that this certificate was generated from the Certificate Authority file in use by the remote server.

ssl-key

This is the location of the private key for the certificate specified in ssl-cert.

by Kenn Takara at April 21, 2017 02:48 PM

April 20, 2017

MariaDB AB

Webinar – Get Started with MariaDB on Docker

Webinar – Get Started with MariaDB on Docker MariaDB Team Thu, 04/20/2017 - 17:52

While containers can be great ephemeral vessels for your applications, your data needs to be able to survive containers coming and going, maintain its availability and reliability, and grow when you need it. 

Join MariaDB Field CTO Alvin Richards on April 27 at 10:00 a.m. PT to get started with MariaDB on Docker. This webinar will cover: 

  • Existing deployment models vs. containers 
  • Items to consider when deploying a database on Docker 
  • Steps to build and scale an application with MariaDB on Docker

Date: April 27, 2017
Time: 10:00 a.m. PT

Register here for the webinar. 

AlvinRichards.jpg

Alvin Richards
Field CTO, MariaDB

Alvin Richards is the Field CTO at MariaDB, the leading high-performance open source relational database, where he connects the dots between practitioners, innovators and MariaDB products. In his prior life, Alvin was vice president of product at Aerospike; ran engineering teams at Docker and MongoDB, leading the revolution of microservices and NoSQL; was technical director at NetApp, working to integrate databases and virtual infrastructures with storage; and worked at Oracle on data warehousing products.

While containers can be great ephemeral vessels for your applications, your data needs to be able to survive containers coming and going, maintain its availability and reliability, and grow when you need it. Join our upcoming webinar to learn how to build your own MariaDB container.

Login or Register to post comments

by MariaDB Team at April 20, 2017 09:52 PM

Peter Zaitsev

Percona Live Featured Session with Alibaba Cloud – Flashback: Rolling Back a MySQL/MariaDB Instance, Database or Table to a Previous Snapshot

Percona Live Featured Session

Percona Live Featured SessionWelcome to another post in the series of Percona Live featured session blogs! In these blogs, we’ll highlight some of the session speakers that will be at this year’s Percona Live conference. We’ll also discuss how these sessions can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured session, we’ll meet Staff Engineer Guangzhou Zhang (who focuses on PostgreSQL), Staff Engineer Lixun Peng (who focuses on Replication), Senior Engineer Weixiang Zhai (who focuses on InnoDB) and Senior Engineer Xin Liu (who focuses on MongoDB) who are all from Alibaba Cloud, the cloud computing arm of Alibaba Group.

Alibaba Cloud is holding a session called Flashback: Rolling Back a MySQL/MariaDB Instance, Database or Table to a Previous Snapshot. The talk will discuss how Flashback is currently implemented, what it currently can and can’t do, and what features are in the pipeline for future MariaDB/AliSQL releases.

I had a chance to speak with them about Flashback:

Percona: How did you get into database technology? What do you love about it?

Guangzhou Zhang: Database technology is fundamental to every IT system as it lays the foundation to provide persistency, concurrency and availability. What makes it even more attractive and exciting is in recent years, the “old” database technology has found new directions and innovations in today’s age of cloud computing. There is so much work that can be done fitting open source databases into cloud environments, or even innovating new “cloud native” database architectures in the public cloud.

Lixun: When I was in university, I found database theory very interesting. I decided to be a DBA after graduation. Then I studied lots of Oracle Database books. When I graduated, funnily enough, I became a MySQL DBA, which has meant that I have focused on MySQL-related work until now. MySQL is a great database, but it’s not perfect! I always have optimization requirements to enhance its performance and improve the functionality step by step. I have found it very interesting though and continue to be happy with what it makes possible. And now many of Alibaba Cloud’s users are using my code: this is a great feeling.

Percona: Your talk is called Flashback: Rolling Back a MySQL/MariaDB Instance, Database or Table to a Previous Snapshot. Why would somebody need to rollback a database instance?

Lixun: Anyone can make mistakes, include DBAs. After users mishandle their data, we need to recover from the failure as soon as possible. Then we need a way to recover the data from the correct snapshot, and if possible, do it online and fast. That’s why I have implemented the Flashback feature as it provides the ability to achieve this.

Percona: What are the issues you face rolling back an instance? How does Flashback help?

Lixun: We can, of course, recover data from the last full backup set and incremental binary logs, but if a user’s database is too huge, it could take a while! This is particularly frustrating as it can only be a small amount of the data that needs to be modified, but we still need to recover the whole database.

Flashback allows you to reverse the last misoperation from binary logs. More often than not this will be a small activity, so the speed is much faster than recovery from a full backup. And we don’t need to stop the instance server to do carry this out. That’s very important for the Cloud users.

Percona: What do you want attendees to take away from your session? Why should they attend?

Lixun: I hope the attendees of my session can learn how and why Flashback works, the best way to use it and when they should try to use it.

And Flashback still has some limitations that the users should be aware of. I plan to address these in future versions.

I contributed the Flashback feature to MySQL and MariaDB at the same time. MariaDB 10.2 released it. We are still developing the feature, and I want attendees to know what’s in the roadmap during my session.

Percona: What are you most looking forward to at Percona Live 2017?

Xin Liu: There are two things I’m looking forward to at Percona Live. Firstly, holding technical discussion groups around the subject of our talks or about other open source databases. Also, I‘m interested in other NoSQL-focused database topics, such as HBase, Redis, Cassandra, etc. I also want to learn more about its core storage engine, especially WiredTiger or MongoRocks (which are the MongoDB’s storage engines). Gathering more details, design information or ideas for improvements will bring benefit to for us and our work.

Lixun: The best thing for me is meeting with the best MySQL engineers at the conference. There are very few chances to communicate with the engineers from around the world about the latest technology, and share updates with each other.

Percona: Talk about your team’s other topics . . .

Lixun: The topic proposed by Xin Liu (Multi Active-Active and Disaster Recovery with MongoDB Database Center) demonstrates how we can recover a MongoDB cloud service from a disaster failure, even if we lost whole cluster of a region. Active-Active deployment is the typical way in our production environment, and we developed a system called “Lamda” for replicating asynchronous replication within each region.

The talk from Weixiang Zhai (Scale Read Workload by Sharing Data Files of InnoDB) will introduce how we changed InnoDB so that MySQL can be deployed on shared storage and we can provide the ability to scale out read-only workload.

Guangzhou Zhang (On Building Alibaba’s Public Cloud Database Service for PostgreSQL and MySQL) will talk about the problems we solved while fitting PostgreSQL engines into our public cloud database services. We introduced a lot of enhancements in the database engine to solve disk IO or memory isolation problems. The talk also includes a comparison of PostgreSQL and MySQL covering why and how to deal with them differently within our service.

Register for Percona Live Data Performance Conference 2017, and see Lixun present Flashback: Rolling Back a MySQL/MariaDB Instance, Database or Table to a Previous Snapshot. Use the code FeaturedTalk and receive $100 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community, as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara and the Santa Clara Convention Center.

by Dave Avery at April 20, 2017 09:13 PM

Tracking IST Progress in Percona XtraDB Cluster

Percona XtraDB Cluster SST Traffic Encryption

ISTIn this blog post, we’ll look at how Percona XtraDB Cluster uses IST.

Introduction

Percona XtraDB Cluster uses the concept of an Incremental State Transfer (IST). When a node of the cluster leaves the cluster for a short period of time, it can rejoin the cluster by getting the delta set of missing changes from any active node in the cluster.

This process of getting the delta set of changes is named as IST in Percona XtraDB Cluster.

Tracking IST Progress

The number of write-sets/changes that the joining node needs to catch up on when rejoining the cluster is dictated by:

  1. The duration the node was not present in the cluster
  2. The workload of the cluster during that time frame

This catch-up process can be time-consuming. Until this process is complete, the rejoining node is not ready to process any active workloads.

We believe that any process that is time-consuming should have a progress monitor attached to it. This is exactly what we have done.

In the latest release of Percona XtraDB Cluster 5.7.17-29.20, we added an IST progress monitor that is exposed through SHOW STATUS. This helps you to monitor the percentage of write-sets which has been applied by the rejoining node.

Let’s see this in a working example:

  • Start a two-node cluster
  • Process some basic workloads, allow cluster replication
  • Shutdown node-2
  • Node-1 then continues to process more workloads (the workload fits the allocated gcache)
  • Restart Node-2, causing it to trigger an IST

mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+--------------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------------+
| wsrep_ist_receive_status | 3% complete, received seqno 1421771 of 1415410-1589676 |
+--------------------------+--------------------------------------------------------+
1 row in set (0.00 sec)
....
mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| wsrep_ist_receive_status | 52% complete, received seqno 1506799 of 1415410-1589676 |
+--------------------------+---------------------------------------------------------+
1 row in set (0.00 sec)
....
mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| wsrep_ist_receive_status | 97% complete, received seqno 1585923 of 1415410-1589676 |
+--------------------------+---------------------------------------------------------+
1 row in set (0.00 sec)
mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| wsrep_ist_receive_status | |
+--------------------------+-------+
1 row in set (0.00 sec)

As you can see, the wsrep_ist_receive_status monitoring string indicates the percentage completed, currently received write-set and the range of write-sets applicable to the IST.

Once the IST activity is complete, the variable shows an empty-string.

Closing Comments

I hope you enjoy this newly added feature. Percona Engineering would be happy to hear from you, about more such features that can help you make effective use of Percona XtraDB Cluster. We will try our best to include them in our future plans (based on feasibility).

Note: Special thanks for Kenn Takara and Roel Van de Paar for helping me edit this post.

by Krunal Bauskar at April 20, 2017 08:33 PM

More Trackable Flow Control for Percona XtraDB Cluster

Percona XtraDB Cluster SST Traffic Encryption

Percona XtraDB ClusterIn this blog post, we’ll discuss trackable flow control in Percona XtraDB Cluster.

Introduction

Percona XtraDB Cluster has a self-regulating mechanism called Flow Control. This mechanism helps to avoid a situation wherein the weakest/slowest member of the cluster falls significantly behind other members of the cluster.

When a member of a cluster is slow at applying write-sets (while simultaneously continuing to receive write-sets from the cluster group channel), then the incoming/receive queue grows in size. If this queue crosses a set threshold (gcs.fc_limit), the node emits a FLOW_CONTROL message asking other members to slow down or halt processing.

While this happens transparently for the end-user, it’s important for the cluster administrator to know whether a node is using flow-control. If it is, it can affect overall cluster productivity.

Finding if a node is in flow control

FLOW_CONTROL is not a persistent state. A node enters FLOW_CONTROL once the queue size exceeds the set threshold. It is released back again once the queue size falls back below the low-end watermark.

So how can one view the higher and lower watermarks?

Up until now, this was exposed only in the log files. However, starting with Percona XtraDB Cluster 5.7.17-29.20, this is now exposed and available through SHOW STATUS:

mysql> show status like 'wsrep_flow_control_interval';
+-----------------------------+------------+
| Variable_name | Value |
+-----------------------------+------------+
| wsrep_flow_control_interval | [ 12, 23 ] |
+-----------------------------+------------+
1 row in set (0.00 sec)
mysql> set global wsrep_provider_options="gcs.fc_limit=100";
Query OK, 0 rows affected (0.00 sec)
mysql> show status like 'wsrep_flow_control_interval';
+-----------------------------+-------------+
| Variable_name | Value |
+-----------------------------+-------------+
| wsrep_flow_control_interval | [ 71, 141 ] |
+-----------------------------+-------------+
1 row in set (0.00 sec)
mysql> set global wsrep_provider_options="gcs.fc_factor=0.75";
Query OK, 0 rows affected (0.00 sec)
mysql> show status like 'wsrep_flow_control_interval';
+-----------------------------+--------------+
| Variable_name | Value |
+-----------------------------+--------------+
| wsrep_flow_control_interval | [ 106, 141 ] |
+-----------------------------+--------------+
1 row in set (0.01 sec)

As you can see, the wsrep_flow_control_interval status variable emits a range representing the lower and higher watermark. A value set of (12, 23) means that if the incoming queue size is greater than 23, then FLOW_CONTROL is enabled. If the size falls below 12, FLOW_CONTROL is released.

Still, this does not show us if the node is using FLOW_CONTROL at any given moment.

To address this, we simultaneously introduced a wsrep_flow_control_status status variable into the same release. This boolean status variable tells the user if the node is in FLOW_CONTROL or not. Once the node is out of flow-control, the variable is switched to OFF, and vice versa to ON when the node is employing flow-control:

show status like '%flow%';
.....
| wsrep_flow_control_sent | 28156 |
| wsrep_flow_control_recv | 28156 |
| wsrep_flow_control_interval | [ 106, 141 ] |
| wsrep_flow_control_status | ON |

Finally, the wsrep_flow_control_sent/recv counter can be used to track FLOW_CONTROL status. This shows an aggregated count of how many times flow-control kicked in. You can clear them using FLUSH STATUS.

mysql> show status like '%flow%';
.....
| wsrep_flow_control_sent | 31491 |
| wsrep_flow_control_recv | 31491 |
| wsrep_flow_control_interval | [ 106, 141 ] |
| wsrep_flow_control_status | OFF |

Takeaway thoughts

We hope you enjoy the new features added to Percona XtraDB Cluster, including flow control tracking!

We like feedback from customers (via the usual Customer Support website), and where possible from the community (via Percona JIRA and Launchpad), on which features they most want to see. We recently re-engineered Percona XtraDB Cluster to have much-increased performance (a request originating from the community), and the new IST Progress Meter and Flow Control Monitoring were introduced to aid in solving customer-facing challenges.

If you have a feature you would like to see, let us know!

Note: Special thanks for Kenn Takara and Roel Van de Paar for helping me edit this post.

by Krunal Bauskar at April 20, 2017 05:38 PM

Percona Live Featured Session with Casper Kejlberg-Rasmussen: Placing Databases @ Uber

Percona Live Featured Session

Percona Live Featured SessionWelcome to another post in the series of Percona Live featured session blogs! In these blogs, we’ll highlight some of the session speakers that will be at this year’s Percona Live conference. We’ll also discuss how these sessions can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured session, we’ll meet Casper Kejlberg-Rasmussen, Software Developer at Uber. His session is Placing Databases @ Uber. Uber has many thousands of MySQL databases running inside of Docker containers on thousands of hosts. When deciding exactly which host a database should run on, it is important that you avoid hosts running databases of the same cluster as the one you are placing, and that you avoid placing all databases of a cluster on the same rack or in the same data center.

I had a chance to talk to Casper about Uber and database placement:

CasperPercona: How did you get into database technology? What do you love about it?

Casper: When I took my Ph.D., my thesis area was about dynamic data structures. During my bachelor, master and Ph.D., I took all the algorithms and data structure classes I could. So it was natural for me to also work with databases in my professional career. Databases are a prime example of a very useful dynamic data structure.

Percona: Your talk is called Placing Database @ Uber. What do you mean by placing databases, and why is it important?

Casper: At Uber, the storage team manages all of Uber’s storage offerings. Our main database technology is an in-house NoSQL database called Schemaless. Schemaless builds on top of MySQL (and we specifically use our own fork of Percona’s MySQL variant, found here in GitHub). We have many thousands of databases that run inside of Docker containers. Whenever we need to create a new Schemaless instance for our internal customers, or we need to add capacity to an existing Schemaless instance, we need to place new Docker containers with Percona Server for MySQL running inside. For our Schemaless instances to be reliable, durable and highly available, we need to place databases in at least two different data centers. We want to avoid placing two databases of the same instance on the same rack or the same host. So consideration needs to be taken when deciding where to place the databases.

Percona: What are some of the conditions that affect where you place databases?

Casper: When we place a database we have to take into account the labels on the hosts we consider. These labels can be which data center or rack the host is part of, or what Clusto (Clusto is an internal hardware management tool we have) pools a host belongs to. This can be a testing, staging or production host, etc. A host also has “relations.” A relation is also a label, but instead of stating facts about the host, a relation states what other databases are running on the host. An example of a relation label is schemaless.instance.mezzanine, which indicates that the host is running a Schemaless database from the Mezzanine instance. Another example is schemaless.cluster.percona-cluster-mezzanine-us1-db01, which indicates that the database is a Schemaless database belonging to the cluster percona-cluster-mezzanine-us1-db01.

Percona: What do you want attendees to take away from your session? Why should they attend?

Casper: I want the attendees to remember that there are three steps when placing a database or any container:

  1. Filter out any host that fails the hard requirements (like not having enough resources) or fails the label or relation requirements (like having databases of the same instance as the one we want to place)
  2. Rank the hosts to select the best one, which can be having a host with the most free space left or having a low number of other databases on it
  3. As time passes and databases consume more resources, we want to relocate databases to other hosts at which it makes more sense to place them.

People should attend my session to see how to get good database placements in a simple way.

Percona: What are you most looking forward to at Percona Live 2017?

Casper: I look forward to hearing about other NoSQL solutions, and hearing about different storage engines for MySQL systems. And of course meeting other exciting people from the database community! 🙂

Register for Percona Live Data Performance Conference 2017, and see Casper present Placing Databases @ Uber. Use the code FeaturedTalk and receive $100 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community, as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara and the Santa Clara Convention Center.

by Dave Avery at April 20, 2017 04:17 PM

Jean-Jerome Schmidt

MySQL on Docker: ClusterControl and Galera Cluster on Docker Swarm

Our journey in adopting MySQL and MariaDB in containerized environments continues, with ClusterControl coming into the picture to facilitate deployment and management. We already have our ClusterControl image hosted in Docker Hub, where it can deploy different replication/cluster topologies on multiple containers. With the introduction of Docker Swarm, a native orchestration tools embedded inside Docker Engine, scaling and provisioning containers has become much easier. It also has high availability covered by running services on multiple Docker hosts.

In this blog post, we’ll be experimenting with automatic provisioning of Galera Cluster on Docker Swarm with ClusterControl. ClusterControl would usually deploy database clusters on bare-metal, virtual machines and cloud instances. ClusterControl relies on SSH (through libssh) as core communication module to connect to the managed hosts, so these would not require any agents. The same rule can be applied to containers, and that’s what we are going to show in this blog post.

ClusterControl as Docker Swarm Service

We have built a Docker image with extended logic to handle deployment in container environments in a semi-automatic way. The image is now available on Docker Hub and the code is hosted in our Github repository. Please note that only this image is capable of deploying on containers, and is not available in the standard ClusterControl installation packages.

The extended logic is inside deploy-container.sh, a script that monitors a custom table inside CMON database called “cmon.containers”. The created database container shall report and register itself into this table and this script will look for new entries and perform the necessary action using a ClusterControl CLI. The deployment is automatic, and you can monitor the progress directly from the ClusterControl UI or using “docker logs” command.

Before we go further, take note of some prerequisites for running ClusterControl and Galera Cluster on Docker Swarm:

  • Docker Engine version 1.12 and later.
  • Docker Swarm Mode is initialized.
  • ClusterControl must be connected to the same overlay network as the database containers.

To run ClusterControl as a service using “docker stack”, the following definition should be enough:

 clustercontrol:
    deploy:
      replicas: 1
    image: severalnines/clustercontrol
    ports:
      - 5000:80
    networks:
      - galera_cc

Or, you can use the “docker service” command as per below:

$ docker service create --name cc_clustercontrol -p 5000:80 --replicas 1 severalnines/clustercontrol

Or, you can combine the ClusterControl service together with the database container service and form a “stack” in a compose file as shown in the next section.

Base Containers as Docker Swarm Service

The base container’s image called “centos-ssh” is based on CentOS 6 image. It comes with a couple of basic packages like SSH server, clients, curl and mysql client. The entrypoint script will download ClusterControl’s public key for passwordless SSH during startup. It will also register itself to the ClusterControl’s CMON database for automatic deployment.

Running this container requires a couple of environment variables to be set:

  • CC_HOST - Mandatory. By default it will try to connect to “cc_clustercontrol” service name. Otherwise, define its value in IP address, hostname or service name format. This container will download the SSH public key from ClusterControl node automatically for passwordless SSH.
  • CLUSTER_TYPE - Mandatory. Default to “galera”.
  • CLUSTER_NAME - Mandatory. This name distinguishes the cluster with others from ClusterControl perspective. No space allowed and it must be unique.
  • VENDOR - Default is “percona”. Other supported values are “mariadb”, “codership”.
  • DB_ROOT_PASSWORD - Mandatory. The database root password for the database server. In this case, it should be MySQL root password.
  • PROVIDER_VERSION - Default is 5.6. The database version by the chosen vendor.
  • INITIAL_CLUSTER_SIZE - Default is 3. This indicates how ClusterControl should treat newly registered containers, whether they are for new deployments or for scaling out. For example, if the value is 3, ClusterControl will wait for 3 containers to be running and registered into the CMON database before starting the cluster deployment job. Otherwise, it waits 30 seconds for the next cycle and retries. The next containers (4th, 5th and Nth) will fall under the “Add Node” job instead.

To run the container, simply use the following stack definition in a compose file:

  galera:
    deploy:
      replicas: 3
    image: severalnines/centos-ssh
    ports:
      - 3306:3306
    environment:
      CLUSTER_TYPE: "galera"
      CLUSTER_NAME: "PXC_Docker"
      INITIAL_CLUSTER_SIZE: 3
      DB_ROOT_PASSWORD: "mypassword123"
    networks:
      - galera_cc

By combining them both (ClusterControl and database base containers), we can just deploy them under a single stack as per below:

version: '3'

services:

  galera:
    deploy:
      replicas: 3
      restart_policy:
        condition: on-failure
        delay: 10s
    image: severalnines/centos-ssh
    ports:
      - 3306:3306
    environment:
      CLUSTER_TYPE: "galera"
      CLUSTER_NAME: "Galera_Docker"
      INITIAL_CLUSTER_SIZE: 3
      DB_ROOT_PASSWORD: "mypassword123"
    networks:
      - galera_cc

  clustercontrol:
    deploy:
      replicas: 1
    image: severalnines/clustercontrol
    ports:
      - 5000:80
    networks:
      - galera_cc

networks:
  galera_cc:
    driver: overlay

Save the above lines into a file, for example docker-compose.yml in the current directory. Then, start the deployment:

$ docker stack deploy --compose-file=docker-compose.yml cc
Creating network cc_galera_cc
Creating service cc_clustercontrol
Creating service cc_galera

Docker Swarm will deploy one container for ClusterControl (replicas:1) and another 3 containers for the database cluster containers (replicas:3). The database container will then register itself into the CMON database for deployment.

Wait for a Galera Cluster to be ready

The deployment will be automatically picked up by the ClusterControl CLI. So you basically don’t have to do anything but wait. The deployment usually takes around 10 to 20 minutes depending on the network connection.

Open the ClusterControl UI at http://{any_Docker_host}:5000/clustercontrol, fill in the default administrator user details and log in. Monitor the deployment progress under Activity -> Jobs, as shown in the following screenshot:

Or, you can look at the progress directly from the docker logs command of the ClusterControl container:

$ docker logs -f $(docker ps | grep clustercontrol | awk {'print $1'})
>> Found the following cluster(s) is yet to deploy:
Galera_Docker

>> Number of containers for Galera_Docker is lower than its initial size (3).
>> Nothing to do. Will check again on the next loop.
>> Found the following cluster(s) is yet to deploy:
Galera_Docker

>> Found a new set of containers awaiting for deployment. Sending deployment command to CMON.
>> Cluster name         : Galera_Docker
>> Cluster type         : galera
>> Vendor               : percona
>> Provider version     : 5.7
>> Nodes discovered     : 10.0.0.6 10.0.0.7 10.0.0.5
>> Initial cluster size : 3
>> Nodes to deploy      : 10.0.0.6;10.0.0.7;10.0.0.5

>> Deploying Galera_Docker.. It's gonna take some time..
>> You shall see a progress bar in a moment. You can also monitor
>> the progress under Activity (top menu) on ClusterControl UI.
Create Galera Cluster
- Job  1 RUNNING    [██▊       ]  26% Installing MySQL on 10.0.0.6

That’s it. Wait until the deployment completes and you will then be all set with a three-node Galera Cluster running on Docker Swarm, as shown in the following screenshot:

In ClusterControl, it has the same look and feel as what you have seen with Galera running on standard hosts (non-container) environment.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Management

Managing database containers is a bit different with Docker Swarm. This section provides an overview of how the database containers should be managed through ClusterControl.

Connecting to the Cluster

To verify the status of the replicas and service name, run the following command:

$ docker service ls
ID            NAME               MODE        REPLICAS  IMAGE
eb1izph5stt5  cc_clustercontrol  replicated  1/1       severalnines/clustercontrol:latest
ref1gbgne6my  cc_galera          replicated  3/3       severalnines/centos-ssh:latest

If the application/client is running on the same Swarm network space, you can connect to it directly via the service name endpoint. If not, use the routing mesh by connecting to the published port (3306) on any of the Docker Swarm nodes. The connection to these endpoints will be load balanced automatically by Docker Swarm in a round-robin fashion.

Scale up/down

Typically, when adding a new database, we need to prepare a new host with base operating system together with passwordless SSH. In Docker Swarm, you just need to scale out the service using the following command to the number of replicas that you desire:

$ docker service scale cc_galera=5
cc_galera scaled to 5

ClusterControl will then pick up the new containers registered inside cmon.containers table and trigger add node jobs, for one container at a time. You can look at the progress under Activity -> Jobs:

Scaling down is similar, by using the “service scale” command. However, ClusterControl doesn’t know whether the containers that have been removed by Docker Swarm were part of the auto-scheduling or just a scale down (which indicates that we deliberately wanted the containers to be removed). Thus, to scale down from 5 nodes to 3 nodes, one would:

$ docker service scale cc_galera=3
cc_galera scaled to 3

Then, remove the stopped hosts from the ClusterControl UI by going to Nodes -> rollover the removed container -> click on the ‘X’ icon on the top right -> Confirm & Remove Node:

ClusterControl will then execute a remove node job and bring back the cluster to the expected size.

Failover

In case of container failure, Docker Swarm automatic rescheduling will kick in and there will be a new replacement container with the same IP address as the old one (with different container ID). ClusterControl will then start to provision this node from scratch, by performing the installation process, configuration and getting it to rejoin the cluster. The old container will be removed automatically from ClusterControl before the deployment starts.

Go ahead and try to kill one of the database containers:

$ docker kill [container ID]

You’ll see the new containers that Swarm created will be provisioned automatically by ClusterControl.

Creating a new cluster

To create a new cluster, just create another service or stack with a different CLUSTER_NAME and service name. The following is an example that we want to create another Galera Cluster running on MariaDB 10.1 (some extra environment variables are required for MariaDB 10.1):

version: '3'
services:
  galera2:
    deploy:
      replicas: 3
    image: severalnines/centos-ssh
    ports:
      - 3306
    environment:
      CLUSTER_TYPE: "galera"
      CLUSTER_NAME: "MariaDB_Galera"
      VENDOR: "mariadb"
      PROVIDER_VERSION: "10.1"
      INITIAL_CLUSTER_SIZE: 3
      DB_ROOT_PASSWORD: "mypassword123"
    networks:
      - cc_galera_cc

networks:
  cc_galera_cc:
    external: true

Then, create the service:

$ docker stack deploy --compose-file=docker-compose.yml db2

Go back to ClusterControl UI -> Activity -> Jobs and you should see a new deployment has started. After a couple of minutes, you will see the new cluster will be listed inside ClusterControl dashboard:

Destroying everything

To remove everything (including the ClusterControl container), you just need to remove the stack created by Docker Swarm:

$ docker stack rm cc
Removing service cc_clustercontrol
Removing service cc_galera
Removing network cc_galera_cc

That’s it, the whole stack has been removed. Pretty neat huh? You can start all over again by running the “docker stack deploy” command and everything will be ready after a couple of minutes.

Summary

The flexibility you get by running a single command to deploy or destroy a whole environment can be useful in different types of use cases such as backup verification, DDL procedure testing, query performance tweaking, experimenting for proof-of-concepts and also for staging temporary data. These use cases are closer to developer environment. With this approach, you can now treat a stateful service “statelessly”.

Would you like to see ClusterControl manage the whole database container stack through the UI via point and click? Let us know your thoughts in the comments section below. In the next blog post, we are going to look at how to perform automatic backup verification on Galera Cluster using containers.

by ashraf at April 20, 2017 09:16 AM

April 19, 2017

Peter Zaitsev

How We Made Percona XtraDB Cluster Scale

Percona XtraDB Cluster SST Traffic Encryption

Percona XtraDB ClusterIn this blog post, we’ll look at the actions and efforts Percona experts took to scale Percona XtraDB Cluster.

Introduction

When we first started analyzing Percona XtraDB Cluster performance, it was pretty bad. We would see contention even with 16 threads. Performance was even worse with sync binlog=1, although the same pattern was observed even with the binary log disabled. The effect was not only limited to OLTP workloads, as even other workloads (like update-key/non-key) were also affected in a wider sense than OLTP.

That’s when we started analyzing the contention issues and found multiple problems. We will discuss all these problems and the solutions we adapted. But before that, let’s look at the current performance level.

Check this blog post for more details.

The good news is Percona XtraDB Cluster is now optimized to scale well for all scenarios, and the gain is in the range of 3x-10x.

Understanding How MySQL Commits a Transaction

Percona XtraDB Cluster contention is associated mainly with Commit Monitor contention, which comes into the picture during commit time. It is important to understand the context around it.

When a commit is invoked, it proceeds in two phases:

  • Prepare phase: mark the transaction as PREPARE, updating the undo segment to capture the updated state.
    • If bin-log is enabled, redo changes are not persisted immediately. Instead, a batch flush is done during Group Commit Flush stage.
    • If bin-log is disabled, then redo changes are persisted immediately.
  • Commit phase: Mark the transaction commit in memory.
    • If bin-log is enabled, Group Commit optimization kicks in, thereby causing a flush of redo-logs (that persists changes done to db-objects + PREPARE state of transaction) and this action is followed by a flush of the binary logs. Since the binary logs are flushed, redo log capturing of transaction commit doesn’t need to flush immediately (Saving fsync)
    • If bin-log is disabled, redo logs are flushed on completion of the transaction to persist the updated commit state of the transaction.

What is a Monitor in Percona XtraDB Cluster World?

Monitors help maintain transaction ordering. For example, the Commit Monitor ensures that no transaction with a global-seqno greater than the current commit-processing transaction’s global seqno is allowed to proceed.

How Percona XtraDB Cluster Commits a Transaction

Percona XtraDB Cluster follows the existing MySQL semantics of course, but has its own step to commit the transaction in the replication world. There are two important themes:

  1. Apply/Execution of transaction can proceed in parallel
  2. Commit is serialized based on cluster-wide global seqno.

Let’s understand the commit flow with Percona XtraDB Cluster involved (Percona XtraDB Cluster registers wsrep as an additional storage engine for replication).

  • Prepare phase:
    • wsrep prepare: executes two main actions:
      • Replicate the transaction (adding the write-set to group-channel)
      • Entering CommitMonitor. Thereby enforcing ordering of transaction.
    • binlog prepare: nothing significant (for this flow).
    • innobase prepare: mark the transaction in PREPARE state.
      • As discussed above, the persistence of the REDO log depends on if the binlog is enabled/disabled.
  • Commit phase
    • If bin-log is enabled
      • MySQL Group Commit Logic kicks in. The semantics ensure that the order of transaction commit is the same as the order of them getting added to the flush-queue of the group-commit.
    • If bin-log is disabled
      • Normal commit action for all registered storage engines is called with immediate persistence of redo log.
    • Percona XtraDB Cluster then invokes the post_commit hook, thereby releasing the Commit Monitor so that the next transaction can make progress.

With that understanding, let’s look at the problems and solutions:

PROBLEM-1:

Commit Monitor is exercised such that the complete commit operation is serialized. This limits the parallelism associated with the prepare-stage. With log-bin enabled, this is still ok since redo logs are flushed at group-commit flush-stage (starting with 5.7). But if log-bin is disabled, then each commit causes an independent redo-log-flush (in turn probable fsync).

OPTIMIZATION-1:

Split the replication pre-commit hook into two explicit actions: replicate (add write-set to group-channel) + pre-commit (enter commit-monitor).

The replicate action is performed just like before (as part of storage engine prepare). That will help complete the InnoDB prepare action in parallel (exploring much-needed parallelism in REDO flush with log-bin disabled).

On completion of replication, the pre-commit hook is called. That leads to entering the Commit Monitor for enforcing the commit ordering of the transactions. (Note: Replication action assigns the global seqno. So even if a transaction with a higher global seqno finishes the replication action earlier (due to CPU scheduling) than the transaction with a lower global seqno, it will wait in the pre-commit hook.)

Improved parallelism in the innodb-prepare stage helps accelerate log-bin enabled flow, and the same improved parallelism significantly helps in the log-bin disabled case by reducing redo-flush contention, thereby reducing fsyncs.


PROBLEM-2:

MySQL Group Commit already has a concept of ordering transactions based on the order of their addition to the GROUP COMMIT queue (FLUSH STAGE queue to be specific). Commit Monitor enforces the same, making the action redundant but limiting parallelism in MySQL Group Commit Logic (including redo-log flush that is now delayed to the flush stage).

With the existing flow (due to the involvement of Commit Monitor), only one transaction can enter the GROUP COMMIT Queue, thereby limiting optimal use of Group Commit Logic.

OPTIMIZATION-2:

Release the Commit Monitor once the transaction is successfully added to flush-stage of group-commit. MySQL will take it from there to maintain the commit ordering. (We call this interim-commit.)

Releasing the Commit Monitor early helps other transactions to make progress and real MySQL Group Commit Leader-Follower Optimization (batch flushing/sync/commit) comes into play.

This also helps ensure batch REDO log flushing.


PROBLEM-3:

This problem is specific to when the log-bin is disabled. Percona XtraDB Cluster still generates the log-bin, as it needs it for forming a replication write-set (it just doesn’t persist this log-bin information). If disk space is not a constraint, then I would suggest operating Percona XtraDB Cluster with log-bin enabled.

With log-bin disabled, OPTIMIZATION-1 is still relevant, but OPTIMIZATION-2 isn’t, as there is no group-commit protocol involved. Instead, MySQL ensures that the redo-log (capturing state change of transaction) is persisted before reporting COMMIT as a success. As per the original flow, the Commit Monitor is not released till the commit action is complete.

OPTIMIZATION-3:

The transaction is already committed to memory and the state change is captured. This is about persisting the REDO log only (REDO log modification is already captured by mtr_commit). This means we can release the Commit Monitor just before the REDO flush stage kicks in. Correctness is still ensured as the REDO log flush always persists the data sequentially. So even if trx-1 loses its slots before the flush kicks in, and trx-2 is allowed to make progress, trx-2’s REDO log flush ensures that trx-1’s REDO log is also flushed.


Conclusion

With these three main optimizations, and some small tweaks, we have tuned Percona XtraDB Cluster to scale better and made it fast enough for the growing demands of your applications. All of this is available with the recently released Percona XtraDB Cluster 5.7.17-29.20. Give it a try and watch your application scale in a multi-master environment, making Percona XtraDB Cluster the best fit for your HA workloads.

by Krunal Bauskar at April 19, 2017 09:46 PM

Percona XtraDB Cluster 5.6.35-26.20-3 is now available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.6.34-26.19Percona announces the release of Percona XtraDB Cluster 5.6.35-26.20-3 on April 13, 2017. Binaries are available from the downloads section or our software repositories.

Percona XtraDB Cluster 5.6.35-26.20-3 is now the current release, based on the following:

All Percona software is open-source and free.

NOTE: Due to end of life, Percona will stop producing packages for the following distributions after July 31, 2017:

  • Red Hat Enterprise Linux 5 (Tikanga)
  • Ubuntu 12.04 LTS (Precise Pangolin)

You are strongly advised to upgrade to latest stable versions if you want to continue using Percona software.

Fixed Bugs

  • Updated semantics for gcache page cleanup to trigger when either gcache.keep_pages_size or gcache.keep_pages_count exceeds the limit, instead of both at the same time.
  • Added support for passing the XtraBackup buffer pool size with the use-memory option under [xtrabackup] and the innodb_buffer_pool_size option under [mysqld] when the --use-memory option is not passed with the inno-apply-opts option under [sst].
  • Fixed gcache page cleanup not triggering when limits are exceeded.
  • PXC-782: Updated xtrabackup-v2 script to use the tmpdir option (if it is set under [sst][xtrabackup] or [mysqld], in that order).
  • PXC-784: Fixed the pc.recovery procedure to abort if the gvwstate.dat file is empty or invalid, and fall back to normal joining process. For more information, see 1669333.
  • PXC-794: Updated the sockopt option to include a comma at the beginning if it is not set by the user.
  • PXC-797: Blocked wsrep_desync toggling while the node is paused to avoid halting the cluster when running FLUSH TABLES WITH READ LOCK. For more information, see 1370532.
  • Fixed several packaging and dependency issues.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

by Alexey Zhebel at April 19, 2017 08:40 PM

Performance improvements in Percona XtraDB Cluster 5.7.17-29.20

Percona XtraDB Cluster

In our latest release of Percona XtraDB Cluster, we’ve introduced major performance improvements to the MySQLwrite-set replication layer. In this post, we want to show what these improvements look like.

For the test, we used the sysbench OLTP_RW, UPDATE_KEY and UPDATE_NOKEY workloads with 100 tables, 4mln rows each, which gives about 100GB of datasize. In all the tests we use a three-node setup, connected via a 10GB network, with the sysbench load directed to the one primary node.

In the first chart, we show improvements comparing to the previous version (5.7.16):

Percona XtraDB Cluster

The main improvements come from concurrent workloads, under multiple threads.

The previous chart is for cases using enabled binary logs, but in some situations we will have deployments without binary logs enabled (Percona XtraDB Cluster does not require them). The latest release significantly improves performance for this case as well.

Here is a chart showing throughput without binary logs:

Percona XtraDB Cluster

Where does Percona XtraDB Cluster place in comparison with similar technologies? To find out, we’ll compare this release with MySQL 5.7.17 Group Replication and with the recently released MariaDB 10.2.5 RC.

For MySQL 5.7.17 Group Replication, I’ve reviewed two cases: “durable” with sync_binlog=1, and “relaxed durability” with sync_binlog=0.

Also for MySQL 5.7.17 Group Replication, we want to review two cases with different flow_control settings. The first setting is flow_control=25000 (the default setting). It provides better performance, but with the drawbacks that non-primary nodes will fall behind significantly and MySQL Group Replication does not provide a way to protect from reading stale data. So with a default flow_control=25000, we risk reading very outdated data. We also tested MySQL Group Replication with flow_control=1000 to minimize stale data on non-primary nodes.

A note on the Flow Control topic: it is worth mentioning that we also changed the flow_control default for Percona XtraDB Cluster. The default value is 100 instead of 16 (as in version 5.7.16).

Comparison chart with sync_binlog=1 (for MySQL Group Replication):

Percona XtraDB Cluster

Comparison chart with sync_binlog=0 (for MySQL Group Replication):

Percona XtraDB Cluster

So there are couple conclusions we can make out of these charts.

  1. The new version of Percona XtraDB Cluster performs on the level with MySQL Group Replication
  2. flow_control for MySQl Group Replication really makes a difference for performance, and default flow_control=25000 is better (with the risk of a lot of outdated data on non-primary nodes)

The reference our benchmark files and config files are here.

by Vadim Tkachenko at April 19, 2017 01:50 PM

Percona XtraDB Cluster 5.7.17-29.20 is now available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.7Percona announces the release of Percona XtraDB Cluster 5.7.17-29.20 on April 19, 2017. Binaries are available from the downloads section or our software repositories.

NOTE: You can also run Docker containers from the images in the Docker Hub repository.

Percona XtraDB Cluster 5.7.17-29.20 is now the current release, based on the following:

All Percona software is open-source and free.

Performance Improvements

This release is focused on performance and scalability with increasing workload threads. Tests show up to 10 times increase in performance.

Fixed Bugs

  • Updated semantics for gcache page cleanup to trigger when either gcache.keep_pages_size or gcache.keep_pages_count exceeds the limit, instead of both at the same time.
  • Added support for passing the XtraBackup buffer pool size with the use-memory option under [xtrabackup] and the innodb_buffer_pool_size option under [mysqld] when the --use-memory option is not passed with the inno-apply-opts option under [sst].
  • Fixed gcache page cleanup not triggering when limits are exceeded.
  • Improved SST and IST log messages for better readability and unification.
  • Excluded the garbd node from flow control calculations.
  • Added extra checks to verify that SSL files (certificate, certificate authority, and key) are compatible before openning connection.
  • Improved parallelism for better scaling with multiple threads.
  • Added validations for DISCARD TABLESPACE and IMPORT TABLESPACE in PXC Strict Mode to prevent data inconsistency.
  • Added the wsrep_flow_control_status variable to indicate if node is in flow control (paused).
  • PXC-766: Added the wsrep_ist_receive_status variable to show progress during an IST.
  • Allowed CREATE TABLE ... AS SELECT (CTAS) statements with temporary tables (CREATE TEMPORARY TABLE ... AS SELECT) in PXC Strict Mode. For more information, see 1666899.
  • PXC-782: Updated xtrabackup-v2 script to use the tmpdir option (if it is set under [sst][xtrabackup] or [mysqld], in that order).
  • PXC-783: Improved the wsrep stage framework.
  • PXC-784: Fixed the pc.recovery procedure to abort if the gvwstate.dat file is empty or invalid, and fall back to normal joining process. For more information, see 1669333.
  • PXC-794: Updated the sockopt option to include a comma at the beginning if it is not set by the user.
  • PXC-795: Set --parallel=4 as default option for wsrep_sst_xtrabackup-v2 to run four threads with XtraBackup.
  • PXC-797: Blocked wsrep_desync toggling while node is paused to avoid halting the cluster when running FLUSH TABLES WITH READ LOCK. For more information, see 1370532.
  • PXC-805: Inherited upstream fix to avoid using deprecated variables, such as INFORMATION_SCHEMA.SESSION_VARIABLE. For more information, see 1676401.
  • PXC-811: Changed default values for the following variables:
    • fc_limit from 16 to 100
    • send_window from 4 to 10
    • user_send_window from 2 to 4
  • Moved wsrep settings into a separate configuration file (/etc/my.cnf.d/wsrep.cnf).
  • Fixed mysqladmin shutdown to correctly stop the server on systems using systemd.
  • Fixed several packaging and dependency issues.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

by Alexey Zhebel at April 19, 2017 01:37 PM

Jean-Jerome Schmidt

How to Deploy Asynchronous Replication Slave to MariaDB Galera Cluster 10.x using ClusterControl

Combining Galera and asynchronous replication in the same MariaDB setup, aka Hybrid Replication, can be useful - e.g. as a live backup node in a remote datacenter or reporting/analytics server. We already blogged about this setup for Codership/Galera or Percona XtraDB Cluster users, but a master failover as described in that post does not work for MariaDB because of its different GTID approach. In this post, we will show you how to deploy an asynchronous replication slave to MariaDB Galera Cluster 10.x (with master failover!), using GTID with ClusterControl.

Preparing the Master

First and foremost, you must ensure that the master and slave nodes are running on MariaDB Galera 10.0.2 or later. A MariaDB replication slave requires at least one master with GTID among the Galera nodes. However, we would recommend users to configure all the MariaDB Galera nodes as masters. GTID, which is automatically enabled in MariaDB, will be used to do master failover.The following must be true for the masters:

  • At least one master among the Galera nodes
  • All masters must be configured with the same domain ID
  • log_slave_updates must be enabled
  • All masters’ MariaDB port is accessible by ClusterControl and slaves
  • Must be running MariaDB version 10.0.2 or later

From ClusterControl this is easily done by selecting Enable Binary Logging in the drop down for each node.

Enabling binary logging through ClusterControl
Enabling binary logging through ClusterControl

And then enable GTID in the dialogue:

Once Proceed has been clicked, a job will automatically configure the Galera node according to the settings described earlier.

If you wish to perform this action by hand, you can configure a Galera node as master, by changing the MariaDB configuration file for that node as per below:

gtid_domain_id=<must be same across all mariadb servers participating in replication>
server_id=<must be unique>
binlog_format=ROW
log_slave_updates=1
log_bin=binlog

After making these changes, restart the nodes one by one or using a rolling restart (ClusterControl > Manage > Upgrades > Rolling Restart)

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Preparing the Slave

For the slave, you would need a separate host or VM with or without MariaDB installed. If you do not have MariaDB installed, you need to perform the following tasks; configure root password (based on monitored_mysql_root_password), create slave user (based on repl_user, repl_password), configure MariaDB, start the server and finally start replication.

Adding the slave using ClusterControl, all these steps will be automated in the Add Replication Slave job as described below.

Add replication slave to MariaDB Cluster
Add replication slave to MariaDB Cluster

After adding our slave node, our deployment will look like this:

MariaDB Galera asynchronous slave topology
MariaDB Galera asynchronous slave topology

Master Failover and Recovery

Since we are using MariaDB with GTID enabled, master failover is supported via ClusterControl when Cluster and Node Auto Recovery has been enabled. Whether the master would fail due to network connectivity or any other reason, ClusterControl will automatically fail over to the most suitable other master node in the cluster.

Automatic slave failover to another master in Galera cluster
Automatic slave failover to another master in Galera cluster

This way ClusterControl will add a robust asynchronous slave capability to your MariaDB Cluster!

by Art at April 19, 2017 10:22 AM

Shlomi Noach

Practical Orchestrator, BoF, GitHub and other talks at Percona Live 2017

Next week I will be presenting Practical Orchestrator at Percona Live, Santa Clara.

As opposed to previous orchestrator talks I gave, and which were either high level or algorithmic talks, Practical Orchestrator will be, well... practical.

The objective for this talk is that attendees leave the classroom with a good grasp of orchestrator's powers, and know how to set up orchestrator in their environment.

We will walk through discovery, refactoring, recovery, HA. I will walk through the most important configuration settings, share advice on what makes a good deployment, and tell you how we and others run orchestrator. We'll present a few scripting/automation examples. We will literally set up orchestrator on my computer.

It's a 50 minute talk and it will be fast paced!

ProxySQL & Orchestrator BoF

ProxySQL is all the rage, and throughout the past 18 months René Cannaò and myself discussed a few times the potential for integration between ProxySQL and Orchestrator. We've also received several requests from the community.

We will run a BoF, a very informal session where we openly discuss our thoughts on possible integration, what makes sense and what doesn't, and above all else would love to hear the attendees' thoughts. We might come out of this session with some plan to pick low hanging fruit, who knows?

The current link to the BoF sessions is this. It seems terribly broken, and hopefully I'll replace it later on.

GitHub talks

GitHub engineers will further present these talks:

  • gh-ost: triggerless, painless, trusted online schema migrations
    Jonah Berquist, 25 April - 2:20 PM - 3:10 PM @ Ballroom D
    This is the "classic" introduction to gh-ost, our very own open source schema migration tool. gh-ost enjoys good traction and adoption. For us, it made a significant impact on our development cycle and on our availability and reliability. We hear similar stories from users. This talk will explain how gh-ost works and why it works for us and others so much better. Also: superpowers.
  • Automating Schema Changes using gh-ost
    Tom Krouper, 27 April - 12:50 PM - 1:40 PM @ Ballroom D
    What's the greater picture? gh-ost runs the migration, but what's the migration cycle? How do our engineers design the change, get this to run, verify it doesn't break our environment? What automation do we have in place? Expect very interesting and definitely not trivial issues.
  • Practical JSON in MySQL 5.7 and beyond
    Ike Walker
    , 27 April - 3:00 PM - 3:50 PM @ Ballroom A
    Ike will review the JSON data type along with most recent changes, virtual columns, operational concerns and more. I'm looking forward to a great review!

Related

  • Experiences using gh-ost in a multi-tier topology
    Ivan Groenewold, Valerie Parham-Thompson, Pythian, 26 April - 5:00 PM - 5:25 PM @ Ballroom C
    Don't take it from us. Pythian are running gh-ost in production, and on completely different environments than us. I'm absolutely curious to hear about their experience and findings.
  • High Availability in GCE
    Carmen Mason, Allan Mason, 26 April - 3:30 PM - 4:20 PM @ Ballroom D
    This is a story on the road to MySQL HA on Google Cloud Engine. It ends up with MHA picked as the HA solution while orchestrator is not. I'm curious to hear!

See you in Santa Clara!

 

by shlomi at April 19, 2017 09:59 AM

April 18, 2017

Monty Says

MariaDB 10.3-alpha released

While most of the MariaDB developers have been working hard on getting MariaDB 10.2 out as GA, a small team, including me, has been working on the next release, MariaDB 10.3.

The theme of MariaDB 10.2 is complex operations, like window functions, common table expressions, JSON functions, the theme of MariaDB 10.3 is compatibility.

Compatibility refers to functionality that exist in other databases but have been missing in MariaDB:
In MariaDB 10.2 ORACLE mode was limited to removing MariaDB specific options in SHOW CREATE TABLE, SHOW CREATE VIEW and setting SQL_MODE to "PIPES_AS_CONCAT, ANSI_QUOTES, IGNORE_SPACE, ORACLE, NO_KEY_OPTIONS, NO_TABLE_OPTIONS, NO_FIELD_OPTIONS, NO_AUTO_CREATE_USER".

In MariaDB 10.3, SQL_MODE=ORACLE mode allows MariaDB to understand a large subset of Oracle's PL/SQL language. The documentation for what is supported is still lacking, but the interested can find what is supported in the test suite in the "mysql-test/suite/compat/oracle" directory.

If things go as planned, the features we will add to 10.3 prior to beta are:
Most of the above features are already close to be ready (to be added in future Alphas), so I expect that it willl not take many months before we can make a first MariaDB 10.3 beta!

This is in line what was discussed on the MariaDB developer conference in New York one week ago, where most attendees wanted to see new MariaDB releases more often.

MariaDB 10.3 can be downloaded here

Happy testing!

by Michael "Monty" Widenius (noreply@blogger.com) at April 18, 2017 10:23 PM

Peter Zaitsev

Percona XtraBackup 2.4.7 is Now Available

Percona XtraBackup

Percona XtraBackup 2.4.7Percona announces the GA release of Percona XtraBackup 2.4.7 on April 18, 2017. You can download it from our download site and apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

New features:
  • Percona XtraBackup now uses hardware accelerated implementation of crc32 where it is supported.
  • Percona XtraBackup has implemented new options: --tables-exclude and --databases-exclude that work similar to --tables and --databases options, but exclude given names/paths from backup.
  • The xbstream binary now supports parallel extraction with the --parallel option.
  • The xbstream binary now supports following new options: --decrypt, --encrypt-threads, --encrypt-key, and --encrypt-key-file. When --decrypt option is specified xbstream will automatically decrypt encrypted files when extracting input stream. Either --encrypt-key or --encrypt-key-file options must be specified to provide encryption key, but not both. Option --encrypt-threads specifies the number of worker threads doing the encryption, default is 1.
Bugs Fixed:
  • Backups were missing *.isl files for general tablespace. Bug fixed #1658692.
  • In 5.7, MySQL changed default checksum algorithm to crc32, while xtrabackup was using innodb. This caused xtrabackup to perform extra checksum calculations which were not needed. Bug fixed #1664405.
  • For system tablespaces consisting of multiple files xtrabackup updated LSN only in first file. This caused MySQL versions lower than 5.7 to fail on startup. Bug fixed #1669592.
  • xtrabackup --export can now export tables that have more than 31 index. Bug fixed #1089681.
  • Unrecognized character x01; marked by message could be seen if backups were taken with the version check enabled. Bug fixed #1651978.

Release notes with all the bugfixes for Percona XtraBackup 2.4.7 are available in our online documentation. Please report any bugs to the launchpad bug tracker.

by Hrvoje Matijakovic at April 18, 2017 07:19 PM

Percona XtraBackup 2.3.8 is Now Available

Percona XtraBackup

Percona XtraBackup 2.3.8Percona announces the release of Percona XtraBackup 2.3.8 on April 18, 2017. Downloads are available from our download site or Percona Software Repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

This release is the current GA (Generally Available) stable release in the 2.3 series.

New Features
  • Percona XtraBackup now uses hardware accelerated implementation of crc32 where it is supported.
  • Percona XtraBackup has implemented new options: --tables-exclude and --databases-exclude that work similar to --tables and --databases options, but exclude given names/paths from backup.
  • The xbstream binary now supports parallel extraction with the --parallel option.
  • The xbstream binary now supports following new options: --decrypt, --encrypt-threads, --encrypt-key, and --encrypt-key-file. When --decrypt option is specified xbstream will automatically decrypt encrypted files when extracting input stream. Either --encrypt-key or --encrypt-key-file options must be specified to provide encryption key, but not both. Option --encrypt-threads specifies the number of worker threads doing the encryption, default is 1.
Bugs Fixed:
  • xtrabackup would not create fresh InnoDB redo logs when preparing incremental backup. Bug fixed #1669592.
  • xtrabackup --export can now export tables that have more than 31 index. Bug fixed #1089681.
  • Unrecognized character x01; marked by message could be seen if backups were taken with the version check enabled. Bug fixed #1651978.

Release notes with all the bugfixes for Percona XtraBackup 2.3.8 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

by Hrvoje Matijakovic at April 18, 2017 06:36 PM

M17 Conference Observations on the Future of MariaDB

M17 Conference

M17 ConferenceIn this blog post, I’ll discuss some of my thoughts about the future of MariaDB after attending the M17 Conference.

Let me start with full disclosure: I’m the CEO of Percona, and we compete with the MariaDB Corporation in providing Support for MariaDB and other services. I probably have some biases!

Last week I attended the MariaDB Developers UnConference and the M17 Conference, which provided great insights into MariaDB’s positioning as a project and as a business. Below are some of my thoughts as I attended various sessions at the conference:

Breaking away from their MySQL pastMichael Howard’s (MariaDB CEO) keynote focused on breaking away from the past and embracing the future. In this case, the “past” means proprietary databases. But I think MariaDB is also trying to break away from their past of being a MySQL variant, and focus on becoming completely independent technology. If I didn’t know their history, I wouldn’t recognize how much codebase MariaDB shares with MySQL – and how much MariaDB innovation Oracle still drives.

MySQL compatibility is no longer the primary goal. In its first version, MariaDB 5.1 was truly compatible with MySQL (it had relatively few differences). By contrast, MariaDB 10.2 has different replication, JSON support and a very different optimizer. With MariaDB 10.3, more changes are planned for InnoDB on disk format, and no plans exist to remove .frm files and use the MySQL 8 Data Dictionary. With these features, another level of MySQL compatibility is removed. The MariaDB knowledgebase states: “For all practical purposes, MariaDB is a binary drop in replacement for the same MySQL version.” The argument can still be made that this is true for MySQL 5.7 (as long as your application does not use some of the new features), but this does not seem to be the case for MySQL 8.

The idea seems to be that since MariaDB has replaced MySQL in many (most?) Linux distributions, and many people use MariaDB when they think they are using MySQL, compatibility is not that big of a deal anymore.

Embracing contributions and keeping the development process open is a big focus of MariaDB. Facebook, Google, Alibaba and Tencent have contributed to MariaDB, along with many independent smaller companies and independent developers (Percona among them). This is different from the MySQL team at Oracle, who have provided some contributions, but not nearly to the extent that MariaDB has. An open source model is a double-edged sword – while it gives you more features, it also makes it harder to maintain a consistent user experience and consistent quality of code and documentation. It will be interesting to see how MariaDB deals with these challenges.

Oracle compatibility. MariaDB strives to be the open source database that is the most compatible with Oracle, and therefore the easiest to migrate to. I have heard people compare MariaDB’s desire for Oracle compatibility to EDB Postgres – only with the advantage of being open source as opposed to proprietary software.  For MariaDB 10.3 (alpha), they are developing support for Oracle PL/SQL syntax for stored procedures to be able to migrate applications with little, if any, changes. They are also developing support for SEQUENCE and other Oracle features, including a special sql_mode=ORACLE to maximize compatibility.

BSL as a key for success. When it comes to business source licensing (BSL), I couldn’t quite resolve the conflict I found in MariaDB’s messaging. On the one hand, MariaDB promotes open source as a great way to escape vendor lock-in (which we at Percona completely agree with). But on the other hand, Michael Howard stated that BSL software (“Eventual Open Source”) is absolutely critical for MariaDB’s commercial success. Is the assumption here that if vendor lock-in is not forever, it is not really lock-in? Currently, only MariaDB MaxScale is BSL, but it sounds like we should expect more of their software to follow this model.

Note. As MariaDB Server and MariaDB Columnstore use a lot of Oracle’s GPL code, these will most likely remain GPL.

I enjoyed attending both conferences. I had a chance to meet with many old friends and past colleagues, as well as get a much better feel for where MariaDB is going and when it is appropriate to advise its usage.

by Peter Zaitsev at April 18, 2017 05:52 PM

Jean-Jerome Schmidt

How to Set Up Asynchronous Replication from Galera Cluster to Standalone MySQL server with GTID

Hybrid replication, i.e. combining Galera and asynchronous MySQL replication in the same setup, became much easier since GTID got introduced in MySQL 5.6. Although it was fairly straightforward to replicate from a standalone MySQL server to a Galera Cluster, doing it the other way round (Galera → standalone MySQL) was a bit more challenging. At least until the arrival of GTID.

There are a few good reasons to attach an asynchronous slave to a Galera Cluster. For one, long-running reporting/OLAP type queries on a Galera node might slow down an entire cluster, if the reporting load is so intensive that the node has to spend considerable effort coping with it. So reporting queries can be sent to a standalone server, effectively isolating Galera from the reporting load. In a belts and suspenders approach, an asynchronous slave can also serve as a remote live backup.

In this blog post, we will show you how to replicate a Galera Cluster to a MySQL server with GTID, and how to failover the replication in case the master node fails.

Hybrid Replication in MySQL 5.5

In MySQL 5.5, resuming a broken replication requires you to determine the last binary log file and position, which are distinct on all Galera nodes if binary logging is enabled. We can illustrate this situation with the following figure:

Galera cluster asynchronous slave topology without GTID
Galera cluster asynchronous slave topology without GTID

If the MySQL master fails, replication breaks and the slave will need to switch over to another master. You will need pick a new Galera node, and manually determine a new binary log file and position of the last transaction executed by the slave. Another option is to dump the data from the new master node, restore it on slave and start replication with the new master node. These options are of course doable, but not very practical in production.

How GTID Solves the Problem

GTID (Global Transaction Identifier) provides a better transactions mapping across nodes, and is supported in MySQL 5.6. In Galera Cluster, all nodes will generate different binlog files. The binlog events are the same and in the same order, but the binlog file names and offsets may vary. With GTID, slaves can see a unique transaction coming in from several masters and this could easily being mapped into the slave execution list if it needs to restart or resume replication.

Galera cluster asynchronous slave topology with GTID failover
Galera cluster asynchronous slave topology with GTID failover

All necessary information for synchronizing with the master is obtained directly from the replication stream. This means that when you are using GTIDs for replication, you do not need to include MASTER_LOG_FILE or MASTER_LOG_POS options in the CHANGE MASTER TO statement. Instead it is necessary only to enable the MASTER_AUTO_POSITION option. You can find more details about the GTID in the MySQL Documentation page.

Setting Up Hybrid Replication by hand

Make sure the Galera nodes (masters) and slave(s) are running on MySQL 5.6 before proceeding with this setup. We have a database called sbtest in Galera, which we will replicate to the slave node.

1. Enable required replication options by specifying the following lines inside each DB node’s my.cnf (including the slave node):

For master (Galera) nodes:

gtid_mode=ON
log_bin=binlog
log_slave_updates=1
enforce_gtid_consistency
expire_logs_days=7
server_id=1         # 1 for master1, 2 for master2, 3 for master3
binlog_format=ROW

For slave node:

gtid_mode=ON
log_bin=binlog
log_slave_updates=1
enforce_gtid_consistency
expire_logs_days=7
server_id=101         # 101 for slave
binlog_format=ROW
replicate_do_db=sbtest
slave_net_timeout=60

2. Perform a cluster rolling restart of the Galera Cluster (from ClusterControl UI > Manage > Upgrade > Rolling Restart). This will reload each node with the new configurations, and enable GTID. Restart the slave as well.

3. Create a slave replication user and run following statement on one of the Galera nodes:

mysql> GRANT REPLICATION SLAVE ON *.* TO 'slave'@'%' IDENTIFIED BY 'slavepassword';

4. Log into the slave and dump database sbtest from one of the Galera nodes:

$ mysqldump -uroot -p -h192.168.0.201 --single-transaction --skip-add-locks --triggers --routines --events sbtest > sbtest.sql

5. Restore the dump file onto the slave server:

$ mysql -uroot -p < sbtest.sql

6. Start replication on the slave node:

mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_HOST = '192.168.0.201', MASTER_PORT = 3306, MASTER_USER = 'slave', MASTER_PASSWORD = 'slavepassword', MASTER_AUTO_POSITION = 1;
mysql> START SLAVE;

To verify that replication is running correctly, examine the output of slave status:

mysql> SHOW SLAVE STATUS\G
       ...
       Slave_IO_Running: Yes
       Slave_SQL_Running: Yes
       ...

Setting up Hybrid Replication using ClusterControl

In the previous paragraph we described all the necessary steps to enable the binary logs, restart the cluster node by node, copy the data and then setup replication. The procedure is a tedious task and you can easily make errors in one of these steps. In ClusterControl we have automated all the necessary steps.

1. For ClusterControl users, you can go to the nodes in the Nodes page and enable binary logging.

Enable binary logging on Galera cluster using ClusterControl
Enable binary logging on Galera cluster using ClusterControl

This will open a dialogue that allows you to set the binary log expiration, enable GTID and auto restart.

Enable binary logging with GTID enabled
Enable binary logging with GTID enabled

This initiates a job, that will safely write these changes to the configuration, create replication users with the proper grants and restart the node safely.

Photo description

Repeat this process for each Galera node in the cluster, until all nodes indicate they are master.

All Galera Cluster nodes are now master
All Galera Cluster nodes are now master

2. Add the asynchronous replication slave to the cluster

Adding an asynchronous replication slave to Galera Cluster using ClusterControl
Adding an asynchronous replication slave to Galera Cluster using ClusterControl

And this is all you have to do. The entire process described in the previous paragraph has been automated by ClusterControl.

Changing Master

If the designated master goes down, the slave will retry to reconnect again in slave_net_timeout value (our setup is 60 seconds - default is 1 hour). You should see following error on slave status:

       Last_IO_Errno: 2003
       Last_IO_Error: error reconnecting to master 'slave@192.168.0.201:3306' - retry-time: 60  retries: 1

Since we are using Galera with GTID enabled, master failover is supported via ClusterControl when Cluster and Node Auto Recovery has been enabled. Whether the master would fail due to network connectivity or any other reason, ClusterControl will automatically fail over to the most suitable other master node in the cluster.

If you wish to perform the failover manually, simply change the master node as follows:

mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_HOST = '192.168.0.202', MASTER_PORT = 3306, MASTER_USER = 'slave', MASTER_PASSWORD = 'slavepassword', MASTER_AUTO_POSITION = 1;
mysql> START SLAVE;

In some cases, you might encounter a “Duplicate entry .. for key” error after the master node changed:

       Last_Errno: 1062
       Last_Error: Could not execute Write_rows event on table sbtest.sbtest; Duplicate entry '1089775' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysqld-bin.000009, end_log_pos 85789000

In older versions of MySQL, you can just use SET GLOBAL SQL_SLAVE_SKIP_COUNTER = n to skip statements, but it does not work with GTID. Miguel from Percona wrote a great blog post on how to repair this by injecting empty transactions.

Another approach, for smaller databases, could also be to just get a fresh dump from any of the available Galera nodes, restore it and use RESET MASTER statement:

mysql> STOP SLAVE;
mysql> RESET MASTER;
mysql> DROP SCHEMA sbtest; CREATE SCHEMA sbtest; USE sbtest;
mysql> SOURCE /root/sbtest_from_galera2.sql; -- repeat step #4 above to get this dump
mysql> CHANGE MASTER TO MASTER_HOST = '192.168.0.202', MASTER_PORT = 3306, MASTER_USER = 'slave', MASTER_PASSWORD = 'slavepassword', MASTER_AUTO_POSITION = 1;
mysql> START SLAVE;

You may also use pt-table-checksum to verify the replication integrity, more information in this blog post.

Note: Since in MySQL replication the slave applier is by default still single-threaded, do not expect the async replication performance to be the same as Galera’s parallel replication. For MySQL 5.6 and 5.7 there are options to make the asynchronous replication executed in parallel on the slave nodes, but in principle this replication is still depending on the correct order of transactions inside the same schema to happen. If the replication load is intensive and continuous, the slave lag will just keep growing. We have seen cases where slave could never catch up with the master.

by Art at April 18, 2017 09:52 AM

Peter Zaitsev

Experimental Build of MyRocks with Percona Server for MySQL

MyRocks with Percona Server for MySQL

MyRocks with Percona Server for MySQLWe have been working on bringing out a build of MyRocks with Percona Server for MySQL.

MyRocks is a RocksDB-based storage engine. You can find more information about MyRocks here.

While there is still a lot of work to do, I want to share an experimental build of Percona Server for MySQL with MyRocks, which you can use to evaluate and test this engine

(WARNING: in NO WAY is this build supposed to be for production usage! Consider this ALPHA quality.)

The tar.gz binaries are available from our TESTING area. To start Percona Server for MySQL with the MyRocks engine, use following line in my.cnf:

plugin-load=rocksdb=ha_rocksdb.so;rocksdb_cfstats=ha_rocksdb.so;rocksdb_dbstats=ha_rocksdb.so;rocksdb_perf_context=ha_rocksdb.so;rocksdb_perf_context_global=ha_rocksdb.so;rocksdb_cf_options=ha_rocksdb.so;rocksdb_compaction_stats=ha_rocksdb.so;rocksdb_global_info=ha_rocksdb.so;rocksdb_ddl=ha_rocksdb.so;rocksdb_index_file_map=ha_rocksdb.so;rocksdb_locks=ha_rocksdb.so;rocksdb_trx=ha_rocksdb.so

Later we will provide experimental RPM and DEB packages from our testing repositories. Please let us know how MyRocks is working for you!

by Vadim Tkachenko at April 18, 2017 12:07 AM

April 17, 2017

Peter Zaitsev

The mysqlpump Utility

mysqlpump

mysqlpumpIn this blog, we’ll look at the

mysqlpump
 utility.

mysqlpump
 is a utility that performs logical backups (which means backing up your data as SQL statements instead of a raw copy of data files). It was added in MySQL Server version 5.7.8, and can be used to dump a database or a set of databases to a file and then loaded on another SQL server (not necessarily a MySQL server).

Its usage is similar to

mysqldump
, but it includes a new set of features. Many of the options are the same, but it was written from scratch to avoid being limited to
mysqldump
 compatibility.

The Main Features Include:

  • To make the dump process faster, it allows parallel processing of databases and objects within databases.
  • There are more options to customize your dumps and choose which databases and objects to dump (tables, stored programs, user accounts), using the
    --include-*
     and 
    --exclude-*
     parameters.
  • User accounts can be dumped now as
    CREATE USER
     and
    GRANT
     statements, instead of inserting directly to the MySQL system database.
  • Information between the client and the server can be compressed using the
    --compress
     option. This feature is very useful for remote backups, as it saves bandwidth and transfer time. You can also compress the output file using
    --compress-output
    , which supports ZLIB and LZ4 compression algorithms.
  • It has an estimated progress indicator. This is really useful to check the current status of the dump process. You can see the total amount of rows dumped and the number of databases completed. It also reports an estimate of the total time to complete the dump.
  • Creation of secondary indexes for InnoDB tables happens after data load for shorter load times.

Exclude/Include:

This feature provides more control over customizing your dumps, and filter the data that you need. Using this feature, you can be more selective with the data you want to dump (databases, tables, triggers, events, routines, users) and save file size, process time and transferring time while copying/moving the file to another host.

Keep in mind that there are some options that are mutually exclusive: e.g., if you use the

--all-databases
 option, the
--exclude-databases
  parameter won’t take effect. By default,
mysqlpump
 will not dump the following databases unless you specify them using the
--include-databases
 option:
INFORMATION_SCHEMA
,
performance_schema
,
ndbinfo
  and
sys
.

Values for these options need to be declared by comma-separated listing. Using a “%” as a value for any of the exclude/include options acts as a wildcard. For example, you can dump all databases starting with “t” and “p” by adding the option

--include-databases=t%,p%
  to the command line.

For users, routines, triggers and events,

mysqlpump
 has
--include-*
 and
--exclude-*
 options with similar usage. Some specific notes:

  • Triggers are dumped by default, but you can also filter them using the
    --include-triggers
    /
    --exclude-triggers
     options
  • Routines and events are not dumped by default, and need to be specified in the command line with
    --routines
     and
    --events
    , or the corresponding
    --include
     and 
    --exclude
     options
  • Keep in mind that if a stored procedure and a function have the same name, then include/exclude applies to both

Parallel Processing:

This feature allows you to process several databases, and tables within the databases, in parallel. By default,

mysqlpump
 uses one processing queue with two threads. You can increase the number of threads for this default queue with
--default-parallelism
. Unless you create additional queues, all the databases and/or tables you elect to dump go through the default queue.

To create additional queues you can use the 

--parallel-schemas
 option, which takes two parameters: the number of threads for the queue and the sub-set of databases this queue processes.  As an example, you could run:

mysqlpump --include-databases=a,b,c,d,e,f,g,h --default-parallelism=3 --parallel-schemas=4:a,b

so that schemas c, d, e, f, g and h are processed by the default queue (which uses three threads), and then tables from schemas a and b are processed by a separate queue (that uses four threads). Database names should be included as a comma-separated list:

$ mysqlpump --parallel-schemas=4:example1,example2,example3 --parallel-schemas=3:example4,example5 > examples.sql
Dump progress: 0/1 tables, 250/261184 rows
Dump progress: 24/30 tables, 1204891/17893833 rows
Dump progress: 29/30 tables, 1755611/17893833 rows
Dump progress: 29/30 tables, 2309111/17893833 rows
...
Dump completed in 42424 milliseconds

User Accounts:

User accounts can be dumped using this tool. Here’s a comparison of our Percona Tool

pt-show-grants
 versus
mysqlpump
 to check their differences.

By default,

mysqlpump
 doesn’t dump user account definitions (even while dumping the MySQL database). To include user accounts on the dump, you must specify the
--users
 option.

Here’s an example on how use

mysqlpump
 to get only user accounts dumped to a file:

$ mysqlpump --exclude-databases=% --exclude-triggers=% --users
-- Dump created by MySQL dump utility, version: 5.7.8-rc, linux-glibc2.5 (x86_64)
-- Dump start time: Thu Aug 27 17:10:10 2015
-- Server version: 5.7.8
SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0;
SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0;
SET @OLD_TIME_ZONE=@@TIME_ZONE;
SET TIME_ZONE='+00:00';
SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT;
SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS;
SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION;
SET NAMES utf8;
CREATE USER 'msandbox'@'127.%' IDENTIFIED WITH 'mysql_native_password' AS '*6C387FC3893DBA1E3BA155E74754DA6682D04747' REQUIRE NONE PASSWORD EXPIRE DEFAULT ACCOUNT UNLOCK;
GRANT ALL PRIVILEGES ON *.* TO 'msandbox'@'127.%';
CREATE USER 'msandbox_ro'@'127.%' IDENTIFIED WITH 'mysql_native_password' AS '*6C387FC3893DBA1E3BA155E74754DA6682D04747' REQUIRE NONE PASSWORD EXPIRE DEFAULT ACCOUNT UNLOCK;
GRANT SELECT, EXECUTE ON *.* TO 'msandbox_ro'@'127.%';
CREATE USER 'msandbox_rw'@'127.%' IDENTIFIED WITH 'mysql_native_password' AS '*6C387FC3893DBA1E3BA155E74754DA6682D04747' REQUIRE NONE PASSWORD EXPIRE DEFAULT ACCOUNT UNLOCK;
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, INDEX, ALTER, SHOW DATABASES, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE ON *.* TO 'msandbox_rw'@'127.%';
CREATE USER 'rsandbox'@'127.%' IDENTIFIED WITH 'mysql_native_password' AS '*B07EB15A2E7BD9620DAE47B194D5B9DBA14377AD' REQUIRE NONE PASSWORD EXPIRE DEFAULT ACCOUNT UNLOCK;
GRANT REPLICATION SLAVE ON *.* TO 'rsandbox'@'127.%';
CREATE USER 'furrywall'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*AB8D50A9E3B8D1F3ACE85C54736B5BF472B44539' REQUIRE NONE PASSWORD EXPIRE DEFAULT ACCOUNT LOCK;
GRANT USAGE ON *.* TO 'furrywall'@'localhost';
CREATE USER 'msandbox'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*6C387FC3893DBA1E3BA155E74754DA6682D04747' REQUIRE NONE PASSWORD EXPIRE DEFAULT ACCOUNT UNLOCK;
GRANT ALL PRIVILEGES ON *.* TO 'msandbox'@'localhost';
CREATE USER 'msandbox_ro'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*6C387FC3893DBA1E3BA155E74754DA6682D04747' REQUIRE NONE PASSWORD EXPIRE DEFAULT ACCOUNT UNLOCK;
GRANT SELECT, EXECUTE ON *.* TO 'msandbox_ro'@'localhost';
CREATE USER 'msandbox_rw'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*6C387FC3893DBA1E3BA155E74754DA6682D04747' REQUIRE NONE PASSWORD EXPIRE DEFAULT ACCOUNT UNLOCK;
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, INDEX, ALTER, SHOW DATABASES, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE ON *.* TO 'msandbox_rw'@'localhost';
CREATE USER 'root'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*6C387FC3893DBA1E3BA155E74754DA6682D04747' REQUIRE NONE PASSWORD EXPIRE DEFAULT ACCOUNT UNLOCK;
GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' WITH GRANT OPTION;
CREATE USER 'testuser'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*6E543F385210D9BD42A4FDB4BB23FD2C31C95462' REQUIRE NONE PASSWORD EXPIRE INTERVAL 30 DAY ACCOUNT UNLOCK;
GRANT USAGE ON *.* TO 'testuser'@'localhost';
SET TIME_ZONE=@OLD_TIME_ZONE;
SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT;
SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS;
SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION;
SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS;
SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS;
-- Dump end time: Thu Aug 27 17:10:10 2015
Dump completed in 823 milliseconds

As you can see, above the tool makes sure the session uses known values for timezone and character sets. This won’t affect users, it’s part of the dump process to ensure correctness while restoring on the destination.

Comparing it with

pt-show-grants
 from Percona Toolkit, we can see that 
mysqlpump
 dumps the
CREATE USER
  information as well. The statements produced by
mysqlpump
 are the right thing to run to recreate users (and should be the preferred method), especially because of the
sql_mode
 NO_AUTO_CREATE_USERS. If enabled, it renders
pt-show-grants
 useless.

Here’s an example of

pt-show-grants
 usage:

$ pt-show-grants --host 127.0.0.1 --port 5708 --user msandbox --ask-pass
Enter password:
-- Grants dumped by pt-show-grants
-- Dumped from server 127.0.0.1 via TCP/IP, MySQL 5.7.8-rc at 2015-08-27 17:06:52
-- Grants for 'furrywall'@'localhost'
GRANT USAGE ON *.* TO 'furrywall'@'localhost';
-- Grants for 'msandbox'@'127.%'
GRANT ALL PRIVILEGES ON *.* TO 'msandbox'@'127.%';
-- Grants for 'msandbox'@'localhost'
GRANT ALL PRIVILEGES ON *.* TO 'msandbox'@'localhost';
-- Grants for 'msandbox_ro'@'127.%'
GRANT EXECUTE, SELECT ON *.* TO 'msandbox_ro'@'127.%';
-- Grants for 'msandbox_ro'@'localhost'
GRANT EXECUTE, SELECT ON *.* TO 'msandbox_ro'@'localhost';
-- Grants for 'msandbox_rw'@'127.%'
GRANT ALTER, CREATE, CREATE TEMPORARY TABLES, DELETE, DROP, EXECUTE, INDEX, INSERT, LOCK TABLES, SELECT, SHOW DATABASES, UPDATE ON *.* TO 'msandbox_rw'@'127.%';
-- Grants for 'msandbox_rw'@'localhost'
GRANT ALTER, CREATE, CREATE TEMPORARY TABLES, DELETE, DROP, EXECUTE, INDEX, INSERT, LOCK TABLES, SELECT, SHOW DATABASES, UPDATE ON *.* TO 'msandbox_rw'@'localhost';
-- Grants for 'root'@'localhost'
GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' WITH GRANT OPTION;
GRANT PROXY ON ''@'' TO 'root'@'localhost' WITH GRANT OPTION;
-- Grants for 'rsandbox'@'127.%'
GRANT REPLICATION SLAVE ON *.* TO 'rsandbox'@'127.%';
-- Grants for 'testuser'@'localhost'
GRANT USAGE ON *.* TO 'testuser'@'localhost';

Some Miscellaneous Notes:

  • One of the differences with
    mysqldump
     is that
    mysqlpump
     adds 
    CREATE DATABASE
     statements to the dump by default, unless specified with the
    --no-create-db
    option.
    • There’s an important difference on the dump process that is closely related: it includes the database name while adding the
      CREATE TABLE
       statement. This causes a problem when trying to use the tool to create a duplicate.

by Pablo Padua at April 17, 2017 05:45 PM

Jean-Jerome Schmidt

Video: ClusterControl for Galera Cluster

This video walks you through the features that ClusterControl offers for Galera Cluster and how you can use them to deploy, manage, monitor and scale your open source database environments.

ClusterControl and Galera Cluster

ClusterControl provides advanced deployment, management, monitoring, and scaling functionality to get your Galera clusters up-and-running using proven methodologies that you can depend on to work.

At the core of ClusterControl is it’s automation functionality that let’s you automate many of the database tasks you have to perform regularly like deploying new clusters, adding and scaling new nodes, running backups and upgrades, and more.

ClusterControl is cluster-aware, topology-aware and able to provision all members of the Galera Cluster, including the replication chain connected to it.

To learn more check out the following resources…

by Severalnines at April 17, 2017 09:59 AM

April 15, 2017

Valeriy Kravchuk

How To Find What Thread Had Executed FTWRL

This week one of MariaDB Support customers asked how to find what thread had executed FLUSH TABLES WITH READ LOCK (FTWRL) and thus blocked all changes to data. I decided to list several ways to find this out (as eventually customer wanted to know how to find this out not only in MariaDB 10.1, but also in MySQL 5.6 etc).

Let me start with a quick summary. I know the following ways (other that having all queries logged in general query log, slow log, by some audit plugin or at client side, and checking the log) to find the thread that executed FLUSH TABLES WITH READ LOCK successfully:
  1. In MariaDB starting from 10.0.7 you can use METADATA_LOCK_INFO plugin.
  2. In MySQL starting from 5.7 you can use performance_schema.metadata_locks table.
  3. In MySQL starting from 5.6 (or MariaDB 10.x.y) you can use performance_schema.events_statements_history table.
  4. In all versions of MySQL or MariaDB you can attach gdb and check threads one by one (and I explain one of the ways to do this in the post).
Now, time to present the details. Unfortunately, when FTWRL is executed successfully, it is not visible neither in SHOW PROCESSLIST nor in related INFORMATION_SCHEMA table:
MariaDB [test]> select version();
+-----------------+
| version()       |
+-----------------+
| 10.1.18-MariaDB |
+-----------------+
1 row in set (0.00 sec)

MariaDB [test]> flush tables with read lock;
Query OK, 0 rows affected (0.06 sec)

MariaDB [test]> show full processlist\G
*************************** 1. row ***************************
      Id: 10
    User: root
    Host: localhost
      db: test
 Command: Query
    Time: 0
   State: init
    Info: show full processlist
Progress: 0.000
*************************** 2. row ***************************
      Id: 11
    User: root
    Host: localhost
      db: test
 Command: Query
    Time: 743
   State: Waiting for global read lock
    Info: delete from t0
Progress: 0.000
2 rows in set (0.00 sec)

MariaDB [test]> select * from information_schema.processlist\G*************************** 1. row ***************************
           ID: 11
         USER: root
         HOST: localhost
           DB: test
      COMMAND: Query
         TIME: 954
        STATE: Waiting for global read lock
         INFO: delete from t0
      TIME_MS: 954627.587
        STAGE: 0
    MAX_STAGE: 0
     PROGRESS: 0.000
  MEMORY_USED: 67464
EXAMINED_ROWS: 0
     QUERY_ID: 1457
  INFO_BINARY: delete from t0
          TID: 8838
*************************** 2. row ***************************
           ID: 10
         USER: root
         HOST: localhost
           DB: test
      COMMAND: Query
         TIME: 0
        STATE: Filling schema table
         INFO: select * from information_schema.processlist
      TIME_MS: 0.805
        STAGE: 0
    MAX_STAGE: 0
     PROGRESS: 0.000
  MEMORY_USED: 84576
EXAMINED_ROWS: 0
     QUERY_ID: 1461
  INFO_BINARY: select * from information_schema.processlist
          TID: 8424
2 rows in set (0.02 sec)
There is usually nothing useful in the INNODB STATUS also:
MariaDB [test]> show engine innodb status\G
...
------------
TRANSACTIONS
------------
Trx id counter 20439
Purge done for trx's n:o < 20422 undo n:o < 0 state: running but idle
History list length 176
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 11, OS thread handle 0x7f7f5cdb8b00, query id 1457 localhost root Waiting for global read lock
delete from t0
---TRANSACTION 0, not started
MySQL thread id 10, OS thread handle 0x7f7f5ce02b00, query id 1462 localhost root init
show engine innodb status
--------
...
No wonder, global read lock that FTWRL sets is not an InnoDB lock.

In this case we have just 2 user threads connected, so it's easy to find out what thread could set the global read lock we are waiting for. With dozens or hundreds of threads connected one has to guess which one it can be (often those that sleep for a longest time are blamed), unless he knows for sure how to find the exact one. So, let me list some of the ways to find this out.

I'd like to start with MariaDB 10.x.y. It provides a special METADATA_LOCK_INFO plugin. Surely you can not install it when FTWRL is already executed (as plugin ends up recorded in mysql.plugins table when installed):
MariaDB [test]> INSTALL SONAME 'metadata_lock_info';
ERROR 1223 (HY000): Can't execute the query because you have a conflicting read lock
MariaDB [test]> unlock tables;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> INSTALL SONAME 'metadata_lock_info';
Query OK, 0 rows affected (0.37 sec)

MariaDB [test]> select * from mysql.plugin;
+--------------------+-----------------------+
| name               | dl                    |
+--------------------+-----------------------+
| BLACKHOLE          | ha_blackhole.so       |
| METADATA_LOCK_INFO | metadata_lock_info.so |
+--------------------+-----------------------+
2 rows in set (0.00 sec)
But when plugin is installed, we can get the information we need from the information_schema.metadata_lock_info table:
MariaDB [test]> flush tables with read lock;
Query OK, 0 rows affected (0.01 sec)

MariaDB [test]> select * from information_schema.metadata_lock_info;
+-----------+------------+---------------+------------------+--------------+------------+
| THREAD_ID | LOCK_MODE  | LOCK_DURATION | LOCK_TYPE        | TABLE_SCHEMA | TABLE_NAME |
+-----------+------------+---------------+------------------+--------------+------------+
|        10 | MDL_SHARED | NULL          | Global read lock |              |            |
|        10 | MDL_SHARED | NULL          | Commit lock      |              |            |
+-----------+------------+---------------+------------------+--------------+------------+
2 rows in set (0.00 sec)

MariaDB [test]> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
|              10 |
+-----------------+
1 row in set (0.01 sec)

MariaDB [test]> show processlist;
+----+------+-----------+------+---------+------+------------------------------+------------------+----------+
| Id | User | Host      | db   | Command | Time | State                        | Info             | Progress |
+----+------+-----------+------+---------+------+------------------------------+------------------+----------+
| 10 | root | localhost | test | Query   |    0 | init                         | show processlist |    0.000 |
| 12 | root | localhost | test | Query   |   47 | Waiting for global read lock | delete from t0   |    0.000 |
+----+------+-----------+------+---------+------+------------------------------+------------------+----------+
2 rows in set (0.00 sec)
We are looking for a row where lock_type is "Global read lock":
MariaDB [test]> select thread_id from information_schema.metadata_lock_info where lock_type = 'Global read lock';
+-----------+
| thread_id |
+-----------+
|        10 |
+-----------+
1 row in set (0.00 sec)
Now, let me proceed to MySQL 5.7. It had introduced the performance_schema.metadata_locks table, and we can use it as follows (I've used Percona Server 5.7.17 for this example, but it's not any different from upstream MySQL 5.7 in this regard):
openxs@ao756:~/dbs/maria10.1$ mysql -uroot -proot test
mysql: [Warning] Using a password on the command line interface can be insecure.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.7.17-13-log Percona Server (GPL), Release '13', Revision 'fd33d43'

Copyright (c) 2009-2016 Percona LLC and/or its affiliates
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select @@performance_schema;
+----------------------+
| @@performance_schema |
+----------------------+
|                    1 |
+----------------------+
1 row in set (0.00 sec)

mysql> UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME = 'global_instrumentation';
Query OK, 0 rows affected (0.18 sec)
Rows matched: 1  Changed: 0  Warnings: 0

mysql> UPDATE performance_schema.setup_instruments SET ENABLED = 'YES' WHERE NAME = 'wait/lock/metadata/sql/mdl';
Query OK, 1 row affected (0.04 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> flush tables with read lock;
Query OK, 0 rows affected (0.06 sec)

mysql> select * from performance_schema.metadata_locks;
+-------------+--------------------+----------------+-----------------------+-------------+---------------+-------------+-------------------+-----------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA      | OBJECT_NAME    | OBJECT_INSTANCE_BEGIN | LOCK_TYPE   | LOCK_DURATION | LOCK_STATUS | SOURCE            | OWNER_THREAD_ID | OWNER_EVENT_ID |
+-------------+--------------------+----------------+-----------------------+-------------+---------------+-------------+-------------------+-----------------+----------------+
| GLOBAL      | NULL               | NULL           |       139917187957472 | SHARED      | EXPLICIT      | GRANTED     | lock.cc:1128      |              39 |             10 || COMMIT      | NULL               | NULL           |       139917300586224 | SHARED      | EXPLICIT      | GRANTED     | lock.cc:1212      |              39 |             10 |
| TABLE       | performance_schema | metadata_locks |       139917192091536 | SHARED_READ | TRANSACTION   | GRANTED     | sql_parse.cc:6406 |              39 |             11 |
+-------------+--------------------+----------------+-----------------------+-------------+---------------+-------------+-------------------+-----------------+----------------+
3 rows in set (0.01 sec)

mysql> show processlist;
+----+----------+--------------------+------+---------+------+----------+------------------+-----------+---------------+
| Id | User     | Host               | db   | Command | Time | State    | Info             | Rows_sent | Rows_examined |
+----+----------+--------------------+------+---------+------+----------+------------------+-----------+---------------+
| 11 | maxscale | 192.168.1.37:55399 | NULL | Sleep   |    4 |          | NULL             |         0 |             0 |
| 12 | root     | localhost          | test | Query   |    0 | starting | show processlist |         0 |             0 |
+----+----------+--------------------+------+---------+------+----------+------------------+-----------+---------------+
2 rows in set (0.02 sec)
In this case we are looking for the value "GLOBAL" in OBJECT_TYPE column and "SHARED" in LOCK_TYPE column. In this row we see OWNER_TREAD_ID, but the value, 39, is not present in the SHOW PROCESSLIST output. We need to search for the proper thread id in the performance_schema.threads table, like this:
mysql> select t.processlist_id from performance_schema.threads t join performance_schema.metadata_locks ml on ml.owner_thread_id = t.thread_id where ml.object_type='GLOBAL' and ml.lock_type='SHARED';
+----------------+
| processlist_id |
+----------------+
|             12 |
+----------------+
1 row in set (0.03 sec)
Now, what if we run older MySQL 5.6.x that does not have metadata_locks table? If we make sure that the history of statements executed by each thread is maintained in performance_schema, we can easily search the history for the FTWRL statement in the performance_schema.events_statements_history table! Steps are also simple:
mysql> update performance_schema.setup_consumers set enabled = 'YES' where NAME = 'events_statements_history';
ERROR 1223 (HY000): Can't execute the query because you have a conflicting read lock
mysql> unlock tables;
Query OK, 0 rows affected (0.00 sec)

mysql> update performance_schema.setup_consumers set enabled = 'YES' where NAME = 'events_statements_history';
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1  Changed: 0  Warnings: 0

mysql> flush tables with read lock;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from performance_schema.events_statements_history where sql_text like 'flush tables%'\G
*************************** 1. row ***************************
              THREAD_ID: 39
               EVENT_ID: 21
           END_EVENT_ID: 21
             EVENT_NAME: statement/sql/flush
                 SOURCE: socket_connection.cc:95
            TIMER_START: 94449505549959000
              TIMER_END: 94449505807116000
             TIMER_WAIT: 257157000
              LOCK_TIME: 0
               SQL_TEXT: flush tables with read lock
                 DIGEST: 03682cc3e0eaed3d95d665c976628d02
            DIGEST_TEXT: FLUSH TABLES WITH READ LOCK
...
    NESTING_EVENT_LEVEL: 0
1 row in set (0.00 sec)
 Again, if we need thread id for the processlist, we have to join with performance_schema.threads:
mysql> select t.processlist_id from performance_schema.threads t join performance_schema.events_statements_history h on h.thread_id = t.thread_id where h.digest_text like 'FLUSH TABLES%';
+----------------+
| processlist_id |
+----------------+
|             12 |
+----------------+
1 row in set (0.01 sec)
Note that using digest_text while searching in history is a bit more reliable, as the text is already normalized, so you do not need to care about extra spaces etc. On the other hand, it requires statements_digest consumer to be enabled:

mysql> select * from performance_schema.setup_consumers where enabled = 'YES';
+---------------------------+---------+
| NAME                      | ENABLED |
+---------------------------+---------+
| events_statements_current | YES     |
| events_statements_history | YES     |
| global_instrumentation    | YES     |
| thread_instrumentation    | YES     |
| statements_digest         | YES     |
+---------------------------+---------+
5 rows in set (0.00 sec)
Now, what if you do not have performance_schema enabled (or compiled in), use older MySQL version or it's too late to run any updates or install any plugins? As usual, gdb to the rescue!

We already know that global shared metadata lock is set, so one can try to apply the approach I've already written about in this post. It looks a bit complicated though, so I decided to try something easier and just traverse the linked list of all MySQL threads in gdb in a way similar to what Shane Bester did years ago while looking for SQL statements executed by thread in the core file.

So, in MariaDB 10.1 with performance_schema disabled I, again, executed the following (surely, delete was executed in another session after FTWRL):
openxs@ao756:~/dbs/maria10.1$ bin/mysql -uroot --socket=/tmp/mariadb.sock test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 13
Server version: 10.1.18-MariaDB MariaDB Server

Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [test]> select @@performance_schema;
+----------------------+
| @@performance_schema |
+----------------------+
|                    0 |
+----------------------+
1 row in set (0.00 sec)

MariaDB [test]> flush tables with read lock;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> show processlist;
+----+------+-----------+------+---------+------+------------------------------+------------------+----------+
| Id | User | Host      | db   | Command | Time | State                        | Info             | Progress |
+----+------+-----------+------+---------+------+------------------------------+------------------+----------+
| 13 | root | localhost | test | Query   |    0 | init                         | show processlist |    0.000 |
| 14 | root | localhost | test | Query   |    9 | Waiting for global read lock | delete from t0   |    0.000 |
+----+------+-----------+------+---------+------+------------------------------+------------------+----------+
2 rows in set (0.00 sec)
and then found the PID for MariaDB server and attached gdb to the process:
openxs@ao756:~$ ps aux | grep 'mysqld ' | grep maria
openxs    7444  0.1  5.7 748028 220744 ?       Sl   кві14   1:47 /home/openxs/dbs/maria10.1/bin/mysqld --no-defaults --basedir=/home/openxs/dbs/maria10.1 --datadir=/home/openxs/dbs/maria10.1/data --plugin-dir=/home/openxs/dbs/maria10.1/lib/plugin --log-error=/home/openxs/dbs/maria10.1/data/ao756.err --pid-file=ao756.pid --socket=/tmp/mariadb.sock --port=3307
openxs@ao756:~$ sudo gdb -p 7444
...
Loaded symbols for /home/openxs/dbs/maria10.1/lib/plugin/metadata_lock_info.so
0x00007f7f5a2f084d in poll () at ../sysdeps/unix/syscall-template.S:81
81      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) p threads
$1 = {<base_ilist> = {first = 0x7f7f3fbf3008, last = {
      _vptr.ilink = 0x7f7f5cea3510 <vtable for ilink+16>,
      prev = 0x7f7f3ffa1018, next = 0x0}}, <No data fields>}
As you can see, we have a global variable threads. threads->first is a pointer to (the structure that describes) the first thread (of THD * type) in the double linked list of all MySQL threads, so we can do the following:
(gdb) set $thd=(THD *)(threads->first)
(gdb) p $thd
$3 = (THD *) 0x7f7f3fbf3008
(gdb) p $thd->thread_id$4 = 14
(gdb) p $thd->global_read_lock
$5 = {m_state = Global_read_lock::GRL_NONE, m_mdl_global_shared_lock = 0x0,
  m_mdl_blocks_commits_lock = 0x0}
I had not checked any version before MySQL 5.6 yet, but I assume that all of them have list of threads, each thread's structure has a thread_id item and I suppose to see global_read_lock item as well, if not as a structure with related MDL information, then with something else (to be studied later).

In the case above, I clearly see that thread with 14 in the processlist had not set global read lock. Now, let me proceed to the next thread:
(gdb) set $thd=(THD *)($thd->next)
(gdb) p $thd
$6 = (THD *) 0x7f7f3ffa1008
(gdb) p $thd->thread_id
$7 = 13
(gdb) p $thd->global_read_lock
$8 = {m_state = Global_read_lock::GRL_ACQUIRED_AND_BLOCKS_COMMIT,
  m_mdl_global_shared_lock = 0x7f7f2e90e080,
  m_mdl_blocks_commits_lock = 0x7f7f2e90e180}
So, it is thread with id 13 that had executed FTWRL and had not yet released the global read lock. I was lucky to find it fast, but I can repeat steps above to move on to the next list item until the next is 0x0. Shane had written a similar code as a Python macro for gdb is his post I mentioned. One day I may grow up and write something similar for this case.

Other approach was invented and described by in his blog post on the same topic of getting SQL statements from core files. One can just create a big enough input file for gdb that will switch to process threads one by one and try to get information from MySQL thread structure there, assuming it was created (and ignoring errors), like this:
(gdb) thread 1[Switching to thread 1 (Thread 0x7f7f5ce6d7c0 (LWP 7444))]
#0  0x00007f7f5a2f084d in poll () at ../sysdeps/unix/syscall-template.S:81
81      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) p do_command::thd->thread_id
No frame is currently executing in block do_command(THD*).
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f7f5ce02b00 (LWP 9232))]
#0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
238     ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S: No such file or directory.
(gdb) p do_command::thd->thread_id
$9 = 14
(gdb) p do_command::thd->global_read_lock
$10 = {m_state = Global_read_lock::GRL_NONE, m_mdl_global_shared_lock = 0x0,
  m_mdl_blocks_commits_lock = 0x0}
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f7f5cd6eb00 (LWP 9230))]
#0  0x00007f7f5a2f084d in poll () at ../sysdeps/unix/syscall-template.S:81
81      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) p do_command::thd->thread_id
$11 = 13
(gdb) p do_command::thd->global_read_lock
$12 = {m_state = Global_read_lock::GRL_ACQUIRED_AND_BLOCKS_COMMIT,
  m_mdl_global_shared_lock = 0x7f7f2e90e080,
  m_mdl_blocks_commits_lock = 0x7f7f2e90e180}

...
And so on, for 1000 threads or more, if we expect that many. With some smart use of shell and tools like grep and awk on the resulting output we can probably eventually get just what we need - thread id of the thread that did FTWRL and is still keeping global read lock.

One day I may work on this kind of scripts, but for now I just wanted to show how gdb can help you to get this information.

by Valeriy Kravchuk (noreply@blogger.com) at April 15, 2017 03:59 PM

April 14, 2017

Peter Zaitsev

Run Percona Server on Bash on Windows on Ubuntu

Bash on Windows on Ubuntu

Bash on Windows on UbuntuIn this post, I’ll explain how to run Percona Server for MySQL and Percona Server for MongoDB on Bash on Windows on Ubuntu.

We are getting a good number of questions about whether Percona Server (for both MySQL and MongoDB) is available for Windows for evaluation or development purposes. I want to provide a guide to how to get it running.

In comments to the post Running Percona XtraBackup on Windows … in Docker, Peter Laursen recommend Bash on Ubuntu on Windows. That hadn’t occurred to me before, so the credit goes to Peter Laursen.

As of that older post, it appears that Percona XtraBackup was not working right in Bash on Ubuntu on Windows. But in my latest test on Windows 10 Creators Edition, the problem seems resolved.

But you can get Percona Server for MySQL Percona Server for and MongoDB)  running in Bash on Windows on Ubuntu right away. It is quite easy and does not require extra work. Probably the biggest step is to get Bash on Ubuntu on Windows enabled by itself. A manual on how to do this is here:

https://www.howtogeek.com/249966/how-to-install-and-use-the-linux-bash-shell-on-windows-10/

After this, follow the steps to install Percona Server for MySQL or Percona Server for MongoDB from our Ubuntu repositories:

After this, you can start the server as you would in a regular Linux environment.

by Vadim Tkachenko at April 14, 2017 08:54 PM

TokuDB Hotbackup and Replication

mysqlpump

TokuDB HotbackupTokuDB Hotbackup is a solution that allows you to do backups on the fly. It works as a library that intercepts certain system calls that duplicate data written to already copied parts of files, so that at the end of the backup process the copied files contain the same content as the original files. There are several blog posts describing how TokuDB Hot Backup works in details:

Replication often uses backups replication to create slaves. For this purpose, we need to know the last executed GTID or binary log position both for the slave and master configurations.

You can obtain the corresponding information with SHOW MASTER/SLAVE STATUS. But if we request this information after the backup, the corresponding binlog position or executed GTID can be changed and the backed-up data won’t correspond to the master/slave state.

It’s possible to lock tables with FLUSH TABLE WITH READ LOCK, or use more smart locks like LOCK TABLES/BINLOG FOR BACKUP. See this  blog post for details: Introducing backup locks in Percona Server.

But the larger question is when can we use the corresponding locks? We could lock binlog or some table before getting a backup, and then release after it’s done. But if the data being backed-up is big enough the backup itself can take time — and all that time the data is locked for changes.

There must be a more suitable solution. Let’s recall how TokuDB Hotbackup works: it uses a special library that intercepts some system calls and it has an API. This API includes such commands as “start capturing” and “stop capturing”. “Capturing” means that if some part of a file gets copied to a backup location, and this part is changed (by the MySQL server in our case), then these changes are also applied to the backup location. TokuDB Hotbackup plugin uses the API, starts capturing, make copies and stops capturing.

There is no need to lock the binlog during the copy process, because when the binlog is flushed the changes are copied to backup by a “capturing” mechanism. After everything has been copied, and with the capturing still running, this is a good time for LOCK BINLOG FOR BACKUP execution. After this statement is executed, the binlog is flushed, the flushed changes are captured and applied to a backup location, and any queries that could change the binlog position or executed GTID are blocked. After this we can stop capturing, get the binlog position or the last executed GTID and unlock the binlog.

This is how it’s implemented in TokuDB Hotbackup. After a backup is taken, there are “tokubackup_slave_info” and “tokubackup_binlog_info” files in the backup directory that contain the corresponding information about the slave and master (in human-readable format). You can use this information to start a new slave from the master or slave. 5.7 supports a multisource replication, and “tokubackup_slave_info” contains information for all replication channels.

There could be a happy ending here, but the devil is in the details. For example, there are several binary logging formats: RBR, SBR, MBR (for details see https://dev.mysql.com/doc/refman/5.7/en/replication-formats.html). In the case of SBR or MBR, a binary log event can contain statements that produce temporary tables on the slave side, and the result of further statements can depend on the content of the temporary tables.

Usually, temporary tables are created in a separate directory that is out of a MySQL data directory, and aren’t backed up. That is why if we create a backup when temporary tables produced by binary log events exist, and then try to restore the backup, the temporary tables aren’t restored (as they were not backed up). If we try to restore the replication from the point saved during the backup, and after this point the binary log contains events that use the content of non-restored temporary tables, the data will be inconsistent.

That is why the so-called “safe-slave” mode is implemented in TokuDB Hotbackup. The same mode is implemented in Percona XtraBackup, and its name was also inherited from Percona XtraBackup. In this mode, along with LOCK BINLOG FOR BACKUP statement execution, the slave SQL thread is stopped. After this it is checked to see if temporary tables produced by slave SQL thread exist or not. If yes, then the slave SQL thread is restarted until there are no temporary tables (or the certain timeout is reached).

For those purposes, we introduced the following new system variables:

  • --tokudb-backup-safe-slave
     – turn on/off safe-slave mode
  • --tokudb-backup-safe-slave-timeout
     – maximum amount of time in seconds to wait until temp tables disappear

Note for the case of multisource replication, the simplified version of this feature is implemented. So if there are several replication channels, and for some of them the SQL thread is started while for others the thread is stopped, the TokuDB Hotbackup does not restore the channels state. It just restarts the SQL threads for all channels, and after the backup SQL threads for all channels will be started.

You can’t use this option for group-replication.

This could be the happy ending, but… well, you know, it’s complicated. Apart from replication features in the MySQL server, there are a lot of engine-specific features. One of them is how frequently the recovery log is synchronized.

For TokuDB, there are two system variables you can use to tune recovery log synchronization frequency:

tokudb_commit_sync
 and
tokudb_fsync_log_period
. By playing with these variables, you can establish some tradeoff between durability and performance. Here is a good blogpost about these parameters: High Insertion Rates into a TokuDB Table with Durable Transactions.

But let’s imagine we have a certain recovery log synchronization period, and the backup finished somewhere in the middle of this period. It’s OK if we use the backup just to restore some data, because the data is restored using recovery and rollback logs during the server’s start.

But if we also want to start the slave using the information stored in the “tokubackup_slave_info” and “tokubackup_binlog_info” files, there might be a situation where after the recovery stage the data in the database is behind the replication position stored during backup. See this bug description https://bugs.launchpad.net/percona-server/+bug/1533174 for details. Now, when capturing is still active, TokuDB forcibly synchronizes the recovery log when the copy process is finished.

So this could be the happy ending but… Well, actually I hope this is a happy ending! At least we’ve implemented the general functionality for creating slaves using TokuDB Hotbackup. However, as “ the show must go on”, or “the spice must flow” etc., we are happy to discuss any concerns, ideas, proposals and bug reports.

Thanks to all who took part in development: George O. Lorch III, Sergey Glushchenko, Shahriyar Rzaev, Alexey Kopytov (as the author of “LOCK BINLOG FOR BACKUP” feature) and other people from Percona team and community.

by Vlad Lesin at April 14, 2017 06:10 PM

April 13, 2017

Peter Zaitsev

TokuDB Troubleshooting: Q & A

TokuDB

TokuDBIn this blog, we will provide answers to the Q & A for the TokuDB Troubleshooting webinar.

First, we want to thank everybody for attending the March 22, 2017 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that we were unable to answer during the webinar:

Q: Is it possible to load specific tables or data from the backup?

A: Do you mean the backup created by TokuBackup? No, this is not possible. You have to restore the full backup to a temporary instance, then perform a logical dump/reload of the specific tables you want to restore.

Q: Since it is not recoverable when corruption happens, we have a risk of losing data. Is incremental backup the only option?

A: TokuBackup currently does not support incremental backup options. You can only create a kind of incremental backup if you copy and store the binary logs on a separate server, and then apply them on top of a previous full backup.

by George O. Lorch III at April 13, 2017 05:08 PM

Percona Live Featured Session with Wei Hu – AliSQL: Breakthrough for the Future

Percona Live Featured Session

Percona Live Featured SessionWelcome to another post in the series of Percona Live featured session blogs! In these blogs, we’ll highlight some of the session speakers that will be at this year’s Percona Live conference. We’ll also discuss how these sessions can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured session, we’ll meet Wei Hu, Staff Engineer at Alibaba. His session (along with co-presenter Dengcheng He, Senior Staff Engineer at Alibaba) is AliSQL: Breakthrough for the Future. AliSQL is a MySQL branch maintained by the Alibaba Database Team. AliSQL has made many improvements over the last year in its efforts to make it a high-performance, high-availability and low-maintenance storage engine option.

I had a chance to speak with Wei about AliSQL:

Percona: How did you get into database technology? What do you love about it?

Wei: I worked on an RDBMS storage engine project in graduate school, where I spent years studying database theory, and experienced the charm of database systems.

Before joining the Alibaba Group, I worked for Netease. I developed another storage engine called TNT (Transactional/Non-Transactional) for MySQL 5.1 and 5.5. During this project, I had the opportunity to learn about and gain a deep understanding of the MySQL source code. Last year I joined the Alibaba group. Alibaba’s E-Commerce business has extreme RDBMS demands, and my work here is making AliSQL faster, safer and more efficient.

Percona: Your talk is called AliSQL: Breakthrough for the Future. What is AliSQL, and what workloads could benefit from it?

Wei: Last year, we joined Percona Live for the very first time. We brought the world AliSQL. AliSQL is a fork of MySQL(based on the community version) tailored for Alibaba’s business characteristics and requirements. AliSQL is focused on extreme performance and data consistency.  As many people know, AliSQL supports the world’s largest throughput of OLTP system. This has been demonstrated in the Singles’ Day shopping festival. Compared to the community version, AliSQL can offer high throughput, high concurrency and low latency at the same time.

Last Percona Live, We share many of the improvements we made, including Column Compression, Double Read Buffer, SQL Firewall and so on. This year we’re bringing the world a brand new AliSQL.

Firstly, we developed the new “Hot SKU” feature. We were not satisfied with AliSQL’s previous performance (5,000 single key updates per second). We developed a new Group update algorithm to improve throughputs to 100,000 single key updates per second. Panic buying is no longer an annoying problem in our e-commerce scenario.

Secondly, based on the InnoDB memcache plugin, AliSQL developed X-KV, a new powerful Key-Value interface. X-KV implements a new protocol with more operation and data type support. Our customers used X-KV as a memory cache, and save the use of hundreds of machines in a production environment.

In addition, based on AliSQL we have developed X-Cluster. X-Cluster uses X-Paxos (Alibaba’s consensus library) to replicate data among instances. It supports high availability and high reliability. X-Cluster has better performance compared to Group Replication. Our benchmarking shows that X-Cluster has five times the throughput of Group Replication (for MySQL 5.7.17) in our high latency network. Furthermore, X-Cluster has many customization features for Alibaba’s production environment, such as leader election priority, LogType instance (low cost), etc.

Percona: How does the latest version of AliSQL make DBAs’ work easier?

Wei: With new “Hot SKU” feature, DBAs do not need to scale out instances for panic buying. With AliSQL X-KV, DBAs do not need to care about schema changes anymore. With AliSQL X-Cluster, DBAs don’t need to worry about data inconsistency problems. All the data transfer systems for AliSQL can use X-Paxos SDK to communicate with X-Cluster. DBAs do not need to set the log position. All is handled by X-Cluster itself.

Percona: What do you want attendees to take away from your session? Why should they attend?

Wei: In my session, I will share the AliSQL HOT SKU, X-KV and X-Cluster internals. Other developers can gain insights and spark new ideas from the talk.

Percona: What are you most looking forward to at Percona Live 2017?

Wei: I am looking forward to chatting with MySQL developers from all over the world, and making more friends.

Register for Percona Live Data Performance Conference 2017, and see Wei and Dengcheng present AliSQL: Breakthrough for the Future. Use the code FeaturedTalk and receive $100 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community, as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara and the Santa Clara Convention Center.

by Dave Avery at April 13, 2017 03:31 PM

April 12, 2017

Peter Zaitsev

At Percona Live 2017: Many Technical MariaDB Server Sessions!

Percona Live Featured Session

Percona Live MariaDB ServerAt Percona, we support MariaDB Server (in addition to MySQL, Percona Server for MySQL, MongoDB and Percona Server for MongoDB, of course!), and that is reflected in the good, high-quality technical content about MariaDB Server at the Percona Live Open Source Database Conference 2017 in Santa Clara.

MariaDB is a fork of MySQL, developed and maintained by some of the original creators of MySQL (most notably, Michael “Monty” Widenius). At Percona Live, learn about how MariaDB promises to be a drop-in replacement for MySQL – with better performance and more flexibility – in our MariaDB track. These sessions tackle subjects on analytics, architecture and design, security, operations, scalability and performance for MariaDB.

If you’re hungry for good MariaDB Server content, this is what you’ll probably want to attend:

Monday (tutorial day)

  • Come to the MyRocks Deep Dive Tutorial by Yoshinori Matsunobu. Percona Server for MySQL and MariaDB Server will include the new storage engine in production. You should attend this tutorial if you want to learn how to use it.

Tuesday

Wednesday

Thursday

So there is plenty of MariaDB Server related content to fill you up while attending Percona Live. Use the code SeeMeSpeak to get 10% off your tickets. What are you waiting for, register now!

by Colin Charles at April 12, 2017 07:56 PM

ProxySQL Rules: Applying and Chaining the Rules

ProxySQL Rules

In this post, I am going to show you how you can minimize the performance impact of ProxySQL rules by using some finesse.

Apply Test

In my previous post, we could see the effect of the rules on ProxySQL performance. As we could also see, the “apply” option does not help with 1000 tables. Are we sure about this? Let’s consider: if we know 90% of our traffic won’t match any rules, it doesn’t matter if we have 10 or 500 rules – it has to check all of them. And this is going to have a serious effect on performance. How can we avoid that?

Let’s insert rule number ONE, which matches all queries, like this:

insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest,apply) values('testuser_rw',600,1,3,'(from|into|update|into table) sbtest([1-9]d{3,}|[1-9][0-9][1-9])b',1);

This rule matches all queries where table names > sbtest100. But again, this logic also can be applied on “userids” or any other keys. We just have to know our application and our query distribution.

With this rule, the 90% of the queries have to check only one rule (the first one):

Now we have 101 rules, but the performance is almost the same as when we had only ten rules! As we can see, creating the rules based on our query distribution has a huge impact!

But what if we don’t know which queries are the busiest, or every query has the same amount of hits? Can we do anything? Yes, we can.

Chaining

In my previous post, I mentioned the “flagIN”, “flagOUT” options. With these options we can chain the rules. But why is that good for us?

If we have 100 rules and 100 tables, even with applying, on average ProxySQL has to check 50 rules. But if we write rules like these:

insert into mysql_query_rules (flagin,flagout,username,active,retries,match_digest,apply) VALUES
(0,1000,'testuser_rw',1,3,'(from|into|update|into table) sbtest.b',0),
(0,1100,'testuser_rw',1,3,'(from|into|update|into table) sbtest1.b',0),
(0,1200,'testuser_rw',1,3,'(from|into|update|into table) sbtest2.b',0),
(0,1300,'testuser_rw',1,3,'(from|into|update|into table) sbtest3.b',0),
(0,1400,'testuser_rw',1,3,'(from|into|update|into table) sbtest4.b',0),
(0,1500,'testuser_rw',1,3,'(from|into|update|into table) sbtest5.b',0),
(0,1600,'testuser_rw',1,3,'(from|into|update|into table) sbtest6.b',0),
(0,1700,'testuser_rw',1,3,'(from|into|update|into table) sbtest7.b',0),
(0,1800,'testuser_rw',1,3,'(from|into|update|into table) sbtest8.b',0),
(0,1900,'testuser_rw',1,3,'(from|into|update|into table) sbtest9.b',0);
insert into mysql_query_rules (flagin,destination_hostgroup,active,match_digest,apply) VALUES
(1100,600,1,'(from|into|update|into table) sbtest11b',1),
(1100,600,1,'(from|into|update|into table) sbtest12b',1),
(1100,600,1,'(from|into|update|into table) sbtest13b',1),
(1100,600,1,'(from|into|update|into table) sbtest14b',1),
(1100,600,1,'(from|into|update|into table) sbtest15b',1),
(1100,600,1,'(from|into|update|into table) sbtest16b',1),
(1100,600,1,'(from|into|update|into table) sbtest17b',1),
(1100,600,1,'(from|into|update|into table) sbtest18b',1),
(1100,600,1,'(from|into|update|into table) sbtest19b',1);
...

We are going to have more than 100 rules, but first we match on the first digit after the second and then go on. With this approach ProxySQL has to only check 15 rules on average.

Let’s see the results:

As we can see, even with more rules, chaining is way faster than without chaining.

Tips

Hits

ProxySQL keeps statistics about a rule’s hits. When you add a rule you can see how many queries it applied to:

select * from stats_mysql_query_rules;
+---------+------+
| rule_id | hits |
+---------+------+
| 2 | 6860 |
| 3 | 6440 |
| 4 | 6880 |
| 5 | 6610 |
| 6 | 6850 |
| 7 | 7130 |
| 8 | 6620 |
| 9 | 7300 |
| 10 | 6750 |
| 11 | 7050 |
| 12 | 7280 |
| 13 | 6780 |
| 14 | 6670 |
...

Query_Processor_time_nsec

ProxySQL does not record how much time it spends on a rule (not yet, anyway: https://github.com/sysown/proxysql/issues/966), but it has a global stat:

select * from stats_mysql_global where Variable_name="Query_Processor_time_nsec";
+---------------------------+----------------+
| Variable_Name | Variable_Value |
+---------------------------+----------------+
| Query_Processor_time_nsec | 3184114671740 |
+---------------------------+----------------+

You can monitor this statistic, and if you see a huge increase after you added a rule, you might want to review it again.

Conclusion

ProxySQL can handle many rules, and of course they have some costs. But if you design your rules based on your workload and your query distribution, you can minimize this cost a lot.

by Tibor Korocz at April 12, 2017 06:55 PM

MariaDB Foundation

2016 in the MariaDB Foundation

We have been very busy in 2016 and early 2017 so this yearly report is long overdue, but we are glad we can report on so many positive things. Good growth has been visible all over the MariaDB ecosystem, and the mariadb.org website had almost 1.9 million page views in 2016, a growth of 72% […]

The post 2016 in the MariaDB Foundation appeared first on MariaDB.org.

by Otto Kekäläinen at April 12, 2017 11:59 AM

Daniël van Eeden

Network attacks on MySQL, Part 6: Loose ends

Backup traffic

After securing application-to-database and replication traffic, you should also do the same for backup traffic.

If you use Percona XtraBackup with streaming than you should use SSH to send your backup to a secure location. The same is true for MySQL Enterprise Backup. Also both have options to encrypt the backup itself. If you send your backup to a cloud service this is something you should really do, especially if it is not sent via SSH or HTTPS.

And mysqldump and mysqlbinlog both support SSL. And you could use GnuPG, OpenSSL, WinZIP or any other tool to encrypt it.

Sending credentials

You could try to force the client to send credentials elsewhere. This can be done if you can control the parameters to the mysql client. It reads the config from /etc/my.cnf, ~/.my.cnf and ~/.mylogin.conf but if you for example specify a login-path and a hostname.. it connects to that host, but with the password and username from the loginpath from the encrypted ~/.mylogin.cnf file.

You could use --enable-cleartext-plugin to make it even easier to get to the stored password. Note that if you have direct access to the ~/.mylogin.cnf file that there are options to decrypt it.

See Bug #74545: mysql allows to override login-path for details.

MySQL Cluster (NDB)

Make sure your machines use a private network (VLAN) which can only be accessed from cluster nodes. Your API nodes should be in this network and have a public interface where mysqld listens. Another option might be to use a firewall device or host based firewalls. Just make sure you are aware or the risks.

As usual thers is extensive documentation about this: MySQL Cluster Security and Networking Issues from the MySQL Reference Manual.

Network storage

And use proper security for iSCSI, NFS, FCP or any other kind of network storage you might be using. I've seen setups where iSCSI and/or NFS were publicly available and even with data-at-rest encryption this is not really safe, especially if read-write access is available.

Future

In both MySQL 5.6 and MySQL 5.7 Oracle improved the SSL/TLS support a lot. There are more improvements needed as a lot has changed in how SSL over the past 10 years. Assumptions made years ago are no longer true.

And also the creators of YaSSL have been busy: wolfSSL/mysql-patch on github

by Daniël van Eeden (noreply@blogger.com) at April 12, 2017 07:00 AM

April 11, 2017

Peter Zaitsev

Correct Index Choices for Equality + LIKE Query Optimization

Query Optimization

Query OptimizationAs part of our support services, we do a lot of query optimization. This is where most performance gains come from. Here’s an example of the work we do.

Some days ago a customer arrived with the following table:

CREATE TABLE `infamous_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `member_id` int(11) NOT NULL DEFAULT '0',
  `email` varchar(200) NOT NULL DEFAULT '',
  `msg_type` varchar(255) NOT NULL DEFAULT '',
  `t2send` int(11) NOT NULL DEFAULT '0',
  `flag` char(1) NOT NULL DEFAULT '',
  `sent` varchar(100) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `f` (`flag`),
  KEY `email` (`email`),
  KEY `msg_type` (`msg_type`(5)),
  KEY `t_msg` (`t2send`,`msg_type`(5))
) ENGINE=InnoDB DEFAULT CHARSET=latin1

And a query that looked like this:

SELECT COUNT(*)
  FROM `infamous_table`
 WHERE `t2send` > 1234
   AND `msg_type` LIKE 'prefix%';

The table had an index t_msg that wasn’t helping at all: the EXPLAIN for our 1000000 rows test table looked like this:

id: 1
select_type: SIMPLE
table: infamous_table
type: range
possible_keys: t_msg
key: t_msg
key_len: 4
ref: NULL
rows: 107478
Extra: Using where

You can see the index is the on that was expected: “t_msg”. But the key_len is 4. This indicates that the INT part was used, but that the msg_type(5) part was ignored. This resulted examining 100k+ rows. If you have MySQL 5.6, you can see it more clearly with EXPLAIN FORMAT=JSON under used_key_parts:

EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "infamous_table",
      "access_type": "range",
      "possible_keys": [
        "t_msg"
      ],
      "key": "t_msg",
      "used_key_parts": [
        "t2send"
      ],
      "key_length": "4",
      "rows": 107478,
      "filtered": 100,
      "index_condition": "(`test`.`infamous_table`.`t2send` > 1234)",
      "attached_condition": "(`test`.`infamous_table`.`msg_type` like 'prefix%')"
    }
  }
}

The customer had multi-valued strings like “PREFIX:INT:OTHER-STRING” stored in the columnmsg_type, and that made it impossible to convert it to an enum or similar field type that allowed changing the LIKE for an equity.

So the solution was rather simple: just like for point and range queries over numeric values, you must define the index with the ranged field as the rightmost part. This means the correct index would have looked like msg_type(5),t2send. The EXPLAIN for the new index provided the customer with some happiness:

id: 1
select_type: SIMPLE
table: infamous_table
type: range
possible_keys: t_msg,better_multicolumn_index
key: better_multicolumn_index
key_len: 11
ref: NULL
rows: 4716
Extra: Using where

You can see the key_len is now what we would have expected: four bytes for the INT and another seven bytes for the VARCHAR (five for our chosen prefix + two for prefix length). More importantly, you can notice the rows count decreased by approximately 22 times.

We used pt-online-schema on the customer’s environment to apply ALTER to avoid downtime. This made it an easy and painless solution, and the query effectively executed in under 1/20 of the time! So, all fine and dandy? Well, almost. We did a further test, and the query looked like this:

SELECT COUNT(*)
  FROM `infamous_table`
 WHERE `t2send` > 1234
   AND `msg_type` LIKE 'abc%';

So where’s the difference? The length of the string used for the LIKE condition is shorter than the prefix length we choose for the VARCHAR part of the index (the customer intended to look-up strings with only three chars, so we needed to check this). This query also scanned 100k rows, and EXPLAIN showed the key_len was 4, meaning the VARCHAR part was being ignored once again.

This means the index prefix needed to be shorter. We ALTERed the table and made the prefix four characters long, counting on the fact that the multi-valued strings were using “:” to separate the values, so we suggested the customer include the colon in the look-up string for the shortest strings. In this case,  'abc%' would be 'abc:%' (which is also four characters long).

As a final optimization, we suggested dropping old indexes that were covered by the new better_multicolumn_index, and that were most likely created by the customer while testing optimization.

Conclusion

Just like in point-and-range queries, the right order for multi-column indexes is putting the ranged part last. Equally important is that the length of the string prefix needs to match the length of the shortest string you intend to look-up. Just remember, you can’t make this prefix too short or you’ll lose specificity and the query will end up scanning rows unnecessarily.

by marcos.albe at April 11, 2017 07:51 PM

Webinar Wednesday 4/12: Tuning MongoDB Consistency

Tuning MongoDB

Tuning MongoDBPlease join Percona’s Senior Technical Operations Architect Tim Vaillancourt as he presents Tuning MongoDB Consistency on April 12, 2017 at 10:00 am PDT / 1:00 pm EDT (UTC-7).

 Welcome to part two of Percona’s tuning series. In our previous webinar, we mentioned some of the best practices for MongoDB tuning. What if you still need better performance after following the tuning advice in the first webinar? Part two takes a closer look at some of the some of the other options to consider when tuning queries.

In this webinar, we will cover:

  • Consistency, atomicity and isolation in MongoDB
  • Replica set rollbacks, and the risks to your data
  • Integrity vs. scalability tradeoffs to consider during development
  • Using read concerns and write concerns to tune your application data consistency
  • When to use Read Preference, and the tradeoffs of doing so
  • Tuning your MongoDB deployment and server configuration for data integrity/consistency
  • Performing cluster-wide consistent backups

By the end of the webinar you will have a better understanding of how to use MongoDB’s features to achieve a required balance of consistency and scalability.

Register for the webinar here.

Timothy Vaillancourt, Senior Technical Operations Architect

Tim joined Percona in 2016 as Sr. Technical Operations Architect for MongoDB with a goal to make the operations of MongoDB as smooth as possible. With experience operating infrastructures in industries such as government, online marketing/publishing, SaaS and gaming, combined with experience tuning systems from the hard disk all the way up to the end-user, Tim has spent time in nearly every area of the modern IT stack with many lessons learned.

Tim is based in Amsterdam, NL and enjoys traveling, coding and music. Before Percona Tim was the Lead MySQL DBA of Electronic Arts’ DICE studios, helping some of the largest games in the world (“Battlefield” series, “Mirrors Edge” series, “Star Wars: Battlefront”) launch and operate smoothly while also leading the automation of MongoDB deployments for EA systems. Before the role of DBA at EA’s DICE studio, Tim served as a subject matter expert in NoSQL databases, queues and search on the Online Operations team at EA SPORTS. Before moving to the gaming industry, Tim served as a Database/Systems Admin operating a large MySQL-based SaaS infrastructure at AbeBooks/Amazon Inc.

by Dave Avery at April 11, 2017 04:54 PM

Valeriy Kravchuk

Fun with Bugs #52 - On Some Bugs Fixed in MySQL 5.7.18

I had not reviewed MySQL 5.7 release notes for quite a some time in this series. Last time I checked it was MySQL 5.7.15. So, as soon as I noted new release, 5.7.18, I decided to check the release notes for any interesting fixed bug (reported by Community users in public) in the areas I am interested in: InnoDB, replication, optimizer and few others.

Note that recently most of the bugs fixed are internal only, found by Oracle engineers that never cared (or are not allowed, who knows) to report them in public, so this blog post is not even remotely a full review of what's fixed in MySQL 5.7.18 and is not a replacement for reading the detailed release notes.

So, here is the result. I've noted the following InnoDB bugs from Community users are fixed (as usual, I explicitly state who reported and verified the bug):
  • Bug #83245 - "Log parsing buffer overflow". This regression bug was reported by my former colleague from Percona, Tomislav Plavcic (based on the original findings by Peter Zaitsev), and verified by Umesh Shastry.
  • Bug #82968 "InnoDB's estimate for rows-in-table can return rows=1 due to a race condition", was reported by my colleague Sergey Petrunya while working on my MariaDB bug report, MDEV-10649 (based, in turn, on a problem one of our key customers hit in production). The bug was kindly verified by Sinisa Milivojevic.
  • Bug #80060 - "Concurrent TRUNCATE TABLEs cause stalls". It's great to see this case reported by Domas Mituzas fixed at least in 5.7.x. But it's even better to just read all the comments and arguments in that great and funny discussion he had with Sinisa Milivojevic who was working on verification...
  • Bug #80580 - "count(*) much slower on 5.7 than 5.6". This regression (not tagged as such!) bug was reported by Qi Xiaobin as optimizer one (but it turned out to be in InnoDB), and verified by  Umesh Shastry. Simple public test case there was provided by me.
  • Bug #84202 is not public, so let me simply quote (after telling you again that I hate such cases):
    "InnoDB: The row_search_mvcc() function unnecessarily traversed the entire table for a range query, which occurred when the record was not in the transaction read view. (Bug #84202, Bug #23481444, Bug #25251375)"
These bugs reported by Community users were fixed in replication:
  • Bug #83537 - "WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS timeout value handles some inputs badly". It was reported by Dan Reif and verified by Umesh Shastry. Now fractional and negative timeouts are processed in a reasonable and expected way.
  • Bug #84674 - "Having an unresolvable hostname in group_repl should not block group replication", by Kenny Gryp. I've already reviewed active group replication bugs in a separate post some time ago. Nice to see some of them fixed. The bug was probably verified by Nuno Carvalho.
  • Bug #83918 - "RBR: Wrong field length in error message". It was reported by Luis Soares from Oracle,  who probably fixed it himself. Nice example when Oracle engineers do not hesitate to report bugs in public!
  • Bug #83270 - "Performance degradation when the server has many gtids", was reported by Daniël van Eeden and quickly verified by Umesh Shastry.
The following interesting bugs in other categories were also fixed:
  • Bug #84786 - "--performance-schema-digests-size=1 leads to SIGSEGV". It was reported by Roel Van de Paar and promptly verified by Miguel Solorzano.
  • Bug #84263 - "mysql.server (MySQL Server Startup Script) can not work,and export some error.". This bug was reported by Song Boomballa. See also this my related bug report, Bug #84173 - "mysqld_safe --no-defaults & silently does NOT work any more". Nice to know this regression is finally fixed. 
  • Bug #84172 - "The pid-file value is ignored in the /etc/my.cnf option file". This nice packaging bug/difference was noted and reported by Monty Solomon.
  • Bug #83253 - "performance_schema digests the same query to multiple different digests", was reported by my former colleague Justin Swanhart. The bug was verified and probably fixed by Mayank Prasad.
  • Bug #83110 - "found_rows() returns 1 when no rows found". This nice regression bug was found and reported by Yamada Isami, and verified by Umesh Shastry
  • Bug #83005 - "When the Optimiser is using index for group-by it often gives wrong results", was reported by Yoseph Phillips and verified, again by Umesh Shastry. Somehow this regression affecting versions 5.6+ was not noted for a long time.
  • Bug #82313 - "Wrong result for EXPLAIN SELECT COUNT(*) FROM TABLE;". This is yet another optimizer regression bug in 5.7 found by Justin Swanhart. I wonder if MySQL 8 (that seemed to be affected) would include the fix... Release notes fro 8.0.1 are empty at the moment!
  • Bug #81854 - "Force index is skipped while executing select count(*)", was reported by Zhai Weixiang and verified by Umesh Shastry.
  • Bug #78244 - "SELECT DISTINCT, wrong results combined with use_index_extensions=off". It was reported by Daniel G and verified by Miguel Solorzano.
  • Bug #41908 - "Trying to start MySQL while another instance is starting: confusing error messag". The bug was reported by  Roel Van de Paar more than 8 years ago! It was verified by Sveta Smirnova. Somehow it seems that 5.7.18 had introduced a lot of changes into the way mysqld_safe works...
That's all fixed bugs that I've considered relevant for myself and this post. I've just built MySQL 5.7.18 from source on my Ubuntu 14.04 netbook while I write this post, so I do not have any personal experience with this new release, yet. But I know it works:
openxs@ao756:~/dbs/5.7$ bin/mysqld_safe --no-defaults --port=3308 --socket=/tmp/mysql57.sock --basedir=/home/openxs/dbs/5.7 --datadir=/home/openxs/dbs/5.7/data &
[1] 5202
openxs@ao756:~/dbs/5.7$ 2017-04-11T10:51:24.224518Z mysqld_safe Logging to '/home/openxs/dbs/5.7/data/ao756.err'.
2017-04-11T10:51:24.264873Z mysqld_safe Starting mysqld daemon with databases from /home/openxs/dbs/5.7/data

openxs@ao756:~/dbs/5.7$ bin/mysql -uroot --socket=/tmp/mysql57.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.7.18 MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>
Time to start testing and looking for something new...

by Valeriy Kravchuk (noreply@blogger.com) at April 11, 2017 11:38 AM

Jean-Jerome Schmidt

Updated - Full Restore of a MySQL or MariaDB Galera Cluster from Backup

Performing regular backups of your database cluster is imperative for high availability and disaster recovery. If for any reason you lost your entire cluster and had to do a full restore from backup, you would need a reliable and up-to-date backup to start from.

Best Practices for Backups

Some recommendations to consider for a good scheduled backup regime:

  • You should be able to completely recover from a catastrophic failure from at least two previous full backups. Just in case the most recent full backup is damaged, lost, or corrupt,
  • Your backup should contain at least one full backup within a chosen cycle, normally weekly,
  • Store backups away from the current data location, preferably off site,
  • Use a mixture of mysqldump and Xtrabackup for extra safety, and not rely on one method,
  • Test restore your backups on a regular basis, e.g. every two months.

A weekly full backup combined with daily incremental backup is normally enough. Keeping a number of backups for a period of time is always a good plan, maybe keep each weekly backup for one month. This allows you to recover an older database in case of emergencies or if for some reason you have local backup file corruption.

mysqldump or Xtrabackup

mysqldump is very likely the most popular way of backing up MySQL. It does a logical backup of the data, reading from each table using SQL statements then exporting the data into text files. Restoration of a mysqldump is as easy as creating the dump file. The main drawbacks are that it is very slow for large databases, it is not ‘hot’ and it wipes out the InnoDB buffer pool.

Xtrabackup performs hot backups, does not lock the database during the backup and is generally faster. Hot backups are important for high availability, as they run without blocking the application. This is also an important factor when used with Galera, as Galera relies on synchronous replication. However, restoring an xtrabackup can be a little tricky using manual ways.

ClusterControl supports the scheduling of both mysqldump and Xtrabackup (full and incremental), as well as the backup restoration right from the UI.

Full Restore from Backup

In this post, we will show you how to restore Xtrabackup (full + incremental) onto an empty cluster running on MariaDB Galera Cluster. These steps should also work on Percona XtraDB Cluster or Galera Cluster for MySQL from Codership.

In our original cluster, we had a full xtrabackup scheduled daily, with incremental backups created every hour. The backups are stored on ClusterControl as shown in the following screenshot:

Now, let’s assume we have lost our original cluster and have to do a full restore onto a new cluster. The steps include:

  1. Set up a new ClusterControl server.
  2. Set up a new MariaDB Cluster.
  3. Export the backup records and files to the new ClusterControl server.
  4. Start the restoration process.
  5. Start the remaining nodes.

The following diagram illustrates our architecture for this exercise:

Step 1 - Set up New MariaDB Cluster

Install ClusterControl and deploy a new MariaDB Cluster. Go to ClusterControl -> Deploy -> Deploy Database Cluster -> MySQL Galera and specify the required information in the deployment dialog:

Click on the Deploy button and start the deployment. Since we only have a cluster on the old server so the cluster ID should be identical (cluster ID: 1) in this new instance.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Step 2 - Export and import the backup files

Once the cluster is deployed, we will have to import the backups from the old ClusterControl server into the new one. First, export the content of cmon.backup_records to dump files. Since the old cluster ID and the new one is identical, we just need to modify the dump file with the new IP address and import it to the new ClusterControl node. If the cluster ID is different, then you have to change “cid” value accordingly inside the dump files before importing into CMON DB on the new node. Also, it is easier to keep the backup storage location as in the old server so the new ClusterControl can locate the backup files in the new server.

On the old ClusterControl server, export the backup_records table into dump files:

$ mysqldump -uroot -p --single-transaction --no-create-info cmon backup_records > backup_records.sql

Then, perform remote copy of the backup files from the old server into the new ClusterControl server:

$ scp -r /root/backups 192.168.55.150:/root/
$ scp ~/backup_records.sql 192.168.55.150:~

Next is to modify the dump files to reflect the new ClusterControl server IP address. Don’t forget to escape the dot in the IP address:

$ sed -i "s/192\.168\.55\.170/192\.168\.55\.150/g" backup_records.sql

On the new ClusterControl server, import the dump files:

$ mysql -uroot -p cmon < backup_records.sql

Verify that the backup list is correct in the new ClusterControl server:

As you can see, all occurences of the previous IP address (192.168.55.170) have been replaced by the new IP address (192.168.55.150). Now we are ready to perform the restoration in the new server.

Step 3 - Perform the Restoration

Performing restoration through the ClusterControl UI is a simple point-and-click step. Choose which backup to restore and click on the “Restore” button. We are going to restore the latest incremental backup available (Backup: 9). Click on the “Restore” button just below the backup name and you will be presented with the following pre-restoration dialog:

Looks like the backup size is pretty small (165.6 kB). It doesn’t really matter because ClusterControl will prepare all incremental backups grouped under Backup Set 6, which holds the full Xtrabackup. You also have several post-restoration options:

  • Restore backup on - Choose the node to restore the backup on.
  • Tmp Dir - Directory will be used on the local ClusterControl server as temporary storage during backup preparation. It must be as big as the estimated MySQL data directory.
  • Bootstrap cluster from the restored node - Since this is a new cluster, we are going to toggle this ON so ClusterControl will bootstrap the cluster automatically after the restoration succeeds.
  • Make a copy of the datadir before restoring the backup - If the restored data is corrupted or not as what you are expected it to be, you will have a backup of the previous MySQL data directory. Since this is a new cluster, we are going to ignore this one.

Percona Xtrabackup restoration will cause the cluster to be stopped. ClusterControl will:

  1. Stop all nodes in the cluster.
  2. Restore the backup on the selected node.
  3. Bootstrap the selected node.

To see the restoration progress, go to Activity -> Jobs -> Restore Backup and click on the “Full Job Details” button. You should see something like this:

One important thing that you need to do is to monitor the output of the MySQL error log on the target node (192.168.55.151) during the restoration process. After the restoration completes and during the bootstrapping process, you should see the following lines starting to appear:

Version: '10.1.22-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server
2017-04-07 18:03:51 140608191986432 [Warning] Access denied for user 'cmon'@'192.168.55.150' (using password: YES)
2017-04-07 18:03:51 140608191986432 [Warning] Access denied for user 'cmon'@'192.168.55.150' (using password: YES)
2017-04-07 18:03:51 140608191986432 [Warning] Access denied for user 'cmon'@'192.168.55.150' (using password: YES)
2017-04-07 18:03:52 140608191986432 [Warning] Access denied for user 'cmon'@'192.168.55.150' (using password: YES)
2017-04-07 18:03:53 140608191986432 [Warning] Access denied for user 'cmon'@'192.168.55.150' (using password: YES)
2017-04-07 18:03:54 140608191986432 [Warning] Access denied for user 'cmon'@'192.168.55.150' (using password: YES)
2017-04-07 18:03:55 140608191986432 [Warning] Access denied for user 'cmon'@'192.168.55.150' (using password: YES)

Don’t panic. This is an expected behaviour because this backup set doesn’t store the cmon login credentials of the new ClusterControl cmon password. It has restored/replaced the old cmon user instead. What you need to do is to re-grant cmon user back to the server by running the following statement on this DB node:

GRANT ALL PRIVILEGES ON *.* to cmon@'192.168.55.150' IDENTIFIED BY 'mynewCMONpassw0rd' WITH GRANT OPTION;
FLUSH PRIVILEGES;

ClusterControl then would be able to connect to the bootstrapped node and determine the node and backup state. If everything is OK, you should see something like this:

At this point, the target node is bootstrapped and running. We can start the remaining nodes under Nodes -> choose node -> Start Node and check the “Perform an Initial Start” checkbox:

The restoration is now complete and you can expect Performance -> DB Growth to report the updated size of our newly restored data set :

Happy restoring!

by ashraf at April 11, 2017 09:59 AM

April 10, 2017

Peter Zaitsev

ProxySQL Rules: Do I Have Too Many?

In this blog post we are going to take a closer look at ProxySQL rules. How do they work, and how big is the performance impact of having many rules?

I would like to say thank you to Renè, who was willing to answer all my questions during my tests.

Overview

ProxySQL is heavily based on the query rules. We can set up ProxySQL without rules based only on the host groups, but if we want read/write splitting or sharding (or anything else) we need rules.

ProxySQL knows the SQL protocol and language, so we can easily create rules based on username, schema name and even on the query itself. We can write regular expressions that match the query digest. Let me show you an example:

insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('Testuser',601,1,3,'^SELECT');

This rule matches all the queries starting with “SELECT”, and sends them to host group 601.

After version 1.3.1, the default regex engine was RE2. Starting after version 1.4, the default regex engine will be PCRE.

I would like to highlight three options that can have a bigger impact on your rules than you think: flagINflagOUTapply.

With regards to the manual:

. . .these allow us to create “chains of rules” that get applied one after the other. An input flag value is set to 0, and only rules with flagIN=0 are considered at the beginning. When a matching rule is found for a specific query, flagOUT is evaluated and if NOT NULL the query will be flagged with the specified flag in flagOUT. If flagOUT differs from flagIN, the query will exit the current chain and enters a new chain of rules having flagIN as the new input flag. If flagOUT matches flagIN, the query will be re-evaluated again against the first rule with said flagIN. This happens until there are no more matching rules, or apply is set to 1 (which means this is the last rule to be applied)

You might not be sure what this means, but I will show you later.

As you can see, adding a rule is easy and we can add hundreds of rules, But is there any performance impact?

Test Case

We can write rules based on any part of the query (for example, “userid” or some “sharding key”). In these tests I wrote the rules based on table names because I can easily generate tables with “sysbench”, and run queries against these tables.

I created 1000 tables using sysbench, and I am going to test them with a direct MySQL connection, ProxySQL without rules, with ten rules and with 100 rules.

Time to do some tests to see if adding 100 or more rules have any effect on the performance?

I used two c4.4xlarge instances with SSDs, and I am going to share the steps so anybody can repeat my test and share/compare the results. NodeA is running MySQL 5.7.17 server, and NodeB is running “ProxySQL 1.3.4: and sysbench. During the test I increased the sysbench threads in the following steps:1,2,4,8,12,16,20,24.

I tried to use the simplest ProxySQL configuration as possible:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('10.10.10.243',600,3306,1000,0);
INSERT INTO mysql_replication_hostgroups VALUES (600,'','');
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;
insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('testuser_rw','Testpass1.',1,600,'test');
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK; 

Only one server, one host group. I tried to measure the impact the rules had, so in all the test I sent the queries to the same host group. I only changed the rules (and some ProxySQL settings, as I will explain later).

As I mentioned, I am going to filter based on table names. Here are the 100 rules that I used:

insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('testuser_rw',600,1,3,'(from|into|update|into table) sbtest1b');
insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('testuser_rw',600,1,3,'(from|into|update|into table) sbtest2b'); ... insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('testuser_rw',600,1,3,'(from|into|update|into table) sbtest100b');First Test

First I ran tests with a direct MySQL connection, ProxySQL without rules, ProxySQL with ten rules and ProxySQL with 100 rules.

ProxySQL rules

ProxySQL itself has an impact on the performance, but there is a big difference between 10 and 100 rules. So adding more and more rules can have a negative effect on the performance.

That’s all? Can we do anything to speed things up? I used the default ProxySQL settings. Let’s have a look what can we tune.

Increasing the Number of Threads

Let’s go step by step. First we can increase the thread number inside ProxySQL (the default is 4). We will increase it to 8:

UPDATE global_variables SET variable_value='8' WHERE variable_name='mysql-threads';
SAVE MYSQL VARIABLES TO DISK;

ProxySQL has to be restarted after this changes.

ProxySQL rules

With this simple changes, we can improve the performance. As we can see, the difference is getting larger and larger as we increase the number of the sysbench threads.

Compiling

By compiling our own package, we can gain some extra performance. It is not clear why, so we opened a ticket for further investigation:

ProxySQL rules

I removed some of the columns because the graph got to busy.

ProxySQL 1.4

In ProxySQL 1.4 (which is not GA yet), we can change between the regex engines. However, even using the same engine (RE2) is faster in 1.4:

ProxySQL rules

Apply

As I mentioned, ProxySQL has a few important parameters like “apply”. With apply, if the query matches a rule it won’t check the remaining rules. In an ideal world, if you have 100 rules and 100 queries in random order which match only one rule, you only have to check 50 rules on average.

The new rules:

insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest,apply) values('testuser_rw',600,1,3,'(from|into|update|into table) sbtest1b',1);

ProxySQL rules

As you can see it didn’t help at all. But why? Because in this test we have 1000 tables, and we are running queries on all of the tables. This means 90% the queries have to check all the rules anyway. Let’s make a test with 100 tables to see if the “apply” helps or not:

ProxySQL rules

As we can see, with 100 tables we get a much better performance. But of course this is not a valid solution because we can’t just drop tables, “userids” or “sharding keys”. In the next post I will show you how to use “apply” in a more effective way.

Conclusion

So far, ProxySQL 1.4 with the PCRE engine and eight threads gives us the best performance with 100 rules and 1000 tables. As we can see, both the number of the rules and the query distribution matter. Both impact the performance. In my next blog post, I will show you how you can add some logic into your rules so that, even if you have more rules, you will get better performance.

by Tibor Korocz at April 10, 2017 11:52 PM

InnoDB Page Merging and Page Splitting

Page Merging and Page Splitting

If you met one of the (few) MySQL consultants around the globe and asked him/her to review your queries and/or schemas, I am sure that he/she would tell you something regarding the importance of good primary key(s) design. Especially in the case of InnoDB, I’m sure they started to explain to you about index merges and page splits. These two notions are closely related to performance, and you should take this relationship into consideration when designing any index (not just PKs).

That may sound like mumbo jumbo to you, and you may be right. This is not easy stuff, especially when talking about internals. This is not something you deal with on a regular basis, and often you don’t want to deal with it at all.

But sometimes it’s a necessity. If so, this article is for you.

In this article, I want to shed some light in explaining some of the most unclear, behind the scenes operations in InnoDB: page index creation, page merging and page splitting.

In Innodb all data is an index. You’ve probably heard that as well right? But what exactly does that mean?

File-Table Components

Let’s say you have MySQL installed, the latest 5.7 version (Percona Server for MySQL, right? 😉 ), and you have a table named wmills in the schema windmills. In the data directory (normally /var/lib/mysql/) you will see that it contains:

data/
  windmills/
      wmills.ibd
      wmills.frm

This is because the parameter innodb_file_per_table is set to 1 since MySQL 5.6. With that setting, each table in your schema is represented by one file (or many files if the table is partitioned).

What is important here is that the physical container is a file named wmills.ibd. This file is broken up into and contains N number of segments. Each segment is associated with an index.

While a file’s dimensions do not shrink with row-deletions, a segment itself can grow or shrink in relation to a sub-element named extent. An extent can only exist inside a segment and has a fixed dimension of 1MB (in the case of default page size). A page is a sub-element of an extent and has a default size of 16KB.

Given that, an extent can contain a maximum of 64 pages. A page can contain two to N number of rows. The number of rows a page can contain is related to the size of the row, as defined by your table schema. There is a rule within InnoDB that says, at minimum, two rows must fit into a page. Therefore, we have a row-size limit of 8000 bytes.

If you think this sounds like Matryoshka dolls, you are right! An image might help:

InnoDB uses B-trees to organize your data inside pages across extents, within segments.

Roots, Branches, and Leaves

Each page (leaf) contains 2-N rows(s) organized by the primary key. The tree has special pages to manage the different branch(es). These are known as internal nodes (INodes).

This image is just an example, and is not indicative of the real-world output below.

Let’s see the details:

ROOT NODE #3: 4 records, 68 bytes
 NODE POINTER RECORD ≥ (id=2) → #197
 INTERNAL NODE #197: 464 records, 7888 bytes
 NODE POINTER RECORD ≥ (id=2) → #5
 LEAF NODE #5: 57 records, 7524 bytes
 RECORD: (id=2) → (uuid="884e471c-0e82-11e7-8bf6-08002734ed50", millid=139, kwatts_s=1956, date="2017-05-01", location="For beauty's pattern to succeeding men.Yet do thy", active=1, time="2017-03-21 22:05:45", strrecordtype="Wit")

Below is the table structure:

CREATE TABLE `wmills` (
  `id` bigint(11) NOT NULL AUTO_INCREMENT,
  `uuid` char(36) COLLATE utf8_bin NOT NULL,
  `millid` smallint(6) NOT NULL,
  `kwatts_s` int(11) NOT NULL,
  `date` date NOT NULL,
  `location` varchar(50) COLLATE utf8_bin DEFAULT NULL,
  `active` tinyint(2) NOT NULL DEFAULT '1',
  `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `strrecordtype` char(3) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_millid` (`millid`)
) ENGINE=InnoDB;

All styles of B-trees have a point of entry known as the root node. We’ve identified that here as page #3. The root page contains information such as index ID, number of INodes, etc. INode pages contain information about the pages themselves, their value ranges, etc. Finally, we have the leaf nodes, which is where we can find our data. In this example, we can see that leaf node #5 has 57 records for a total of 7524 bytes. Below that line is a record, and you can see the row data.

The concept here is that while you organize your data in tables and rows, InnoDB organizes it in branches, pages, and records. It is very important to keep in mind that InnoDB does not work on a single row basis. InnoDB always operates on pages. Once a page is loaded, it will then scan the page for the requested row/record.

Is that clear up to now? Good. Let’s continue.

Page Internals

A page can be empty or fully filled (100%). The row-records will be organized by PK. For example, if your table is using an AUTO_INCREMENT, you will have the sequence ID = 1, 2, 3, 4, etc.

A page also has another important attribute: MERGE_THRESHOLD. The default value of this parameter is 50% of the page, and it plays a very important role in InnoDB merge activity:

While you insert data, the page is filled up sequentially if the incoming record can be accommodated inside the page.

When a page is full, the next record will be inserted into the NEXT page:

Given the nature of B-trees, the structure is browsable not only top-down following the branches, but also horizontally across the leaf nodes. This is because each leaf node page has a pointer to the page that contains the NEXT record value in the sequence.

For example, Page #5 has a reference to the next page, Page #6. Page #6 has references backward to the previous page (Page #5) and a forward to the next page (Page #7).

This mechanism of a linked list allows for fast, in-order scans (i.e., Range Scans). As mentioned before, this is what happens when you are inserting and have a PK based on AUTO_INCREMENT. But what happens if I start to delete values?

Page Merging

When you delete a record, the record is not physically deleted. Instead, it flags the record as deleted and the space it used becomes reclaimable.

When a page has received enough deletes to match the MERGE_THRESHOLD (50% of the page size by default), InnoDB starts to look to the closest pages (NEXT and PREVIOUS) to see if there is any chance to optimize the space utilization by merging the two pages.

In this example, Page #6 is utilizing less than half of its space. Page #5 received many deletes and is also now less than 50% used. From InnoDB’s perspective, they are mergeable:

The merge operation results in Page #5 containing its previous data plus the data from Page #6. Page #6 becomes an empty page, usable for new data.

Page Merging and Page Splitting

The same process also happens when we update a record and the size of the new record brings the page below the threshold.

The rule is: Merges happen on delete and update operations involving close linked pages. If a merge operation is successful, the index_page_merge_successful metric in INFORMATION_SCHEMA.INNODB_METRICS is incremented.

Page Splits

As mentioned above, a page can be filled up to 100%. When this happens, the next page takes new records.

But what if we have the following situation?

Page Merging and Page Splitting

Page #10 doesn’t have enough space to accommodate the new (or updated) record. Following the next page logic, the record should go on Page #11. However:

Page Merging and Page Splitting

Page #11 is also full, and data cannot be inserted out of order. So what can be done?

Remember the linked list we spoke about? At this moment Page #10 has Prev=9 and Next=11.  

What InnoDB will do is (simplifying):

  1. Create a new page
  2. Identify where the original page (Page #10) can be split (at the record level)
  3. Move records
  4. Redefine the page relationships

Page Merging and Page Splitting

A new Page #12 is created:

Page Merging and Page Splitting

Page #11 stays as it is. The thing that changes is the relationship between the pages:

  • Page #10 will have Prev=9 and Next=12
  • Page #12 Prev=10 and Next=11
  • Page #11 Prev=12 and Next=13

The path of the B-tree still sees consistency since it is following a logical organization. However, physically the page is located out of order, and in most cases in a different extent.

As a rule we can say: Page splits happens on Insert or Update, and cause page dislocation (in many cases on different extents).

InnoDB tracks the number of page splits in INFORMATION_SCHEMA.INNODB_METRICS. Look for index_page_splits and index_page_reorg_attempts/successful metrics.

Once the split page is created, the only way to move back is to have the created page drop below the merge threshold. When that happens, InnoDB moves the data from the split page with a merge operation.

The other way is to reorganize the data by OPTIMIZE the table. This can be a very heavy and long process, but often is the only way to recover from a situation where too many pages are located in sparse extents.

Another aspect to keep in mind is that during merge and split operations, InnoDB acquires an x-latch to the index tree. On a busy system, this can easily become a source of concern. This can cause index latch contention. If no merges and splits (aka writes) touch only a single page, this is called an “optimistic” update in InnoDB, and the latch is only taken in S. Merges and splits are called “pessimistic” updates, and take the latch in X.

My Primary Key

A good Primary Key (PK) is not only important for retrieving data, but also correctly distributing the data inside the extents while writing (which is also relevant in the case of split and merge operations).

In the first case, I have a simple auto-increment. In the second my PK is based on an ID (1-200 range) and an auto-increment value. In my third, I have the same ID (1-200 range) but associate with a UUID.

When inserting, InnoDB must add pages. This is read as a SPLIT operation:

Page Merging and Page Splitting

The behavior is quite different depending on the kind of Primary Key I use.

The first two cases will have more “compact” data distribution. This means they will also have better space utilization, while the semi-random nature of the UUID will cause a significant “sparse” page distribution (causing a higher number of pages and related split operations).

In the case of merges, the number of attempts to merge is even more different by PK type.

Page Merging and Page Splitting

On Insert-Update-Delete operations, auto-increment has less page merge attempts and 9.45% less of a success ratio than the other two types. The PK with UUID (on the side other of the spectrum) has a higher number of merge attempts, but at the same time also a significantly higher success ratio at 22.34%, given that the “sparse” distribution left many pages partially empty. 

The PK values with similar numbers also come from a secondary index.

Conclusion

MySQL/InnoDB constantly performs these operations, and you have very limited visibility of them. But they can bite you, and bite hard, especially if using a spindle storage VS SSD (which have different issues, by the way).

The sad story is there is also very little we can do to optimize this on the server side using parameters or some other magic. But the good news is there is A LOT that can be done at design time.

Use a proper Primary Key and design a secondary index, keeping in mind that you shouldn’t abuse of them. Plan proper maintenance windows on the tables that you know will have very high levels of inserts/deletes/updates.

This is an important point to keep in mind. In InnoDB you cannot have fragmented records, but you can have a nightmare at the page-extent level. Ignoring table maintenance will cause more work at the IO level, memory and InnoDB buffer pool.

You must rebuild some tables at regular intervals. Use whatever tricks it requires, including partitioning and external tools (pt-osc). Do not let a table to become gigantic and fully fragmented. 

Wasting disk space? Need to load three pages instead one to retrieve the record set you need? Each search causes significantly more reads?
That’s your fault; there is no excuse for being sloppy!

Happy MySQL to everyone!

Acknowledgments

Laurynas Biveinis: who had the time and patience to explain some internals to me.

Jeremy Cole: for his project InnoDB_ruby (that I use constantly).

by Marco Tusa at April 10, 2017 07:08 PM

Federico Razzoli

Prepared Statements in MariaDB 10.2

Prepared statements are particularly useful in stored procedures. In fact, they are the only way to execute dynamic SQL, which means that, for example, the list of ORDER BY columns is in a variable. You can do this by composing an SQL string, as you do in other languages.

However, in MySQL and up to MariaDB 10.1, there are some annoying limitations:

  • Prepared statements exist at a connection level. Even if you declare them in a stored procedure, they are global. If you are writing a stored procedure which should be usable in many contexts (maybe it’s part of an open source library), there is a risk that you overwrite existing prepared statements. There is no way to get a list of existing prepared statements.
  • A prepared statement is prepared from a literal string or a user variable. A literal string cannot contain variables. A user variable exists at a session level, so again, there is a risk to overwrite something. Especially if you always call the variable @sql, because a lot of people tends to do so.
  • Prepared statements are verbose. You need to run 3 different statements (PREPARE, EXECUTE, DEALLOCATE PREPARE) just to execute a dynamic SQL statement.

First, let me say that the first problem is not solved. I hope it will be solved in a future version, so it will be easier to distribute stored procedures libraries. After all, MariaDB 10.3 will have several improvements to stored procedures.

The second problem is solved in 10.2. You can pass any SQL expression (or, at least, all expressions I tested) to PREPARE. Including a local variable, which means that there will be no conflicts with variables declared by the user. For example:

DECLARE v_query TEXT DEFAULT 'SELECT COUNT(DISTINCT user) FROM mysql.user';
PREPARE stmt FROM v_query;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

But this example is verbose. It demonstrates what I stated above: you need at least 3 statements just to execute a dynamic query. But MariaDB 10.2 solves this problem introducing EXECUTE IMMEDIATE, which is basically a shortcut:

DECLARE v_query TEXT DEFAULT 'SELECT COUNT(DISTINCT user) FROM mysql.user';
EXECUTE IMMEDIATE v_query;

As you can see, we don’t need PREPARE or DEALLOCATE PREPARE if we use EXECUTE IMMEDIATE. This is nice to have in stored procedures, but is not useful outside of procedures, because the prepared statement is always reprepared, even if we executed it before.

Enjoy!

 


by Federico at April 10, 2017 12:58 PM

Jean-Jerome Schmidt

Updated - How to Bootstrap MySQL or MariaDB Galera Cluster

Unlike standard MySQL server and MySQL Cluster, the way to start a MySQL/MariaDB Galera Cluster is a bit different. Galera requires you to start a node in a cluster as a reference point, before the remaining nodes are able to join and form the cluster. This process is known as cluster bootstrap. Bootstrapping is an initial step to introduce a database node as primary component, before others see it as a reference point to sync up data.

How does it work?

When Galera starts with the bootstrap command on a node, that particular node will reach Primary state (check the value of wsrep_cluster_status). The remaining nodes will just require a normal start command and they will automatically look for existing Primary Component (PC) in the cluster and join to form a cluster. Data synchronization then happens through either incremental state transfer (IST) or snapshot state transfer (SST) between the joiner and the donor.

So basically, you should only bootstrap the cluster if you want to start a new cluster or when no other nodes in the cluster is in PRIMARY state. Care should be taken when choosing the action to take, or else you might end up with split clusters or loss of data.

The following example scenarios illustrate when to bootstrap the a three-node cluster based on node state (wsrep_local_state_comment) and cluster state (wsrep_cluster_status):

Galera State Bootstrap Flow
  1. Restart the INITIALIZED node.
  1. Restart the INITIALIZED node.
  2. Once done, start the new node.
  1. Bootstrap the most advanced node using “pc.bootstrap=1”.
  2. Restart the remaining nodes, one node at a time.
  1. Start the new node.
  1. Start the new node, one node at a time.
  1. Bootstrap any node.
  2. Start the remaining nodes, one node at a time.

How to start Galera cluster?

The 3 Galera vendors use different bootstrapping commands (based on the software’s latest version). On the first node, run:

  • MySQL Galera Cluster (Codership):

    $ service mysql bootstrap # sysvinit
    $ galera_new_cluster # systemd
    $ mysqld_safe --wsrep-new-cluster # command line
  • Percona XtraDB Cluster (Percona):

    $ service mysql bootstrap-pxc # sysvinit
    $ systemctl start mysql@bootstrap.service # systemd
  • MariaDB Galera Cluster (MariaDB):

    $ service mysql bootstrap # sysvinit
    $ service mysql start --wsrep-new-cluster # sysvinit
    $ galera_new_cluster # systemd
    $ mysqld_safe --wsrep-new-cluster # command line

The above command is just a wrapper and what it actually does is to start the MySQL instance on that node with gcomm:// as the wsrep_cluster_address variable. You can also manually define the variables inside my.cnf and run the standard start/restart command. However, do not forget to change wsrep_cluster_address back again to contain the addresses to all nodes after the start.

When the first node is live, run the following command on the subsequent nodes:

$ service mysql start
$ systemctl start mysql

The new node connects to the cluster members as defined by the wsrep_cluster_address parameter. It will now automatically retrieve the cluster map and connect to the rest of the nodes and form a cluster.

Warning: Never bootstrap when you want to reconnect a node to an existing cluster, and NEVER run bootstrap on more than one node.

Safe-to-Bootstrap Flag

Galera starting with version 3.19 comes with a new flag called “safe_to_bootstrap” inside grastate.dat. This flag facilitates the decision and prevent unsafe choices by keeping track of the order in which nodes are being shut down. The node that was shut down last will be marked as “Safe-to-Bootstrap”. All the other nodes will be marked as unsafe to bootstrap from.

Looking at grastate.dat (default is under MySQL datadir) content and you should notice the flag on the last line:

# GALERA saved state
version: 2.1
uuid:    8bcf4a34-aedb-14e5-bcc3-d3e36277729f
seqno:   2575
safe_to_bootstrap: 0

When bootstrapping the new cluster, Galera will refuse to start the first node that was marked as unsafe to bootstrap from. You will see the following message in the logs:

“It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates.

To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .”

In case of unclean shutdown or hard crash, all nodes will have “safe_to_bootstrap: 0”, so we have to consult the InnoDB storage engine to determine which node has committed the last transaction in the cluster. This can be achieved by starting mysqld with the “--wsrep-recover” variable on each of the nodes, which produces an output like this:

$ mysqld --wsrep-recover
...
2016-11-18 01:42:15 36311 [Note] InnoDB: Database was not shutdown normally!
2016-11-18 01:42:15 36311 [Note] InnoDB: Starting crash recovery.
...
2016-11-18 01:42:16 36311 [Note] WSREP: Recovered position: 8bcf4a34-aedb-14e5-bcc3-d3e36277729f:114428
...

The number after the UUID string on the "Recovered position" line is the one to look for. Pick the node that has the highest number and edit its grastate.dat to set “safe_to_bootstrap: 1”, as shown in the example below:

# GALERA saved state
version: 2.1
uuid:    8bcf4a34-aedb-14e5-bcc3-d3e36277729f
seqno:   -1
safe_to_bootstrap: 1

You can then perform the standard bootstrap command on the chosen node.

What if the nodes have diverged?

In certain circumstances, nodes can have diverged from each other. The state of all nodes might turn into Non-Primary due to network split between nodes, cluster crash, or if Galera hit an exception when determining the Primary Component. You will then need to select a node and promote it to be a Primary Component.

To determine which node needs to be bootstrapped, compare the wsrep_last_committed value on all DB nodes:

node1> SHOW STATUS LIKE 'wsrep_%';
+----------------------+-------------+
| Variable_name        | Value       |
+----------------------+-------------+
| wsrep_last_committed | 10032       |
...
| wsrep_cluster_status | non-Primary |
+----------------------+-------------+
node2> SHOW STATUS LIKE 'wsrep_%';
+----------------------+-------------+
| Variable_name        | Value       |
+----------------------+-------------+
| wsrep_last_committed | 10348       |
...
| wsrep_cluster_status | non-Primary |
+----------------------+-------------+
node3> SHOW STATUS LIKE 'wsrep_%';
+----------------------+-------------+
| Variable_name        | Value       |
+----------------------+-------------+
| wsrep_last_committed |   997       |
...
| wsrep_cluster_status | non-Primary |
+----------------------+-------------+

From above outputs, node2 has the most up-to-date data. In this case, all Galera nodes are already started, so you don’t necessarily need to bootstrap the cluster again. We just need to promote node2 to be a Primary Component:

node2> SET GLOBAL wsrep_provider_options="pc.bootstrap=1";

The remaining nodes will then reconnect to the Primary Component (node2) and resync their data based on this node.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

If you are using ClusterControl (try it for free), you can determine the wsrep_last_committed and wsrep_cluster_status directly from the ClusterControl > Overview page:

Or from ClusterControl > Performance > DB Status page:

by ashraf at April 10, 2017 12:00 PM

April 09, 2017

Valeriy Kravchuk

Accessing Oracle tables via MariaDB CONNECT engine and ODBC

In my previous working environment it was typical to consider MariaDB Server as just a set of "random" storage engines mixed together, sometimes for unclear reasons and with questionable extra value. But even at that times I was already impressed by some features of one of the engines supported only by MariaDB, CONNECT. PIVOT table type specifically was of interest in a couple of customer cases I had to work on, but I quickly found out that the engine does not work with Oracle's MySQL or Percona Server, and thus had to forget about it for some time.

Recently while working in MariaDB I've got several more chances to deal with some new CONNECT use cases (and some related problems, that ended up mostly minor or caused by ignorance). This engine can be (and is, widely) used to access tables from virtually any other database management system or data source, as long as there is an ODBC or JDBC driver for it. I had to set up a testing environment to reproduce some of the problems noted while accessing remote tables in Oracle RDBMS, and eventually decided to document setup steps, mistakes and findings for myself and my readers.

I think the topic may require a series of posts, and in this first one I plan to concentrate on creating a basic testing environment to access Oracle RDBMS via ODBC driver and CONNECT engine on my main testing box running Fedora 25, and resolving simplest access issues, like those I described in MDEV-12355 (ended as not a bug thanks to quick help and clarifications from engine creator, Olivier Bertrand).

For pure testing purposes I had not planned to spend time installing and setting up Oracle RDBMS on that Fedora 25 box directly (later I tried and failed), but decided to rely on some existing Docker image. So, I started with installing Docker and dependencies (see this article for some hints, if needed):
[openxs@fc23 ~]$ sudo dnf install docker
...
Installed:
  container-selinux.noarch 2:2.10-1.fc25
  docker.x86_64 2:1.12.6-6.gitae7d637.fc25
  docker-common.x86_64 2:1.12.6-6.gitae7d637.fc25
  oci-register-machine.x86_64 0-2.7.gitbb20b00.fc25
  oci-systemd-hook.x86_64 1:0.1.6-1.gitfe22236.fc25
  skopeo-containers.x86_64 0.1.17-1.dev.git2b3af4a.fc25

Complete!

[openxs@fc23 ~]$ sudo systemctl start docker
I am going to skip details on images that had not worked well (not a topic for this blog). Eventually I ended up working with https://hub.docker.com/r/sath89/oracle-12c/:
[openxs@fc23 server]$ sudo docker run -d -p 8080:8080 -p 1521:1521 -e ORACLE_ALLOW_REMOTE=true sath89/oracle-12c:latest
4515bc4a6fb4804ac8d714115de12e47a9e8f9baabb2300cde672aedb4843c2b
[openxs@fc23 server]$ sudo docker ps
CONTAINER ID        IMAGE                      COMMAND             CREATED             STATUS              PORTS                                            NAMES
4515bc4a6fb4        sath89/oracle-12c:latest   "/entrypoint.sh "   14 seconds ago      Up 8 seconds        0.0.0.0:1521->1521/tcp, 0.0.0.0:8080->8080/tcp   adoring_turing

[openxs@fc23 server]$ sudo docker logs -f 4515bc4a6fb4
ls: cannot access /u01/app/oracle/oradata: No such file or directory
Database not initialized. Initializing database.
Starting tnslsnr
Copying database files
1% complete
3% complete
11% complete
18% complete
26% complete
37% complete
Creating and starting Oracle instance
40% complete
45% complete
50% complete
55% complete
56% complete
60% complete
62% complete
Completing Database Creation
66% complete
70% complete
73% complete
85% complete
96% complete
100% complete
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/xe/xe.log" for further details.
Configuring Apex console
Database initialized. Please visit http://#containeer:8080/em http://#containeer:8080/apex for extra configuration if needed
Starting web management console
PL/SQL procedure successfully completed.
Starting import from '/docker-entrypoint-initdb.d':
found file /docker-entrypoint-initdb.d//docker-entrypoint-initdb.d/*
[IMPORT] /entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
Import finished
Database ready to use. Enjoy! ;)
^C
This image initializes new database by default (unless you pass a volume on host, then it is initialized for the first time if the database is not there), so you have to wait (maybe for many minutes) and check the logs before trying to access the database. Note that container's port 1521 is mapped to host's port 1521 (and my Fedora 25 IP address, 192.168.1.85, is then used later below to access "remote" Oracle server on this port).

So, my "Oracle server" is ready (probably). Now, my Fedora 25 host will work as a client, so I had downloaded and installed basic Oracle client software RPMs (including ODBC driver and SQL*Plus) from OTN:
[openxs@fc23 maria10.1]$ rpm -q -a | grep oracle
oracle-instantclient11.2-sqlplus-11.2.0.4.0-1.x86_64
oracle-xe-11.2.0-1.0.x86_64
oracle-instantclient11.2-basic-11.2.0.4.0-1.x86_64
oracle-instantclient11.2-odbc-11.2.0.4.0-1.x86_64
I also installed unixODBC (including -devel, as I built MariaDB from source and one needs unixODBC-devel to have CONNECT engine build properly) as fine manual stated at the beginning:
[openxs@fc23 maria10.1]$ rpm -q -a | grep unixODBC
unixODBC-2.3.4-3.fc25.x86_64
unixODBC-devel-2.3.4-3.fc25.x86_64
One has to set up tnsnames.ora for Oracle client to work with (remote) instance, and /etc/odbcinst.ini + /etc/odbc.ini to inform unixODBC about the known drivers and system data sources (this discussion had some useful hints):
[openxs@fc23 maria10.1]$ cat /etc/oracle/tnsnames.ora
XE =
 ( DESCRIPTION =
  (ADDRESS_LIST =
  (ADDRESS =
  (PROTOCOL = TCP)
  (Host = 192.168.1.85)
  (Port = 1521)
  )
 )
 (CONNECT_DATA = (SID = XE)
 )
)

[openxs@fc23 maria10.1]$ cat /etc/odbcinst.ini
...

[OracleODBC]
Description = Oracle ODBC driver for Oracle 11g
Driver64 = /usr/lib/oracle/11.2/client64/lib/libsqora.so.11.1
FileUsage = 1
Driver Logging = 7

[openxs@fc23 maria10.1]$ cat /etc/odbc.ini
[oracle]
Driver = OracleODBCDSN = OracleODBC
ServerName = XE
UserID = system
Password = oracle
 Surely, I tried to access my Oracle server in Docker container via sqlplus command first:
[openxs@fc23 server]$ /usr/lib/oracle/11.2/client64/bin/sqlplus system/oracle@localhost:1521/xe.oracle.docker/usr/lib/oracle/11.2/client64/bin/sqlplus: error while loading shared libraries: libsqlplus.so: cannot open shared object file: No such file or directory
[openxs@fc23 server]$ ldd /usr/lib/oracle/11.2/client64/bin/sqlplus
        linux-vdso.so.1 (0x00007ffe8813b000)
        libsqlplus.so => not found
        libclntsh.so.11.1 => not found
        libnnz11.so => not found
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f0aadc93000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f0aad98a000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f0aad76a000)
        libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f0aad551000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f0aad18b000)
        /lib64/ld-linux-x86-64.so.2 (0x0000563b14bfb000)
This is what happens when one does not care to read all the details and forgets to set some environment variables and inform dynamic loader about libraries location. So, I did the following to resolve the problem:
[openxs@fc23 server]$ export ORACLE_HOME=/usr/lib/oracle/11.2/client64
[openxs@fc23 server]$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib
[openxs@fc23 server]$ export PATH=$PATH:$ORACLE_HOME/bin
[openxs@fc23 server]$ sudo ldconfig
[openxs@fc23 server]$ ldd /usr/lib/oracle/11.2/client64/bin/sqlplus
        linux-vdso.so.1 (0x00007ffd31b50000)
        libsqlplus.so => /usr/lib/oracle/11.2/client64/lib/libsqlplus.so (0x00007f88f680f000)
        libclntsh.so.11.1 => /usr/lib/oracle/11.2/client64/lib/libclntsh.so.11.1 (0x00007f88f3ea0000)
        libnnz11.so => /usr/lib/oracle/11.2/client64/lib/libnnz11.so (0x00007f88f3ad3000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f88f38b2000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f88f35a9000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f88f3389000)
        libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f88f3170000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f88f2daa000)
        libaio.so.1 => /lib64/libaio.so.1 (0x00007f88f2ba8000)
        /lib64/ld-linux-x86-64.so.2 (0x000056123639a000)

[openxs@fc23 server]$ sqlplus system/oracle@localhost:1521/xe.oracle.docker    
SQL*Plus: Release 11.2.0.4.0 Production on Fri Mar 24 11:00:21 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Standard Edition Release 12.1.0.2.0 - 64bit Production

SQL> select count(*) from scott.emp;

  COUNT(*)
----------
        14

SQL> create table scott.t1(c1 number(30));

Table created.

SQL> insert into scott.t1 values (12345);

1 row created.

SQL> desc scott.t1;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 C1                                                 NUMBER(30)

SQL>

SQL> exit
Disconnected from Oracle Database 12c Standard Edition Release 12.1.0.2.0 - 64bit Production
So, SQL*Plus (that I still miss sometimes, after using it maybe only 10 times over last 12 years) client works. Let's try to access Oracle via ODBC and isql now:
[openxs@fc23 server]$ isql -v oracle
[08004][unixODBC][Oracle][ODBC][Ora]ORA-12154: TNS:could not resolve the connect identifier specified

[ISQL]ERROR: Could not SQLConnect
Surely, one more environment variable is not set, the one pointing out the location of tnsnames.ora file:
[openxs@fc23 server]$ export TNS_ADMIN=/etc/oracle
[openxs@fc23 server]$ isql -v oracle
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL> select * from scott.t1;
+---------------------------------+
| C1                              |
+---------------------------------+
| 12345                           |
+---------------------------------+
SQLRowCount returns -1
1 rows fetched
SQL> select count(*) from scott.emp;+-----------------------------------------+
| COUNT(*)                                |
+-----------------------------------------+
| 14                                      |
+-----------------------------------------+
SQLRowCount returns -1
1 rows fetched
SQL> quit
[openxs@fc23 server]$

Looks like now we should be able to use ODBC table type of CONNECT storage engine of MariaDB (that was already loaded with INSTALL SONAME 'ha_connect.so'; check the details here) to access remote Oracle table:
MariaDB [test]> create table oracle_t1 engine=connect table_type=ODBC tabname='t1' dbschema='scott' connection='dsn=oracle';
ERROR 1105 (HY000): Cannot get columns from t1
MariaDB [test]> show warnings\G
*************************** 1. row ***************************
  Level: Error
   Code: 1105
Message: Cannot get columns from t1
*************************** 2. row ***************************
  Level: Error
   Code: 1030
Message: Got error 122 "Internal (unspecified) error in handler" from storage engine CONNECT
2 rows in set (0.00 sec)
Do not be like me and do not report bugs when you see these messages! Go read the fine manual that explicitly mentions Oracle as being case sensitive and converting identifiers to upper case unless they are quoted.

As soon as you use proper case in tabname (and dbname) settings, everything works as expected, at least for my primitive use case:
MariaDB [test]> create table t1_oracle engine=connect table_type=ODBC tabname='t1' dbname='scott' connection='dsn=oracle';
ERROR 1105 (HY000): Cannot get columns from t1
MariaDB [test]> create table t1_oracle engine=connect table_type=ODBC tabname='T1' dbname='scott' connection='dsn=oracle';ERROR 1105 (HY000): Cannot get columns from T1
MariaDB [test]> create table t1_oracle engine=connect table_type=ODBC tabname='T1' dbname='SCOTT' connection='dsn=oracle';Query OK, 0 rows affected (1.29 sec)

MariaDB [test]> select * from t1_oracle;
+-------+
| C1    |
+-------+
| 12345 |
+-------+
1 row in set (0.07 sec)

MariaDB [test]> create table t2_oracle engine=connect table_type=ODBC tabname='SCOTT.T1' connection='dsn=oracle';
Query OK, 0 rows affected (3.54 sec)

MariaDB [test]> select * from t2_oracle;
+-------+
| C1    |
+-------+
| 12345 |
+-------+
1 row in set (0.06 sec)
To summarize, with some initial setup efforts (that should include reading the official CONNECT manual) it is easy to access Oracle tables from MariaDB server via ODBC driver. It may be very useful if you plan to migrate or use some data from Oracle RDBMS.

As a side note, Docker really rocks sometimes. I spent maybe 15+ years installing Oracle RDBMS (starting from version 7.0.16 or so) on all kinds of OSes dozens of times, and I do not miss this experience and do not mind to have it up and running with one simple command any time (rarely) I need it.

by Valeriy Kravchuk (noreply@blogger.com) at April 09, 2017 04:11 PM

April 08, 2017

MariaDB AB

Meet eperi at M|17

Meet eperi at M|17 guest Sat, 04/08/2017 - 11:15

By Elmar Eperiesi-Beck, CEO, eperi GmbH

On behalf of the eperi team, we are very much looking forward to being part of MariaDB’s first annual M|17 user conference. We see this as a perfect opportunity to introduce our Cloud Data Protection (CDP) solution, the eperi Gateway, to our peers within the MariaDB community.

A little background: our eperi Gateway offers data encryption for sensitive data in SaaS, IaaS, databases (like MariaDB), and files – all without affecting end users or changing existing IT environments. We built the eperi Gateway to ensure that those attempting unauthorized access to these applications will receive only unreadable data, while authorized employees enjoy a seamless experience.

Our longtime partnership with MariaDB is rooted in a mutual appreciation for the power of open source solutions – it’s the natural strengths of open source that enable our CDP solutions to be transparent, reliable, and trustworthy. We believe in achieving security through transparency, and publish the source code for our product’s security framework publicly. We’ve made it reviewable, by anyone, via our website.

We’re very proud to say that eperi has cooperated closely with MariaDB for several years now. This fruitful partnership has featured eperi’s deep contributions to the MariaDB encryption project, and to our development of the eperi data encryption plugin for MariaDB. Working with MariaDB, we’ve made tremendous steps toward securing MariaDB users and setting an example for data processing applications. With GDPR and several other strict data protection regulations on the horizon, enterprises need transparent data security more than ever before. We’re excited to continue our work with MariaDB, utilizing our open source approach to provide the transparency users prefer when they encrypt their data.

Attendees of M|17 interested in securing data in cloud applications should meet us at our kiosk at the conference to experience a live demonstration of the eperi Gateway

To learn more about eperi at M|17, please visit us here.

 

By Elmar Eperiesi-Beck, CEO, eperi GmbH

On behalf of the eperi team, we are very much looking forward to being part of MariaDB’s first annual M|17 user conference. We see this as a perfect opportunity to introduce our Cloud Data Protection (CDP) solution, the eperi Gateway, to our peers within the MariaDB community.

Login or Register to post comments

by guest at April 08, 2017 03:15 PM

Colin Charles

Speaking in April 2017

Its been a while since I’ve blogged (will have to catch up soon), but here’s a few appearances:

  • How we use MySQL today – April 10 2017 – New York MySQL meetup. I am almost certain this will be very interesting with the diversity of speakers and topics.
  • Percona Live 2017 – April 24-27 2017 – Santa Clara, California. This is going to be huge, as its expanded beyond just MySQL to include MongoDB, PostgreSQL, and other open source databases. Might even be the conference with the largest time series track out there. Use code COLIN30 for the best discount at registration.

I will also be in attendance at the MariaDB Developer’s (Un)Conference, and M|17 that follows.

by Colin Charles at April 08, 2017 01:55 PM

MariaDB Foundation

MariaDB.org page views increased by 77%

It is over a year since the overhauled MariaDB.org was published. We immediately got a lot of positive feedback and now looking at the statistics a year later we are glad to report that the number of visits has grown by 28% and the amount of page views has grown a whopping 77% from 1,1 million […]

The post MariaDB.org page views increased by 77% appeared first on MariaDB.org.

by Otto Kekäläinen at April 08, 2017 11:10 AM

April 07, 2017

Peter Zaitsev

Non-Deterministic Order for SELECT with LIMIT

Non-Deterministic Order

Non-Deterministic OrderIn this blog, we’ll look at how queries in systems with parallel processing can return rows in a non-deterministic order (and how to fix it).

Short story:

Do not rely on the order of your rows if your query does not use

ORDER BY
. Even with
ORDER BY
, rows with the same values can be sorted differently. To fix this issue, always add
ORDER BY ... ID
 when you have
LIMIT N
.

Long story:

While playing with MariaDB ColumnStore and Yandex ClickHouse, I came across a very simple case. In MariaDB ColumnStore and Yandex ClickHouse, the simple query (which I used for testing)

select * from <table> where ... limit 10
  returns results in a non-deterministic order.

This is totally expected.

SELECT * from <table> WHERE ... LIMIT 10
 means “give me any ten rows, and as there is no order they can be anything that matches the
WHERE
 condition.” What we used to get in vanilla MySQL + InnoDB, however, is different:
SELECT * from <table> WHERE ... LIMIT 10
 gives us the rows sorted by primary key. Even with MyISAM in MySQL, if the data doesn’t change, the results are repeatable:

mysql> select * from City where CountryCode = 'USA' limit 10;
+------+--------------+-------------+--------------+------------+
| ID   | Name         | CountryCode | District     | Population |
+------+--------------+-------------+--------------+------------+
| 3793 | New York     | USA         | New York     |    8008278 |
| 3794 | Los Angeles  | USA         | California   |    3694820 |
| 3795 | Chicago      | USA         | Illinois     |    2896016 |
| 3796 | Houston      | USA         | Texas        |    1953631 |
| 3797 | Philadelphia | USA         | Pennsylvania |    1517550 |
| 3798 | Phoenix      | USA         | Arizona      |    1321045 |
| 3799 | San Diego    | USA         | California   |    1223400 |
| 3800 | Dallas       | USA         | Texas        |    1188580 |
| 3801 | San Antonio  | USA         | Texas        |    1144646 |
| 3802 | Detroit      | USA         | Michigan     |     951270 |
+------+--------------+-------------+--------------+------------+
10 rows in set (0.01 sec)
mysql> select * from City where CountryCode = 'USA' limit 10;
+------+--------------+-------------+--------------+------------+
| ID   | Name         | CountryCode | District     | Population |
+------+--------------+-------------+--------------+------------+
| 3793 | New York     | USA         | New York     |    8008278 |
| 3794 | Los Angeles  | USA         | California   |    3694820 |
| 3795 | Chicago      | USA         | Illinois     |    2896016 |
| 3796 | Houston      | USA         | Texas        |    1953631 |
| 3797 | Philadelphia | USA         | Pennsylvania |    1517550 |
| 3798 | Phoenix      | USA         | Arizona      |    1321045 |
| 3799 | San Diego    | USA         | California   |    1223400 |
| 3800 | Dallas       | USA         | Texas        |    1188580 |
| 3801 | San Antonio  | USA         | Texas        |    1144646 |
| 3802 | Detroit      | USA         | Michigan     |     951270 |
+------+--------------+-------------+--------------+------------+
10 rows in set (0.00 sec)

The results are ordered by ID here. In most cases, when the data doesn’t change and the query is the same, the order of results will be deterministic: open the file, read ten lines from the beginning, close the file. (When using indexes it can be different if different indexes are selected. For the same query, the database will probably select the same index if the data is static.)

But this is still not guaranteed. Here’s why: imagine we now introduce parallelism, split our table into ten pieces and run ten threads. Each will work on its own piece. Then, unless we specifically wait on each thread to finish and order the results, it will give us a random order of results. Let’s simulate this in a bash script:

for y in {2000..2010}
do
  sql="select YearD, count(*), sum(ArrDelayMinutes) from ontime where yeard=$y and carrier='DL' limit 1"
  mysql -Nb ontime -e "$sql" &
done
wait

The script’s purpose is to perform aggregation faster by taking advantage of multiple CPU cores on the server in parallel. It opens ten connections to MySQL and returns results as they arrive:

$ ./parallel_test.sh
2009    428007  5003632
2007    475889  5915443
2008    451931  5839658
2006    506086  6219275
2003    660617  5917398
2004    687638  8384465
2002    728758  7381821
2005    658302  8143431
2010    732973  9169167
2001    835236  8339276
2000    908029  11105058
$ ./parallel_test.sh
2009    428007  5003632
2008    451931  5839658
2007    475889  5915443
2006    506086  6219275
2005    658302  8143431
2003    660617  5917398
2004    687638  8384465
2002    728758  7381821
2010    732973  9169167
2001    835236  8339276
2000    908029  11105058

In this case, the faster queries arrive first and are on top, with the slower on the bottom. If the network was involved (think about different nodes in a cluster connected via a network), then the response time from each node can be much more random due to non-deterministic network latency.

In the case of MariaDB ColumnStore or Yandex Clickhouse, where scans are performed in parallel, the order of the results can also be non-deterministic. An example for ClickHouse:

:) select * from wikistat where project = 'en' limit 1;
SELECT *
FROM wikistat
WHERE project = 'en'
LIMIT 1
┌───────date─┬────────────────time─┬─project─┬─subproject─┬─path─────┬─hits─┬──size─┐
│ 2008-07-11 │ 2008-07-11 14:00:00 │ en      │            │ Retainer │   14 │ 96857 │
└────────────┴─────────────────────┴─────────┴────────────┴──────────┴──────┴───────┘
1 rows in set. Elapsed: 0.031 sec. Processed 2.03 million rows, 41.40 MB (65.44 million rows/s., 1.33 GB/s.)
:) select * from wikistat where project = 'en' limit 1;
SELECT *
FROM wikistat
WHERE project = 'en'
LIMIT 1
┌───────date─┬────────────────time─┬─project─┬─subproject─┬─path─────────┬─hits─┬───size─┐
│ 2008-12-15 │ 2008-12-15 14:00:00 │ en      │            │ Graeme_Obree │   18 │ 354504 │
└────────────┴─────────────────────┴─────────┴────────────┴──────────────┴──────┴────────┘
1 rows in set. Elapsed: 0.023 sec. Processed 1.90 million rows, 68.19 MB (84.22 million rows/s., 3.02 GB/s.)

An example for ColumnStore:

MariaDB [wikistat]> select * from wikistat limit 1
date: 2008-01-18
time: 2008-01-18 06:00:00
project: en
subproject: NULL
path: Doctor_Who:_Original_Television_Soundtrack
hits: 2
size: 2
1 row in set (1.63 sec)
MariaDB [wikistat]> select * from wikistat limit 1
date: 2008-01-31
time: 2008-01-31 10:00:00
project: de
subproject: NULL
path: Haramaki
hits: 1
size: 1
1 row in set (1.58 sec)

In another case (bug#72076) we use 

ORDER BY
, but the rows being sorted are the same. MySQL 5.7 contains the “ORDER BY” + LIMIT optimization:

If multiple rows have identical values in the ORDER BY columns, the
server is free to return those rows in any order, and may do so
differently depending on the overall execution plan. In other words,
the sort order of those rows is nondeterministic with respect to
the nonordered columns.

Conclusion
In systems that involve parallel processing, queries like

select * from table where ... limit N
 can return rows in a random order (even if the data doesn’t change between the calls). This is due to the async nature of the parallel calls: whoever serves results faster wins. In MySQL, you run
select * from table limit 1
 three times and get the same data in the same order (especially if the table data doesn’t change), but the response time will be slightly different. In a massively parallel system, the difference in the response times can cause the rows to be ordered differently.

To fix: always add

ORDER BY ... ID
  when you have
LIMIT N
.

by Alexander Rubin at April 07, 2017 07:26 PM

Percona Server for MongoDB 3.4.3-1.3 is Now Available

Percona Server for MongoDB

Percona Server for MongoDB 3.4.3-1.3Percona announces the release of Percona Server for MongoDB 3.4.3-1.3 on April 6, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB 3.4.3-1.3

Percona Server for MongoDB 3.4.3-1.3 is an enhanced, open source, fully compatible, highly-scalable, zero-maintenance downtime database supporting the MongoDB v3.4 protocol and drivers. It extends MongoDB with Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features:

Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release candidate is based on MongoDB 3.4.3 and includes the following additional changes:

  • #PSMDB-123: Fixed Hot Backup to create proper subdirectories in the destination directory.
  • #PSMDB-126: Added index and collection names to duplicate key error message.

Percona Server for MongoDB 3.4.3-1.3 release notes are available in the official documentation.

by Alexey Zhebel at April 07, 2017 04:57 PM

April 06, 2017

Peter Zaitsev

Percona Live Featured Session with Ilya Kosmodemiansky: Linux IO internals for Database Administrators

Percona Live Featured Session

Percona Live Featured SessionWelcome to another post in the series of Percona Live featured session blogs! In these blogs, we’ll highlight some of the session speakers that will be at this year’s Percona Live conference. We’ll also discuss how these sessions can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured session, we’ll meet Ilya Kosmodemiansky, CEO and Consultant of Data Egret. His session is Linux IO Internals for Database Administrators. Input/output performance problems are an everyday agenda item for DBAs since the beginning of databases. Data volume grows rapidly, and you need to get your data quickly from the disk – and more importantly to the disk. This talk covers how Linus IO works, how database pages travel from the disk level to the database shared memory and back, and what mechanisms exist to help control the exchange.

NOTE: As of this interview, Ilya’s company has rebranded from PostgreSQL Consulting to Data Egret. You can read about this change here.

I had a chance to speak with Ilya about Linux IO Internals for Database Administrators:

Ilya KosmodemianskyPercona: How did you get into database technology? What do you love about it?

Ilya: I am actually a biologist by training, so my first programming experience was in basic bioinformatics rather than general programming. That was a long time ago. Then I started trying myself in pure informatics and found it really exciting. I then worked as an Oracle, DB2 and Postgres DB. Today, my main focus is Postgres.

Dealing with open source technology is very different, and I enjoy the sense of community it brings.

Databases are the bread and butter of any business, and are crucial to its success. I find it exciting to be able to support the different types of businesses we are working with, and really enjoy solving problems and troubleshooting complex issues. This is what makes my day-to-day really enjoyable and keeps me on my toes.

Percona: Your talk is called Linux IO internals for Database Administrators. Why are Linux IO internals important for DBAs?

Ilya: Databases are really a part of a larger ecosystem, they heavily rely on operating system’s internal mechanisms, hardware, etc. To be an expert DBA and have a full control over your database, you should have a deeper understanding of how this system works. This knowledge also helps you to tackle different situations and avoid problems where possible.

After you reach a certain level of database optimization, you need to scrutinize your system on a deeper level. This is the only way you can ensure its optimal function.

Percona: What value does understanding how the IO internals work add to their ability to do their jobs?

Ilya: I would say it’s similar to driving vs. troubleshooting a car. To drive a car, you only need to know how you change gears, adjust the mirrors, add fuel and have a good driving technique. But if your car breaks down you need to understand what happens under the hood, and how its different components work on a deeper level. It not only makes you a better and more confident driver, but it also helps you get the best out of your car.

The same thing is true with databases. If you know how they really work, you will be able to better optimize their performance. As a DBA, you are in a sense an F1 mechanic. You need to know how they work to be able to do a good job, efficiently and fast.

Percona: What do you want attendees to take away from your session? Why should they attend?

Ilya: I would like my audience to really start and see the bigger picture, while at the same time not forgetting the importance of detail. It’s always easy to rely on a checklist, and an average DBA always looks for one. My talk is going to disappointment them: there is no ultimate checklist that finds all the possible failures and fixes. You really need to have a deeper understanding of database internals. Only that knowledge allows you to quickly make the right decision in critical situations. Having this knowledge will also allow you to be the judge of what to optimize and what to improve, so that you can get most out of your system.

Percona: What are you most looking forward to at Percona Live 2017?

Ilya: This is going to be my third conference, and it always attracts fantastic speakers and a great audience. I am looking forward to learning a lot about broader technologies I would normally have no chance to look at, making new friends and hanging out with old ones. Being part of the open source community is like having an extended family, and I find events such as Percona Live contribute to its strength by bringing together different communities.

For the first time, we will have PostgreSQL community booth at Percona Live this year. I think it’s a fantastic opportunity that will allow the two communities to get together and provide a fertile ground for new discussions, collaborations and mutual technology improvements.

For more information on Linux IO internals, or PostgreSQL in general, see Ilya and Data Egret’s various social handles:

Register for Percona Live Data Performance Conference 2017, and see Ilya present his session on Linux IO Internals for Database Administrators. Use the code FeaturedTalk and receive $100 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community, as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

by Dave Avery at April 06, 2017 08:57 PM

Dealing with MySQL Error Code 1215: “Cannot add foreign key constraint”

MySQL error code 1215

MySQL error code 1215In this blog, we’ll look at how to resolve MySQL error code 1215: “Cannot add foreign key constraint”.

Our Support customers often come to us with things like “My database deployment fails with error 1215”, “Am trying to create a foreign key and can’t get it working” or “Why am I unable to create a constraint?” To be honest, the error message doesn’t help much. You just get the following line:

ERROR 1215 (HY000): Cannot add foreign key constraint

But MySQL never tells you exactly WHY it failed. There’s actually a multitude of reasons this can happen. This blog post is a compendium of the most common reasons why you can get ERROR 1215, how to diagnose your case to find which one is affecting you and potential solutions for adding the foreign key.

(Note: be careful when applying the proposed solutions, as many involve ALTERing the parent table and that can take a long time blocking the table, depending on your table size, MySQL version and the specific ALTER operation being applied; In many cases using pt-online-schema-change will be likely a good idea).

So, onto the list:

1) The table or index the constraint refers to does not exist yet (usual when loading dumps).

How to diagnose: Run SHOW TABLES or SHOW CREATE TABLE for each of the parent tables. If you get error 1146 for any of them, it means tables are being created in wrong order.
How to fix: Run the missing CREATE TABLE and try again, or temporarily disable foreign-key-checks. This is especially needed during backup restores where circular references might exist. Simply run:

SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS;
SET FOREIGN_KEY_CHECKS=0;
SOURCE /backups/mydump.sql; -- restore your backup within THIS session
SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS;

Example:

mysql> CREATE TABLE child (
    ->   id INT(10) NOT NULL PRIMARY KEY,
    ->   parent_id INT(10),
    ->   FOREIGN KEY (parent_id) REFERENCES `parent`(`id`)
    -> ) ENGINE INNODB;
ERROR 1215 (HY000): Cannot add foreign key constraint
# We check for the parent table and is not there.
mysql> SHOW TABLES LIKE 'par%';
Empty set (0.00 sec)
# We go ahead and create the parent table (we’ll use the same parent table structure for all other example in this blogpost):
mysql> CREATE TABLE parent (
    ->   id INT(10) NOT NULL PRIMARY KEY,
    ->   column_1 INT(10) NOT NULL,
    ->   column_2 INT(10) NOT NULL,
    ->   column_3 INT(10) NOT NULL,
    ->   column_4 CHAR(10) CHARACTER SET utf8 COLLATE utf8_bin,
    ->   KEY column_2_column_3_idx (column_2, column_3),
    ->   KEY column_4_idx (column_4)
    -> ) ENGINE INNODB;
Query OK, 0 rows affected (0.00 sec)
# And now we re-attempt to create the child table
mysql> CREATE TABLE child (
    ->   id INT(10) NOT NULL PRIMARY KEY,drop table child;
    ->   parent_id INT(10),
    ->   FOREIGN KEY (parent_id) REFERENCES `parent`(`id`)
    -> ) ENGINE INNODB;
Query OK, 0 rows affected (0.01 sec)

2) The table or index in the constraint references misuses quotes.

How to diagnose: Inspect each FOREIGN KEY declaration and make sure you either have no quotes around object qualifiers, or that you have quotes around the table and a SEPARATE pair of quotes around the column name.
How to fix: Either don’t quote anything, or quote the table and the column separately.
Example:

# wrong; single pair of backticks wraps both table and column
ALTER TABLE child  ADD FOREIGN KEY (parent_id) REFERENCES `parent(id)`;
# correct; one pair for each part
ALTER TABLE child ADD FOREIGN KEY (parent_id) REFERENCES `parent`(`id`);
# also correct; no backticks anywhere
ALTER TABLE child ADD FOREIGN KEY (parent_id) REFERENCES parent(id);
# also correct; backticks on either object (in case it’s a keyword)
ALTER TABLE child ADD FOREIGN KEY (parent_id) REFERENCES parent(`id`);

3) The local key, foreign table or column in the constraint references have a typo:

How to diagnose: Run SHOW TABLES and SHOW COLUMNS and compare strings with those in your REFERENCES declaration.
How to fix: Fix the typo once you find it.
Example:

# wrong; Parent table name is ‘parent’
ALTER TABLE child ADD FOREIGN KEY (parent_id) REFERENCES pariente(id);
# correct
ALTER TABLE child ADD FOREIGN KEY (parent_id) REFERENCES parent(id);

4) The column the constraint refers to is not of the same type or width as the foreign column:

How to diagnose: Use SHOW CREATE TABLE parent to check that the local column and the referenced column both have same data type and width.
How to fix: Edit your DDL statement such that the column definition in the child table matches that of the parent table.
Example:

# wrong; id column in parent is INT(10)
CREATE TABLE child (
  id INT(10) NOT NULL PRIMARY KEY,
  parent_id BIGINT(10) NOT NULL,
  FOREIGN KEY (parent_id) REFERENCES `parent`(`id`)
) ENGINE INNODB;
# correct; id column matches definition of parent table
CREATE TABLE child (
  id INT(10) NOT NULL PRIMARY KEY,
  parent_id INT(10) NOT NULL,
  FOREIGN KEY (parent_id) REFERENCES `parent`(`id`)
) ENGINE INNODB;

5) The foreign object is not a KEY of any kind

How to diagnose: Use SHOW CREATE TABLE parent to check that if the REFERENCES part points to a column, it is not indexed in any way.
How to fix: Make the column a KEY, UNIQUE KEY or PRIMARY KEY on the parent.
Example:

# wrong; column_1 is not indexed in our example table
CREATE TABLE child (
  id INT(10) NOT NULL PRIMARY KEY,
  parent_column_1 INT(10),
  FOREIGN KEY (parent_column_1) REFERENCES `parent`(`column_1`)
) ENGINE INNODB;
# correct; we first add an index and then re-attempt creation of child table
ALTER TABLE parent ADD INDEX column_1_idx(column_1);
# and then re-attempt creation of child table
CREATE TABLE child (
  id INT(10) NOT NULL PRIMARY KEY,
  parent_column_1 INT(10),
  FOREIGN KEY (parent_column_1) REFERENCES `parent`(`column_1`)
) ENGINE INNODB;

6) The foreign key is a multi-column PK or UK, where the referenced column is not the leftmost one

How to diagnose: Do a SHOW CREATE TABLE parent to check if the REFERENCES part points to a column that is present in some multi-column index(es), but is not the leftmost one in its definition.
How to fix: Add an index on the parent table where the referenced column is the leftmost (or only) column.
Example:

# wrong; column_3 only appears as the second part of an index on parent table
CREATE TABLE child (
  id INT(10) NOT NULL PRIMARY KEY,
  parent_column_3 INT(10),
  FOREIGN KEY (parent_column_3) REFERENCES `parent`(`column_3`)
) ENGINE INNODB;
# correct; create a new index for the referenced column
ALTER TABLE parent ADD INDEX column_3_idx (column_3);
# then re-attempt creation of child
CREATE TABLE child (
  id INT(10) NOT NULL PRIMARY KEY,
  parent_column_3 INT(10),
  FOREIGN KEY (parent_column_3) REFERENCES `parent`(`column_3`)
) ENGINE INNODB;

7) Different charsets/collations among the two table/columns

How to diagnose: Run SHOW CREATE TABLE parent and compare that the child column (and table) CHARACTER SET and COLLATE parts match those of the parent table.
How to fix: Modify the child table DDL so that it matches the character set and collation of the parent table/column (or ALTER the parent table to match the child’s wanted definition.
Example:

# wrong; the parent table uses utf8/utf8_bin for charset/collation
CREATE TABLE child (
  id INT(10) NOT NULL PRIMARY KEY,
  parent_column_4 CHAR(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  FOREIGN KEY (parent_column_4) REFERENCES `parent`(`column_4`)
) ENGINE INNODB;
# correct; edited DDL so COLLATE matches parent definition
CREATE TABLE child (
  id INT(10) NOT NULL PRIMARY KEY,
  parent_column_4 CHAR(10) CHARACTER SET utf8 COLLATE utf8_bin,
  FOREIGN KEY (parent_column_4) REFERENCES `parent`(`column_4`)
) ENGINE INNODB;

8) The parent table is not using InnoDB

How to diagnose: Run SHOW CREATE TABLE parent and verify if ENGINE=INNODB or not.
How to fix: ALTER the parent table to change the engine to InnoDB.
Example:

# wrong; the parent table in this example is MyISAM:
CREATE TABLE parent (
  id INT(10) NOT NULL PRIMARY KEY
) ENGINE MyISAM;
# correct: we modify the parent’s engine
ALTER TABLE parent ENGINE=INNODB;

9) Using syntax shorthands to reference the foreign key

How to diagnose: Check if the REFERENCES part only mentions the table name. As explained by ex-colleague Bill Karwin in http://stackoverflow.com/questions/41045234/mysql-error-1215-cannot-add-foreign-key-constraint, MySQL doesn’t support this shortcut (even though this is valid SQL).
How to fix: Edit the child table DDL so that it specifies both the table and the column.
Example:

# wrong; only parent table name is specified in REFERENCES
CREATE TABLE child (
  id INT(10) NOT NULL PRIMARY KEY,
  column_2 INT(10) NOT NULL,
  FOREIGN KEY (column_2) REFERENCES parent
) ENGINE INNODB;
# correct; both the table and column are in the REFERENCES definition
CREATE TABLE child (
  id INT(10) NOT NULL PRIMARY KEY,
  column_2 INT(10) NOT NULL,
  FOREIGN KEY (column_2) REFERENCES parent(column_2)
) ENGINE INNODB;

10) The parent table is partitioned

How to diagnose: Run SHOW CREATE TABLE parent and find out if it’s partitioned or not.
How to fix: Removing the partitioning (i.e., merging all partitions back into a single table) is the only way to get it working.
Example:

# wrong: the parent table we see below is using PARTITIONs
CREATE TABLE parent (
  id INT(10) NOT NULL PRIMARY KEY
) ENGINE INNODB
PARTITION BY HASH(id)
PARTITIONS 6;
#correct: ALTER parent table to remove partitioning
ALTER TABLE parent REMOVE PARTITIONING;

11) Referenced column is a generated virtual column (this is only possible with 5.7 and newer)

How to diagnose: Run SHOW CREATE TABLE parent and verify that the referenced column is not a virtual column.
How to fix: CREATE or ALTER the parent table so that the column will be stored and not generated.
Example:

# wrong; this parent table has a generated virtual column
CREATE TABLE parent (
  id INT(10) NOT NULL PRIMARY KEY,
  column_1 INT(10) NOT NULL,
  column_2 INT(10) NOT NULL,
  column_virt INT(10) AS (column_1 + column_2) NOT NULL,
  KEY column_virt_idx (column_virt)
) ENGINE INNODB;
# correct: make the column STORED so it can be used as a foreign key
ALTER TABLE parent DROP COLUMN column_virt, ADD COLUMN column_virt INT(10) AS (column_1 + column_2) STORED NOT NULL;
# And now the child table can be created pointing to column_virt
CREATE TABLE child (
  id INT(10) NOT NULL PRIMARY KEY,
  parent_virt INT(10) NOT NULL,
  FOREIGN KEY (parent_virt) REFERENCES parent(column_virt)
) ENGINE INNODB;

12) Using SET DEFAULT for a constraint action

How to diagnose: Check your child table DDL and see if any of your constraint actions (ON DELETE, ON UPDATE) try to use SET DEFAULT
How to fix: Remove or modify actions that use SET DEFAULT from the child table CREATE or ALTER statement.
Example:

# wrong; the constraint action uses SET DEFAULT
CREATE TABLE child (
  id INT(10) NOT NULL PRIMARY KEY,
  parent_id INT(10) NOT NULL,
  FOREIGN KEY (parent_id) REFERENCES parent(id) ON UPDATE SET DEFAULT
) ENGINE INNODB;
# correct; there's no alternative to SET DEFAULT, removing or picking other is the corrective measure
CREATE TABLE child (
  id INT(10) NOT NULL PRIMARY KEY,
  parent_id INT(10) NOT NULL,
  FOREIGN KEY (parent_id) REFERENCES parent(id)
) ENGINE INNODB;

I realize many of the solutions are not what you might desire, but these are limitations in MySQL that must be overcome on the application side for the time being. I do hope the list above gets shorter by the time 8.0 is released!

If you know other ways MySQL can fail with ERROR 1215, let us know in the comments!

More information regarding Foreign Key restrictions can be found here: https://dev.mysql.com/doc/refman/5.7/en/innodb-foreign-key-constraints.html.

by marcos.albe at April 06, 2017 06:11 PM

Shlomi Noach

"MySQL High Availability tools" followup, the missing piece: orchestrator

I read with interest MySQL High Availability tools - Comparing MHA, MRM and ClusterControl by SeveralNines. I thought there was a missing piece in the comparison: orchestrator, and that as result the comparion was missing scope and context.

I'd like to add my thoughts on topics addressed in the post. I'm by no means an expert on MHA, MRM or ClusterControl, and will mostly focus on how orchestrator tackles high availability issues raised in the post.

What this is

This is to add insights on the complexity of failovers. Over the duration of three years, I always think I've seen it all, and then get hit by yet a new crazy scenario. Doing the right thing automatically is difficult.

In this post, I'm not trying to convince you to use orchestrator (though I'd be happy if you did). To be very clear, I'm not claiming it is better than any other tool. As always, each tool has pros and cons.

This post does not claim other tools are not good. Nor that orchestrator has all the answers. At the end of the day, pick the solution that works best for you. I'm happy to use a solution that reliably solves 99% of the cases as opposed to an unreliable solution that claims to solve 99.99% of the cases.

Quick background

orchestrator is actively maintained by GitHub. It manages automated failovers at GitHub. It manages automated failovers at Booking.com, one of the largest MySQL setups on this planet. It manages automated failovers as part of Vitess. These are some names I'm free to disclose, and browsing the issues shows a few more users running failovers in production. Otherwise, it is used for topology management and visualization in a large number of companies such as Square, Etsy, Sendgrid, Godaddy and more.

Let's now follow one-by-one the observations on the SeveralNines post.

Flapping

orchestrator supports an anti-flapping mechanism. Once an automated failover kicks off, no additional automated failover will run on the same cluster for the duration of a pre-configured <code>RecoveryPeriodBlockSeconds</code>.  We, for example, set that time to 1 hour.

However, orchestrator also supports acknowledgements. A human (or a bot, for that matter, accessing the API or just running the command line) can acknowledge a failover. Once a failover is acknowledged, the block is removed. The next incident requiring a failover is free to proceed.

Moreover, a human is always allowed to forcibly invoke a failover (e.g. via orchestrator -c graceful-master-takeover or  orchestrator -c force-master-takeover). In such case, any blocking is ignored and orchestrator immediately kicks in the failover sequence.

Lost transactions

orchestrator does not pull binlog data from the failed master and only works with the data available on the replicas. There is a potential for data loss.

Note: There was actually some work into a MHA-like synching of relay logs, and in fact most of it is available right now in orchestrator, synching relaylogs via remote SSH and without agents. See https://github.com/github/orchestrator/issues/45 for some pointers. We looked into this for a couple months but saw some dangers and issues such as non-atomic relay log entries in RBR and others. We chose to put this on-hold, and I advise to not use this functionality in orchestrator.

Lost transactions, notes on semi-sync

Vitess have contributed a semi-sync failover mechanism (discussion). So you are using semi-sync? That's wonderful. You've failed over and have greatly reduced the amount of lost data. What happens with your new setup? Will your recovery re-apply semi sync? Vitess's contribution does just that and makes sure a new relica takes the semi-sync role.

Lost transactions, notes on "most up to date replica"

There is this fallacy. It has been proven to be a fallacy many time in production, that I've witnessed. I'm sure this happened somewhere else in the universe that I haven't witnessed.

The fallacy says: "when master fails, we will promote the most up-to-date replica". The most up-to-date may well be the wrong choice. You may wish to skip and lose the most up-to-date replica. The assumption is a fallacy, because many times our production environment is not sterile.

Consider:

  • You're running 5.6 and wish to experiment/upgrade to 5.7. You will like upgrade a replica or two, measure their replication capacity, their query latency, etc. You now have a non-sterile topology. When the master fails, you must not promote your 5.7 replica because you will not be able to (or wouldn't want to take the chance) replicate from that 5.7 onto the rest of your fleet, which is 5.6.
    • We are looking into 5.7 upgrade for a few months now. Most careful places I know likewise take months to run such an upgrade. These are months where your topology is not sterile.

So you'd rather lose the most up-to-date replica than lose the other 10 replicas you have.

  • Likewise, you run with STATEMENT based replication. You wish to experiment with ROW based replication. Your topology is not sterile. You must not upgrade your ROW replica, because the rest of the replicas (assuming log_slave_updates) would not be able to replicate.

orchestrator understands all the above and makes the right call. It would promote the replica that would ensure the survival of your fleet, and prefers to lose data.

  • You have a special replica with replication filters. More often than desired, this happens. Perhaps you're splitting out a new sub-cluster, functionally partitioning away some tables. Or you're running some exporter to Hadoop and only filter specific events. Or... You must not promote that replica.

orchestrator would not promote that replica.

Let's look at something crazy now:

  • Getting back to the 5.7 upgrade, you have now upgraded 8 out of your 10 replicas to 5.7. You now do want to promote the 5.7 replica. It's a matter of how many replicas you'd lose if you promoted _this one_ as opposed to _that one_.

orchestrator makes that calculation. Hey, the same applies for STATEMENT and ROW based replicas. Or maybe you have both 5.7 and RBR experiments (tsk tsk tsk, but life is challanging) at the same time. orchestrator will still pick the replica whose promotion will get the majority of your fleet intact.

  • The most up-to-date replica is on a different data center.

orchestrator can, but prefer not to promote it. But then, it also supports a 2 step promotion. If possible, it would first promote that most up-to-date replica from the other DC, then let local DC replicas catch up, then reverse replication and place a local replica on top, making the master on local DC again.

I cannot stress enough how useful and important this is.

This conveniently leads us to...

Roles

Fallacy: using configuration to white-list or black-list servers.

You'd expect to overcome the above 5.7 or ROW etc. issues by carefully marking your servers as blacklisted.

We've been there. This works on a setup with 10 servers. I claim this doesn't scale.

Again, I wish to be clear: if this works for you, great! Ignore the rest of this section. I suggest that as you grow, this becomes more and more a difficult problem. At the scale of a 100 servers, it's a serious pain. Examples:

  • You set global binlog_format='ROW'. Do you then immediately follow up to reconfigure your HA service to blacklist your server?
  • You provision a new box. Do you need to go and reconfigure your HA service, adding the box as white-listed?
  • You need to run maintenance on a server. Do you rush to reconfigure HA service?

You can invest in automating the reconfiguration of servers; I've been there as well, to some extent. This is a difficult task on its own.

  • You'd use service discovery to assign tags to hosts
  • And then puppet would distribute configuration.
    • How fast would it do that?
  • I'm not aware and am happy to learn if any of the mentioned solutions can be reconfigured by consul. You need to do the job yourself.

Also consider how flexible your tools are: suppose you reconfigure; how easy it is for your tools to reload and pick the new config?

  • orchestrator does its best, and reloads most configuration live, but even orchestrator has some "read-only" variables.

How easy it is to restart your HA services? Can you restart them in a rolling fashion, such that at any given point you have HA up and running?

  • This is well supported by orchestrator

orchestrator recognizes those crazy 5.7 -ROW-replication filters topologies by magic. Well, not magic. It just observes the state of the topologies. It learns the state of the topologies, so that at time of crash it has all the info. Its logic is based on that info. It knows and understands replication rules. It computes a good promotion path, taking data centers and configurations into consideration.

Initially, orchestrator started with blacklists in configuration. They're still supported. But today it's more about labels.

  • One of your replicas serves as a backup server. It runs extra tasks, or configured differently (no log_slave_updates? Different buffer pool settings?) and is not good to serve as a master. You should not promote that replica.

Instead of saying "this server is a backup server and should never be promoted" in configuration -- (and that's possible to do!), orchestrator lets you dynamically announce that this server should not be promoted. Such that maybe 10 minutes ago this wasn't a backup server, but now it is. You can advertise that fact to orchestrator. We do that via:

orchestrator -c register-candidate -i {::fqdn} --promotion-rule=${promotion_rule}

where ${promotion_rule} is candidate, neutral or must_not. We run this from cron, every couple minutes. How we choose the right rule comes from our service discovery. orchestrator is always up-to-date (up to a couple minutes) at worst of role changes (and urgent role changes get propagated immediately).

Also, each host can self-declare its role, so that orchestrator discovers the role on discover, as per DetectPromotionRuleQuery. Vitess is known to use this approach, and they contributed the code.

Network partitioning

First, about general failure detection.

orchestrator uses a holistic approach to detecting failures, and instead of justifying that it is a really good approach, I will testify that it is. orchestrator's accuracy in recognizing a failover scenario is very high. Booking.com has X,000 MySQL servers in production. at that scale, servers and networks break enough that there's always something breaking. Quoting (with permission) Simon J. Mudd from Booking.com:

We have at least one failure a day which it handles automatically. that’s really neat. people don’t care about dead masters any more.

orchestrator looks not only at the master, but at its replicas. If orchestrator can't see the master, but the replicas are happily running and replicating, then it's just orchestrator who can't see the master. But if orchestrator doesn't see the master`, and all the replicas are broken, then there's a failover scenario.

With ClusterControl, there seems to be danger of false positives: "...therefore it can take an action if there are network issues between the master and the ClusterControl host."

As for fencing:

MHA uses multiple nodes for detecting the failure. Especially if these nodes run on different DCs, this gives MHA a better view when fencing is involved.

Last September we gathered at an improvised BoF session on Percona Live Amsterdam. We were looking at observing a failure scenario where fencing is involved.

We mostly converged onto having multiple observers, such as 3 observers on 3 different data centers, a quorum of which would decide if there's a failure scenario or not.

The problem is difficult. If the master is seen by 2 nodes, but all of its replicas are broken, does that make a failure scenario or not? What if a couple replicas are happy but ten others are not?

orchestrator has it on the roadmap to run a quorum based failover decision. My estimation is that it will do the right thing most times and that it would be very difficult to push towards 99.99%.

Integration

orchestrator provides HTTP API as well as a command line interface. We use those at GitHub, for example, to integrate orchestrator into our chatops. We command orchestrator via chat and get information via chat.

Or we and others have automated, scheduled jobs, that use the orchestrator command line to rearrange topologies.

There is no direct integration between orchestrator and other tools. Recently there have been requests to integrate orchestrator with ProxySQL. I see multiple use cases for that. Plug: in the upcoming Percona Live conference in Santa Clara, René Cannaò and myself will co-present a BoF on potential ProxySQL-orchestrator integration. It will be an open discussion. Please come and share your thoughts! (The talk hasn't been scheduled yet, I will update this post with the link once it's scheduled).

Addressing the issue of read_only as an indicator to master/replica roles, please see the discussion on https://github.com/sysown/proxysql/issues/789. Hint: this isn't trivial and many times not reliable. It can work in some cases.

Conclusion

Is it time already? I have much to add; but let's stay focus on the SeveralNines blog. Addressing the comparison chart at the "Conclusion" section:

Replication support

orchestrator supports failovers on:

  • Oracle GTID
  • MariaDB GTID
  • Pseudo-GTID
  • Semi-sync
  • Binlog servers

GTID support is ongoing (recent example). Traditionally orchestrator is very focused on Pseudo-GTID.

I should add: Pseudo-GTID is a thing. It runs, it runs well. Pseudo-GTID provides with almost all the Oracle GTID advantages without actually using GTID, and without GTID limitations. The obvious disclaimer is that Pseudo-GTID has its own limitations, too. But I will leave the PSeudo-GTID preaching for another time, and just note that GitHub and Booking.com both run automated failovers based on Pseudo-GTID.

Flapping

Time based blocking per cluster, acknowledgements supported; human overrides permitted.

Lost transactions

No checking for transactions on master

Network Partitioning

Excellent false positive detection. Seems like other tools like MRM and ClusterControl are following up by adopting the orchestrator approach, and I'm happy for that.

Roles

State-based automatic detection and decision making; also dynamic roles via advertising; also support for self- declaring roles; also support for configuration based black lists

Integration

HTTP API, command line interfaces. No direct integration with other products. Looking into ProxySQL with no promises held.

Further

If you use non-GTID replication, MHA is the only option for you

That is incorrect. orchestrator is an excellent choice in my biased opinion. GitHub and Booking.com both run orchestrator automated failovers without using GTIDs.

Only ClusterControl ... is flexible enough to handle both types of GTID under one tool

That is incorrect. orchestrator supports both types of GTID. I'm further working towards better and better supports of Oracle GTID.

...this could be very useful if you have a mixed environment while you still would like to use one single tool to ensure high availability of your replication setup

Just to give you an impression: orchestrator works on topologies mixed with Oracle and MariaDB servers, normal replication and binlog servers, STATEMENT and ROW based replication, three major versions in the same cluster (as I recall I've seen that; I don't run this today), and mixed combination of the above. Truly.

Finally

Use whatever tool works best for you.

Oh, and I forgot!

Please consider attending my talk, Practical Orchestrator, where I will share practical advice and walkthough on `orchestrator setup.

by shlomi at April 06, 2017 12:56 PM

April 05, 2017

MariaDB Foundation

MariaDB 10.2.5 RC now available

The MariaDB project is pleased to announce the immediate availability of MariaDB 10.2.5 Release Candidate (RC). See the release notes and changelog for details. Download MariaDB 10.2.5 Release Notes Changelog What is MariaDB 10.2? MariaDB APT and YUM Repository Configuration Generator Thanks, and enjoy MariaDB!

The post MariaDB 10.2.5 RC now available appeared first on MariaDB.org.

by Daniel Bartholomew at April 05, 2017 07:54 PM

Peter Zaitsev

Evaluation of PMP Profiling Tools

PMP Profiling Tools

In this blog post, we’ll look at some of the available PMP profiling tools.

While debugging or analyzing issues with Percona Server for MySQL, we often need a quick understanding of what’s happening on the server. Percona experts frequently use the pt-pmp tool from Percona Toolkit (inspired by http://poormansprofiler.org).

The

pt-pmp
 tool collects application stack traces GDB and then post-processes them. From this you get a condensed, ordered list of the stack traces. The list helps you understand where the application spent most of the time: either running something or waiting for something.

Getting a profile with

pt-pmp
 is handy, but it has a cost: it’s quite intrusive. In order to get stack traces, GDB has to attach to each thread of your application, which results in interruptions. Under high loads, these stops can be quite significant (up to 15-30-60 secs). This means that the
pt-pmp
 approach is not really usable in production.

Below I’ll describe how to reduce GDB overhead, and also what other tools can be used instead of GDB to get stack traces.

  • GDB
    By default, the symbol resolution process in GDB is very slow. As a result, getting stack traces with GDB is quite intrusive (especially under high loads).There are two options available that can help notably reduce GDB tracing overhead:

      1. Use readnever patch. RHEL and other distros based on it include GDB with the readnever patch applied. This patch allows you to avoid unnecessary symbol resolving with the 
        --readnever
         option. As a result you get  up to 10 times better speed.
      2. Use gdb_index. This feature was added to address symbol resolving issue by creating and embedding a special index into the binaries. This index is quite compact: I’ve created and embedded gdb_index for Percona server binary (it increases the size around 7-8MB). The addition of the gdb_index speeds up obtaining stack traces/resolving symbols two to three times.

    # to check if index already exists:
      readelf -S  | grep gdb_index
    # to generate index:
      gdb -batch mysqld -ex "save gdb-index /tmp" -ex "quit"
    # to embed index:
      objcopy --add-section .gdb_index=tmp/mysqld.gdb-index --set-section-flags .gdb_index=readonly mysqld mysqld

  • eu-stack (elfutils)
    The eu-stack from the elfutils package prints the stack for each thread in a process or core file.Symbol resolving also is not very optimized in eu-stack. By default, if you run it under load it will take even more time than GDB. But eu-stack allows you to skip resolving completely, so it can get stack frames quickly and then resolve them without any impact on the workload later.
  • Quickstack
    Quickstack is a tool from Facebook that gets stack traces with minimal overheads.

Now let’s compare all the above profilers. We will measure the amount of time it needs to take all the stack traces from Percona Server for MySQL under a high load (sysbench OLTP_RW with 512 threads).

The results show that eu-stack (without resolving) got all stack traces in less than a second, and that Quickstack and GDB (with the readnever patch) got very close results. For other profilers, the time was around two to five times higher. This is quite unacceptable for profiling (especially in production).

There is one more note regarding the

pt-pmp
 tool. The current version only supports GDB as the profiler. However, there is a development version of this tool that supports GDB, Quickstack, eu-stack and eu-stack with offline symbol resolving. It also allows you to look at stack traces for specific threads (tids). So for instance, in the case of Percona Server for MySQL, we can analyze just the purge, cleaner or IO threads.

Below are the command lines used in testing:

# gdb & gdb+gdb_index
  time gdb  -ex "set pagination 0" -ex "thread apply all bt" -batch -p `pidof mysqld` > /dev/null
# gdb+readnever
  time gdb --readnever -ex "set pagination 0" -ex "thread apply all bt" -batch -p `pidof mysqld` > /dev/null
# eu-stack
  time eu-stack -s -m -p `pidof mysqld` > /dev/null
# eu-stack without resolving
  time eu-stack -q -p `pidof mysqld` > /dev/null
# quickstack - 1 sample
  time quickstack  -c 1 -p `pidof mysqld` > /dev/null
# quickstack - 1000 samples
  time quickstack  -c 1000 -p `pidof mysqld` > /dev/null

by Alexey Stroganov at April 05, 2017 07:12 PM

Jean-Jerome Schmidt

Meet the Severalnines team at M|17 and Percona Live this month

This month we’re excited to be participating in two open source database conferences, namely the MariaDB User Conference in New York, M|17, and the Percona Live Conference (formerly known as the MySQL User Conference) in Santa Clara.

And we’re looking forward to presenting, exhibiting and mingling there with fellow open source database enthusiasts.

We’ll have talks at both conferences, and a booth at Percona Live, so do come visit us at booth 411 in the Percona Live Exhibition Hall. We’ll be there to answer any questions you may have on open source database management and automation. And we’re planning to make it a fun place to hang out.

Of course, we look forward to catching up with you as well in New York, if you’re attending the MariaDB conference. See below who our speakers are at both conferences.

These are our talks and speakers at M|17 and Percona Live this month:

Talk @ M|17

Step-By-Step: Clustering with Galera and Docker Swarm

Wednesday, April 12 - 2:10 pm - 3:00 pm

Ashraf Sharif, Senior Support Engineer

Clustering is an important feature in container technology, as multiple nodes are required to provide redundancy and failover in case of outage. Docker Swarm is an orchestration tool that allows administrators to manage a cluster of Docker nodes as one single virtual system. MariaDB Cluster, however, has its own clustering model based on Galera.

In this talk, we’ll look at how to deploy MariaDB Cluster on Docker Swarm with a multi-host environment, by “homogeneousing” the MariaDB image to achieve high availability and scalability in a fully automated way. We will touch upon service controls, multi-host networking, persistent storage, scaling, fault tolerance, service discovery, and load distribution.

Talks @ Percona Live

Become a MongoDB DBA: monitoring essentials

Tuesday, April 25 - 11:30 AM - 12:20 PM @ Ballroom G

Art van Scheppingen, Senior Support Engineer

To operate MongoDB efficiently, you need to have insight into database performance. And with that in mind, we’ll dive into monitoring in this talk. MongoDB offers many metrics through various status overviews and commands, but which ones really matter to you? How do you trend and alert on them? What is the meaning behind the metrics?

We’ll discuss the most important ones and describe them in ordinary plain MySQL DBA language. Finally we’ll have a look at the (open source) tools available for MongoDB monitoring and trending and compare them.

MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - a close up look

26 April - 11:10 AM - 12:00 PM @ Ballroom D

Krzysztof Książek, Senior Support Engineer

Load balancing MySQL connections and queries using HAProxy has been popular in the past years. Recently however, we have seen the arrival of MaxScale, MySQL Router, ProxySQL and now also Nginx as a reverse proxy. For which use cases do you use them and how well do they integrate in your environment?

This session aims to give a solid grounding in load balancer technologies for MySQL and MariaDB.

We will review the wide variety of open-source options available: from application connectors (php-mysqlnd, jdbc), TCP reverse proxies (HAproxy, Keepalived, Nginx) and SQL-aware load balancers (MaxScale, ProxySQL, MySQL Router), and look at what considerations you should make when assessing their suitability for your environment.

MySQL (NDB) Cluster Best Practices (Die Hard VIII)

26 April - 3:30 PM - 4:20 PM @ Room 210

Johan Andersson, CTO

MySQL Cluster is a write-scalable, real-time, ACID-compliant transactional database, designed to deliver 99.999% availability. It provides shared-nothing clustering and auto-sharding for MySQL, accessed via SQL and NoSQL interfaces. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL.

In this session we will talk about:

  • Core architecture and Design principles of NDB Cluster
  • APIs for data access (SQL and NoSQL interfaces)
  • Important configuration parameters
  • Best practices: indexing and schema

We will also compare performance between MySQL Cluster 7.5 and Galera (MySQL 5.6/5.7), and how to best make use of the feature set of MySQL Cluster 7.5.

So … Happy clustering, and see you in New York, Santa Clara, or maybe both!

by jj at April 05, 2017 06:31 PM

MariaDB AB

MyRocks available in MariaDB 10.2 Release Candidate 2

MyRocks available in MariaDB 10.2 Release Candidate 2 rasmusjohansson Wed, 04/05/2017 - 13:13

It’s about two months since the release of the first release candidate of MariaDB 10.2. Release Candidates are nice especially from the view that this is the point where large amounts of users start testing a product. We’re seeing this happen also this time and have experienced a lot of interest in MariaDB Server 10.2 in general over the last two months.

Now, we’re releasing MariaDB Server 10.2.5, which is Release Candidate number 2. Although this is a Release Candidate, there is one significant new piece in it, namely MyRocks. MyRocks is a storage engine developed by Facebook using RocksDB as the backend. Some of the benefits of MyRocks are that it provides great compression, it’s write efficient and data loading is fast. The MyRocks site has more information about the benefits and you can learn more about MyRocks in MariaDB on our blog.

Since MyRocks originally was developed by Facebook, the focus was to make it especially suitable for their needs, i.e. their workloads and platforms. Under the lead of our developer Sergey Petrunia, MyRocks was now included in MariaDB. MyRocks in MariaDB is constantly being merged with upstream MyRocks from Facebook.

The goal was to make MyRocks available as a storage engine in MariaDB Server for every MariaDB user to consume on whatever platform they use (with some exceptions for 32-bit platforms and older Linuxes). With MyRocks now included in RC2 of 10.2 that goal has been accomplished. MyRocks is available in the MariaDB Server packages for most Linux distributions and is available for Windows users too.

Although it’s in its early days for MyRocks, it’s very much a functioning storage engine, with which amazing results can be achieved. We encourage you to try it, use it and report any problems you might have with it. But please remember that this is the first public version of MyRocks in MariaDB, which means it's suitable for serious testing, etc. but not production usage.

Test out MariaDB 10.2.5 Release Candidate 2! We want to hear about your experience with it. Although MyRocks is the newest feature in 10.2 there are some other significant features to test out in MariaDB Server 10.2 as well:

Get MariaDB Server 10.2.5 here!

We’re releasing MariaDB Server 10.2.5, which is Release Candidate number 2. Although this is a Release Candidate, there is one significant new piece in it, namely MyRocks. MyRocks is a storage engine developed by Facebook using RocksDB as the backend.

Login or Register to post comments

by rasmusjohansson at April 05, 2017 05:13 PM

Peter Zaitsev

Percona Server for MySQL 5.7.17-13 is Now Available

Percona Server for MySQL 5.7.17-13

Percona Server for MySQL 5.7.17-13Percona announces the GA release of Percona Server for MySQL 5.7.17-13 on April 5, 2017. Download the latest version from the Percona web site or the Percona Software Repositories. You can also run Docker containers from the images in the Docker Hub repository.

Based on MySQL 5.7.17, including all the bug fixes in it, Percona Server for MySQL 5.7.17-13 is the current GA release in the Percona Server for MySQL 5.7 series. Percona’s provides completely open-source and free software. Find release details in the 5.7.17-13 milestone at Launchpad.

Bugs Fixed:
  • MyRocks storage engine detection implemented in mysqldump in Percona Server 5.6.17-12 was using a deprecated INFORMATION_SCHEMA.SESSION_VARIABLES table, causing mysqldump failures on servers running with the show_compatibility_56 variable set to OFF. Bug fixed #1676401.

The release notes for Percona Server for MySQL 5.7.17-13 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

by Hrvoje Matijakovic at April 05, 2017 03:53 PM

Valeriy Kravchuk

My First Steps with MariaDB 10.2 and RocksDB Storage Engine

Last year I started to explore MyRocks, that is, RocksDB used as a storage engine with MySQL. So far I had to use Facebook's MySQL 5.6 to do this. I could try to use some specific development branches of MariaDB (or maybe even Percona Server) for this, but I preferred to wait until the engine is included into a main branch by the company I work for. Recently this happened, and now you can get RocksDB loaded and working in main MariaDB 10.2 branch. In this blog post I am going to explain how to build it from source and do some basic checks.

I was updating my MariaDB local repository on my Ubuntu 14.04 netbook, with 10.2 branch already checked out (do git checkout 10.2 if you see other marked with *, 10.1 is used by default):
openxs@ao756:~/git/server$ git branch  10.0
  10.1
* 10.2
  bb-10.2-marko
and noted the following at the end of output produced by git pull:
...
 create mode 100644 storage/rocksdb/unittest/test_properties_collector.cc
 create mode 100644 storage/rocksdb/ut0counter.h
So, I realized what I considered just talks, plans and rumors really happened - we have RocksDB in the main branch of MariaDB! I immediately proceeded with the following commands to get submodule for the engine up to date:
openxs@ao756:~/git/server$ git submodule init
Submodule 'storage/rocksdb/rocksdb' (https://github.com/facebook/rocksdb.git) registered for path 'storage/rocksdb/rocksdb'
openxs@ao756:~/git/server$ git submodule update
Submodule path 'libmariadb': checked out 'd1387356292fb840c7736aeb8f449310c3139087'
Cloning into 'storage/rocksdb/rocksdb'...
remote: Counting objects: 49559, done.
remote: Compressing objects: 100% (70/70), done.
remote: Total 49559 (delta 31), reused 1 (delta 1), pack-reused 49485
Receiving objects: 100% (49559/49559), 97.77 MiB | 3.45 MiB/s, done.
Resolving deltas: 100% (36642/36642), done.
Checking connectivity... done.
Submodule path 'storage/rocksdb/rocksdb': checked out 'ba4c77bd6b16ea493c555561ed2e59bdc4c15fc0'
openxs@ao756:~/git/server$ git log -1
commit 0d34dd7cfb700b91f11c59d189d70142ed652615
...
Then I applied my usual cmake command line and build commands:
openxs@ao756:~/dbs/maria10.2$ fc -l
2001     cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_SSL=system -DWITH_ZLIB=bundled -DMYSQL_MAINTAINER_MODE=0 -DENABLED_LOCAL_INFILE=1 -DWITH_JEMALLOC=system -DWITH_INNODB_DISALLOW_WRITES=ON -DCMAKE_INSTALL_PREFIX=/home/openxs/dbs/maria10.2
...
2003     time make -j 2
2004     make install && make clean
2005     cd
2006     cd dbs/maria10.2
2007     bin/mysqld_safe --no-defaults --port=3307 --socket=/tmp/mariadb.sock --rocksdb &
The last command above was my lame attempt to add RocksDB support in the same way it is done in MySQL 5.6 from Facebook. The option is not recognized and instead you just have to start as usual and install plugin:
install soname 'ha_rocksdb.so';
Then you'll see a lot of new rows in the output of SHOW PLUGINS:
...
| ROCKSDB                       | ACTIVE   | STORAGE ENGINE     | ha_rocksdb.so | GPL     |
| ROCKSDB_CFSTATS               | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_DBSTATS               | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_PERF_CONTEXT          | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_PERF_CONTEXT_GLOBAL   | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_CF_OPTIONS            | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_COMPACTION_STATS      | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_GLOBAL_INFO           | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_DDL                   | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_INDEX_FILE_MAP        | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_LOCKS                 | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_TRX                   | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
+-------------------------------+----------+--------------------+---------------+---------+
64 rows in set (0.00 sec)

MariaDB [test]> show engines;
+--------------------+---------+----------------------------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                                          | Transactions | XA   | Savepoints |
+--------------------+---------+----------------------------------------------------------------------------------+--------------+------+------------+
| ROCKSDB            | YES     | RocksDB storage engine                                                           | YES          | YES  | YES        |
| CSV                | YES     | CSV storage engine                                                               | NO           | NO   | NO         |
| MyISAM             | YES     | MyISAM storage engine                                                            | NO           | NO   | NO         |
| MEMORY             | YES     | Hash based, stored in memory, useful for temporary tables                        | NO           | NO   | NO         |
| MRG_MyISAM         | YES     | Collection of identical MyISAM tables                                            | NO           | NO   | NO         |
| CONNECT            | YES     | Management of External Data (SQL/MED), including many file formats               | NO           | NO   | NO         |
| SEQUENCE           | YES     | Generated tables filled with sequential values                                   | YES          | NO   | YES        |
| Aria               | YES     | Crash-safe tables with MyISAM heritage                                           | NO           | NO   | NO         |
| InnoDB             | DEFAULT | Supports transactions, row-level locking, foreign keys and encryption for tables | YES          | YES  | YES        |
| PERFORMANCE_SCHEMA | YES     | Performance Schema                                                               | NO           | NO   | NO         |
+--------------------+---------+----------------------------------------------------------------------------------+--------------+------+------------+
10 rows in set (0.00 sec)
Now I can create ROCKSDB tables and work with them:
MariaDB [test]> create table tmariarocks(id int primary key, c1 int) engine=rocksdb;
Query OK, 0 rows affected (0.14 sec)

MariaDB [test]> insert into tmariarocks values(1,1);
Query OK, 1 row affected (0.04 sec)

MariaDB [test]> select version(), t.* from tmariarocks t;
+----------------+----+------+
| version()      | id | c1   |
+----------------+----+------+
| 10.2.5-MariaDB |  1 |    1 |
+----------------+----+------+
1 row in set (0.00 sec)

MariaDB [test]> show create table tmariarocks\G
*************************** 1. row ***************************
       Table: tmariarocks
Create Table: CREATE TABLE `tmariarocks` (
  `id` int(11) NOT NULL,
  `c1` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=ROCKSDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

MariaDB [test]> select engine, count(*) from information_schema.tables group by engine;
+--------------------+----------+
| engine             | count(*) |
+--------------------+----------+
| Aria               |       11 |
| CSV                |        2 |
| InnoDB             |        8 |
| MEMORY             |       74 |
| MyISAM             |       25 |
| PERFORMANCE_SCHEMA |       52 |
| ROCKSDB            |        1 |
+--------------------+----------+
7 rows in set (0.03 sec)
Moreover, I can still work with InnoDB tables and even mix them with ROCKSDB ones in the same transaction (at least to some extent):
MariaDB [test]> create table t1i(id int primary key) engine=InnoDB;
Query OK, 0 rows affected (0.24 sec)

MariaDB [test]> create table t1r(id int primary key) engine=ROCKSDB;
Query OK, 0 rows affected (0.13 sec)

MariaDB [test]> insert into t1i values (1), (2), (3);
Query OK, 3 rows affected (0.05 sec)
Records: 3  Duplicates: 0  Warnings: 0

MariaDB [test]> insert into t1r select * from t1i;
Query OK, 3 rows affected (0.04 sec)
Records: 3  Duplicates: 0  Warnings: 0

MariaDB [test]> start transaction;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> insert into t1i values(5);
Query OK, 1 row affected (0.00 sec)

MariaDB [test]> insert into t1r values(6);
Query OK, 1 row affected (0.00 sec)

MariaDB [test]> commit;
Query OK, 0 rows affected (0.14 sec)

MariaDB [test]> select * from t1i;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
|  5 |
+----+
4 rows in set (0.00 sec)

MariaDB [test]> select * from t1r;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
|  6 |
+----+
4 rows in set (0.00 sec)
In the error log I see:
openxs@ao756:~/dbs/maria10.2$ tail data/ao756.err
2017-04-05 13:59:51 140157875193600 [Note]   cf=default
2017-04-05 13:59:51 140157875193600 [Note]     write_buffer_size=67108864
2017-04-05 13:59:51 140157875193600 [Note]     target_file_size_base=67108864
2017-04-05 13:59:52 140157875193600 [Note] RocksDB: creating a column family __system__
2017-04-05 13:59:52 140157875193600 [Note]     write_buffer_size=67108864
2017-04-05 13:59:52 140157875193600 [Note]     target_file_size_base=67108864
2017-04-05 13:59:52 140157875193600 [Note] RocksDB: Table_store: loaded DDL data for 0 tables
2017-04-05 13:59:52 140157875193600 [Note] RocksDB: global statistics using get_sched_indexer_t indexer
2017-04-05 13:59:52 140157875193600 [Note] RocksDB instance opened
2017-04-05 14:03:13 140157875193600 [ERROR] Invalid (old?) table or database name '.rocksdb'
So, not 100% clean integration, it's probably still at alpha stage, but now it is easy to build, migrate, mix, test, benchmark and compare InnoDB from MySQL 5.7 with latest RocksDB, all on the same MariaDB 10.2 code base!

I think a really great job was done by Sergei Petrunia, my colleagues from MariaDB, Facebook engineers and other interested MyRocks community members. I'd like to thank them all for their hard work on MyRocks and making it available in MariaDB. Maria Rocks!

by Valeriy Kravchuk (noreply@blogger.com) at April 05, 2017 03:43 PM

Jean-Jerome Schmidt

MySQL High Availability tools - Comparing MHA, MRM and ClusterControl

We previously compared two high availability solutions for MySQL - MHA and MariaDB Replication Manager and looked into how they performed fail-over. In this blog post, we’ll see how ClusterControl stacks up against these solutions. Since MariaDB Replication Manager is under active development, we decided to take a look at the not yet released version 1.1.

Flapping

All solutions provide flapping detection. MHA, by default, executes a failover once. Even after you restart masterha_manager, it will still check if the last failover didn’t happen too recently. If yes (by default, if it happened in the last 8 hours), no new failover will happen. You need to explicitly change the timeout or set --ignore_last_failover flag.

MariaDB Replication Manager has less strict defaults - it will allow up to three failovers as long as each of them will happen more than 10 seconds after the previous one. In our opinion this is a bit too flexible - if the first failover didn’t solve the problem, it is unlikely that another attempt will give better results. Still, default settings are there to be changed so you can configure MRM however you like.

ClusterControl uses similar approach to MHA - only one failover is attempted. Next one can happen only after the master has been detected successfully as online (for example, ClusterControl recovery or manual intervention by the admin managed to promote one of the slaves to a master) or after restart of cmon process.

Lost transactions

MHA can work in two modes - GTID or non-GTID. Those modes differ regarding to how missing transactions are handled. Traditional replication, actually, is handled in a better way - as long as the old master is reachable, MHA connects to it and attempts to recover missing transactions from its binary logs. If you use GTID mode, this does not happen which may lead to more significant data loss if your slaves didn’t manage to receive all relay logs - another very good reason to use semi-synchronous replication, which has you covered in this scenario.

MRM does not connect to the old master to get the logs. By default, it elects the most advanced slave and promotes it to master. Remaining slaves are slaved off this new master, making them as up to date as the new master. There is a potential for a data loss, on par with MHA’s GTID mode.

ClusterControl behaves similarly to MRM - it picks the most advanced slave as a master candidate and then, as long as it is safe (for example, there are no errant transactions), promote it to become a new master. Remaining slaves get slaved off this new master. If ClusterControl detects errant transactions, it will stop the failover and alert the administrator that manual intervention is needed. It is also possible to configure ClusterControl to skip errant transaction check and force the failover.

Network partitioning

For MHA, this has been taken care of by adding a second MHA Manager node, preferably in another section of your network. You can query it using secondary_check_script. It can be used to connect to another MHA node and execute masterha_check_repl to see how the cluster can be seen from that node. This gives MHA a better view on the situation and topology, it might not failover as it is unnecessary.

MRM implements another approach. It can be configured to use slaves, external MaxScale proxy or scripts executed through HTTP protocol on a custom port (like the scripts which governs HAProxy behavior) to build a full view of the topology and then make an informed decision based on this.

ClusterControl, at this moment, does not perform any advanced checks regarding availability of the master - it uses only its own view of the system, therefore it can take an action if there are network issues between the master and the ClusterControl host. Having said that, we are aware this can be a serious limitation and there is a work in progress to improve how ClusterControl detects failed master - using slaves and proxies like MaxScale or ProxySQL to get a broader picture of the topology.

Roles

Within MHA you are able to apply roles to a specific host, so for instance ‘candidate_master’ and ‘no_master’ will help you determine which hosts are preferred to become master. A good example could be the data center topology: spread the candidate master nodes over multiple racks to ensure HA. Or perhaps you have a delayed slave that may never become the new master even if it is the last node remaining.

This last scenario is likely to happen with MariaDB Replication Manager as it can’t see the other nodes anymore and thus can’t determine that this node is actually, for instance, 24 hours behind. MariaDB does not support the Delayed Slave command but it is possible to use pt-slave-delay instead. There is a way to set the maximum slave delay allowed for MRM, however MRM reads the Seconds_Behind_Master from the slave status output. Since MRM is executed after the master is dead, this value will obviously be null.

At the beginning of the failover procedure, ClusterControl builds a list of slaves which can be promoted to master. Most of the time, it will contain all slaves in the topology but the user has some additional control over it. There are two variables you can set in the cmon configuration:

replicaton_failover_whitelist

and

replicaton_failover_blacklist

The whitelist contains a list of IP’s or hostnames of slaves which should be used as potential master candidates. If this variable is set, only those hosts will be considered. The second variable may contain a list of hosts which will never be considered as master candidate. You can use it to list slaves that are used for backups or analytical queries. If the hardware varies between slaves, you may want to put here the slaves which use slower hardware.

Replication_failover_whitelist takes precedence, meaning the replication_failover_blacklist is ignored if replication_failover_whitelist is set

Integration

MHA is a standalone tool, it doesn’t integrate well with other external software. It does however provide hooks (pre/post failover scripts) which can be used to do some integration - for instance, execute scripts to make changes in the configuration of an external tool. MHA also uses read_only value to differentiate between master and slaves - this can also be used by external tools to drive topology changes. One example would be ProxySQL - MHA can work with this proxy using both pre/post failover scripts and with read_only values, depending on the ProxySQL configuration. It’s worth mentioning that, in GTID mode, MHA doesn’t support MariaDB GTID - it only supports Oracle MySQL or Percona Server.

MRM integrates nicely with MaxScale - it can be used along MaxScale in a couple of ways. It could be set so MaxScale will do the work to monitor the health of the nodes and execute MRM as needed, to perform failovers. Another option is that MRM drives MaxScale - monitoring is done on MRM’s side and MaxScale’s configuration is updated as needed. MRM also sets read_only variables so it makes it compatible with other tools which understand those settings (like ProxySQL, for example). A direct integration with HAProxy is also available - MRM, if collocated, may modify the HAProxy configuration whenever the topology changes. On the cons side, MRM works only with MariaDB installations - it is not possible to use it with Oracle MySQL’s version of GTID.

ClusterControl uses read_only variables to differentiate between master and slave nodes. This is enough to integrate with every kind of proxy which could be deployed from ClusterControl: ProxySQL, MaxScale and HAProxy. Failover executed by ClusterControl will be detected and handled by any of those proxies. ClusterControl also integrates with external tools regarding management. It provides access to management console for MaxScale and, to some extend, to HAProxy. Advanced support for ProxySQL will be added shortly. Metrics are provided for HAProxy and ProxySQL. ClusterControl supports both Oracle GTID and MariaDB GTID.

Conclusion

If you are interested in details how MHA, MRM or ClusterControl handle failover, we’d like to encourage you to take a look at the blog posts listed below:

Below is a summary of the differences between the different HA solutions:

  MHA MRM ClusterControl
Replication support non-GTID, Oracle GTID MariaDB GTID Oracle GTID and MariaDB GTID
Flapping One failover allowed Defaults are less restrictive but can be modified One failover allowed unless it brings the master online
Lost transactions Very good handling for non-GTID, no checking for transactions on master for GTID setups No checking for transactions on master No checking for transactions on master
Network Partitioning No support built in, can be added through user-created scripts Very good false positive detection using slaves, proxy or external scripts No support at this moment, work in progress to build false positive detection using proxy and slaves
Roles Support for whitelist and blacklist of hosts to promote to master No support Support for whitelist and blacklist of hosts to promote to master
Integration Can be integrated with external tools using hooks. Uses read_only variable to identify master and slaves which helps to integrate with other tools that understand this pattern. Close integration with MaxScale, integration with HAProxy is also available. Uses read_only variable to identify master and slaves which helps to integrate with other tools that understand this pattern. Can be integrated with external tools using hooks. Uses read_only variable to identify master and slaves which helps to integrate with other tools that understand this pattern.

If we are talking about handling master failure, each of the solutions does its job well and feature-wise they are mostly on par. There are some differences in almost every aspect that we compared but, at the end, each of them should handle most of the master failures pretty well. ClusterControl lacks more advanced network partitioning detection but this will change soon. What could be important to keep in mind that those tools support different replication methods and this alone can limit your options. If you use non-GTID replication, MHA is the only option for you. If you use GTID, MHA and MRM are restricted to, respectively, Oracle MySQL and MariaDB GTID setups. Only ClusterControl (you can test it for free) is flexible enough to handle both types of GTID under one tool - this could be very useful if you have a mixed environment while you still would like to use one single tool to ensure high availability of your replication setup.

by krzysztof at April 05, 2017 09:59 AM

Daniël van Eeden

Network attacks on MySQL, Part 5: Attack on SHA256 based passwords

The mysql_sha256_password doesn't use the nonce system which is used for mysql_new_password, but instead forces the use of RSA or SSL.

This is how that works:

  1. The client connects
  2. The server changes authentication to sha256 password (or default?)
  3. The server sends the RSA public key.
  4. The client encrypts the password with the RSA public key and sends it to the server.
  5. The server decrypts the password with the private key and validates it.

The problem is that the client trusts public key of the server. It is possible to use --server-public-key-path=file_name. But then you need to take care of secure public key distribution yourself.

So if we put a proxy between the client and the server and then have the proxy sent its own public key... then we can decrypt it and reencode it with the real public key and send it to the server. Also the decrypted password is the password, not a hash. So we then know the real password.

And if SSL is used it doesn't do the RSA encryption... but this can be a connection with an invalid certificate. Just anything as long as the connection is SSL.

by Daniël van Eeden (noreply@blogger.com) at April 05, 2017 07:00 AM

April 04, 2017

Peter Zaitsev

Percona Live Webinar Thursday, April 6, 2017: Best Practices Migrating to Open Source Databases

Please join Percona’s CEO and Founder, Peter Zaitsev on April 6th, 2017 at 8:00 am PDT / 11:00 am EDT (UTC-7) as he presents Best Practices Migrating to Open Source Databases.

This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have.

In this webinar, we will look at how to address such concerns to help get a migration commitment. We’ll cover picking the right project, selecting the right team to manage migration, and developing the right migration path to maximize migration success (from a technical and organizational standpoint).

Register for the webinar here.

Peter ZaitsevPeter Zaitsev, Co-Founder and CEO, Percona

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 150 professionals in 20 plus countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of Internet giants, large enterprises and many exciting startups. Percona was named to the Inc. 5000 in 2013, 2014 and 2015.

Peter was an early employee at MySQL AB, eventually leading the company’s High-Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High-Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Data Performance Blog. Fortune and DZone tapped him as a contributor, and his recent ebook Practical MySQL Performance Optimization Volume 1 is one of percona.com’s most popular downloads.

by Dave Avery at April 04, 2017 04:45 PM

Percona Live Featured Session: Using SelectStar to Monitor and Tune Your Databases

Percona Live Featured SessionWelcome to another post in the series of Percona Live featured session blogs! In these blogs, we’ll highlight some of the session speakers that will be at this year’s Percona Live conference. We’ll also discuss how these sessions can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured session, we’ll meet the folks at SelectStar, a database monitoring and management tool company. SelectStar will be a sponsor at Percona Live this year.

I recently came across the SelectStar database monitoring product. There are a number of monitoring products on the market (with the evolution of various SaaS and on-premises solutions), but SelectStar piqued my interest for a few reasons. I had a chance to speak with Cameron Jones, Principal Product Manager at SelectStar about their tool:

Percona: What are the challenges that lead to developing SelectStar?

Cameron: One of the challenges that we’ve found in the database monitoring and management sector comes from the dilution of the database market – and not in a bad way. Traditional, closed source database solutions continue to be used across the board (especially by large enterprises), but open source options like MySQL, MongoDB, PostgreSQL and Elasticsearch continue to gain traction as organizations seek solutions that meet their demand for agility and flexibility.

From a database monitoring perspective, this adds some challenges. Traditional solutions are focused on monitoring RDBMS and are really great at it, while newer solutions may only focus on one piece of the puzzle (NoSQL or cloud only, for example).

Percona: How does SelectStar compare to other monitoring and management tools?

Cameron: SelectStar covers a wide array of open and closed source database solutions and is easy to setup. This makes it ideal for enterprises that have a lot going on. Here is the matrix of supported products from our website:

Database Types Key Metrics Monitored by SelectStar
Big Data

  • Hadoop
  • Cassandra
  • Ops Counters – Inserts, Queries, etc.
  • Network Traffic
  • Asserts
  • Locks
  • Memory Usage
Cloud

  • Amazon Aurora
  • Amazon Dynamo
  • Amazon RDS
  • Microsoft Azure
  • Queries
  • Memory Usage
  • Network
  • CPU Balance
  • IOPS
NoSQL

  • MongoDB
  • Ops Counters – Inserts, Queries, etc.
  • Network Traffic
  • Asserts
  • Locks
  • Memory Usage
Open Source

  • PostgreSQL
  • MongoDB
  • MySQL
  • MariaDB
  • Average Query Execution Time
  • Query Executions
  • Memory Usage
  • Wait Time
Traditional RDBMS

  • IBM DB2
  • MS SQL Server
  • Oracle
  • Average Query Execution Time
  • Query Executions
  • Memory Usage
  • Wait Time


In addition to monitoring key metrics for different database types, one of the key differences with SelectStar came from its comprehensive alerts and recommendations system.

The alerts and recommendations are designed to ensure you have an immediate understanding of key issues – and where they are coming from. MonYOG is great at this for MySQL, but lacks on other aspects. With SelectStar, you can pinpoint the exact database instance that may be causing the issue; or go further up the chain and see if it’s an issue impacting several database instances at the host level.

Recommendations are often tied to alerts – if you have a red alert, there’s going to be a recommendation tied to it on how you can improve. However, the recommendations pop up even if your database is completely healthy – ensuring that you have visibility into how you can improve your configuration before you actually have an issue impacting performance.

With insight into key metrics, alerts and recommendations, you can fine tune your database performance. In addition, it gives you the opportunity to become more proactive with your database monitoring.

Percona: Is configuring SelectStar difficult?

Cameron: SelectStar is easy to set up – in fact, most customers are up and running in 20 minutes.

Simply head over to the website – selectstar.io – and log in. From there, you’ll be greeted by a welcome screen where you can easily click through and configure a database.

Percona Live Featured Session SelectStar

To configure a database, you select your type:

Percona Live Featured Session SelectStar

And from there, set up your collector by inputting some key information.

Percona Live Featured Session SelectStar

And that’s it! As soon as it’s configured, the collector will start gathering information and data is populated within 20 minutes.

Percona: How does SelectStar work?

Cameron: Using agentless collectors, SelectStar gathers data from both your on-premises and AWS platforms so that you can have insight into all of your database instances.

Percona Live Featured Session SelectStar

The collector is basically an independent machine within your infrastructure that pulls data from your database. It is low impact so that it doesn’t impact performance. This is a different approach from all of the other monitoring tools.

Percona Live Featured Session

Router Metrics (Shown Above)

Percona Live Featured Session

Mongo relationship tree displaying router, databases, replica set, shards and nodes. (Shown Above)

Percona: Any final thoughts? What are you looking forward to at Percona Live?

Cameron: If you’re in the market for a new database monitoring solution, SelectStar is worth looking at because it covers a breadth of databases with the depth into key metrics, alerts and notifications that optimize performance across your databases. We have a free trial, so you have an easy option to try it. We’re looking forward to meeting with as much of the community as possible, getting feedback and hearing about people’s monitoring needs.

Register for Percona Live Data Performance Conference 2017, and meet the creators of SelectStar. You can find them at selectstar.ioUse the code FeaturedTalk and receive $100 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community, as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

by Manjot Singh at April 04, 2017 04:18 PM

April 03, 2017

Peter Zaitsev

New MariaDB Dashboard in Percona Monitoring and Management Metrics Monitor

MariaDB

In honor of the upcoming MariaDB M17 conference in New York City on April 11-12, we have enhanced Percona Monitoring and Management (PMM) Metrics Monitor with a new MariaDB Dashboard and multiple new graphs!

The Percona Monitoring and Management MariaDB Dashboard builds on the efforts of the MariaDB development team to instrument the Aria Storage Engine Status Variables related to Aria Pagecache and Aria Transaction Log activity, the tracking of Index Condition Pushdown (ICP), InnoDB Online DDL when using ALTER TABLE ... ALGORITHM=INPLACE, InnoDB Deadlocks Detected, and finally InnoDB Defragmentation. This new dashboard is available in Percona Monitoring and Management release 1.1.2. Download it now using our docker, VirtualBox or Amazon AMI installation options!

Percona Monitoring and Management (PMM) is a free and open-source platform for managing and monitoring MySQL®, MariaDB® and MongoDB® performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL, MariaDB® and MongoDB servers to ensure that your data works as efficiently as possible.

Aria Pagecache Reads/Writes

MariaDB 5.1 introduced the Aria Storage Engine, which is MariaDB’s MyISAM replacement. Originally known as the Maria storage engine, they renamed it in late 2010 in order to avoid confusion with the overall MariaDB project name. The Aria Pagecache Status Variables graph plots the count of disk block reads and writes, which occur when the data isn’t already in the Aria Pagecache. We also plot the reads and writes from the Aria Page Cache, which count the reads/writes that did not incur a disk lookup (as the data was previously fetched and available from the Aria pagecache):

MariaDB - Aria Pagecache Reads/Writes

Aria Pagecache Blocks

Aria reads and writes to the pagecache in order to cache data in RAM and avoid or delay activity related to disk. Overall, this translates into faster database query response times:

  • Aria_pagecache_blocks_not_flushed: The number of dirty blocks in the Aria pagecache.
  • Aria_pagecache_blocks_unused: Free blocks in the Aria pagecache.
  • Aria_pagecache_blocks_used: Blocks used in the Aria pagecache.

Aria Pagecache Total Blocks is calculated using Aria System Variables and the following formula:aria_pagecache_buffer_size / aria_block_size:MariaDB - Aria Pagecache Blocks

Aria Transaction Log Syncs

As Aria strives to be a fully ACID- and MVCC-compliant storage engine, an important factor is support for transactions. A transaction is the unit of work in a database that defines how to implement the four properties of Atomicity, Consistency, Isolation, and Durability (ACID). This graph tracks the rate at which Aria fsyncs the Aria Transaction Log to disk. You can think of this as the “write penalty” for running a transactional storage engine:

MariaDB - Aria Transaction Log Syncs

InnoDB Online DDL

MySQL 5.6 released the concept of an in-place DDL operation via ALTER TABLE ... ALGORITHM=INPLACE, which in some cases avoided performing a table copy and thus didn’t block INSERT/UPDATE/DELETE. MariaDB implemented three measures to track ongoing InnoDB Online DDL operations, which we plot via the following three status variables:

  • Innodb_onlineddl_pct_progress: Shows the progress of the in-place alter table. It might not be accurate, as in-place alter is highly dependent on the disk and buffer pool status
  • Innodb_onlineddl_rowlog_pct_used: Shows row log buffer usage in 5-digit integers (10000 means 100.00%)
  • Innodb_onlineddl_rowlog_rows: Number of rows stored in the row log buffer

MariaDB - InnoDB Online DLL

For more information, please see the MariaDB blog post Monitoring progress and temporal memory usage of Online DDL in InnoDB.

InnoDB Defragmentation

MariaDB merged the Facebook/Kakao defragmentation patch for defragmenting InnoDB tablespaces into their 10.1 release. Your MariaDB instance needs to have started with innodb_defragment=1 and your tables need to be in innodb_file_per_table=1 for this to work. We plot the following three status variables:

  • Innodb_defragment_compression_failures: Number of defragment re-compression failures
  • Innodb_defragment_failures: Number of defragment failures
  • Innodb_defragment_count: Number of defragment operations

MariaDB - InnoDB Defragmentation

Index Condition Pushdown

Oracle introduced this in MySQL 5.6. From the manual:

Index Condition Pushdown (ICP) is an optimization for the case where MySQL retrieves rows from a table using an index. Without ICP, the storage engine traverses the index to locate rows in the base table and returns them to the MySQL server which evaluates the WHERE condition for the rows. With ICP enabled, and if parts of the WHERE condition can be evaluated by using only columns from the index, the MySQL server pushes this part of the WHERE condition down to the storage engine. The storage engine then evaluates the pushed index condition by using the index entry and only if this is satisfied is the row read from the table. ICP can reduce the number of times the storage engine must access the base table and the number of times the MySQL server must access the storage engine.

Essentially, the closer that ICP Attempts are to ICP Matches, the better!

MariaDB - Index Condition Pushdown (ICP)

InnoDB Deadlocks Detected (MariaDB 10.1 Only)

Ever since MySQL implemented a transactional storage engine there have been deadlocks. Deadlocks are conditions where different transactions are unable to proceed because each holds a lock that the other needs. In MariaDB 10.1, there is a Status variable that counts the occurrences of deadlocks since the server startup. Previously, you had to instrument your application to get an accurate count of deadlocks, because otherwise you could miss occurrences if your polling interval wasn’t configured frequent enough (even using pt-deadlock-logger). Unfortunately, this Status variable doesn’t appear to be present in the MariaDB 10.2.4 build I tested:

MariaDB - InnoDB Deadlocks Detected

Again, please download Percona Monitoring and Management 1.1.2 to take advantage of the new MariaDB Dashboard and new graphs!  For installation instructions, see the Deployment Guide.

You can see the MariaDB Dashboard and new graphs in action at the PMM Demo site. If you feel the graphs need any tweaking or if I’ve missed anything, leave a note on the blog. You can also write me directly (I look forward to your comments): michael.coburn@percona.com.

To start: on the ICP graph, should we have a line that defines the percentage of successful ICP matches vs. attempts?

by Michael Coburn at April 03, 2017 09:56 PM

MariaDB Foundation

Making life prettier with gdb PrettyPrinting API

Anyone who has peeked inside a gdb manual knows that gdb has some kind of Python API. And anyone who has skimmed through has seen something called “Pretty Printing” that supposedly tells gdb how to print complex data structures in a nice and readable way. Well, at least I have seen that, but I’ve never […]

The post Making life prettier with gdb PrettyPrinting API appeared first on MariaDB.org.

by Sergei at April 03, 2017 06:08 PM

Peter Zaitsev

Percona Monitoring and Management 1.1.2 is Now Available

Percona Monitoring and Management

Percona Monitoring and ManagementPercona announces the release of Percona Monitoring and Management 1.1.2 on April 3, 2017.

For installation instructions, see the Deployment Guide.

This release includes several new dashboards in Metrics Monitor, updated versions of software components used in PMM Server, and a number of small bug fixes.

Thank You to the Community!

We would like to mention some of the key contributors in this release, and thank the community for continued support of PMM:

New Dashboards and Graphs

This release includes the following new dashboards:

MongoDB MMAPv1 Dashboard

MongoDB InMemory Dashboard

  • MariaDB dashboard includes three new graphs for the Aria storage engine. There will be a detailed blog post about monitoring possibilities with these new graphs:

Aria Pagecache reads and writes

Aria Pagecache Blocks

Aria Transaction Log Syncs

The new MariaDB dashboard also includes three new graphs for monitoring InnoDB within MariaDB. We are planning to move them into one of the existing InnoDB dashboards in the next PMM release:

  • The InnoDB Defragmentation graph shows how OPTIMIZE TABLE impacts defragmentation on tables when running MariaDB with innodb_file_per_table=1 and innodb_defragment=1.

InnoDB Defragmentation

  • The InnoDB Online DDL graph includes metrics related to online DDL operations when using ALTER TABLE ... ALGORITHM=INPLACE in MariaDB.

InnoDB Online DDL

  • The InnoDB Deadlocks Detected graph currently works only with MariaDB 10.1. We are planning to add support for MariaDB 10.2, Percona Server, and MySQL in the next PMM release.

InnoDB Deadlocks Detected

  • The Index Condition Pushdown graph shows how InnoDB leverages the Index Condition Pushdown (ICP) routines. Currently this graph works only with MariaDB, but we are planning to add support for Percona Server and MySQL in the next PMM release.

Index Condition Pushdown (ICP)

Updated Software

PMM is based on several third-party open-source software components. We ensure that PMM includes the latest versions of these components in every release, making it the most secure, stable and feature-rich database monitoring platform possible. Here are some highlights of changes in the latest releases:

  • Grafana 4.2 (from 4.1.1)
    • HipChat integration
    • Templating improvements
    • Alerting enhancements
  • Consul 0.7.5 (from 0.7.3)
    • Bug fix for serious server panic
  • Prometheus 1.5.2 (from 1.5.1)
    • Prometheus binaries are built with Go1.7.5
    • Fixed two panic conditions and one series corruption bug
  • Orchestrator 2.0.3 (from 2.0.1)
    • GTID improvements
    • Logging enhancements
    • Improved timing resolution and faster discoveries

Other Changes in PMM Server

  • Migrated the PMM Server docker container to use CentOS 7 as the base operating system.
  • Changed the entry point so that supervisor is PID 1.
  • PMM-633: Set the following default values in my.cnf:
    [mysqld]
    # Default MySQL Settings
    innodb_buffer_pool_size=128M
    innodb_log_file_size=5M
    innodb_flush_log_at_trx_commit=1
    innodb_file_per_table=1
    innodb_flush_method=O_DIRECT
    # Disable Query Cache by default
    query_cache_size=0
    query_cache_type=0
  • PMM-676: Added descriptions for graphs in Disk Performance and Galera dashboards.

Changes in PMM Client

  • Fixed pmm-admin remove --all to clear all saved credentials.
  • Several fixes to mongodb_exporter including PMM-629 and PMM-642.
  • PMM-504: Added ability to change the name of a client with running services:
    $ sudo pmm-admin config --client-name new_name --force

    WARNING: Some Metrics Monitor data may be lost when renaming a running client.

About Percona Monitoring and Management

Percona Monitoring and Management is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.

PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

A live demo of PMM is available at pmmdemo.percona.com.

Please provide your feedback and questions on the PMM forum.

If you would like to report a bug or submit a feature request, use the PMM project in JIRA.

by Alexey Zhebel at April 03, 2017 05:12 PM

March 31, 2017

MariaDB Foundation

Extended maintenance period for MariaDB 5.5

As the maintenance policy of the MariaDB Foundation states, we are committed to maintaining each release for 5 years. As MariaDB 5.5 was announced for General Availability in 2012, the five year mark will soon be passed. However, since MariaDB 5.5 is widely used in many major Linux distributions in production use at the moment, […]

The post Extended maintenance period for MariaDB 5.5 appeared first on MariaDB.org.

by Otto Kekäläinen at March 31, 2017 07:23 PM

Jean-Jerome Schmidt

Webinar Replay and Q&A: Load balancing MySQL & MariaDB with ProxySQL & ClusterControl

Thanks to everyone who participated in our recent webinar on how to load balance MySQL and MariaDB with ClusterControl and ProxySQL!

This joint webinar with ProxySQL creator René Cannaò generated a lot of interest … and a lot of questions!

We covered topics such as ProxySQL concepts (with hostgroups, query rules, connection multiplexing and configuration management), went through a live demo of a ProxySQL setup in ClusterControl (try it free) and discussed upcoming ClusterControl features for ProxySQL.

These topics triggered a lot of related questions, to which you can find our answers below.

If you missed the webinar, would like to watch it again or browse through the slides, it is available for viewing online.

Watch the webinar replay

You can also join us for our follow-up webinar next week on Tuesday, April 4th 2017. We’re again joined by René and will be discussing High Availability in ProxySQL.

Sign up for the webinar on HA in ProxySQL

Webinar Questions & Answers

Q. Thank you for your presentation. I have a question about connection multiplexing: does ProxySQL ensure that all statements from start transaction to commit are sent through the same backend connection?

A. This is configurable.

A small preface first: at any time, each client’s session can have one or more backend connections associated with it. A backend connection is associated to a client when a query needs to be executed, and normally it returns immediately back to the connection pool. “Normally” means that there are circumstances when this doesn’t happen. For example, when a transaction starts, the connection is not returned anymore to the connection pool until the transaction completes (either commits or rollbacks). This means that all the queries that should be routed to the same hostgroup where the transaction is running, are guaranteed to run in the same connection.

Nonetheless, by default, a transaction doesn’t disable query routing. That means that while a transaction is running on one connection to a specific hostgroup and this connection is associated with only that client, if the client sends a query destinated to another hostgroup, that query could be sent to a different connection.

Whatever the query could be sent to a different connection or not based on query rules is configurable by the value of mysql_users.transaction_persistent:

  • 0 = queries for different hostgroup can be routed to different connections while a transaction is running;
  • 1 = query routing will be disabled while the transaction is running.

The behaviour is configurable because it depends on the application. Some applications require that all the queries are part of the same transaction, other applications don’t.

Q. What is the best way to set up a ProxySQL cluster? The main concern here is configuration of the ProxySQL cascading throughout the cluster.

A. ProxySQL can be deployed in numerous ways.

One typical deployment pattern is to deploy a ProxySQL instance on every application host. The application would then connect to the proxy using very low latency connection via Unix socket. If the number of application hosts increase, you can deploy a middle-layer of 3-5 ProxySQL instances and configure all ProxySQL instances from application servers to connect via this middle-layer. Configuration management, typically, would be handled using Puppet/Chef/Ansible infrastructure orchestration tools. You can also easily use home-grown scripts as ProxySQL’s admin interface is accessible via MySQL command line and ProxySQL reconfiguration can be done by issuing a couple of SQL statements.

Q. How would you recommend to make the ProxySQL layer itself highly available?

There are numerous methods to achieve this.

One common method is to deploy a ProxySQL instance on every application host. The application would then connect to the proxy using very low latency connection via Unix socket. In such a deployment there is no single point of failure as every application host connects to the ProxySQL installed locally.

When you implement a middle-layer, you will also maintain HA as 3-5 ProxySQL nodes would be enough to make sure that at least some of them are available for local proxies from application hosts.

Another common method of deploying a highly available ProxySQL setup is to use tools like keepalived along with virtual IP. The application will connect to VIP and this IP will be moved from one ProxySQL instance to another if keepalived detects that something happened to the “main” ProxySQL.

Q. How can ProxySQL use the right hostgroup for each query?

A. ProxySQL route queries to hostgroups is based on query rules - it is up to the user to build a set of rules which make sense in their environment.

Q. Can you tell us more about query mirroring?

A. In general, the implementation of query mirroring in ProxySQL allows you to send traffic to two hostgroups.

Traffic sent to the “main” hostgroup is ensured to reach it (unless there are no hosts in that hostgroup); on the other hand, mirror hostgroup will receive traffic on a “best effort” basis - it should but it is not guaranteed that the query will indeed reach the mirrored hostgroup.

This limits the usefulness of mirroring as a method to replicate data. It is still an amazing way to do load testing of new hardware or redesigned schema. Of course, mirroring reduces the maximal throughput of the proxy - queries have to be executed twice so the load is also twice as high. The load is not split between the two, but duplicated.

Q. And what about query caching?

Query cache in ProxySQL is implemented as a simple key->value memory store with Time To Live for every entry. What will be cached and for how long - this is decided on the query rules level. The user can define a query rule matching a particular query or a wider spectrum of them. To identify query results set in cache, ProxySQL uses query hash along with information about user and schema.

How to set TTL for a query? The simplest answer is: to the maximum value of replication lag which is acceptable for this query. If you are ok to read stale data from slave, which is lagging 10 seconds, you should be fine reading stale data from cache when TTL is set to 10000 milliseconds.

Q. Connection limit to backends?

A. ProxySQL indeed implements a connection limit to backend servers. The maximum number of connections to any backend instance is defined in mysql_servers table.

Because the same backend server can be present in multiple hostgroups, it is possible to define the maximum number of connections per server per hostgroup.

This is useful for example in the case of a small set of connections where specific long running queries are queued without affecting the rest of the traffic destinated to the same server.

Q. Regarding the connection limit from the APP: are connections QUEUED?

A. If you reach the mysql-max_connections, further connections will be rejected with the error “Too many connections”.

It is important to remember that there is not a one-to-one mapping between application connections and backend connections.

That means that:

  • Access to the backends can be queued, but connections from the application are either accepted or rejected.
  • A large number of application connections can use a small number of backend connections.

Q. I haven’t heard of SHUN before: what does it mean?

A. SHUN means that the backend is temporarily marked as non-available but ProxySQL will attempt to connect to it after mysql-shun_recovery_time_sec seconds

Q. Is query sharding available across slaves?

A. Depending on the meaning of sharding, ProxySQL can be used to perform sharding across slaves. For example, it is possible to send all traffic for a specific set of tables to a set of slaves (in a hostgroup). Splitting the slaves into multiple hostgroups and performing query sharding accordingly is possible to improve performance, as each slave won’t read from disk data from tables for which it doesn’t process any query.

Q. How do you sync the configuration of ProxySQL when you have many instances for H.A ?

A. Configuration management, typically, would be handled using Puppet/Chef/Ansible infrastructure orchestration tools. You can also easily use home-grown scripts as ProxySQL’s admin interface is accessible via MySQL command line and ProxySQL reconfiguration can be done by issuing a couple of SQL statements.

Q. How flexible or feasible it is to change the ProxySQL config online, eg. if one database slave is down, how is that handled in such a scenario ?

A. ProxySQL configuration can be changed at any time; it’s been designed with such level of flexibility in mind.

‘Database down’ can be handled differently, it depends on how ProxySQL is configured. If you happen to rely on replication hostgroups to define writer and reader hostgroups (this is how ClusterControl deploys ProxySQL), ProxySQL will monitor state of read_only variable on both reader and writer hostgroups and it will move hosts as needed.

If master is promoted by external tools (like ClusterControl, for example), read_only values will change and ProxySQL will detect a topology change and it will act accordingly. For a standard “slave down” scenario there is no required action from the management system standpoint - without any changes in read_only value ProxySQL will just detect that the host is not available and it will stop sending queries to it, re-executing on other members of the hostgroup those queries which didn’t complete on dead slave.

If we are talking about a setup not using replication hostgroups then it is up to the user and their scripts/tools to implement some sort of logic and reconfigure ProxySQL on runtime using admin interface. Slave down, though, most likely wouldn’t require any changes.

Q. Is it somehow possible to SELECT data from one host group into another host group?

A. No, at this point it is not possible to execute cross-hostgroup queries.

Q. What would be RAM/Disk requirements for logs , etc?

A. It basically depends on the amount of log entries and how ProxySQL log is verbose in your environment. Typically it’s neglectable.

Q. Instead of installing ProxySQL on all application servers, could you put a ProxySQL cluster behind a standard load balancer?

A. We see no reason why not? You can put whatever you like in front of the ProxySQL - F5, another layer of software proxies - it is up to you. Please keep in mind, though, that every layer of proxies or load balancers adds latency to your network and, as a result, to your queries.

Q. Can you please comment on Reverse Proxy, whether it can be used in SQL or not?

A. ProxySQL is a Reverse Proxy. Contrary to a Forward Proxy (that acts as an intermediary that simply forwards requests), a Reverse Proxy processes clients’ requests and retrieves data from servers. ProxySQL is a Reverse Proxy: clients send requests to ProxySQL, that will understand the request, analyze it, and decide what to do: rewrite, cache, block, re-execute on failure, etc.

Q. Does the user authentication layer work with non-local database accounts, e.g. with the pam modules available for proxying LDAP users to local users?

A. There is no direct support for LDAP integration but, as configuration management in ProxySQL is a child’s play, it is really simple to put together a script which will pull the user details from LDAP and load them into ProxySQL. You can use cron to sync it often. All ProxySQL needs is a username and password hash in MySQL format - this is enough to add a user to ProxySQL.

Q. It seems like the prescribed production deployment includes many proxies - are there any suggestions or upcoming work to address how to make configuration changes across all proxies in a consistent manner?

A. At this point it is recommended to leverage configuration management tools like Chef/Ansible/Puppet to manage ProxySQL’s configuration.

Watch the webinar replay

You can also join us for our follow-up webinar next week on Tuesday, April 4th 2017. We’re again joined by René and will be discussing High Availability in ProxySQL.

Sign up for the webinar on HA in ProxySQL

by krzysztof at March 31, 2017 09:59 AM

March 30, 2017

Peter Zaitsev

Performance Evaluation of SST Data Transfer: With Encryption (Part 2)

SST Data Transfer

In this blog post, we’ll look at the performance of SST data transfer using encryption.

In my previous post, we reviewed SST data transfer in an unsecured environment. Now let’s take a closer look at a setup with encrypted network connections between the donor and joiner nodes.

The base setup is the same as the previous time:

  • Database server: Percona XtraDB Cluster 5.7 on donor node
  • Database: sysbench database – 100 tables 4M rows each (total ~122GB)
  • Network: donor/joiner hosts are connected with dedicated 10Gbit LAN
  • Hardware: donor/joiner hosts – boxes with 28 Cores+HT/RAM 256GB/Samsung SSD 850/Ubuntu 16.04

The setup details for the encryption aspects in our testing:

  • Cryptography libraries: openssl-1.0.2, openssl-1.1.0, libgcrypt-1.6.5(for xbstream encryption)
  • CPU hardware acceleration for AES – AES-NI: enabled/disabled
  • Ciphers suites: aes(default), aes128, aes256, chacha20(openssl-1.1.0)

Several notes regarding the above aspects:

  • Cryptography libraries. Now almost every Linux distribution is based on the openssl-1.0.2. This is the previous stable version of the OpenSSL library. The latest stable version (1.1.0) has various performance/scalability fixes and also support of new ciphers that may notably improve throughput, However, it’s problematic to upgrade from 1.0.2 to 1.1.0, or just to find packages for openssl-1.1.0 for existing distributions. This is due to the fact that replacing OpenSSL triggers update/upgrade of a significant number of packages. So in order to use openssl-1.1.0, most likely you will need to build it from sources. The same applies to socat – it will require some effort to build socat with openssl-1.1.0.
  • AES-NI. The Advanced Encryption Standard Instruction Set (AES-NI) is an extension to the x86 CPU’s from Intel and AMD. The purpose of AES-NI is to improve the performance of encryption and decryption operations using the Advanced Encryption Standard (AES), like the AES128/AES256 ciphers. If your CPU supports AES-NI, there should be an option in BIOS that allows you to enabled/disable that feature. In Linux, you can check /proc/cpuinfo for the existence of an “aes” flag. If it’s present, then AES-NI is available and exposed to the OS.There is a way to check what acceleration ratio you can expect from it:
    # AES_NI disabled with OPENSSL_ia32cap
    OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-gcm
    ...
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128-gcm      57535.13k    65924.18k   164094.81k   175759.36k   178757.63k
    # AES_NI enabled
    openssl speed -elapsed -evp aes-128-gcm
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128-gcm     254276.67k   620945.00k   826301.78k   906044.07k   923740.84k

    Our interest is the very last column: 178MB/s(wo AES-NI) vs 923MB/s(w AES-NI)
  • Ciphers. In our testing for network encryption with socat+openssl 1.0.2/1.1.0, we used the following ciphers suites:
    DEFAULT – if you don’t specify a cipher/cipher string for OpenSSL connection, this suite will be used
    AES128 – suite with aes128 ciphers only
    AES256 – suites with aes256 ciphers onlyAdditionally, for openssl-1.1.0, there is an extra cipher suite:
    CHACHA20 – cipher suites using ChaCha20 algoIn the case of xtrabackup, where internal encryption is based on libgcrypt, we use the AES128/AES256 ciphers from this library.
  • SST methods. Streaming database files from the the donor to joiner with the rsync protocol over an OpenSSL-encrypted connection:
    (donor) rsync | socat+ssl       socat+ssl| rsync(daemon mode) (joiner)

    The current approach of wsrep_sst_rsync.sh doesn’t allow you to use the rsync SST method with SSL. However, there is a project that tries to address the lack of SSL support for rsync method. The idea is to create a secure connection with socat and then use that connection as a tunnel to connect rsync between the joiner and donor hosts. In my testing, I used a similar approach.

    Also take a note that in the chart below, there are results for two variants of rsync: “rsync” (the current approach), and “rsync_improved” (the improved one). I’ve explained the difference between them in my previous post.

  • Backup data on the donor side and stream it to the joiner in xbstream format over an OpenSSL encrypted connection

    (donor) xtrabackup| socat+ssl  socat+ssl | xbstream (joiner)

    In my testing for streaming over encrypted connections, I used the

    --parallel=4
     option for xtrabackup. In my previous post, I showed that this is important factor to get the best time. There is also a way to pass the name of the cipher that will be used by socat for the OpenSSL connection in the wsrep_sst_xtrabackup-v2.sh script with the
    sockopt
     option. For instance:

    [sst]
    inno-backup-opts="--parallel=4"
    sockopt=",cipher=AES128"

  • Backup data on the donor side/encrypt it internally(with libgcrypt) and stream the data to the joiner in xbstream format, and afterwards decrypt files on the joiner

    (donor) xtrabackup | socat   socat | xbstream ; xtrabackup decrypt (joiner)

    The xtrabackup tool has a feature to encrypt data when performing a backup. That encryption is based on the libgcrypt library, and it’s possible to use AES128 or AES256 ciphers. For encryption, it’s necessary to generate a key and then provide it to xtrabackup to perform encryption on fly. There is a way to specify the number of threads that will encrypt data, along with the chunk size to tune process of encryption.

    The current version of xtrabackup supports an efficient way to read, compress and encrypt data in parallel, and then write/stream it. From the other side, when we accept a stream we can’t decompress/decrypt stream on the fly. At first, the stream should be received/written to disk with the xbstream tool and only after that can you use xtrabackup with

    --decrypt/--decompress
     modes to unpack data. The inability to process data on the fly and save the stream to disk for later processing has a notable impact on stream time from the donor to the joiner. We have a plan to fix that issue, so that encryption+compression+streaming of data with xtrabackup happens without the necessity to write stream to the disk on the receiver side.

    For my testing, in the case of xtrabackup with internal encryption, I didn’t use SSL encryption for socat.

Results (click on the image for an enlarged view):

SST Data Transfer

Observations:

  • Transferring data with rsync is very inefficient, and the improved version is 2-2.5 times faster. Also, you may note that in the case of “no-aes-n”, the rsync_improved method has the best time for default/aes128/aes256 ciphers. The reason is that we perform both data transfers in parallel (we spawn rsync process for each file), as well as encryption/decryption (socat forks extra processes for each stream). This approach allows us to compensate for the absence of hardware acceleration by using several CPU cores. In all other cases, we only use one CPU for streaming of data and encryption/decryption.
  • xtrabackup (with hardware optimized crc32) shows the best time in all cases, except for the default/aes128/aes256 ciphers in “no-aes-ni” mode (where rsync_imporved showed the best time). However I would like to remind you that SST with rsync is a blocking operation, and during the data transfer the donor node becomes READ-ONLY. xtrabackup, on the other hand, uses backup locks and allows any operations on donor node during SST.
  • On the boxes without hardware acceleration (no-aes-ni mode), the chacha20 cipher allows you to perform data transfer 2-3 times faster. It’s a very good replacement for “aes” ciphers on such boxes. However, the problem with that cipher is that it is available only in openssl-1.1.0. In order to use it, you will need a custom build of OpenSSL and socat for many distros.
  • Regarding xtrabackup with internal encryption (xtrabackup_enc): reading/encrypting and streaming data is quite fast, especially with the latest libgcrypt library(1.7.x). The problem is decryption. As I’ve explained above, right now we need to get the stream and save encrypted data to storage first, and then perform the extra step of reading/decrypting and saving the data back. That extra part consumes 2/3 of the total time. Improving the xbstream tool to perform steam decryption/decompression on the fly would allow you to get very good results.

Testing Details

For purposes of the testing, I’ve created a script “sst-bench.sh” that covers all the methods used in this post. You can use it to measure all the above SST methods in your environment. In order to run the script, you have to adjust several environment variables at the beginning of the script:

joiner ip
, datadirs location on joiner and donor hosts, etc. After that, put the script on the “donor” and “joiner” hosts and run it as the following:

#joiner_host>
sst_bench.sh --mode=joiner --sst-mode=<tar|xbackup|rsync> --cipher=<DEFAULT|AES128|AES256|CHACHA20> --ssl=<0|1> --aesni=<0|1>
#donor_host>
sst_bench.sh --mode=donor --sst-mode=<tar|xbackup|rsync|rsync_improved> --cipher=<DEFAULT|AES128|AES256|CHACHA20> --ssl=<0|1> --aesni=<0|1>

by Alexey Stroganov at March 30, 2017 09:08 PM

March 29, 2017

Peter Zaitsev

Performance Evaluation of SST Data Transfer: Without Encryption (Part 1)

SST Data Transfer

In this blog, we’ll look at evaluating the performance of an SST data transfer without encryption.

A State Snapshot Transfer (SST) operation is an important part of Percona XtraDB Cluster. It’s used to provision the joining node with all the necessary data. There are three methods of SST operation available: mysqldump, rsync, xtrabackup. The most advanced one – xtrabackup – is the default method for SST in Percona XtraDB Cluster.

We decided to evaluate the current state of xtrabackup, focusing on the process of transferring data between the donor and joiner nodes tp find out if there is any room for improvements or optimizations.

Taking into account that the security of the network connections used for Percona XtraDB Cluster deployment is one of the most important factors that affects SST performance, we will evaluate SST operations in two setups: without network encryption, and in a secure environment.

In this post, we will take a look at the setup without network encryption.

Setup:

  • database server: Percona XtraDB Cluster 5.7 on the donor node
  • database: sysbench database – 100 tables, 4M rows each (total ~122GB)
  • network: donor/joiner hosts are connected with dedicated 10Gbit LAN
  • hardware: donor/joiner hosts – boxes with 28 Cores+HT/RAM 256GB/Samsung SSD 850/Ubuntu 16.04

In our test, we will measure the amount of time it takes to stream all necessary data from the donor to the joiner with the help of one of SST’s methods.

Before testing, I measured read/write bandwidth limits of the attached SSD drives (with the help of sysbench/fileio): they are ~530-540MB/sec. That means that the best theoretical time to transfer all of our database files (122GB) is ~230sec.

Schematic view of SST methods:

  • Streaming DB files from the donor to joiner with tar
    (donor) tar | socat                         socat | tar (joiner)

    • tar is not really an SST method. It’s used here just to get some baseline numbers to understand how long it takes to transfer data without extra overhead.
  • Streaming DB files from the donor to joiner with rsync protocol
    (donor) rsync                               rsync(daemon mode) (joiner)

    • While working on the testing of the rsync SST method, I found that the current way of data streaming is quite inefficient: rsync parallelization is directory-based, not file-based. So if you have three directories, – for instance sbtest (100files/100GB), mysql (75files/10MB), performance_schema (88files/1M) – the rsync SST script will start three rsync processes, where each process will handle its own directory. As a result, instead of parallel transfer we end up with one stream that only streams the largest directory (sbtest). Replacing that approach with one that iterates over all files in datadir and queues them to rsync workers allows us to speed up the transfer of data 2-3 times.On the charts, ‘rsync’ is the current approach and ‘rsync_improved’ is the improved one.
  • Backup data on the donor side and stream it to the joiner in xbstream format
    (donor) xtrabackup | socat  socat | xbstream (joiner)

At the end of this post, you will find the command lines used for testing each SST method.

SST Data Transfer

Streaming of our database files with tar took a minimal amount of time, and it’s very close to the best possible time (~230sec). xtrabackup is slower (~2x), as is rsync (~3x).

From profiling xtrabackup, we can clearly see two things:

  1. IO utilization is quite low
  2. A notable amount of time was spent in crc32 computation

Issue 1
xtrabackup can process data in parallel, however by default it does it with a single thread only. Our tests showed that increasing the number of parallel threads to 2/4 with the

--parallel
 option allows us to improve IO utilization and reduce streaming time. One can pass this option to xtrabackup by adding the following to the [sst] section of my.cnf:

[sst]
inno-backup-opts="--parallel=4"

Issue 2
By default xtrabackup uses software-based crc32 functions from the libz library. Replacing this function with a hardware-optimized one allows a notable reduction in CPU usage and a speedup in data transfer. This fix will be included in the next release of xtrabackup.

SST Data Transfer

We ran more tests for xtrabackup with the parallel option and hardware optimized crc32, and got results that confirm our analysis. Streaming time for xtrabackup is now very close to baseline and storage limits.

Testing details

For the purposes of testing, I’ve created a script “sst-bench.sh” that covers all the methods used in this post. You can try to measure all the above SST methods in your environment. In order to run script, you have to adjust several environment variables in the beginning, such as

joiner ip
,
datadirs
 location on the joiner and donor hosts, etc. After that, put the script to the “donor” and “joiner” hosts and run it as the following:

#joiner_host> sst_bench.sh --mode=joiner --sst-mode=<tar|xbackup|rsync>
#donor_host>  sst_bench.sh --mode=donor --sst-mode=<tar|xbackup|rsync|rsync_improved>

by Alexey Stroganov at March 29, 2017 09:10 PM

Webinar Thursday 3/30: MyRocks Troubleshooting

MyRocks Troubleshooting

MyRocks TroubleshootingPlease join Percona’s Principal Technical Services Engineer Sveta Smirnova, and Senior Software Engineer George Lorch, MariaDB’s Query Optimizer Developer Sergei Petrunia and Facebook’s Database Engineer Yoshinori Matsunobu as they present MyRocks Troubleshooting on March 30, 2017 at 11:00 am PDT / 2:00 pm EDT (UTC-7).

MyRocks is an alternative storage engine designed for flash storage. It provides great write workload performance and space efficiency. Like any other powerful engine, it has its own specific configuration scenarios that require special troubleshooting solutions.

This webinar will discuss how to deal with:

  • Data corruption issues
  • Inconsistent data
  • Locks
  • Slow performance

We will use well-known instruments and tools, as well as MyRocks-specific tools, and demonstrate how they work with the MyRocks storage engine.

Register for this webinar here.

by Dave Avery at March 29, 2017 08:51 PM

Jean-Jerome Schmidt

MySQL Tutorial - Troubleshooting MySQL Replication Part 2

In the previous post, we discussed how to verify that MySQL Replication is in good shape. We also looked at some of the typical problems. In this post, we will have a look at some more issues that you might see when dealing with MySQL replication.

Missing or Duplicated Entries

This is something which should not happen, yet it happens very often - a situation in which an SQL statement executed on the master succeeds but the same statement executed on one of slaves fails. Main reason is slave drift - something (usually errant transactions but also other issues or bugs in the replication) causes the slave to differ from its master. For example, a row which existed on the master does not exist on a slave and it cannot be deleted or updated. How often this problem shows up depends mostly on your replication settings. In short, there are three ways in which MySQL stores binary log events. First, “statement”, means that SQL is written in plain text, just as it has been executed on a master. This setting has the highest tolerance on slave drift but it’s also the one which cannot guarantee slave consistency - it’s hard to recommend to use it in production. Second format, “row”, stores the query result instead of query statement. For example, an event may look like below:

### UPDATE `test`.`tab`
### WHERE
###   @1=2
###   @2=5
### SET
###   @1=2
###   @2=4

This means that we are updating a row in ‘tab’ table in ‘test’ schema where first column has a value of 2 and second column has a value of 5. We set first column to 2 (value doesn’t change) and second column to 4. As you can see, there’s not much room for interpretation - it’s precisely defined which row is used and how it’s changed. As a result, this format is great for slave consistency but, as you can imagine, it’s very vulnerable when it comes to data drift. Still it is the recommended way of running MySQL replication.

Finally, the third one, “mixed”, works in a way that those events which are safe to write in the form of statements use “statement” format. Those which could cause data drift will use “row” format.

How do you detect them?

As usual, SHOW SLAVE STATUS will help us identify the problem.

               Last_SQL_Errno: 1032
               Last_SQL_Error: Could not execute Update_rows event on table test.tab; Can't find record in 'tab', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log binlog.000021, end_log_pos 970
               Last_SQL_Errno: 1062
               Last_SQL_Error: Could not execute Write_rows event on table test.tab; Duplicate entry '3' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000021, end_log_pos 1229

As you can see, errors are clear and self-explanatory (and they are basically identical between MySQL and MariaDB.

How do you fix the issue?

This is, unfortunately the complex part. First of all, you need to identify a source of truth. Which host contains the correct data? Master or slave? Usually you’d assume it’s the master but don’t assume it by default - investigate! It could be that after failover, some part of the application still issued writes to the old master, which now acts as a slave. It could be that read_only hasn’t been set correctly on that host or maybe the application uses superuser to connect to database (yes, we’ve seen this in production environments). In such case, the slave could be the source of truth - at least to some extent.

Depending on which data should stay and which should go, the best course of action would be to identify what’s needed to get replication back in sync. First of all, replication is broken so you need to attend to this. Log into the master and check the binary log even that caused replication to break.

           Retrieved_Gtid_Set: 5d1e2227-07c6-11e7-8123-080027495a77:1106672
            Executed_Gtid_Set: 5d1e2227-07c6-11e7-8123-080027495a77:1-1106671

As you can see, we miss one event: 5d1e2227-07c6-11e7-8123-080027495a77:1106672. Let’s check it in the master’s binary logs:

mysqlbinlog -v --include-gtids='5d1e2227-07c6-11e7-8123-080027495a77:1106672' /var/lib/mysql/binlog.000021
#170320 20:53:37 server id 1  end_log_pos 1066 CRC32 0xc582a367     GTID    last_committed=3    sequence_number=4
SET @@SESSION.GTID_NEXT= '5d1e2227-07c6-11e7-8123-080027495a77:1106672'/*!*/;
# at 1066
#170320 20:53:37 server id 1  end_log_pos 1138 CRC32 0x6f33754d     Query    thread_id=5285    exec_time=0    error_code=0
SET TIMESTAMP=1490043217/*!*/;
SET @@session.pseudo_thread_id=5285/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1436549152/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 1138
#170320 20:53:37 server id 1  end_log_pos 1185 CRC32 0xa00b1f59     Table_map: `test`.`tab` mapped to number 571
# at 1185
#170320 20:53:37 server id 1  end_log_pos 1229 CRC32 0x5597e50a     Write_rows: table id 571 flags: STMT_END_F

BINLOG '
UUHQWBMBAAAALwAAAKEEAAAAADsCAAAAAAEABHRlc3QAA3RhYgACAwMAAlkfC6A=
UUHQWB4BAAAALAAAAM0EAAAAADsCAAAAAAEAAgAC//wDAAAABwAAAArll1U=
'/*!*/;
### INSERT INTO `test`.`tab`
### SET
###   @1=3
###   @2=7
# at 1229
#170320 20:53:37 server id 1  end_log_pos 1260 CRC32 0xbbc3367c     Xid = 5224257
COMMIT/*!*/;

We can see it was an insert which sets first column to 3 and second to 7. Let’s verify how our table looks like now:

mysql> SELECT * FROM test.tab;
+----+------+
| id | b    |
+----+------+
|  1 |    2 |
|  2 |    4 |
|  3 |   10 |
+----+------+
3 rows in set (0.01 sec)

Now we have two options, depending on which data should prevail. If correct data is on the master, we can simply delete row with id=3 on the slave. Just make sure you disable binary logging to avoid introducing errant transactions. On the other hand, if we decided that the correct data is on the slave, we need to run REPLACE command on the master to set row with id=3 to correct content of (3, 10) from current (3, 7). On the slave, though, we will have to skip current GTID (or, to be more precise, we will have to create an empty GTID event) to be able to restart replication.

Deleting a row on a slave is simple:

SET SESSION log_bin=0; DELETE FROM test.tab WHERE id=3; SET SESSION log_bin=1;

Inserting an empty GTID is almost as simple:

mysql> SET @@SESSION.GTID_NEXT= '5d1e2227-07c6-11e7-8123-080027495a77:1106672';
Query OK, 0 rows affected (0.00 sec)
mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)
mysql> COMMIT;
Query OK, 0 rows affected (0.00 sec)
mysql> SET @@SESSION.GTID_NEXT=automatic;
Query OK, 0 rows affected (0.00 sec)

Another method of solving this particular issue (as long as we accept the master as a source of truth) is to use tools like pt-table-checksum and pt-table-sync to identify where the slave is not consistent with its master and what SQL has to be executed on the master to bring the slave back in sync. Unfortunately, this method is rather on the heavy side - lots of load is added to master and a bunch of queries are written into the replication stream which may affect lag on slaves and general performance of the replication setup. This is especially true if there is a significant number of rows which need to be synced.

Finally, as always, you can rebuild your slave using data from the master - in this way you can be sure that the slave will be refreshed with the freshest, up-to-date data. This is, actually, not necessarily a bad idea - when we are talking about large number of rows to sync using pt-table-checksum/pt-table-sync, this comes with significant overhead in replication performance, overall CPU and I/O load and man-hours required.

ClusterControl allows you to rebuild a slave, using a fresh copy of the master data.

Consistency checks

As we mentioned in the previous chapter, consistency can become a serious issue and can cause lots of headaches for users running MySQL replication setups. Let’s see how you can verify that your MySQL slaves are in sync with the master and what you can do about it.

How to detect an inconsistent slave

Unfortunately, the typical way an user gets to know that a slave is inconsistent is by running into one of the issues we mentioned in the previous chapter. To avoid that proactive monitoring of slave consistency is required. Let’s check how it can be done.

We are going to use a tool from Percona Toolkit: pt-table-checksum. It is designed to scan replication cluster and identify any discrepancies.

We built a custom scenario using sysbench and we introduced a bit of inconsistency on one of the slaves. What’s important (if you’d like to test it like we did), you need to apply a patch below to force pt-table-checksum to recognize ‘sbtest’ schema as non-system schema:

--- pt-table-checksum    2016-12-15 14:31:07.000000000 +0000
+++ pt-table-checksum-fix    2017-03-21 20:32:53.282254794 +0000
@@ -7614,7 +7614,7 @@

    my $filter = $self->{filters};

-   if ( $db =~ m/information_schema|performance_schema|lost\+found|percona|percona_schema|test/ ) {
+   if ( $db =~ m/information_schema|performance_schema|lost\+found|percona|percona_schema|^test/ ) {
       PTDEBUG && _d('Database', $db, 'is a system database, ignoring');
       return 0;
    }

At first, we are going to execute pt-table-checksum in following way:

master:~# ./pt-table-checksum  --max-lag=5 --user=sbtest --password=sbtest --no-check-binlog-format --databases='sbtest'
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
03-21T20:33:30      0      0  1000000      15       0  27.103 sbtest.sbtest1
03-21T20:33:57      0      1  1000000      17       0  26.785 sbtest.sbtest2
03-21T20:34:26      0      0  1000000      15       0  28.503 sbtest.sbtest3
03-21T20:34:52      0      0  1000000      18       0  26.021 sbtest.sbtest4
03-21T20:35:34      0      0  1000000      17       0  42.730 sbtest.sbtest5
03-21T20:36:04      0      0  1000000      16       0  29.309 sbtest.sbtest6
03-21T20:36:42      0      0  1000000      15       0  38.071 sbtest.sbtest7
03-21T20:37:16      0      0  1000000      12       0  33.737 sbtest.sbtest8

Couple of important notes on how we invoked the tool. First of all, user that we set has to exists on all slaves. If you want, you can also use ‘--slave-user’ to define other, less privileged user to access slaves. Another thing worth explaining - we use row-based replication which is not fully compatible with pt-table-checksum. If you have row-based replication, what happens is pt-table-checksum will change binary log format on a session level to ‘statement’ as this is the only format supported. The problem is that such change will work only on a first level of slaves which are directly connected to a master. If you have intermediate masters (so, more than one level of slaves), using pt-table-checksum may break the replication. This is why, by default, if the tool detects row-based replication, it exits and prints error:

“Replica slave1 has binlog_format ROW which could cause pt-table-checksum to break replication. Please read "Replicas using row-based replication" in the LIMITATIONS section of the tool's documentation. If you understand the risks, specify --no-check-binlog-format to disable this check.”

We used only one level of slaves so it was safe to specify “--no-check-binlog-format” and move forward.

Finally, we set maximum lag to 5 seconds. If this threshold will be reached, pt-table-checksum will pause for a time needed to bring the lag under the threshold.

As you could see from the output,

03-21T20:33:57      0      1  1000000      17       0  26.785 sbtest.sbtest2

an inconsistency has been detected on table sbtest.sbtest2.

By default, pt-table-checksum stores checksums in percona.checksums table. This data can be used for another tool from Percona Toolkit, pt-table-sync, to identify which parts of the table should be checked in detail to find exact difference in data.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

How to fix inconsistent slave

As mentioned above, we will use pt-table-sync to do that. In our case we are going to use data collected by pt-table-checksum although it is also possible to point pt-table-sync to two hosts (the master and a slave) and it will compare all data on both hosts. It is definitely more time- and resource-consuming process therefore, as long as you have already data from pt-table-checksum, it’s much better to use it. This is how we executed it to test the output:

master:~# ./pt-table-sync --user=sbtest --password=sbtest --databases=sbtest --replicate percona.checksums h=master --print
REPLACE INTO `sbtest`.`sbtest2`(`id`, `k`, `c`, `pad`) VALUES ('1', '434041', '61753673565-14739672440-12887544709-74227036147-86382758284-62912436480-22536544941-50641666437-36404946534-73544093889', '23608763234-05826685838-82708573685-48410807053-00139962956') /*percona-toolkit src_db:sbtest src_tbl:sbtest2 src_dsn:h=10.0.0.101,p=...,u=sbtest dst_db:sbtest dst_tbl:sbtest2 dst_dsn:h=10.0.0.103,p=...,u=sbtest lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:25776 user:root host:vagrant-ubuntu-trusty-64*/;

As you can see, as a result some SQL has been generated. Important to note is --replicate variable. What happens here is we point pt-table-sync to table generated by pt-table-checksum. We also point it to master.

To verify if SQL makes sense we used --print option. Please note SQL generated is valid only at the time it’s generated - you cannot really store it somewhere, review it and then execute. All you can do is to verify if the SQL makes any sense and, immediately after, reexecute tool with --execute flag:

master:~# ./pt-table-sync --user=sbtest --password=sbtest --databases=sbtest --replicate percona.checksums h=10.0.0.101 --execute

This should make slave back in sync with the master. We can verify it with pt-table-checksum:

root@vagrant-ubuntu-trusty-64:~# ./pt-table-checksum  --max-lag=5 --user=sbtest --password=sbtest --no-check-binlog-format --databases='sbtest'
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
03-21T21:36:04      0      0  1000000      13       0  23.749 sbtest.sbtest1
03-21T21:36:26      0      0  1000000       7       0  22.333 sbtest.sbtest2
03-21T21:36:51      0      0  1000000      10       0  24.780 sbtest.sbtest3
03-21T21:37:11      0      0  1000000      14       0  19.782 sbtest.sbtest4
03-21T21:37:42      0      0  1000000      15       0  30.954 sbtest.sbtest5
03-21T21:38:07      0      0  1000000      15       0  25.593 sbtest.sbtest6
03-21T21:38:27      0      0  1000000      16       0  19.339 sbtest.sbtest7
03-21T21:38:44      0      0  1000000      15       0  17.371 sbtest.sbtest8

As you can see, there are no diffs anymore in sbtest.sbtest2 table.

We hope you found this blog post informative and useful. Click here to learn more about MySQL Replication. If you have any questions or suggestions, feel free to reach us through comments below.

by krzysztof at March 29, 2017 08:13 AM