Planet MariaDB

December 12, 2018

Jean-Jerome Schmidt

Webinar Replay: How to Manage Replication Failover Processes for MySQL, MariaDB & PostgreSQL

If you’re looking at minimizing downtime and meet your SLAs through an automated or semi-automated approach, then this webinar replay is for you:

A detailed overview of what failover processes may look like in MySQL, MariaDB and PostgreSQL replication setups.

Failover is the process of moving to a healthy standby component, during a failure or maintenance event, in order to preserve uptime. The quicker it can be done, the faster you can be back online.

However, failover can be tricky for transactional database systems as we strive to preserve data integrity - especially in asynchronous or semi-synchronous topologies.

There are risks associated: from diverging datasets to loss of data. Failing over due to incorrect reasoning, e.g., failed heartbeats in the case of network partitioning, can also cause significant harm.

In this webinar we cover the dangers related to the failover process, and discuss the tradeoffs between failover speed and data integrity. We’ll find out about how to shield applications from database failures with the help of proxies.

And we will finally have a look at how ClusterControl manages the failover process, and how it can be configured for both assisted and automated failover.

Agenda

  • An introduction to failover - what, when, how
    • in MySQL / MariaDB
    • in PostgreSQL
  • To automate or not to automate
  • Understanding the failover process
  • Orchestrating failover across the whole HA stack
  • Difficult problems
    • Network partitioning
    • Missed heartbeats
    • Split brain
  • From assisted to fully automated failover with ClusterControl
    • Demo

Speaker

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.

by jj at December 12, 2018 02:21 PM

Peter Zaitsev

AWS Elastic Block Storage (EBS) – Can We Get It Truly Elastic?

very old disk storage

very old disk storageAt AWS Re:Invent 2018 there were many great announcements of AWS New Services and New Features, but one basic feature that I’ve been waiting for years to be released is still nowhere to be  found.

AWS Elastic Block Storage (EBS) is great and it’s got better through the years, adding different storage types and features like Provisioned IOPS. However, it still has the most basic inconvenient requirement – I have to decide in advance how much space I need to allocate, and pay for all of that allocated space whether I use it or not.

It would be so much better if AWS would allow true consumption model pricing with EBS, where you pay for the storage used, not the storage allocated. This is already the case for S3,  RDS, and even EC2 instances (with Unlimited Option on T2/T3 Instances), not to mention Serverless focused services.

For example, I would love to be able to create a 1TB EBS volume but only pay for 10GB of storage if I only use this amount of space.

Modern storage subsystems do a good job differentiating between the space available on the block device and what’s being used by user files and filesystem metadata. The space that’s not allocated any more can be TRIMmed. This is a basic requirement for working well on flash storage, and as modern EC2 instances already provision EBS storage as emulated NVMe devices, I would imagine Amazon could hook into such functionality to track space actually used.

For us at Percona this would make shipping applications on AWS Marketplace much more convenient. Right now for Percona Monitoring and Management (PMM)  we have to choose how much space to allocate to the EBS volume by default, picking between it being expensive to run because of paying for the large unused EBS volume or setting a very limited by default capacity that needs user action to resize the EBS volume. Consumption-based EBS pricing would solve this dilemma.

This problem seems to be well recognized and understood. For example Pure Storage Cloud Block Storage (currently in Beta) is  expected to have such a feature.

I hope with its insane customer focus AWS will add this feature in the future, but currently we have to get by without it.


Image: Arnold Reinhold [CC BY-SA 2.5], via Wikimedia Commons

by Peter Zaitsev at December 12, 2018 12:28 PM

Percona XtraDB Cluster Operator Is Now Available as an Early Access Release

Percona XtraDB Cluster Operator

Percona announces the early access release of Percona XtraDB Cluster Operator.Percona XtraDB Cluster Operator

Note: PerconaLabs and Percona-QA are open source GitHub repositories for unofficial scripts and tools created by Percona staff. These handy utilities can help you save time and effort.

Percona software builds located in the PerconaLabs and Percona-QA repositories are not officially released software, and also aren’t covered by Percona support or services agreements.

Percona XtraDB Cluster Operator simplifies the deployment and management of Percona XtraDB Cluster in a Kubernetes or OpenShift environment. Kubernetes and the Kubernetes-based OpenShift platform provide users with a distributed orchestration system that automates the deployment, management and scaling of containerized applications.

It extends the Kubernetes API with a new custom resource for deploying, configuring and managing the application through the whole life cycle. You can compare the Kubernetes Operator to a System Administrator who deploys the application and watches the Kubernetes events related to it, taking administrative/operational actions when needed.

The Percona XtraDB Cluster Operator on PerconaLabs is an early access release. It is not recommended for production environments. 

You can install Percona XtraDB Cluster Operator can be installed on Kubernetes or OpenShift. While the operator does not support all the Percona XtraDB Cluster features in this early access release, instructions on how to install and configure it are already available along with the operator source code, hosted in our Github repository.

The operator was developed with high availability in mind, so it will attempt to run ProxySQL and XtraDB Cluster instances on separate worker nodes if possible, deploying the database cluster on at least three member nodes.

Percona XtraDB Cluster is an open source, cost-effective and robust clustering solution for businesses that integrates Percona Server for MySQL with the Galera replication library to produce a highly-available and scalable MySQL® cluster complete with synchronous multi-master replication, zero data loss and automatic node provisioning using Percona XtraBackup.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

by Dmitriy Kostiuk at December 12, 2018 01:15 AM

December 11, 2018

Peter Zaitsev

Upcoming Webinar Wed 12/12: MySQL 8 for Developers

MySQL 8 for Developers

MySQL 8 for DevelopersPlease join Percona’s CEO Peter Zaitsev as he presents MySQL 8 for Developers on Wednesday, December 12th, 2018 at 11:00 AM PST (UTC-7) / 2:00 PM EST (UTC-5).

Register Now

There are many great new features in MySQL 8, but how exactly can they help your application? This session takes a practical look at MySQL 8 features. It also details which limitations of previous MySQL versions are overcome by MySQL 8. Lastly, what you can do with MySQL 8 that you could not have done before is discussed.

Register for MySQL 8 for Developers to learn how MySQL’s new features can help your application and more.

by Peter Zaitsev at December 11, 2018 05:16 PM

Percona Server for MongoDB Operator Is Now Available as an Early Access Release

Percona Server for MongoDB Operator

Percona Server for MongoDB OperatorPercona announces the early access release of Percona Server for MongoDB Operator.

Note: PerconaLabs and Percona-QA are open source GitHub repositories for unofficial scripts and tools created by Percona staff. These handy utilities can help you save time and effort.

Percona software builds located in the PerconaLabs and Percona-QA repositories are not officially released software, and also aren’t covered by Percona support or services agreements.

Percona Server for MongoDB Operator simplifies the deployment and management of Percona Server for MongoDB in a Kubernetes or OpenShift environment. Kubernetes and the Kubernetes-based OpenShift platform provide users with a distributed orchestration system that automates the deployment, management and scaling of containerized applications.

It extends the Kubernetes API with a new custom resource for deploying, configuring and managing the application through the whole life cycle. You can compare the Kubernetes Operator to a System Administrator who deploys the application and watches the Kubernetes events related to it, taking administrative/operational actions when needed.

The Percona Server for MongoDB Operator on PerconaLabs is an early access release. It is not recommended for production environments. 

Percona Server for MongoDB Operator can be installed on Kubernetes or OpenShift. While the operator does not support all the Percona Server for MongoDB features in this early access release, instructions on how to install and configure it are already available along with the operator source code, which is hosted in our Github repository.

The operator was developed to give consideration to high availability, so it will attempt to run MongoDB instances on separate worker nodes (if possible), and deploy the database cluster as a single Replica Set with at least three member nodes.

Percona Server for MongoDB Operator

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine, as well as several enterprise-grade features. It requires no changes to MongoDB applications or code.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

by Dmitriy Kostiuk at December 11, 2018 09:06 AM

December 10, 2018

Peter Zaitsev

Percona XtraBackup 8.0.4 Is Now Available

Percona XtraBackup 8.0

Percona XtraBackup 8.0Percona is glad to announce the release of Percona XtraBackup 8.0.4 on December 10, 2018. You can download it from our download site and apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.

This release of Percona Xtrabackup is a General Availability release ready for use in a production environment.

Please note the following about this release:

  • The deprecated innobackupex has been removed. Use the xtrabackup command to back up your instances: $ xtrabackup --backup --target-dir=/data/backup
  • When migrating from earlier database server versions, backup and restore and using XtraBackup 2.4 and then use mysql_upgrade from MySQL 8.0.x
  • If using yum or apt repositories to install Percona Xtrabackup 8.0.4, ensure that you have enabled the new tools repository. You can do this with the percona-release enable tools release command and then install the percona-xtrabackup-80 package.

All Percona software is open-source and free. We are grateful to the community for the invaluable contributions to Percona XtraBackup. We would especially like to highlight the input of Alexey Kopytov who has been actively offering improvements and submitting bug reports for Percona XtraBackup.

New Features

  • Percona XtraBackup 8.0.4 is based on MySQL 8.0.13 and fully supports Percona Server for MySQL 8.0 series and MySQL 8.0 series.

Bugs Fixed

  • PXB-1699:xtrabackup --prepare could fail on backups of MySQL 8.0.13 databases
  • PXB-1704:xtrabackup --prepare could hang while performing insert buffer merge
  • PXB-1668: When the --throttle option was used, the applied value was different from the one specified by the user (off by one error)
  • PXB-1679: PXB could crash when ALTER TABLE … TRUNCATE PARTITION command was run during a backup without locking DDL

by Borys Belinsky at December 10, 2018 07:05 PM

December 08, 2018

Valeriy Kravchuk

What May Cause MySQL ERROR 1213

Probably all of us, MySQL users, DBAs and developers had seen error 1213 more than once, in one context or the other:
mysql> select * from t1;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
The first thing that comes to mind in this case is: "OK, we have InnoDB deadlock, let's check the details", followed by the SHOW ENGINE INNODB STATUS check, like this:
mysql> show engine innodb status\G
*************************** 1. row ***************************
  Type: InnoDB
  Name:
Status:
=====================================
2018-12-08 17:41:11 0x7f2f8b8db700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 12 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 59 srv_active, 0 srv_shutdown, 14824 srv_idle
srv_master_thread log flush and writes: 14882
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 326
OS WAIT ARRAY INFO: signal count 200
RW-shared spins 0, rounds 396, OS waits 195
RW-excl spins 0, rounds 120, OS waits 4
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 396.00 RW-shared, 120.00 RW-excl, 0.00 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 14960
Purge done for trx's n:o < 14954 undo n:o < 0 state: running but idle
History list length 28
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421316960193880, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421316960192752, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
...
Now, what if you get the output like the one above? Without any LATEST DETECTED DEADLOCK section? I've seen people wondering how is it even possible and trying to find some suspicious bug somewhere...

Do not be in a hurry - time to recall that there are actually at least 4 quite common reasons to get error 2013 in modern (5.5+) version of MySQL, MariaDB and Friends:
  1. InnoDB deadlock happened
  2. Metadata deadlock happened
  3. If you are lucky enough to use Galera cluster, Galera conflict happened
  4. Deadlock happened in some other storage engine (for example, MyRocks)
I am not lucky enough to use MySQL's group replication yet, but I know that conflicts there are also possible. I am just not sure if error 1213 is also reported in that case. Feel free to check with a test case similar to the one I've used for Galera below.

I also suspect deadlocks with other engines are also possible. As a bonus point, I'll demonstrate the deadlock with MyRocks also.

Let's reproduce these 3 cases one by one and check how to get more information on them. In all cases it's enough to have at most 2 InnoDB tables with just two rows:
mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  `c1` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> show create table t2\G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `id` int(11) NOT NULL,
  `c1` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> select * from t1;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
+----+------+
2 rows in set (0.00 sec)

mysql> select * from t2;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
+----+------+
2 rows in set (0.00 sec)
We'll need two sessions, surely.

InnoDB Deadlock

With InnoDB and tables above it's really easy to end up with a deadlock. In the first session execute the following:
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t1 where id = 1 for update;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
+----+------+
1 row in set (0.00 sec)
In the second session execute:
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t1 where id = 2 for update;
+----+------+
| id | c1   |
+----+------+
|  2 |    2 |
+----+------+
1 row in set (0.02 sec)
Now in the first session try to access the row with id=2 asking for incompatible lock:
mysql> select * from t1 where id = 2 for update;
This statement hangs waiting for a lock (up to innodb_lock_wait_timeout seconds). Try to access the row with id=1 asking for incompatible lock in the second session, and you'll get the deadlock error:
mysql> select * from t1 where id = 1 for update;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
at this moment SELECT in the first transaction returns data:
+----+------+
| id | c1   |
+----+------+
|  2 |    2 |
+----+------+
1 row in set (5.84 sec)
It's that simple, one table and two rows is enough. We can get the details in the output of SHOW ENGINE INNODB STATUS:
...
------------------------
LATEST DETECTED DEADLOCK
------------------------
2018-12-08 18:32:59 0x7f2f8b8db700
*** (1) TRANSACTION:
TRANSACTION 15002, ACTIVE 202 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 8, OS thread handle 139842181244672, query id 8545 localhost root statistics
select * from t1 where id = 2 for update
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 94 page no 3 n bits 72 index PRIMARY of table `test`.`t1` trx id 15002 lock_mode X locks rec but not gap waiting
*** (2) TRANSACTION:
TRANSACTION 15003, ACTIVE 143 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 9, OS thread handle 139842181510912, query id 8546 localhost root statistics
select * from t1 where id = 1 for update
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 94 page no 3 n bits 72 index PRIMARY of table `test`.`t1` trx id 15003 lock_mode X locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 94 page no 3 n bits 72 index PRIMARY of table `test`.`t1` trx id 15003 lock_mode X locks rec but not gap waiting
*** WE ROLL BACK TRANSACTION (2)
------------
TRANSACTIONS
------------
...
In the case above I've used Percona Server  5.7.24-26 (why not). Details of output may vary depending on version (and bugs it has :).  If you use MariaDB 5.5+, in case of InnoDB deadlock special innodb_deadlocks status variable is also incremented.

Metadata Deadlock

Unlike with InnoDB deadlocks, chances that you've seen deadlocks with metadata locks involved are low. One may spend notable time trying to reproduce such a deadlock, but (as usual) quck check of MySQL bugs database may help to find an easy to reproduce case. I mean Bug #65890 - "Deadlock that is not a deadlock with transaction and lock tables".

So, let's try the following scenario with two sessions and out InnoDB tables, t1 and t2. In one session:
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t2 for update;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
+----+------+
2 rows in set (0.00 sec)
In another session:
mysql> lock tables t1 write, t2 write;
It hangs, waiting as long as lock_wait_timeout. We can check what happens with metadata locks using performance_schema.metadata_locks table (as we use MySQL or Percona Server 5.7+, more on setup, alternatives for MariaDB etc here and there). In the first session:
mysql> select * from performance_schema.metadata_locks;
+-------------+--------------------+----------------+-----------------------+----------------------+---------------+-------------+--------+-----------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA      | OBJECT_NAME    | OBJECT_INSTANCE_BEGIN | LOCK_TYPE            | LOCK_DURATION | LOCK_STATUS | SOURCE | OWNER_THREAD_ID | OWNER_EVENT_ID |
+-------------+--------------------+----------------+-----------------------+----------------------+---------------+-------------+--------+-----------------+----------------+
| TABLE       | test               | t2             |       139841686765904 | SHARED_WRITE         | TRANSACTION   | GRANTED     |        |              45 |           2850 || GLOBAL      | NULL               | NULL           |       139841688088672 | INTENTION_EXCLUSIVE  | STATEMENT     | GRANTED     |        |              46 |            205 |
| SCHEMA      | test               | NULL           |       139841688088912 | INTENTION_EXCLUSIVE  | TRANSACTION   | GRANTED     |        |              46 |            205 |
| TABLE       | test               | t1             |       139841688088992 | SHARED_NO_READ_WRITE | TRANSACTION   | GRANTED     |        |              46 |            207 |
| TABLE       | test               | t2             |       139841688089072 | SHARED_NO_READ_WRITE | TRANSACTION   | PENDING     |        |              46 |            208 |
| TABLE       | performance_schema | metadata_locks |       139841686219040 | SHARED_READ          | TRANSACTION   | GRANTED     |        |              45 |           3003 |
+-------------+--------------------+----------------+-----------------------+----------------------+---------------+-------------+--------+-----------------+----------------+
6 rows in set (0.00 sec)
As soon as we try this in the first session:
mysql> select * from t1;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
we get the same deadlock error 1213 and LOCK TABLES in the second session completes. We can find nothing about this deadlock in the output of SHOW ENGINE INNODB STATUS (as shared at the beginning of this post). I am also not aware about any status variables to count metadata deadlocks.

You can find some useful information about metadata deadlocks in the manual.

Galera Conflict

For simplicity I'll use MariaDB 10.1.x and simple 2 nodes setup on the same box as I described here. I'll start first node as a new cluster and create tables for this test:
openxs@ao756:~/dbs/maria10.1$ bin/mysqld_safe --defaults-file=/home/openxs/galera/mynode1.cnf --wsrep-new-cluster &
[1] 13022
openxs@ao756:~/dbs/maria10.1$ 181208 20:40:52 mysqld_safe Logging to '/tmp/mysql-node1.err'.
181208 20:40:52 mysqld_safe Starting mysqld daemon with databases from /home/openxs/galera/node1

openxs@ao756:~/dbs/maria10.1$ bin/mysql  --socket=/tmp/mysql-node1.sock test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 10.1.34-MariaDB Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [test]> drop table t1, t2;
ERROR 1051 (42S02): Unknown table 'test.t2'
MariaDB [test]> create table t1(id int, c1 int, primary key(id));
Query OK, 0 rows affected (0.29 sec)

MariaDB [test]> create table t2(id int, c1 int, primary key(id));
Query OK, 0 rows affected (0.22 sec)

MariaDB [test]> insert into t1 values (1,1), (2,2);
Query OK, 2 rows affected (0.07 sec)
Records: 2  Duplicates: 0  Warnings: 0

MariaDB [test]> insert into t2 values (1,1), (2,2);
Query OK, 2 rows affected (0.18 sec)
Records: 2  Duplicates: 0  Warnings: 0
Then I'll start second node, make sure it joined the cluster and has the same data:
openxs@ao756:~/dbs/maria10.1$ bin/mysqld_safe --defaults-file=/home/openxs/galera/mynode2.cnf &
[2] 15110
openxs@ao756:~/dbs/maria10.1$ 181208 20:46:11 mysqld_safe Logging to '/tmp/mysql-node2.err'.
181208 20:46:11 mysqld_safe Starting mysqld daemon with databases from /home/openxs/galera/node2

openxs@ao756:~/dbs/maria10.1$ bin/mysql --socket=/tmp/mysql-node2.sock test     Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 10.1.34-MariaDB Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [test]> show status like 'wsrep_cluster%';
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id    | 4                                    |
| wsrep_cluster_size       | 2                                    |
| wsrep_cluster_state_uuid | b1d227b1-0211-11e6-8ce0-3644ad2b03dc |
| wsrep_cluster_status     | Primary                              |
+--------------------------+--------------------------------------+
4 rows in set (0.04 sec)

MariaDB [test]> select * from t2;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
+----+------+
2 rows in set (0.02 sec)
Now we are ready to try to provoke Galera conflict. For this we have to try to update the same data in transactions on two different nodes. In one session connected to node1:
MariaDB [test]> select @@wsrep_node_name;
+-------------------+
| @@wsrep_node_name |
+-------------------+
| node1             |
+-------------------+
1 row in set (0.00 sec)

MariaDB [test]> start transaction;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> update test.t1 set c1 = 5 where id=1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0


In another session connected to other node:

MariaDB [test]> select @@wsrep_node_name;
+-------------------+
| @@wsrep_node_name |
+-------------------+
| node2             |
+-------------------+
1 row in set (0.00 sec)

MariaDB [test]> start transaction;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> update test.t1 set c1 = 6 where id=1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
Now in the first we can COMMIT successfully:
MariaDB [test]> commit;
Query OK, 0 rows affected (0.12 sec)
But if we try to COMMIT in the second:
MariaDB [test]> commit;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
We get that same error 1213 about the deadlock. Surely you'll see nothing about this deadlock in INNODB STATUS output, as it was NOT an InnoDB deadlock, but Galera conflict. Check these status variables on the node2:
MariaDB [test]> show status like 'wsrep_local%';
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_local_bf_aborts      | 1                                    |
| wsrep_local_cached_downto  | 75                                   |
| wsrep_local_cert_failures  | 0                                    |
| wsrep_local_commits        | 0                                    |
| wsrep_local_index          | 0                                    |
| wsrep_local_recv_queue     | 0                                    |
| wsrep_local_recv_queue_avg | 0.000000                             |
| wsrep_local_recv_queue_max | 1                                    |
| wsrep_local_recv_queue_min | 0                                    |
| wsrep_local_replays        | 0                                    |
| wsrep_local_send_queue     | 0                                    |
| wsrep_local_send_queue_avg | 0.000000                             |
| wsrep_local_send_queue_max | 1                                    |
| wsrep_local_send_queue_min | 0                                    |
| wsrep_local_state          | 4                                    |
| wsrep_local_state_comment  | Synced                               |
| wsrep_local_state_uuid     | b1d227b1-0211-11e6-8ce0-3644ad2b03dc |
+----------------------------+--------------------------------------+
17 rows in set (0.01 sec)
If wsrep_local_bf_aborts > 0, you had conflicts and local transaction was rolled back to prevent them. We can see that remote one wins, on node2:
MariaDB [test]> select * from t1;
+----+------+
| id | c1   |
+----+------+
|  1 |    5 |
|  2 |    2 |
+----+------+
2 rows in set (0.00 sec)
To summarize, in Galera "first commit wins" and local transaction involved in conflict is always a looser. You can get a lot of information about conflicts in the error log if you enable conflict logging features through wsrep_log_conflicts and cert.log_conflicts. See this fine manual for details.

MyRocks Deadlock

We can easily check how deadlocks are processed by MyRocks by just loading the plugin for the engine, converting tables to MyRocks and trying the same InnoDB scenario with the same Percona Server we used initially. But first, if you use Percona binaries you have to install a separate package:
openxs@ao756:~$ dpkg -l | grep rocksdb
openxs@ao756:~$ sudo apt-get install percona-server-rocksdb-5.7
[sudo] password for openxs:
Reading package lists... Done
Building dependency tree
...
Unpacking percona-server-rocksdb-5.7 (5.7.24-26-1.trusty) ...
Setting up percona-server-rocksdb-5.7 (5.7.24-26-1.trusty) ...


 * This release of Percona Server is distributed with RocksDB storage engine.
 * Run the following script to enable the RocksDB storage engine in Percona Server:

        ps-admin --enable-rocksdb -u <mysql_admin_user> -p[mysql_admin_pass] [-S <socket>] [-h <host> -P <port>]
Percona's manual has a lot more details and relies on separate ps-admin script, but basically you have to INSTALL PLUGINs like this (check script's code):
mysql> INSTALL PLUGIN ROCKSDB SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.86 sec)

mysql> INSTALL PLUGIN ROCKSDB_CFSTATS SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.06 sec)

mysql> INSTALL PLUGIN ROCKSDB_DBSTATS SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.08 sec)

mysql> INSTALL PLUGIN ROCKSDB_PERF_CONTEXT SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_PERF_CONTEXT_GLOBAL SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.06 sec)

mysql> INSTALL PLUGIN ROCKSDB_CF_OPTIONS SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_GLOBAL_INFO SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_COMPACTION_STATS SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_DDL SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.06 sec)

mysql> INSTALL PLUGIN ROCKSDB_INDEX_FILE_MAP SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_LOCKS SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_TRX SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_DEADLOCK SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.06 sec)
Then check that the engine is there and convert tables:
mysql> show engines;
+--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                                    | Transactions | XA   | Savepoints |
+--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+
| ROCKSDB            | YES     | RocksDB storage engine                                                     | YES          | YES  | YES        |
...

mysql> alter table t1 engine=rocksdb;
Query OK, 2 rows affected (0.64 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> alter table t2 engine=rocksdb;
Query OK, 2 rows affected (0.58 sec)
Records: 2  Duplicates: 0  Warnings: 0
Now we are ready to try the same InnoDB scenario. Just note that lock wait timeout for MyRocks is defined by the rocksdb_lock_wait_timeout that is small by default, 1 second, do you have have to increase it first. You also have to set rocksdb_deadlock_detect to ON (as it's OFF by default):
mysql> set global rocksdb_lock_wait_timeout=50;
Query OK, 0 rows affected (0.00 sec)

mysql> set global rocksdb_deadlock_detect=ON;
Query OK, 0 rows affected (0.00 sec)

mysql> \r
Connection id:    14
Current database: test

mysql> start transaction;
Query OK, 0 rows affected (0.02 sec)

mysql> select * from t1 where id = 1 for update;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
+----+------+
1 row in set (0.00 sec)
Then in the second session:
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t1 where id = 2 for update;
+----+------+
| id | c1   |
+----+------+
|  2 |    2 |
+----+------+
1 row in set (0.00 sec)
In the first:
mysql> select * from t1 where id = 2 for update;
and in the second we can get deadlock error:
mysql> select * from t1 where id = 1 for update;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
mysql> show global status like '%deadlock%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| rocksdb_row_lock_deadlocks | 1     |
+----------------------------+-------+
1 row in set (0.00 sec)
Note that MyRocks has status variable to count deadlocks. Note that Percona Server still does NOT seem to support SHOW ENGINE ROCKSDB TRANSACTION STATUS statement available upstream:
mysql> show engine rocksdb transaction status\G
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'transaction status' at line 1
I was not able to find a bug about this (sorry if I missed it), and just reported new task to Percona's JIRA: PS-5114 - "Add support for SHOW ENGINE ROCKSDB TRANSACTION STATUS".

That's probably more than enough for single blog post (that is mostly NOT about bugs). One day I'll refresh my knowledge of MyRocks etc and maybe write more about deadlocks troubleshooting there.

Do not be surprised if you can not find anything in INNODB STATUS when you get error 1213, just proceed with further steps. There are other reasons to explore. Venice hides a lot early in the morning...

To summarize, do not be surprised that after you got MySQL error 1213 you see no information about recent InnoDB deadlock - there are at least 3 more reasons for this error to be reported, as explained above. You should know your configuration and use several other commands and sources of information to pinpoint what exactly happened and why.

by Valeriy Kravchuk (noreply@blogger.com) at December 08, 2018 07:08 PM

December 07, 2018

Peter Zaitsev

MySQL 8 and The FRM Drop… How To Recover Table DDL

MySQL 8 frm drop recover ddl

… or what I should keep in mind in case of disaster

MySQL 8 frm drop recover ddl

To retrieve and maintain in SQL format the definition of all tables in a database, is a best practice that we all should adopt. To have that under version control is also another best practice to keep in mind.

While doing that may seem redundant, it can become a life saver in several situations. From the need to review what has historically changed in a table, to knowing who changed what and why… to when you need to recover your data and have your beloved MySQL instance not start…

But let’s be honest, only a few do the right thing, and even fewer keep that information up to date. Given that’s the case, what can we do when we have the need to discover/recover the table structure?

From the beginning, MySQL has used some external files to describe its internal structure.

For instance, if I have a schema named windmills and a table named wmillAUTOINC1, on the file system I will see this:

-rw-r-----. 1 mysql mysql     8838 Mar 14 2018 wmillAUTOINC1.frm
-rw-r-----. 1 mysql mysql   131072 Mar 14 2018 wmillAUTOINC1.ibd

The ibd file contains the data, while the frm file contains the structure information.

Putting aside ANY discussion about if this is safe, if it’s transactional and more… when we’ve experienced some major crash and data corruption this approach has been helpful. Being able to read from the frm file was the easiest way to get the information we need.
Simple tools like DBSake made the task quite trivial, and allowed us to script table definition when needed to run long, complex tedious data recovery:

[root@master1 windmills]# /opt/tools/dbsake frmdump wmillAUTOINC1.frm
--
-- Table structure for table `wmillAUTOINC1`
-- Created with MySQL Version 5.7.20
--
CREATE TABLE `wmillAUTOINC1` (
  `id` bigint(11) NOT NULL AUTO_INCREMENT,
  `uuid` char(36) COLLATE utf8_bin NOT NULL,
  `millid` smallint(6) NOT NULL,
  `kwatts_s` int(11) NOT NULL,
  `date` date NOT NULL,
  `location` varchar(50) COLLATE utf8_bin NOT NULL,
  `active` tinyint(2) NOT NULL DEFAULT '1',
  `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `strrecordtype` char(3) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_millid` (`millid`,`active`),
  KEY `IDX_active` (`id`,`active`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin ROW_FORMAT=DYNAMIC;

Of course, if the frm file was also corrupt, then we could try to get the information from the ibdata dictionary. If that is corrupted too (trust me I’ve seen all of these situations) … well a last resource was hoping the customer has a recent table definition stored somewhere, but as mentioned before, we are not so diligent, are we?

Now, though, in MySQL8 we do not have FRM files, they were dropped. Even more interesting is that we do not have the same dictionary, most of the things that we knew have changed, including the dictionary location. So what can be done?

Well Oracle have moved the FRM information—and more—to what is called Serialized Dictionary Information (SDI), the SDI is written INSIDE the ibd file, and represents the redundant copy of the information contained in the data dictionary.

The SDI is updated/modified by DDL operations on tables that reside in that tablespace. This is it: if you have one file per table normally, then you will have in that file ONLY the SDI for that table, but if you have multiple tables in a tablespace, the SDI information will refer to ALL of the tables.

To extract this information from the IBD files, Oracle provides a utility called ibd2sdi. This application parses the SDI information and reports a JSON file that can be easily manipulated to extract and build the table definition.

One exception is represented by Partitioned tables. The SDI information is contained ONLY in the first partition, and if you drop it, it is moved to the next one. I will show that later.

But let’s see how it works. In the next examples I will look for the table’s name, attributes, and datatype starting from the dictionary tables.

To obtain the info I will do this:

/opt/mysql_templates/mysql-8P/bin/./ibd2sdi   /opt/mysql_instances/master8/data/mysql.ibd |jq  '.[]?|.[]?|.dd_object?|("------------------------------------"?,"TABLE NAME = ",.name?,"****",(.columns?|.[]?|(.name?,.column_type_utf8?)))'

The result will be something like:

"------------------------------------"
"TABLE NAME = "
"tables"
"****"
"id"
"bigint(20) unsigned"
"schema_id"
"bigint(20) unsigned"
"name"
"varchar(64)"
"type"
"enum('BASE TABLE','VIEW','SYSTEM VIEW')"
"engine"
"varchar(64)"
"mysql_version_id"
"int(10) unsigned"
"row_format"
"enum('Fixed','Dynamic','Compressed','Redundant','Compact','Paged')"
"collation_id"
"bigint(20) unsigned"
"comment"
"varchar(2048)"
<snip>
"------------------------------------"
"TABLE NAME = "
"tablespaces"
"****"
"id"
"bigint(20) unsigned"
"name"
"varchar(259)"
"options"
"mediumtext"
"se_private_data"
"mediumtext"
"comment"
"varchar(2048)"
"engine"
"varchar(64)"
"DB_TRX_ID"
""
"DB_ROLL_PTR"
""

I cut the output for brevity, but if you run the above command yourself you’ll be able to see that this retrieves the information for ALL the tables residing in the IBD.

The other thing I hope you noticed is that I am NOT parsing ibdata, but mysql.ibd. Why? Because the dictionary was moved out from ibdata and is now in mysql.ibd.

Look what happens if I try to parse ibdata:

[root@master1 ~]# /opt/mysql_templates/mysql-8P/bin/./ibd2sdi   /opt/mysql_instances/master8/data/ibdata1 |jq '.'
[INFO] ibd2sdi: SDI is empty.

Be very careful here to not mess up your mysql.ibd file.

Now what can I do to get information about my wmillAUTOINC1 table in MySQL8?

That is quite simple:

/opt/mysql_templates/mysql-8P/bin/./ibd2sdi   /opt/mysql_instances/master8/data/windmills/wmillAUTOINC.ibd |jq '.'
[
  "ibd2sdi",
  {
    "type": 1,
    "id": 1068,
    "object": {
      "mysqld_version_id": 80013,
      "dd_version": 80013,
      "sdi_version": 1,
      "dd_object_type": "Table",
      "dd_object": {
        "name": "wmillAUTOINC",
        "mysql_version_id": 80011,
        "created": 20180925095853,
        "last_altered": 20180925095853,
        "hidden": 1,
        "options": "avg_row_length=0;key_block_size=0;keys_disabled=0;pack_record=1;row_type=2;stats_auto_recalc=0;stats_sample_pages=0;",
        "columns": [
          {
            "name": "id",
            "type": 9,
            "is_nullable": false,
            "is_zerofill": false,
            "is_unsigned": false,
            "is_auto_increment": true,
            "is_virtual": false,
            "hidden": 1,
            "ordinal_position": 1,
            "char_length": 11,
            "numeric_precision": 19,
            "numeric_scale": 0,
            "numeric_scale_null": false,
            "datetime_precision": 0,
            "datetime_precision_null": 1,
            "has_no_default": false,
            "default_value_null": false,
            "srs_id_null": true,
            "srs_id": 0,
            "default_value": "AAAAAAAAAAA=",
            "default_value_utf8_null": true,
            "default_value_utf8": "",
            "default_option": "",
            "update_option": "",
            "comment": "",
            "generation_expression": "",
            "generation_expression_utf8": "",
            "options": "interval_count=0;",
            "se_private_data": "table_id=1838;",
            "column_key": 2,
            "column_type_utf8": "bigint(11)",
            "elements": [],
            "collation_id": 83,
            "is_explicit_collation": false
          },
<SNIP>
        "indexes": [
          {
            "name": "PRIMARY",
            "hidden": false,
            "is_generated": false,
            "ordinal_position": 1,
            "comment": "",
            "options": "flags=0;",
            "se_private_data": "id=2261;root=4;space_id=775;table_id=1838;trx_id=6585972;",
            "type": 1,
            "algorithm": 2,
            "is_algorithm_explicit": false,
            "is_visible": true,
            "engine": "InnoDB",
<Snip>
        ],
        "foreign_keys": [],
        "partitions": [],
        "collation_id": 83
      }
    }
  },
  {
    "type": 2,
    "id": 780,
    "object": {
      "mysqld_version_id": 80011,
      "dd_version": 80011,
      "sdi_version": 1,
      "dd_object_type": "Tablespace",
      "dd_object": {
        "name": "windmills/wmillAUTOINC",
        "comment": "",
        "options": "",
        "se_private_data": "flags=16417;id=775;server_version=80011;space_version=1;",
        "engine": "InnoDB",
        "files": [
          {
            "ordinal_position": 1,
            "filename": "./windmills/wmillAUTOINC.ibd",
            "se_private_data": "id=775;"
          }
        ]
      }
    }
  }
]

The JSON will contains:

  • A section describing the DB object at high level
  • Array of columns and related information
  • Array of indexes
  • Partition information (not here but in the next example)
  • Table space information

That is a lot more detail compared to what we had in the FRM, and it is quite relevant and interesting information as well.

Once you have extracted the SDI, any JSON parser tool script can generate the information for the SQL DDL.

I mention partitions, so let’s look at this a bit more, given they can be tricky.

As mentioned, the SDI information is present ONLY in the first partition. All other partitions hold ONLY the tablespace information. Given that, then the first thing to do is to identify which partition is the first… OR simply try to access all partitions, and when you are able to get the details, extract them.

The process is the same:

[root@master1 ~]# /opt/mysql_templates/mysql-8P/bin/./ibd2sdi   /opt/mysql_instances/master8/data/windmills/wmillAUTOINCPART#P#PT20170301.ibd |jq '.'
[
  "ibd2sdi",
  {
    "type": 1,
    "id": 1460,
    "object": {
      "mysqld_version_id": 80013,
      "dd_version": 80013,
      "sdi_version": 1,
      "dd_object_type": "Table",
      "dd_object": {
        "name": "wmillAUTOINCPART",
        "mysql_version_id": 80013,
        "created": 20181125110300,
        "last_altered": 20181125110300,
        "hidden": 1,
        "options": "avg_row_length=0;key_block_size=0;keys_disabled=0;pack_record=1;row_type=2;stats_auto_recalc=0;stats_sample_pages=0;",
        "columns": [<snip>
    	  "schema_ref": "windmills",
        "se_private_id": 18446744073709552000,
        "engine": "InnoDB",
        "last_checked_for_upgrade_version_id": 80013,
        "comment": "",
        "se_private_data": "autoinc=31080;version=2;",
        "row_format": 2,
        "partition_type": 7,
        "partition_expression": "to_days(`date`)",
        "partition_expression_utf8": "to_days(`date`)",
        "default_partitioning": 1,
        "subpartition_type": 0,
        "subpartition_expression": "",
        "subpartition_expression_utf8": "",
        "default_subpartitioning": 0,
       ],
<snip>
        "foreign_keys": [],
        "partitions": [
          {
            "name": "PT20170301",
            "parent_partition_id": 18446744073709552000,
            "number": 0,
            "se_private_id": 1847,
            "description_utf8": "736754",
            "engine": "InnoDB",
            "comment": "",
            "options": "",
            "se_private_data": "autoinc=0;version=0;",
            "values": [
              {
                "max_value": false,
                "null_value": false,
                "list_num": 0,
                "column_num": 0,
                "value_utf8": "736754"
              }
            ],

The difference, as you can see, is that the section related to partitions and sub partitions will be filled with all the details you might need to recreate the partitions.

We will have:

  • Partition type
  • Partition expression
  • Partition values
  • …more

Same for sub partitions.

Now again see what happens if I parse the second partition:

[root@master1 ~]# /opt/mysql_templates/mysql-8P/bin/./ibd2sdi   /opt/mysql_instances/master8/data/windmills/wmillAUTOINCPART#P#PT20170401.ibd |jq '.'
[
  "ibd2sdi",
  {
    "type": 2,
    "id": 790,
    "object": {
      "mysqld_version_id": 80011,
      "dd_version": 80011,
      "sdi_version": 1,
      "dd_object_type": "Tablespace",
      "dd_object": {
        "name": "windmills/wmillAUTOINCPART#P#PT20170401",
        "comment": "",
        "options": "",
        "se_private_data": "flags=16417;id=785;server_version=80011;space_version=1;",
        "engine": "InnoDB",
        "files": [
          {
            "ordinal_position": 1,
            "filename": "./windmills/wmillAUTOINCPART#P#PT20170401.ibd",
            "se_private_data": "id=785;"
          }
        ]
      }
    }
  }
]

I will get only the information about the tablespace, not the table.

As promised let me show you now what happens if I delete the first partition, and the second partition becomes the first:

(root@localhost) [windmills]>alter table wmillAUTOINCPART drop partition PT20170301;
Query OK, 0 rows affected (1.84 sec)
Records: 0  Duplicates: 0  Warnings: 0
[root@master1 ~]# /opt/mysql_templates/mysql-8P/bin/./ibd2sdi   /opt/mysql_instances/master8/data/windmills/wmillAUTOINCPART#P#PT20170401.ibd |jq '.'|more
[
  "ibd2sdi",
  {
    "type": 1,
    "id": 1461,
    "object": {
      "mysqld_version_id": 80013,
      "dd_version": 80013,
      "sdi_version": 1,
      "dd_object_type": "Table",
      "dd_object": {
        "name": "wmillAUTOINCPART",
        "mysql_version_id": 80013,
        "created": 20181129130834,
        "last_altered": 20181129130834,
        "hidden": 1,
        "options": "avg_row_length=0;key_block_size=0;keys_disabled=0;pack_record=1;row_type=2;stats_auto_recalc=0;stats_sample_pages=0;",
        "columns": [
          {
            "name": "id",
            "type": 9,
            "is_nullable": false,
            "is_zerofill": false,
            "is_unsigned": false,
            "is_auto_increment": true,
            "is_virtual": false,
            "hidden": 1,
            "ordinal_position": 1,

As I mentioned before, each DDL updates the SDI, and here we go: I will have all the information on what’s NOW the FIRST partition. Please note the value of the attribute “created” between the first time I queried the other partition, and the one that I have now:

/opt/mysql_instances/master8/data/windmills/wmillAUTOINCPART#P#PT20170301.ibd
       "created": 20181125110300,
/opt/mysql_instances/master8/data/windmills/wmillAUTOINCPART#P#PT20170401.ibd
       "created": 20181129130834,

To be clear the second created is NOW (PT20170401) from when I dropped the other partition (PT20170301).

Conclusions

In the end, this solution is definitely more powerful than the FRM files. It will allow us to parse the file and identify the table definition more easily, providing us with much more detail and information.

The problems will arise if and when the IBD file becomes corrupt.

As for the manual:  For InnoDB, an SDI record requires a single index page, which is 16KB in size by default. However, SDI data is compressed to reduce the storage footprint.

By which it means that for each table I have a page, if I associate record=table. Which means that in case of IBD corruption I should (likely) be able to read those pages. Unless I have bad (very bad) luck.

I still wonder how the dimension of an IBD affects the SDI retrieval, but given I have not tried it yet I will have to let you know.

As an aside, I am working on a script to facilitate the generation of the SQL, it’s not yet ready but you can find it here

Last note but keep this in mind! It is stated in the manual but in a hidden place and in small letters:
DDL operations take longer due to writing to storage, undo logs, and redo logs instead of .frm files.

References

https://stedolan.github.io/jq/

https://dev.mysql.com/doc/refman/8.0/en/ibd2sdi.html

https://dev.mysql.com/doc/refman/8.0/en/serialized-dictionary-information.html

https://dev.mysql.com/doc/refman/8.0/en/data-dictionary-limitations.html


Photo by chuttersnap on Unsplash

by Marco Tusa at December 07, 2018 01:31 PM

December 06, 2018

Jean-Jerome Schmidt

MySQL & MariaDB Query Caching with ProxySQL & ClusterControl

Queries have to be cached in every heavily loaded database, there is simply no way for a database to handle all traffic with reasonable performance. There are various mechanisms in which a query cache can be implemented. Starting from the MySQL query cache, which used to work just fine for mostly read-only, low concurrency workloads and which has no place in high concurrent workloads (to the extent that Oracle removed it in MySQL 8.0), to external key-value stores like Redis, memcached or CouchBase.

The main problem with using an external dedicated data store (as we would not recommend to use MySQL query cache to anyone) is that this is yet another datastore to manage. It is yet another environment to maintain, scaling issues to handle, bugs to debug and so on.

So why not kill two birds with one stone by leveraging your proxy? The assumption here is that you are using a proxy in your production environment, as it helps load balance queries across instances, and mask the underlying database topology by provide a simple endpoint to applications. ProxySQL is a great tool for the job, as it can additionally function as a caching layer. In this blog post, we’ll show you how to cache queries in ProxySQL using ClusterControl.

How Query Cache Works in ProxySQL?

First of all, a bit of a background. ProxySQL manages traffic through query rules and it can accomplish query caching using the same mechanism. ProxySQL stores cached queries in a memory structure. Cached data is evicted using time-to-live (TTL) setting. TTL can be defined for each query rule individually so it is up to the user to decide if query rules are to be defined for each individual query, with distinct TTL or if she just needs to create a couple of rules which will match the majority of the traffic.

There are two configuration settings that define how a query cache should be used. First, mysql-query_cache_size_MB which defines a soft limit on the query cache size. It is not a hard limit so ProxySQL may use slightly more memory than that, but it is enough to keep the memory utilization under control. Second setting you can tweak is mysql-query_cache_stores_empty_result. It defines if an empty result set is cached or not.

ProxySQL query cache is designed as a key-value store. The value is the result set of a query and the key is composed from concatenated values like: user, schema and query text. Then a hash is created off that string and that hash is used as the key.

Setting up ProxySQL as a Query Cache Using ClusterControl

As the initial setup, we have a replication cluster of one master and one slave. We also have a single ProxySQL.

This is by no means a production-grade setup as we would have to implement some sort of high availability for the proxy layer (for example by deploying more than one ProxySQL instance, and then keepalived on top of them for floating Virtual IP), but it will be more than enough for our tests.

First, we are going to verify the ProxySQL configuration to make sure query cache settings are what we want them to be.

256 MB of query cache should be about right and we want to cache also the empty result sets - sometimes a query which returns no data still have to do a lot of work to verify there’s nothing to return.

Next step is to create query rules which will match the queries you want to cache. There are two ways to do that in ClusterControl.

Manually Adding Query Rules

First way requires a bit more manual actions. Using ClusterControl you can easily create any query rule you want, including query rules that do the caching. First, let’s take a look at the list of the rules:

At this point, we have a set of query rules to perform the read/write split. The first rule has an ID of 100. Our new query rule has to be processed before that one so we will use lower rule ID. Let’s create a query rule which will do the caching of queries similar to this one:

SELECT DISTINCT c FROM sbtest8 WHERE id BETWEEN 5041 AND 5140 ORDER BY c

There are three ways of matching the query: Digest, Match Digest and Match Pattern. Let’s talk a bit about them here. First, Match Digest. We can set here a regular expression that will match a generalized query string that represents some query type. For example, for our query:

SELECT DISTINCT c FROM sbtest8 WHERE id BETWEEN 5041 AND 5140 ORDER BY c

The generic representation will be:

SELECT DISTINCT c FROM sbtest8 WHERE id BETWEEN ? AND ? ORDER BY c

As you can see, it stripped the arguments to the WHERE clause therefore all queries of this type are represented as a single string. This option is quite nice to use because it matches whole query type and, what’s even more important, it’s stripped off any whitespaces. This makes it so much easier to write a regular expression as you don’t have to account for weird line breaks, whitespaces at the beginning or end of the string and so on.

Digest is basically a hash that ProxySQL calculates over the Match Digest form.

Finally, Match Pattern matches against full query text, as it was sent by the client. In our case, the query will have a form of:

SELECT DISTINCT c FROM sbtest8 WHERE id BETWEEN 5041 AND 5140 ORDER BY c

We are going to use Match Digest as we want all of those queries to be covered by the query rule. If we wanted to cache just that particular query, a good option would be to use Match Pattern.

The regular expression that we use is:

SELECT DISTINCT c FROM sbtest[0-9]+ WHERE id BETWEEN \? AND \? ORDER BY c

We are matching literally the exact generalized query string with one exception - we know that this query hit multiple tables therefore we added a regular expression to match all of them.

Once this is done, we can see if the query rule is in effect or not.

We can see that ‘Hits’ are increasing which means that our query rule is being used. Next, we’ll look at another way to create a query rule.

Using ClusterControl to Create Query Rules

ProxySQL has a useful functionality of collecting statistics of the queries it routed. You can track data like execution time, how many times a given query was executed and so on. This data is also present in ClusterControl:

What is even better, if you point on a given query type, you can create a query rule related to it. You can also easily cache this particular query type.

As you can see, some of the data like Rule IP, Cache TTL or Schema Name are already filled. ClusterControl will also fill data based on which matching mechanism you decided to use. We can easily use either hash for a given query type or we can use Match Digest or Match Pattern if we would like to fine-tune the regular expression (for example doing the same as we did earlier and extending the regular expression to match all the tables in sbtest schema).

This is all you need to easily create query cache rules in ProxySQL. Download ClusterControl to try it today.

by krzysztof at December 06, 2018 10:58 AM

December 05, 2018

Oli Sennhauser

UNDO logs in InnoDB system tablespace ibdata1

We see sometimes at customers that they have very big InnoDB system tablespace files (ibdata1) although they have set innodb_file_per_table = 1.

So we want to know what else is stored in the InnoDB system tablespace file ibdata1 to see what we can do against this unexpected growth.

First let us check the size of the ibdata1 file:

# ll ibdata1 
-rw-rw---- 1 mysql mysql 109064486912 Dez  5 19:10 ibdata1

The InnoDB system tablespace is about 101.6 Gibyte in size. This is exactly 6'656'768 InnoDB blocks of 16 kibyte block size.

So next we want to analyse the InnoDB system tablespace ibdata1 file. For this we can use the tool innochecksum:

# innochecksum --page-type-summary ibdata1 
Error: Unable to lock file:: ibdata1
fcntl: Resource temporarily unavailable

But... the tool innochecksum throughs an error. It seems like it is not allowed to analyse the InnoDB system tablespace with a running database. So then let us stop the database first and try it again. Now we get a useful output:

# innochecksum --page-type-summary ibdata1 
File::ibdata1
================PAGE TYPE SUMMARY==============
#PAGE_COUNT     PAGE_TYPE
===============================================
  349391        Index page                        5.25%
 6076813        Undo log page                    91.29%
   18349        Inode page                        0.28%
  174659        Insert buffer free list page      2.62%
   36639        Freshly allocated page            0.55%
     405        Insert buffer bitmap              0.01%
      98        System page
       1        Transaction system page
       1        File Space Header
     404        Extent descriptor page            0.01%
       0        BLOB page
       8        Compressed BLOB page
       0        Other type of page
-------------------------------------------------------
 6656768        Pages total                     100.00%
===============================================
Additional information:
Undo page type: 3428 insert, 6073385 update, 0 other
Undo page state: 1 active, 67 cached, 249 to_free, 1581634 to_purge, 0 prepared, 4494862 other

So we can see that about 91% (about 92 Gibyte) of the InnoDB system tablespace ibdata1 blocks are used by InnoDB UNDO log pages. To avoid growing of ibdata1 you have to create a database instance with separate InnoDB UNDO tablespaces: Undo Tablespaces.

Taxonomy upgrade extras: 

by Shinguz at December 05, 2018 08:55 PM

Peter Zaitsev

Nondeterministic Functions in MySQL (i.e. rand) Can Surprise You

MySQL non deterministic functions rand

Working on a test case with sysbench, I encountered this:

mysql> select * from sbtest1 where id = round(rand()*10000, 0);
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id   | k      | c                                                                                                                       | pad                                                         |
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
|  179 | 499871 | 09833083632-34593445843-98203182724-77632394229-31240034691-22855093589-98577647071-95962909368-34814236148-76937610370 | 62233363025-41327474153-95482195752-11204169522-13131828192 |
| 1606 | 502031 | 81212399253-12831141664-41940957498-63947990218-16408477860-15124776228-42269003436-07293216458-45216889819-75452278174 | 25423822623-32136209218-60113604068-17409951653-00581045257 |
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
2 rows in set (0.30 sec)

I was really surprised. First, and the most important, id is a primary key and the rand() function should produce just one value. How come it returns two rows? Second, why is the response time 0.30 sec? That seems really high for a primary key access.

Looking further:

CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=latin1
mysql> explain select * from sbtest1 where id = round(rand()*10000, 0);
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | sbtest1 | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 986400 |    10.00 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-------------+

So it is a primary key, but MySQL does not use an index, and it returns two rows. Is this a bug?

Deterministic vs nondeterministic functions

Turned out it is not a bug at all. It is pretty logical behavior from MySQL, but it is not what we would expect. First, why a full table scan? Well, rand() is nondeterministic function. That means we do not know what it will return ahead of time, and actually that is exactly the purpose of rand() – to return a random value. In this case, it is only logical to evaluate the function for each row, each time, and compare the results. i.e. in our case

  1. Read row 1, get the value of id, evaluate the value of RAND(), compare
  2. Proceed using the same algorithm with the remaining rows.

In other words, as the value of rand() is not known (not evaluated) beforehand, so we can’t use an index.

And in this case – rand() function – we have another interesting consequence. For larger tables with an auto_increment primary key, the probability of matching the rand() value and the auto_increment value is higher, so we can get multiple rows back. In fact, if we read the whole table from the beginning and keep comparing the auto_inc sequence with “the roll of the dice”, we can get many rows back.

That behavior is totally counter-intuitive. Nevertheless, to me, it’s also the only correct behavior.

We expect to have the rand() function evaluated before running the query.  This can actually be achieved by assigning rand() to a variable:

mysql> set @id=round(rand()*10000, 0); select @id; select * from sbtest1 where id = @id;
Query OK, 0 rows affected (0.00 sec)
+------+
| @id  |
+------+
| 6068 |
+------+
1 row in set (0.00 sec)
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id   | k      | c                                                                                                                       | pad                                                         |
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 6068 | 502782 | 84971025350-12845068791-61736600622-38249796467-85706778555-74134284808-24438972515-17848828748-86869270666-01547789681 | 17507194006-70651503059-23792945260-94159543806-65683812344 |
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain select * from sbtest1 where id = @id;
+----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table   | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | sbtest1 | NULL       | const | PRIMARY       | PRIMARY | 4       | const |    1 |   100.00 | NULL  |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.01 sec)

This would meet our expectations.

There are (at least) two bug reports filed, with very interesting discussion:

  1. rand() used in scalar functions returns multiple rows
  2. SELECT on PK with ROUND(RAND()) give wrong errors

Other databases

I wanted to see how it works in other SQL databases. In PostgreSQL, the behavior is exactly the same as MySQL:

postgres=# select * from t2 where id = cast(random()*10000 as int);
  id  |    c
------+---------
 4093 | asdasda
 9378 | asdasda
(2 rows)
postgres=# select * from t2 where id = cast(random()*10000 as int);
  id  |    c
------+---------
 5988 | asdasda
 6674 | asdasda
(2 rows)
postgres=# explain select * from t2 where id = cast(random()*10000 as int);
                             QUERY PLAN
--------------------------------------------------------------------
 Seq Scan on t2  (cost=0.00..159837.60 rows=1 width=12)
   Filter: (id = ((random() * '10000'::double precision))::integer)
(2 rows)

And SQLite seems different, evaluating the random() function beforehand:

sqlite> select * from t2 where id = cast(abs(CAST(random() AS REAL))/92233720368547 as int);
16239|asdsadasdsa
sqlite> select * from t2 where id = cast(abs(CAST(random() AS REAL))/92233720368547 as int);
32910|asdsadasdsa
sqlite> select * from t2 where id = cast(abs(CAST(random() AS REAL))/92233720368547 as int);
58658|asdsadasdsa
sqlite> explain select * from t2 where id = cast(abs(CAST(random() AS REAL))/92233720368547 as int);
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     12    0                    00  Start at 12
1     OpenRead       0     30182  0     2              00  root=30182 iDb=0; t2
2     Function0      0     0     3     random(0)      00  r[3]=func(r[0])
3     Cast           3     69    0                    00  affinity(r[3])
4     Function0      0     3     2     abs(1)         01  r[2]=func(r[3])
5     Divide         4     2     1                    00  r[1]=r[2]/r[4]
6     Cast           1     68    0                    00  affinity(r[1])
7     SeekRowid      0     11    1                    00  intkey=r[1]; pk
8     Copy           1     5     0                    00  r[5]=r[1]
9     Column         0     1     6                    00  r[6]=t2.c
10    ResultRow      5     2     0                    00  output=r[5..6]
11    Halt           0     0     0                    00
12    Transaction    0     0     2     0              01  usesStmtJournal=0
13    Int64          0     4     0     92233720368547  00 r[4]=92233720368547
14    Goto           0     1     0                    00

Conclusion

Be careful when using MySQL nondeterministic functions in  a “where” condition – rand() is the most interesting example – as their behavior may surprise you. Many people believe this to be a bug that should be fixed. Let me know in the comments: do you think it is a bug or not (and why)? I would also be interested to know how it works in other, non-opensource databases (Microsoft SQL Server, Oracle, etc)

PS: Finally, I’ve got a “clever” idea – what if I “trick” MySQL by using the deterministic keyword…

MySQL stored functions: deterministic vs not deterministic

So, I wanted to see how it works with MySQL stored functions if they are assigned “deterministic” and “not deterministic” keywords. First, I wanted to “trick” mysql and pass the deterministic to the stored function but use rand() inside. Ok, this is not what you really want to do!

DELIMITER $$
CREATE FUNCTION myrand() RETURNS INT
    DETERMINISTIC
BEGIN
 RETURN round(rand()*10000, 0);
END$$
DELIMITER ;

From MySQL manual about MySQL stored routines we can read:

Assessment of the nature of a routine is based on the “honesty” of the creator: MySQL does not check that a routine declared DETERMINISTIC is free of statements that produce nondeterministic results. However, misdeclaring a routine might affect results or affect performance. Declaring a nondeterministic routine as DETERMINISTIC might lead to unexpected results by causing the optimizer to make incorrect execution plan choices. Declaring a deterministic routine as NONDETERMINISTIC might diminish performance by causing available optimizations not to be used.

The result is interesting:

mysql> select myrand();
+----------+
| myrand() |
+----------+
|     4202 |
+----------+
1 row in set (0.00 sec)
mysql> select myrand();
+----------+
| myrand() |
+----------+
|     7548 |
+----------+
1 row in set (0.00 sec)
mysql> explain select * from t2 where id = myrand()\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
   partitions: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: Impossible WHERE noticed after reading const tables
1 row in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+--------------------------------------------------------------------------------+
| Level | Code | Message                                                                        |
+-------+------+--------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select '2745' AS `id`,'asasdas' AS `c` from `test`.`t2` where 0 |
+-------+------+--------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> select * from t2 where id = 4202;
+------+---------+
| id   | c       |
+------+---------+
| 4202 | asasdas |
+------+---------+
1 row in set (0.00 sec)
mysql> select * from t2 where id = 2745;
+------+---------+
| id   | c       |
+------+---------+
| 2745 | asasdas |
+------+---------+
1 row in set (0.00 sec)

So MySQL optimizer detected the problem (somehow).

If I use the NOT DETERMINISTIC keyword, then MySQL works the same as when using the rand() function:

DELIMITER $$
CREATE FUNCTION myrand2() RETURNS INT
   NOT DETERMINISTIC
BEGIN
 RETURN round(rand()*10000, 0);
END$$
DELIMITER ;
mysql> explain select * from t2 where id = myrand2()\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 262208
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

 


Photo by dylan nolte on Unsplash

by Alexander Rubin at December 05, 2018 06:44 PM

Jean-Jerome Schmidt

MySQL on Docker: Multiple Delayed Replication Slaves for Disaster Recovery with Low RTO

Delayed replication allows a replication slave to deliberately lag behind the master by at least a specified amount of time. Before executing an event, the slave will first wait, if necessary, until the given time has passed since the event was created on the master. The result is that the slave will reflect the state of the master some time back in the past. This feature is supported since MySQL 5.6 and MariaDB 10.2.3. It can come in handy in case of accidental data deletion, and should be part of your disaster recovery plan.

The problem when setting up a delayed replication slave is how much delay we should put on. Too short of time and you risk the bad query getting to your delayed slave before you can get to it, thus wasting the point of having the delayed slave. Optionally, you can have your delayed time to be so long that it take hours for your delayed slave to catch up to where the master was at the time of the error.

Luckily with Docker, process isolation is its strength. Running multiple MySQL instances is pretty convenient with Docker. It allows us to have multiple delayed slaves within a single physical host to improve our recovery time and save hardware resources. If you think a 15-minute delay is too short, we can have another instance with 1-hour delay or 6-hour for an even older snapshot of our database.

In this blog post, we are going to deploy multiple MySQL delayed slaves on one single physical host with Docker, and show some recovery scenarios. The following diagram illustrates our final architecture that we want to build:

Our architecture consists of an already deployed 2-node MySQL Replication running on physical servers (blue) and we would like to set up another three MySQL slaves (green) with following behaviour:

  • 15 minutes delay
  • 1 hour delay
  • 6 hours delay

Take note that we are going to have 3 copies of the exact same data on the same physical server. Ensure our Docker host has the storage required, so do allocate sufficient disk space beforehand.

MySQL Master Preparation

Firstly, login to the master server and create the replication user:

mysql> GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%' IDENTIFIED BY 'YlgSH6bLLy';

Then, create a PITR-compatible backup on the master:

$ mysqldump -uroot -p --flush-privileges --hex-blob --opt --master-data=1 --single-transaction --skip-lock-tables --skip-lock-tables --triggers --routines --events --all-databases | gzip -6 -c > mysqldump_complete.sql.gz

If you are using ClusterControl, you can make a PITR-compatible backup easily. Go to Backups -> Create Backup and pick "Complete PITR-compatible" under the "Dump Type" dropdown:

Finally, transfer this backup to the Docker host:

$ scp mysqldump_complete.sql.gz root@192.168.55.200:~

This backup file will be used by the MySQL slave containers during the slave bootstrapping process, as shown in the next section.

Delayed Slave Deployment

Prepare our Docker container directories. Create 3 directories (mysql.conf.d, datadir and sql) for every MySQL container that we are going to launch (you can use loop to simplify the commands below):

$ mkdir -p /storage/mysql-slave-15m/mysql.conf.d
$ mkdir -p /storage/mysql-slave-15m/datadir
$ mkdir -p /storage/mysql-slave-15m/sql
$ mkdir -p /storage/mysql-slave-1h/mysql.conf.d
$ mkdir -p /storage/mysql-slave-1h/datadir
$ mkdir -p /storage/mysql-slave-1h/sql
$ mkdir -p /storage/mysql-slave-6h/mysql.conf.d
$ mkdir -p /storage/mysql-slave-6h/datadir
$ mkdir -p /storage/mysql-slave-6h/sql

"mysql.conf.d" directory will store our custom MySQL configuration file and will be mapped into the container under /etc/mysql.conf.d. "datadir" is where we want Docker to store the MySQL data directory, which maps to /var/lib/mysql of the container and "sql" directory stores our SQL files - backup files in .sql or .sql.gz format to stage the slave before replicating and also .sql files to automate the replication configuration and startup.

15-minute Delayed Slave

Prepare the MySQL configuration file for our 15-minute delayed slave:

$ vim /storage/mysql-slave-15m/mysql.conf.d/my.cnf

And add the following lines:

[mysqld]
server_id=10015
binlog_format=ROW
log_bin=binlog
log_slave_updates=1
gtid_mode=ON
enforce_gtid_consistency=1
relay_log=relay-bin
expire_logs_days=7
read_only=ON

** The server-id value we used for this slave is 10015.

Next, under /storage/mysql-slave-15m/sql directory, create two SQL files, one to RESET MASTER (1reset_master.sql) and another one to establish the replication link using CHANGE MASTER statement (3setup_slave.sql).

Create a text file 1reset_master.sql and add the following line:

RESET MASTER;

Create a text file 3setup_slave.sql and add the following lines:

CHANGE MASTER TO MASTER_HOST = '192.168.55.171', MASTER_USER = 'rpl_user', MASTER_PASSWORD = 'YlgSH6bLLy', MASTER_AUTO_POSITION = 1, MASTER_DELAY=900;
START SLAVE;

MASTER_DELAY=900 is equal to 15 minutes (in seconds). Then copy the backup file taken from our master (that has been transferred into our Docker host) to the "sql" directory and renamed it as 2mysqldump_complete.sql.gz:

$ cp ~/mysqldump_complete.tar.gz /storage/mysql-slave-15m/sql/2mysqldump_complete.tar.gz

The final look of our "sql" directory should be something like this:

$ pwd
/storage/mysql-slave-15m/sql
$ ls -1
1reset_master.sql
2mysqldump_complete.sql.gz
3setup_slave.sql

Take note that we prefix the SQL filename with an integer to determine the execution order when Docker initializes the MySQL container.

Once everything is in place, run the MySQL container for our 15-minute delayed slave:

$ docker run -d \
--name mysql-slave-15m \
-e MYSQL_ROOT_PASSWORD=password \
--mount type=bind,source=/storage/mysql-slave-15m/datadir,target=/var/lib/mysql \
--mount type=bind,source=/storage/mysql-slave-15m/mysql.conf.d,target=/etc/mysql/mysql.conf.d \
--mount type=bind,source=/storage/mysql-slave-15m/sql,target=/docker-entrypoint-initdb.d \
mysql:5.7

** The MYSQL_ROOT_PASSWORD value must be the same as the MySQL root password on the master.

The following lines are what we are looking for to verify if MySQL is running correctly and connected as a slave to our master (192.168.55.171):

$ docker logs -f mysql-slave-15m
...
2018-12-04T04:05:24.890244Z 0 [Note] mysqld: ready for connections.
Version: '5.7.24-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server (GPL)
2018-12-04T04:05:25.010032Z 2 [Note] Slave I/O thread for channel '': connected to master 'rpl_user@192.168.55.171:3306',replication started in log 'FIRST' at position 4

You can then verify the replication status with following statement:

$ docker exec -it mysql-slave-15m mysql -uroot -p -e 'show slave status\G'
...
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
                    SQL_Delay: 900
                Auto_Position: 1
...

At this point, our 15-minute delayed slave container is replicating correctly and our architecture is looking something like this:

1-hour Delayed Slave

Prepare the MySQL configuration file for our 1-hour delayed slave:

$ vim /storage/mysql-slave-1h/mysql.conf.d/my.cnf

And add the following lines:

[mysqld]
server_id=10060
binlog_format=ROW
log_bin=binlog
log_slave_updates=1
gtid_mode=ON
enforce_gtid_consistency=1
relay_log=relay-bin
expire_logs_days=7
read_only=ON

** The server-id value we used for this slave is 10060.

Next, under /storage/mysql-slave-1h/sql directory, create two SQL files, one to RESET MASTER (1reset_master.sql) and another one to establish the replication link using CHANGE MASTER statement (3setup_slave.sql).

Create a text file 1reset_master.sql and add the following line:

RESET MASTER;

Create a text file 3setup_slave.sql and add the following lines:

CHANGE MASTER TO MASTER_HOST = '192.168.55.171', MASTER_USER = 'rpl_user', MASTER_PASSWORD = 'YlgSH6bLLy', MASTER_AUTO_POSITION = 1, MASTER_DELAY=3600;
START SLAVE;

MASTER_DELAY=3600 is equal to 1 hour (in seconds). Then copy the backup file taken from our master (that has been transferred into our Docker host) to the "sql" directory and renamed it as 2mysqldump_complete.sql.gz:

$ cp ~/mysqldump_complete.tar.gz /storage/mysql-slave-1h/sql/2mysqldump_complete.tar.gz

The final look of our "sql" directory should be something like this:

$ pwd
/storage/mysql-slave-1h/sql
$ ls -1
1reset_master.sql
2mysqldump_complete.sql.gz
3setup_slave.sql

Take note that we prefix the SQL filename with an integer to determine the execution order when Docker initializes the MySQL container.

Once everything is in place, run the MySQL container for our 1-hour delayed slave:

$ docker run -d \
--name mysql-slave-1h \
-e MYSQL_ROOT_PASSWORD=password \
--mount type=bind,source=/storage/mysql-slave-1h/datadir,target=/var/lib/mysql \
--mount type=bind,source=/storage/mysql-slave-1h/mysql.conf.d,target=/etc/mysql/mysql.conf.d \
--mount type=bind,source=/storage/mysql-slave-1h/sql,target=/docker-entrypoint-initdb.d \
mysql:5.7

** The MYSQL_ROOT_PASSWORD value must be the same as the MySQL root password on the master.

The following lines are what we are looking for to verify if MySQL is running correctly and connected as a slave to our master (192.168.55.171):

$ docker logs -f mysql-slave-1h
...
2018-12-04T04:05:24.890244Z 0 [Note] mysqld: ready for connections.
Version: '5.7.24-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server (GPL)
2018-12-04T04:05:25.010032Z 2 [Note] Slave I/O thread for channel '': connected to master 'rpl_user@192.168.55.171:3306',replication started in log 'FIRST' at position 4

You can then verify the replication status with following statement:

$ docker exec -it mysql-slave-1h mysql -uroot -p -e 'show slave status\G'
...
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
                    SQL_Delay: 3600
                Auto_Position: 1
...

At this point, our 15-minute and 1-hour MySQL delayed slave containers are replicating from the master and our architecture is looking something like this:

6-hour Delayed Slave

Prepare the MySQL configuration file for our 6-hour delayed slave:

$ vim /storage/mysql-slave-15m/mysql.conf.d/my.cnf

And add the following lines:

[mysqld]
server_id=10006
binlog_format=ROW
log_bin=binlog
log_slave_updates=1
gtid_mode=ON
enforce_gtid_consistency=1
relay_log=relay-bin
expire_logs_days=7
read_only=ON

** The server-id value we used for this slave is 10006.

Next, under /storage/mysql-slave-6h/sql directory, create two SQL files, one to RESET MASTER (1reset_master.sql) and another one to establish the replication link using CHANGE MASTER statement (3setup_slave.sql).

Create a text file 1reset_master.sql and add the following line:

RESET MASTER;

Create a text file 3setup_slave.sql and add the following lines:

CHANGE MASTER TO MASTER_HOST = '192.168.55.171', MASTER_USER = 'rpl_user', MASTER_PASSWORD = 'YlgSH6bLLy', MASTER_AUTO_POSITION = 1, MASTER_DELAY=21600;
START SLAVE;

MASTER_DELAY=21600 is equal to 6 hours (in seconds). Then copy the backup file taken from our master (that has been transferred into our Docker host) to the "sql" directory and renamed it as 2mysqldump_complete.sql.gz:

$ cp ~/mysqldump_complete.tar.gz /storage/mysql-slave-6h/sql/2mysqldump_complete.tar.gz

The final look of our "sql" directory should be something like this:

$ pwd
/storage/mysql-slave-6h/sql
$ ls -1
1reset_master.sql
2mysqldump_complete.sql.gz
3setup_slave.sql

Take note that we prefix the SQL filename with an integer to determine the execution order when Docker initializes the MySQL container.

Once everything is in place, run the MySQL container for our 6-hour delayed slave:

$ docker run -d \
--name mysql-slave-6h \
-e MYSQL_ROOT_PASSWORD=password \
--mount type=bind,source=/storage/mysql-slave-6h/datadir,target=/var/lib/mysql \
--mount type=bind,source=/storage/mysql-slave-6h/mysql.conf.d,target=/etc/mysql/mysql.conf.d \
--mount type=bind,source=/storage/mysql-slave-6h/sql,target=/docker-entrypoint-initdb.d \
mysql:5.7

** The MYSQL_ROOT_PASSWORD value must be the same as the MySQL root password on the master.

The following lines are what we are looking for to verify if MySQL is running correctly and connected as a slave to our master (192.168.55.171):

$ docker logs -f mysql-slave-6h
...
2018-12-04T04:05:24.890244Z 0 [Note] mysqld: ready for connections.
Version: '5.7.24-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server (GPL)
2018-12-04T04:05:25.010032Z 2 [Note] Slave I/O thread for channel '': connected to master 'rpl_user@192.168.55.171:3306',replication started in log 'FIRST' at position 4

You can then verify the replication status with following statement:

$ docker exec -it mysql-slave-6h mysql -uroot -p -e 'show slave status\G'
...
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
                    SQL_Delay: 21600
                Auto_Position: 1
...

At this point, our 5 minutes, 1-hour and 6-hour delayed slave containers are replicating correctly and our architecture is looking something like this:

Disaster Recovery Scenario

Let's say a user has accidentally dropped a wrong column on a big table. Consider the following statement was executed on the master:

mysql> USE shop;
mysql> ALTER TABLE settings DROP COLUMN status;

If you are lucky enough to realize it immediately, you could use the 15-minute delayed slave to catch up to the moment before the disaster happens and promote it to become master, or export the missing data out and restore it on the master.

Firstly, we have to find the binary log position before the disaster happened. Grab the time now() on the master:

mysql> SELECT now();
+---------------------+
| now()               |
+---------------------+
| 2018-12-04 14:55:41 |
+---------------------+

Then, get the active binary log file on the master:

mysql> SHOW MASTER STATUS;
+---------------+----------+--------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                                                                                                                                                                     |
+---------------+----------+--------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| binlog.000004 | 20260658 |              |                  | 1560665e-ed2b-11e8-93fa-000c29b7f985:1-12031,
1b235f7a-d37b-11e8-9c3e-000c29bafe8f:1-62519,
1d8dc60a-e817-11e8-82ff-000c29bafe8f:1-326575,
791748b3-d37a-11e8-b03a-000c29b7f985:1-374 |
+---------------+----------+--------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Using the same date format, extract the information that we want from the binary log, binlog.000004. We estimate the start time to read from the binlog around 20 minutes ago (2018-12-04 14:35:00) and filter the output to show 25 lines before the "drop column" statement:

$ mysqlbinlog --start-datetime="2018-12-04 14:35:00" --stop-datetime="2018-12-04 14:55:41" /var/lib/mysql/binlog.000004 | grep -i -B 25 "drop column"
'/*!*/;
# at 19379172
#181204 14:54:45 server id 1  end_log_pos 19379232 CRC32 0x0716e7a2     Table_map: `shop`.`settings` mapped to number 766
# at 19379232
#181204 14:54:45 server id 1  end_log_pos 19379460 CRC32 0xa6187edd     Write_rows: table id 766 flags: STMT_END_F

BINLOG '
tSQGXBMBAAAAPAAAACC0JwEAAP4CAAAAAAEABnNidGVzdAAHc2J0ZXN0MgAFAwP+/gME/nj+PBCi
5xYH
tSQGXB4BAAAA5AAAAAS1JwEAAP4CAAAAAAEAAgAF/+AYwwAAysYAAHc0ODYyMjI0NjI5OC0zNDE2
OTY3MjY5OS02MDQ1NTQwOTY1Ny01MjY2MDQ0MDcwOC05NDA0NzQzOTUwMS00OTA2MTAxNzgwNC05
OTIyMzM3NzEwOS05NzIwMzc5NTA4OC0yODAzOTU2NjQ2MC0zNzY0ODg3MTYzOTswMTM0MjAwNTcw
Ni02Mjk1ODMzMzExNi00NzQ1MjMxODA1OS0zODk4MDQwMjk5MS03OTc4MTA3OTkwNQEAAADdfhim
'/*!*/;
# at 19379460
#181204 14:54:45 server id 1  end_log_pos 19379491 CRC32 0x71f00e63     Xid = 622405
COMMIT/*!*/;
# at 19379491
#181204 14:54:46 server id 1  end_log_pos 19379556 CRC32 0x62b78c9e     GTID    last_committed=11507    sequence_number=11508   rbr_only=no
SET @@SESSION.GTID_NEXT= '1560665e-ed2b-11e8-93fa-000c29b7f985:11508'/*!*/;
# at 19379556
#181204 14:54:46 server id 1  end_log_pos 19379672 CRC32 0xc222542a     Query   thread_id=3162  exec_time=1     error_code=0
SET TIMESTAMP=1543906486/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
ALTER TABLE settings DROP COLUMN status

In the bottom few lines of the mysqlbinlog output, you should have the erroneous command that was executed at position 19379556. The position that we should restore is one step before this, which is in position 19379491. This is the binlog position where we want our delayed slave to be up to.

Then, on the chosen delayed slave, stop the delayed replication slave and start again the slave to a fixed end position that we figured out above:

$ docker exec -it mysql-slave-15m mysql -uroot -p
mysql> STOP SLAVE;
mysql> START SLAVE UNTIL MASTER_LOG_FILE = 'binlog.000004', MASTER_LOG_POS = 19379491;

Monitor the replication status and wait until Exec_Master_Log_Pos is equal to Until_Log_Pos value. This could take some time. Once caught up, you should see the following:

$ docker exec -it mysql-slave-15m mysql -uroot -p -e 'SHOW SLAVE STATUS\G'
... 
          Exec_Master_Log_Pos: 19379491
              Relay_Log_Space: 50552186
              Until_Condition: Master
               Until_Log_File: binlog.000004
                Until_Log_Pos: 19379491
...

Finally verify if the missing data that we were looking for is there (column "status" still exists):

mysql> DESCRIBE shop.settings;
+--------+------------------+------+-----+---------+----------------+
| Field  | Type             | Null | Key | Default | Extra          |
+--------+------------------+------+-----+---------+----------------+
| id     | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| sid    | int(10) unsigned | NO   | MUL | 0       |                |
| param  | varchar(100)     | NO   |     |         |                |
| value  | varchar(255)     | NO   |     |         |                |
| status | int(11)          | YES  |     | 1       |                |
+--------+------------------+------+-----+---------+----------------+

Then export the table from our slave container and transfer it to the master server:

$ docker exec -it mysql-slave-1h mysqldump -uroot -ppassword --single-transaction shop settings > shop_settings.sql

Drop the problematic table and restore it back on the master:

$ mysql -uroot -p -e 'DROP TABLE shop.settings'
$ mysqldump -uroot -p -e shop < shop_setttings.sql

We have now recovered our table back to its original state before the disastrous event. To summarize, delayed replication can be used for several purposes:

  • To protect against user mistakes on the master. A DBA can roll back a delayed slave to the time just before the disaster.
  • To test how the system behaves when there is a lag. For example, in an application, a lag might be caused by a heavy load on the slave. However, it can be difficult to generate this load level. Delayed replication can simulate the lag without having to simulate the load. It can also be used to debug conditions related to a lagging slave.
  • To inspect what the database looked like in the past, without having to reload a backup. For example, if the delay is one week and the DBA needs to see what the database looked like before the last few days' worth of development, the delayed slave can be inspected.

Final Thoughts

With Docker, running multiple MySQL instances on a same physical host can be done efficiently. You may use Docker orchestration tools like Docker Compose and Swarm to simplify the multi-container deployment as opposed to the steps shown in this blog post.

by ashraf at December 05, 2018 09:50 AM

December 04, 2018

Peter Zaitsev

Percona Server for MySQL 5.7.24-26 Is Now Available

Percona Server for MySQL

Percona Server for MySQLPercona announces the release of Percona Server for MySQL 5.7.24-26 on December 4, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.7.24, including all the bug fixes in it. Percona Server for MySQL 5.7.24-26 is now the current GA release in the 5.7 series. All of Percona’s software is open-source and free.

This release includes fixes to the following upstream CVEs (Common Vulnerabilities and Exposures): CVE-2016-9843, CVE-2018-3155, CVE-2018-3143, CVE-2018-3156, CVE-2018-3251, CVE-2018-3133, CVE-2018-3144, CVE-2018-3185, CVE-2018-3247CVE-2018-3187, CVE-2018-3174, CVE-2018-3171. For more information, see Oracle Critical Patch Update Advisory – October 2018.

Improvements

  • PS-4790: Improve user statistics accuracy

Bugs Fixed

  • Slave replication could break if upstream bug #74145 (FLUSH LOGS improperly disables the logging if the log file cannot be accessed) occurred in master. Bug fixed PS-1017 (Upstream #83232).
  • Setting the tokudb_last_lock_timeout variable via the command line could cause the server to stop working when the actual timeout took place. Bug fixed PS-4943.
  • Dropping a TokuDB table with non-alphanumeric characters could lead to a crash. Bug fixed PS-4979.
  • When using the MyRocks storage engine, the server could crash after running ALTER TABLE DROP INDEX on a slave. Bug fixed PS-4744.
  • The audit log could be corrupted when the audit_log_rotations variable was changed at runtime. Bug fixed PS-4950.

Other Bugs Fixed

  • PS-4781: sql_yacc.yy uses SQLCOM_SELECT instead of SQLCOM_SHOW_XXXX_STATS
  • PS-4881: Add LLVM/clang 7 to Travis-CI
  • PS-4825: Backport MTR fixes from 8.0
  • PS-4998: Valgrind: compilation fails with: writing to ‘struct buf_buddy_free_t’ with no trivial copy-assignment
  • PS-4980: Valgrind: Syscall param write(buf) points to uninitialised byte(s): Event_encrypter::encrypt_and_write()
  • PS-4982: Valgrind: Syscall param io_submit(PWRITE) points to uninitialised byte(s): buf_dblwr_write_block_to_datafile()
  • PS-4983: Valgrind: Syscall param io_submit(PWRITE) points to uninitialised byte(s): buf_flush_write_block_low()
  • PS-4951: Many libc-related Valgrind errors on CentOS7
  • PS-5012: Valgrind: misused UNIV_MEM_ALLOC after ut_zalloc_nokey
  • PS-4908: UBSan and valgrind errors with encrypted temporary files
  • PS-4532: Replace obsolete HAVE_purify with HAVE_VALGRIND in ha_rocksdb.cc
  • PS-4955: Backport mysqld fixes for valgrind warnings from 8.0
  • PS-4529: MTR: index_merge_rocksdb2 inadvertently tests InnoDB instead of MyRocks
  • PS-5056: handle_fatal_signal (sig=11) in ha_tokudb::write_row
  • PS-5084: innodb_buffer_pool_size is an uninitialized variable
  • PS-4836: Missing PFS signed variable aggregation
  • PS-5033: rocksdb.show_engine: Result content mismatch
  • PS-5034: rocksdb.rocksdb: Result content mismatch
  • PS-5035: rocksdb.show_table_status: 1051: Unknown table ‘db_new’

Find the release notes for Percona Server for MySQL 5.6.24-26 in our online documentation. Report bugs in the Jira bug tracker.

 

by Borys Belinsky at December 04, 2018 05:54 PM

MongoDB 4.0: Using ACID Multi-Document Transactions

mongodb 4.0 acid compliant transactions

mongodb 4.0 acid compliant transactionsMongoDB 4.0 is around, and there are a lot of new features and improvements. In this article we’re going to focus on the major feature which is, undoubtedly, the support for multi-document ACID transactions. This novelty for a NoSQL database could be seen as a way to get closer to the relational world. Well, it’s not that—or maybe not just that. It’s a way to add to the document-based model a new, important, and often requested feature to address a wider range of use cases. The document model and its flexibility should remain the best way to start building an application on MongoDB. At this stage, transactions should be used in specific cases, when you absolutely need them: for example, because your application is aware of data consistency and atomicity. Transactions incur a greater performance cost over single document writes, so the denormalized data model will continue to be optimal in many cases and this helps to minimize the need for transactions.

Single writes are atomic by design: as long as you are able to embed documents in your collections you absolutely don’t need to use a transaction. Even so, transaction support is a very good and interesting feature that you can rely on in MongoDB from now on.

MongoDB 4.0 provides fully ACID transactions support but remember:

  • multi-document transactions are available for replica set deployments only
    • you can use transactions even on a standalone server but you need to configure it as a replica set (with just one node)
  • multi-document transactions are not available for sharded cluster
    • hopefully transactions will be available from version 4.2
  • multi-document transactions are available for the WiredTiger storage engine only

ACID transactions in MongoDB 4.0

ACID properties are well known in the world of relational databases, but let’s recap what the acronym means.

  • Atomicity: a group of commands inside the transaction must follow the “all or nothing” paradigm. If only one of the commands fails for any reason, the complete transaction fails as well.
  • Consistency: if a transaction successfully executes, it will take the database from one state that is consistent to another state that is also consistent.
  • Isolation: multiple transactions can run at the same time in the system. Isolation guarantees that each transaction is not able to view partial results of the others. Executing multiple transactions in parallel must have the same results as running them sequentially
  • Durability: it guarantees that a transaction that has committed will remain persistent, even in the case of a system failure

Limitations of transactions

The support for transactions introduced some limitations:

  • a collection MUST exist in order to use transactions
  • a collection cannot be created or dropped inside a transaction
  • an index cannot be created or dropped inside a transaction
  • non-CRUD operations are not permitted inside a transaction (for example, administrative commands like createUser are not permitted )
  • a transaction cannot read or write in config, admin, and local databases
  • a transaction cannot write to system.* collections
  • the size of a transaction is limited to 16MB
    • a single oplog entry is generated during the commit: the writes inside the transaction don’t have single oplog entries as in regular queries
    • the limitation is a consequence of the 16MB maximum size of any BSON document in the oplog
    • in case of larger transactions, you should consider splitting these into smaller transactions
  • by default a transaction that executes for longer then 60 seconds will automatically expire
    • you can change this using the configuration parameter transactionLifetimeLimitSeconds
    • transactions rely on WiredTiger snapshot capability, and having a long running transaction can result in high pressure on WiredTiger’s cache to maintain snapshots, and lead to the retention of a lot of unflushed operations in memory

Sessions

Sessions were deployed in version 3.6 in order to run the retryable writes (for example) but they are very important, too, for transactions. In fact any transaction is associated with an open session. Prior to starting a transaction, a session must be created. A transaction cannot be run outside a session.

At any given time you may have multiple running sessions in the system, but each session may run only a single transaction at a time. You can run transactions in parallel according to how many open sessions you have.

Three new commands were introduce for creating, committing, and aborting transactions:

  • session.startTransaction()
    • starts a new transaction in the current session
  • session.commitTransaction()
    • saves consistently and durably the changes made by the operations in the transaction
  • session.abortTransaction()
    • the transaction ends without saving any of the changes made by the operations in the transaction

Note: in the following examples, we use two different connections to create two sessions. We do this for the sake of simplicity, but remember that you can create multiple sessions even inside a single connection, assigning each session to a different variable.

Our first transaction

To test our first transaction if you don’t have a replica set already configured let’s start a standalone server like this:

#> mongod --dbpath /data/db --logpath /data/mongo.log --fork --replSet foo

Create a new collection, and insert some data.

foo:PRIMARY> use percona
switched to db percona
foo:PRIMARY> db.createCollection('people')
{
   "ok" : 1,
   "operationTime" : Timestamp(1538483120, 1),
   "$clusterTime" : {
      "clusterTime" : Timestamp(1538483120, 1),
      "signature" : {
         "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
         "keyId" : NumberLong(0)
       }
    }
}
foo:PRIMARY> db.people.insert([{_id:1, name:"Corrado"},{_id:2, name:"Peter"},{_id:3,name:"Heidi"}])

Create a session

foo:PRIMARY> session = db.getMongo().startSession()
session { "id" : UUID("dcfa7de5-527d-4b1c-a890-53c9a355920d") }

Start a transaction and insert some new documents

foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").people.insert([{_id: 4 , name : "George"},{_id: 5, name: "Tom"}])
WriteResult({ "nInserted" : 2 })

Now read the collection from inside and outside the session and see what happens

foo:PRIMARY> session.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }
foo:PRIMARY> db.people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }

As you might notice, since the transaction is not yet committed, you can see the modifications only from inside the session. You cannot see any of the modifications outside of the session, even in the same connection. If you try to open a new connection to the database, then you will not be able to see any of the modifications either.

Now, commit the transaction and see that you can now read the same data both inside and outside the session, as well as from any other connection.

foo:PRIMARY> session.commitTransaction()
foo:PRIMARY> session.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }
foo:PRIMARY> db.people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }

When the transaction is committed, all the data are written consistently and durably in the database, just like any typical write. So, writing to the journal file and to the oplog takes place in the same way it as for any single write that’s not inside a transaction. As long as the transaction is open, any modification is stored in memory.

Isolation test

Let’s test now the isolation between two concurrent transactions.

Open the first connection, create a session and start a transaction:

//Connection #1
foo:PRIMARY> var session1 = db.getMongo().startSession()
foo:PRIMARY> session1.startTransaction()

do the same on the second connection:

//Connection #2
foo:PRIMARY> var session2 = db.getMongo().startSession()
foo:PRIMARY> session2.startTransaction()

Update the document on connection #1 to record Heidi’s document. Add the gender field to the document.

//Connection #1
foo:PRIMARY> session1.getDatabase("percona").people.update({_id:3},{$set:{ gender: "F" }})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
foo:PRIMARY> session1.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi", "gender" : "F" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }

Update the same collection on connection #2 to add the same gender field to all the males:

//Connection #2
foo:PRIMARY> session2.getDatabase("percona").people.update({_id:{$in:[1,2,4,5]}},{$set:{ gender: "M" }},{multi:"true"})
WriteResult({ "nMatched" : 4, "nUpserted" : 0, "nModified" : 4 })
foo:PRIMARY> session2.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado", "gender" : "M" }
{ "_id" : 2, "name" : "Peter", "gender" : "M" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George", "gender" : "M" }
{ "_id" : 5, "name" : "Tom", "gender" : "M" }

The two transactions are isolated, each one can see only the ongoing modifications that it has made itself.

Commit the transaction in connection #1:

//Connection #1
foo:PRIMARY> session1.commitTransaction()
foo:PRIMARY> session1.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi", "gender" : "F" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }

In the connection #2 read the collection:

//Connection #2
foo:PRIMARY> session1.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado", "gender" : "M" }
{ "_id" : 2, "name" : "Peter", "gender" : "M"  }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George", "gender" : "M"  }
{ "_id" : 5, "name" : "Tom", "gender" : "M"  }

As you can see the second transaction still sees its own modifications, and cannot see the already committed updates of the other transaction. This kind of isolation works the same as the “REPEATABLE READ” level of MySQL and other relational databases.

Now commit the transaction in connection #2 and see the new values of the collection:

//Connection #2
foo:PRIMARY> session2.commitTransaction()
foo:PRIMARY> session2.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado", "gender" : "M" }
{ "_id" : 2, "name" : "Peter", "gender" : "M" }
{ "_id" : 3, "name" : "Heidi", "gender" : "F" }
{ "_id" : 4, "name" : "George", "gender" : "M" }
{ "_id" : 5, "name" : "Tom", "gender" : "M" }

Conflicts

When two (or more) concurrent transactions modify the same documents, we may have a conflict. MongoDB can detect a conflict immediately, even while transactions are not yet committed. The first transaction to acquire the lock on a document will continue, the second one will receive the conflict error message and fail. The failed transaction can then be retried later.

Let’s see an example.

Create a new transaction in connection #1 to update Heidi’s document. We want to change the name to Luise.

//Connection #1
foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").people.update({name:"Heidi"},{$set:{name:"Luise"}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Let’s try to modify the same document in a concurrent transaction in connection #2. Modify the name from Heidi to Marie in this case.

//Connection #2
foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").people.update({name:"Heidi"},{$set:{name:"Marie"}})
WriteCommandError({
    "errorLabels" : [
       "TransientTransactionError"
    ],
    "operationTime" : Timestamp(1538495683, 1),
    "ok" : 0,
    "errmsg" : "WriteConflict",
    "code" : 112,
    "codeName" : "WriteConflict",
    "$clusterTime" : {
       "clusterTime" : Timestamp(1538495683, 1),
       "signature" : {
            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
            "keyId" : NumberLong(0)
       }
     }
})

We received an error and the transaction failed. We can retry it later.

Other details

  • the individual writes inside the transaction are not retry-able even if retryWrites is set to true
  • each commit operation is a retry-able write operation regardless of whether retryWrites is set to true. The drivers retry the commit a single time in case of an error.
  • Read Concern supports snapshot, local and majority values
  • Write Concern can be set at the transaction level. The individual operations inside the transaction ignore the write concern. Write concern is evaluated during the commit
  • Read Preference supports only primary value

Conclusions

Transaction support in MongoDB 4.0 is a very interesting new feature, but it isn’t fully mature yet, there are strong limitations at this stage: a transaction cannot be larger than 16MB, you cannot use it on sharded clusters and others. If you absolutely need a transaction in your application use it. But don’t use transactions only because they are cool, since in some cases a proper data model based on embedding documents in collections and denormalizing your data could be the best solution. MongoDB isn’t by its nature a relational database; as long as you are able to model your data keeping in mind that it’s a NOSQL database you should avoid using transactions. In specific cases, or if you already have a database with strong “informal relations” between the collections that you cannot change, then you could choose to rely on transactions.

Image modified from original photo: by Annie Spratt on Unsplash

by Corrado Pandiani at December 04, 2018 01:13 PM

December 03, 2018

Peter Zaitsev

Percona Live 2019 Call for Papers is Now Open!

Percona Live CFP 2019

Percona Live 2019Announcing the opening of the Percona Live 2019 Open Source Database Conference call for papers. It will be open from now until January 20, 2019. The Percona Live Open Source Database Conference 2019 takes place May 28-30 in Austin, Texas.

Our theme this year is CONNECT. ACCELERATE. INNOVATE.

As a speaker at Percona Live, you’ll have the opportunity to CONNECT with your peers—open source database experts and enthusiasts who share your commitment to improving knowledge and exchanging ideas. ACCELERATE your projects and career by presenting at the premier open source database event, a great way to build your personal and company brands. And influence the evolution of the open source software movement by demonstrating how you INNOVATE!

Community initiatives remain core to the open source ethos, and we are proud of the contribution we make with Percona Live in showcasing thought leading practices in the open source database world.

With a nod to innovation, this year we are introducing a business track to benefit those business leaders who are exploring the use of open source and are interested in learning more about its costs and benefits.

Speaking Opportunities

The Percona Live Open Source Database Conference 2019 Call for Papers is open until January 20, 2019. We invite you to submit your speaking proposal for breakout, tutorial or lightning talk sessions. Classes and talks are invited for Foundation (either entry-level or of general interest to all), Core (intermediate), and Masterclass (advanced) levels.

  • Breakout Session. Broadly cover a technology area using specific examples. Sessions should be either 25 minutes or 50 minutes in length (including Q&A).
  • Tutorial Session. Present a technical session that aims for a level between a training class and a conference breakout session. We encourage attendees to bring and use laptops for working on detailed and hands-on presentations. Tutorials will be three or six hours in length (including Q&A).
  • Lightning Talk. Give a five-minute presentation focusing on one key point that interests the open source community: technical, lighthearted or entertaining talks on new ideas, a successful project, a cautionary story, a quick tip or demonstration.

If your proposal is selected for breakout or tutorial sessions, you will receive a complimentary full conference pass.

Topics and Themes

We want proposals that cover the many aspects of application development using all open source databases, as well as new and interesting ways to monitor and manage database environments. Did you just embrace open source databases this year? What are the technical and business values of moving to or using open source databases? How did you convince your company to make the move? Was there tangible ROI?

Best practices and current trends, including design, application development, performance optimization, HA and clustering, cloud, containers and new technologies –  what’s holding your focus? Share your case studies, experiences and technical knowledge with an engaged audience of open source peers.

In the submission entry, indicate which of these themes your proposal best fits: tutorial, business needs; case studies/use cases; operations; or development. Also include which track(s) from the list below would be best suited to your talk.

Tracks

The conference committee is looking for proposals that cover the many aspects of using, deploying and managing open source databases, including:

  • MySQL. Do you have an opinion on what is new and exciting in MySQL? With the release of MySQL 8.0, are you using the latest features? How and why? Are they helping you solve any business issues, or making deployment of applications and websites easier, faster or more efficient? Did the new release influence you to change to MySQL? What do you see as the biggest impact of the MySQL 8.0 release? Do you use MySQL in conjunction with other databases in your environment?
  • MariaDB. Talks highlighting MariaDB and MariaDB compatible databases and related tools. Discuss the latest features, how to optimize performance, and demonstrate the best practices you’ve adopted from real production use cases and applications.
  • PostgreSQL. Why do you use PostgreSQL as opposed to other SQL options? Have you done a comparison or benchmark of PostgreSQL vs. other types of databases related to your applications? Why, and what were the results? How does PostgreSQL help you with application performance or deployment? How do you use PostgreSQL in conjunction with other databases in your environment?
  • MongoDB. Has the 4.0 release improved your experience in application development or time-to-market? How are the new features making your database environment better? What is it about MongoDB 4.0 that excites you? What are your experiences with Atlas? Have you moved to it, and has it lived up to its promises? Do you use MongoDB in conjunction with other databases in your environment?
  • Polyglot Persistence. How are you using multiple open source databases together? What tools and technologies are helping you to get them interacting efficiently? In what ways are multiple databases working together helping to solve critical business issues? What are the best practices you’ve discovered in your production environments?
  • Observability and Monitoring. How are you designing your database-powered applications for observability? What monitoring tools and methods are providing you with the best application and database insights for running your business? How are you using tools to troubleshoot issues and bottlenecks? How are you observing your production environment in order to understand the critical aspects of your deployments? 
  • Kubernetes. How are you running open source databases on the Kubernetes, OpenShift and other container platforms? What software are you using to facilitate their use? What best practices and processes are making containers a vital part of your business strategy? 
  • Automation and AI. How are you using automation to run databases at scale? Are you using automation to create self-running, self-healing, and self-tuning databases? Is machine learning and artificial intelligence (AI) helping you create a new generation of database automation?
  • Migration to Open Source Databases. How are you migrating to open source databases? Are you migrating on-premises or to the cloud? What are the tools and strategies you’ve used that have been successful, and what have you learned during and after the migration? Do you have real-world migration stories that illustrate how best to migrate?
  • Database Security and Compliance. All of us have experienced security and compliance challenges. From new legislation like GDPR, PCI and HIPAA, exploited software bugs, or new threats such as ransomware attacks, when is enough “enough”? What are your best practices for preventing incursions? How do you maintain compliance as you move to the cloud? Are you finding that security and compliance requirements are preventing your ability to be agile?
  • Other Open Source Databases. There are many, many great open source database software and solutions we can learn about. Submit other open source database talk ideas – we welcome talks for both established database technologies as well as the emerging new ones that no one has yet heard about (but should).
  • Business and Enterprise. Has your company seen big improvements in ROI from using Open Source Databases? Are there efficiency levels or interesting case studies you want to share? How did you convince your company to move to Open Source?

How to Respond to the Call for Papers

For information on how to submit your proposal, visit our call for papers page.

Sponsorship

If you would like to obtain a sponsor pack for Percona Live Open Source Database Conference 2019, you will find more information including a prospectus on our sponsorship page. You are welcome to contact me, Bronwyn Campbell, directly.

by Bronwyn Campbell at December 03, 2018 05:43 PM

December 02, 2018

Valeriy Kravchuk

Fun with Bugs #74 - On MySQL Bug Reports I am Subscribed to, Part XI

For some reason the Committee of FOSDEM 2019 MySQL, MariaDB & Friends Devroom of all my talks submitted picked up the one on how to create a useful MySQL bug report, so I have no options but continue to write about MySQL bugs, as long and MySQL Community wants and even prefers to listen and read about them... That's what I do, with pleasure.

Today I'll continue my series of posts about community bug reports I am subscribed to with the review of bugs reported since October 1, 2018, starting from the oldest and skipping those MySQL 8 regression ones I've already commented on:
  • Bug #92609 - "Upgrade to 8.0.12 fails". This bug reported by Frederic Steinfels is about upgrade from MySQL 5.7 that leads to crash. Nice workaround was found:
    "The work around is delete all .TRG file (or move them to out side of mysql data folder) then update. After success we can re-create the trigger."
    The bug is really fixed in MySQL 8.0.14 according to the last comment, but for some reason it is still "Verified". Probably will be closed when MySQL 8.0.14 is released.
  • Bug #92631 - "importing dump from mysqldump --all-databases breaks SYS schema due to routines". This bug affecting MySQL 5.7 (and not 8.0) was reported by Shane Bester. Workaround is actually documented in the manual - add sys schema explicitly while dumping and dump it separately, then re-create.
  • Bug #92661 - "SELECT on key partitioned enum reading all partitions instead of 1". Interesting corner case found by Frederic Steinfels. MariaDB 10.3.7 also seems affected:
    MariaDB [test]> explain partitions select id from product where outdated='0';
    +------+-------------+---------+------------+------+---------------+----------+-
    --------+-------+------+--------------------------+
    | id   | select_type | table   | partitions | type | possible_keys | key      |
    key_len | ref   | rows | Extra                    |
    +------+-------------+---------+------------+------+---------------+----------+-
    --------+-------+------+--------------------------+
    |    1 | SIMPLE      | product | p0,p1      | ref  | outdated      | outdated |
    1       | const |    2 | Using where; Using index |
    +------+-------------+---------+------------+------+---------------+----------+-
    --------+-------+------+--------------------------+
    1 row in set (0.002 sec)

    MariaDB [test]> explain partitions select id from product where outdated='1';
    +------+-------------+---------+------------+------+---------------+----------+-
    --------+-------+------+--------------------------+
    | id   | select_type | table   | partitions | type | possible_keys | key      |
    key_len | ref   | rows | Extra                    |
    +------+-------------+---------+------------+------+---------------+----------+-
    --------+-------+------+--------------------------+
    |    1 | SIMPLE      | product | p0,p1      | ref  | outdated      | outdated |
    1       | const | 2048 | Using where; Using index |
    +------+-------------+---------+------------+------+---------------+----------+-
    --------+-------+------+--------------------------+
    1 row in set (0.002 sec)
  • Bug #92809 - "Inconsistent ResultSet for different Execution Plans". The full test case is not public and it took a lot of arguing until this bug (reported by Juan Arruti) was finally "Verified". Based on the workaround, setting optimizer_switch='materialization=off', this feature of MySQL optimizer is still problematic.
  • Bug #92850 - "Bad select+order by+limit performance in 5.7". As Sveta Smirnova demonstrated, there are still cases when FORCE INDEX hints really needed to help optimizer to use proper plan, even in somewhat obvious case of single table and single proper index... What a surprise!
  • Bug #92882 - "MTS not replication crash-safe with GTID and all the right parameters." As  Jean-François Gagné proved, at least statement-based multi-threaded replication with GTIDs is not safe in case of OS crash. Good that there is an easy enough workaround: stop slave; reset slave; start slave; See also his Bug #93081 - "Please implement a better relay log recovery." that refers to several more known problems with relay log recovery.
  • Bug #92949 - "add auto_increment column as PK cause RBR replication inconsistent". This probably should never happen in production (adding primary key while data are already active changed concurrently), but still this is a nice corner case reported by Fungo Wang.
  • Bug #92996 - "ANALYZE TABLE still locks tables 10 years later". Domas Mituzas is trying hard to escalate this well known problem of blocking queries if ANALYZE TABLE was executed at some wrong time (when long running query against the table was in progress). The problem was resolved this year in Percona Server, see this blog post. See also my MDEV-15101 (fix is planned for version 10.4 in MariaDB).
  • Bug #93033 - "Missing info on partitioned tables in I_S.INNODB_COLUMNS after upgrade to 8.0". Yet another regression in MySQL 8 vs 5.7 was reported by Alexey Kopytov.
  • Bug #93049 - "ORDER BY pk, otherdata should just use PK". There is no reason to use filesort, as Domas Mituzas kindly noted. MariaDB is also, unfortunately, affected.
  • Bug #93083 - "InnoDB: Failing assertion: srv_read_only_mode || trx->in_depth > 0". This Severity 6 bug (only debug binaries are directly affected) was reported by Ramesh Sivaraman from Percona QA. At least it was verified instantly (I've subscribed to double check what happens to bugs with low severity levels).
I'll stop for now. More detailed review of remaining bugs reported in November is coming soon.

Sheep are everywhere in East Sussex and bugs are everywhere in MySQL. Not that many, but still they are visible.
To summarize my conclusions from this list:
  1. Sometimes it takes too much efforts to force proper bug report processing. I had written more about this here.
  2. Having materialization=on in optimizer_switch in MySQL 5.7+ may cause wrong results. Take care to double check.
  3. There still cases of single table SELECTs where optimizer can do a much better job. FORCE INDEX helps, sometimes.
  4. Multi-threaded statement-based replication in MySQL 5.6 and 5.7 is not crash safe, even with GTIDs, relay_log_info_repository-TABLE and relay_log_recovery=ON. A lot of improvements in relay log recovery are needed.
  5. Oracle engineers still care to document workarounds in active public bug reports. This is great!
  6. Percona still fixes some really important and annoying bugs way before other vendors.
  7. MySQL 8 is different in many small details, so regressions are to be expected.

by Valeriy Kravchuk (noreply@blogger.com) at December 02, 2018 04:30 PM

November 30, 2018

Peter Zaitsev

PostgreSQL Streaming Physical Replication With Slots

postgres replication using slots

PostgreSQLPostgreSQL streaming physical replication with slots simplifies setup and maintenance procedures. Usually, you should estimate disk usage for the Write Ahead Log (WAL) and provide appropriate limitation to the number of segments and setup of the WAL archive procedure. In this article, you will see how to use replication with slots and understand what problems it could solve.

Introduction

PostgreSQL physical replication is based on WAL. Th Write Ahead Log contains all database changes, saved in 16MB segment files. Normally postgres tries to keep segments between checkpoints. So with default settings, just 1GB of WAL segment files is available.

Replication requires all WAL files created after backup and up until the current time. Previously, it was necessary to keep a huge archive directory (usually mounted by NFS to all slave servers). The slots feature introduced in 9.4 allows Postgres to track the latest segment downloaded by a slave server. Now, PostgreSQL can keep all segments on disk, even without archiving, if a slave is seriously behind its master due to downtime or networking issues. The drawback: the disk space could be consumed infinitely in the case of configuration error. Before continuing, if you need a better understanding of physical replication and streaming replication, I recommend you read “Streaming Replication with PostgreSQL“.

Create a sandbox with two PostgreSQL servers

To setup replication, you need at least two PostgreSQL servers. I’m using pgcli (pgc) to setup both servers on the same host. It’s easy to install on Linux, Windows, and OS X, and provides the ability to download and run any version of PostgreSQL on your staging server or even on your laptop.

python -c "$(curl -fsSL https://s3.amazonaws.com/pgcentral/install.py)"
mv bigsql master
cp -r master slave
$ cd master
master$ ./pgc install pg10
master$ ./pgc start pg10
$ cd ../slave
slave$ ./pgc install pg10
slave$ ./pgc start pg10

First of all you should allow the replication user to connect:

master$ echo "host replication replicator 127.0.0.1/32 md5" >> ./data/pg10/pg_hba.conf

If you are running master and slave on different servers, please replace 127.0.0.1 with the slave’s address.

Next pgc creates a shell environment file with PATH and all the other variables required for PostgreSQL:

master$ source ./pg10/pg10.env

Allow connections from the remote host, and create a replication user and slot on master:

master$ psql
postgres=# CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'replicator';
CREATE ROLE
postgres=# ALTER SYSTEM SET listen_addresses TO '*';
ALTER SYSTEM
postgres=# SELECT pg_create_physical_replication_slot('slot1');
pg_create_physical_replication_slot
-------------------------------------
(slot1,)

To apply system variables changes and hba.conf, restart the Postgres server:

master$ ./pgc stop ; ./pgc start
pg10 stopping
pg10 starting on port 5432

Test table

Create a table with lots of padding on the master:

master$ psql psql (10.6) Type "help" for help.
postgres=# CREATE TABLE t(id INT, pad CHAR(200));
postgres=# CREATE INDEX t_id ON t (id);
postgres=# INSERT INTO t SELECT generate_series(1,1000000) AS id, md5((random()*1000000)::text) AS pad;

Filling WAL with random data

To see the benefits of slots, we should fill the WAL with some data by running transactions. Repeat the update statement below to generate a huge amount of WAL data:

UPDATE t SET pad = md5((random()*1000000)::text);

Checking the current WAL size

You can check total size for all WAL segments from the shell or from psql:

master$ du -sh data/pg10/pg_wal
17M data/pg10/pg_wal
master$ source ./pg10/pg10.env
master$ psql
postgres=# \! du -sh data/pg10/pg_wal
17M data/pg10/pg_wal

Check maximum WAL size without slots activated

Before replication configuration, we can fill the WAL with random data and find that after 1.1G, the data/pg10/pg_wal directory size does not increase regardless of the number of update queries.

postgres=# UPDATE t SET pad = md5((random()*1000000)::text); -- repeat 4 times
postgres=# \! du -sh data/pg10/pg_wal
1.1G data/pg10/pg_wal
postgres=# UPDATE t SET pad = md5((random()*1000000)::text);
postgres=# \! du -sh data/pg10/pg_wal
1.1G data/pg10/pg_wal

Backup master from the slave server

Next, let’s make a backup for our slot1:

slave$ source ./pg10/pg10.env
slave$ ./pgc stop pg10
slave$ rm -rf data/pg10/*
# If you are running master and slave on different servers, replace 127.0.0.1 with master's IP address.
slave$ PGPASSWORD=replicator pg_basebackup -S slot1 -h 127.0.0.1 -U replicator -p 5432 -D $PGDATA -Fp -P -Xs -Rv

Unfortunately pg_basebackup hangs with: initiating base backup, waiting for checkpoint to complete.
We can wait for the next checkpoint, or force the checkpoint on the master. Checkpoint happens every checkpoint_timeout seconds, and is set to five minutes by default.

Forcing checkpoint on master:

master$ psql
postgres=# CHECKPOINT;

The backup continues on the slave side:

pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/92000148 on timeline 1
pg_basebackup: starting background WAL receiver
1073986/1073986 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/927FDDE8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed

The backup copies settings from the master, including its TCP port value. I’m running both master and slave on the same host, so I should change the port in the slave .conf file:

slave$ vim data/pg10/postgresql.conf
# old value port = 5432
port = 5433

Now we can return to the master and run some queries:

slave$ cd ../master
master$ source pg10/pg10.env
master$ psql
postgres=# UPDATE t SET pad = md5((random()*1000000)::text);
UPDATE t SET pad = md5((random()*1000000)::text);

By running these queries, the WAL size is now 1.4G, and it’s bigger than 1.1G! Repeat this update query three times and the WAL grows to 2.8GB:

master$ du -sh data/pg10/pg_wal
2.8G data/pg10/pg_wal

Certainly, the WAL could grow infinitely until whole disk space is consumed.
How do we find out the reason for this?

postgres=# SELECT redo_lsn, slot_name,restart_lsn,
round((redo_lsn-restart_lsn) / 1024 / 1024 / 1024, 2) AS GB_behind
FROM pg_control_checkpoint(), pg_replication_slots;
redo_lsn    | slot_name | restart_lsn | gb_behind
------------+-----------+-------------+-----------
1/2A400630  | slot1     |  0/92000000 | 2.38

We have one slot behind the master of 2.38GB.

Let’s repeat the update and check again. The gap has increased:

postgres=# postgres=# SELECT redo_lsn, slot_name,restart_lsn,
round((redo_lsn-restart_lsn) / 1024 / 1024 / 1024, 2) AS GB_behind
FROM pg_control_checkpoint(), pg_replication_slots;
redo_lsn    | slot_name | restart_lsn | gb_behind
------------+-----------+-------------+-----------
1/8D400238  |     slot1 | 0/92000000  | 3.93

Wait, though: we have already used slot1 for backup! Let’s start the slave:

master$ cd ../slave
slave$ ./pgc start pg10

Replication started without any additional change to recovery.conf:

slave$ cat data/pg10/recovery.conf
standby_mode = 'on'
primary_conninfo = 'user=replicator password=replicator passfile=''/home/pguser/.pgpass'' host=127.0.0.1 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'
primary_slot_name = 'slot1'

pg_basebackup -R option instructs backup to write to the recovery.conf file with all required options, including primary_slot_name.

WAL size, all slots connected

The gap reduced several seconds after the slave started:

postgres=# SELECT redo_lsn, slot_name,restart_lsn,
round((redo_lsn-restart_lsn) / 1024 / 1024 / 1024, 2) AS GB_behind
FROM pg_control_checkpoint(), pg_replication_slots;
redo_lsn    | slot_name | restart_lsn | gb_behind
------------+-----------+-------------+-----------
 1/8D400238 |     slot1 |  0/9A000000 | 3.80

And a few minutes later:

postgres=# SELECT redo_lsn, slot_name,restart_lsn,
round((redo_lsn-restart_lsn) / 1024 / 1024 / 1024, 2) AS GB_behind
FROM pg_control_checkpoint(), pg_replication_slots;
redo_lsn    | slot_name | restart_lsn | gb_behind
------------+-----------+-------------+-----------
 1/9E5DECE0 |     slot1 |  1/9EB17250 | -0.01
postgres=# \!du -sh data/pg10/pg_wal
1.3G data/pg10/pg_wal

Slave server maintenance

Let’s simulate slave server maintenance with ./pgc stop pg10 executed on the slave. We’ll push some data onto the master again (execute the UPDATE query 4 times).

Now, “slot1” is again 2.36GB behind.

Removing unused slots

By now, you might realize that a problematic slot is not in use. In such cases, you can drop it to allow retention for segments:

master$ psql
postgres=# SELECT pg_drop_replication_slot('slot1');

Finally the disk space is released:

master$ du -sh data/pg10/pg_wal
1.1G data/pg10/pg_wal

Important system variables

  • archive_mode is not required for streaming replication with slots.
  • wal_level – is replica by default
  • max_wal_senders – set to 10 by default, a minimum of three for one slave, plus two for each additional slave
  • wal_keep_segments – 32 by default, not important because PostgreSQL will keep all segments required by slot
  • archive_command – not important for streaming replication with slots
  • listen_addresses – the only option that it’s necessary to change, to allow remote slaves to connect
  • hot_standby – set to on by default, important to enable reads on slave
  • max_replication_slots – 10 by default https://www.postgresql.org/docs/10/static/runtime-config-replication.html

Summary

  • Physical replication setup is really easy with slots. By default in pg10, all settings are already prepared for replication setup.
  • Be careful with orphaned slots. PostgreSQL will not remove WAL segments for inactive slots with initialized restart_lsn.
  • Check pg_replication_slots restart_lsn value and compare it with current redo_lsn.
  • Avoid long downtime for slave servers with slots configured.
  • Please use meaningful names for slots, as that will simplify debug.

References

by Nickolay Ihalainen at November 30, 2018 05:05 PM

Jean-Jerome Schmidt

Database Backup Encryption - Best Practices

Offsite backup storage should be a critical part of any organisation’s disaster recovery plan. The ability to store data in a separate physical location, where it could survive a catastrophic event which destroys all the data in your primary data center, ensures your data survival and continuity of your organisation. A Cloud storage service is quite a good method to store offsite backups. No matter if you are using a cloud provider or if you are just copying data to an external data center, the backup encryption is a must in such cases. In one of our previous blogs, we discussed several methods of encrypting your backups. Today we will focus on some best practices around backup encryption.

Ensure that your secrets are safe

To encrypt and decrypt your data you have to use some sort of a password or a key. Depending on the encryption method (symmetrical or asymmetrical), it can be one secret for both encryption and decryption or it can be a public key for encryption and a private key for decryption. What is important, you should keep those safe. If you happen to use asymmetric encryption, you should focus on the private key, the one you will use for decrypting backups.

You can store keys in a key management system or a vault - there are numerous options on the market to pick from like Amazon’s KMS or Hashicorp’s Vault. Even if you decide not to use those solutions, you still should apply generic security practices like to ensure that only the correct users can access your keys and passwords. You should also consider preparing your backup scripts in a way that you will not expose keys or passwords in the list of running processes. Ideally, put them in the file instead of passing them as an argument to some commands.

Consider asymmetric encryption

The main difference between symmetric and asymmetric encryption is that while using symmetric encryption for both encryption and decryption, you use a single key or password. This requires higher security standards on both ends of the process. You have to make sure that the host on which you encrypt the data is very secure as a leak of the symmetric encryption key will allow the access to all of your encrypted backups.

On the other hand, if you use asymmetric encryption, you have two keys: the public key for encrypting the data and the private key for decryption. This makes things so much easier - you don’t really have to care about the public key. Even if it would be compromised, it will still not allow for any kind of access to the data from backups. You have to focus on the security of the private key only. It is easier - you are most likely encrypting backups on a daily basis (if not more frequent) while restore happens from time to time, making it feasible to store the private key in more secure location (even on a dedicated physical device). Below is a very quick example on how you can use gpg to generate a key pair and use it to encrypt data.

First, you have to generate the keys:

root@vagrant:~# gpg --gen-key
gpg (GnuPG) 1.4.20; Copyright (C) 2015 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

gpg: directory `/root/.gnupg' created
gpg: new configuration file `/root/.gnupg/gpg.conf' created
gpg: WARNING: options in `/root/.gnupg/gpg.conf' are not yet active during this run
gpg: keyring `/root/.gnupg/secring.gpg' created
gpg: keyring `/root/.gnupg/pubring.gpg' created
Please select what kind of key you want:
   (1) RSA and RSA (default)
   (2) DSA and Elgamal
   (3) DSA (sign only)
   (4) RSA (sign only)
Your selection?
RSA keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048) 4096
Requested keysize is 4096 bits
Please specify how long the key should be valid.
         0 = key does not expire
      <n>  = key expires in n days
      <n>w = key expires in n weeks
      <n>m = key expires in n months
      <n>y = key expires in n years
Key is valid for? (0)
Key does not expire at all
Is this correct? (y/N) y

You need a user ID to identify your key; the software constructs the user ID
from the Real Name, Comment and Email Address in this form:
    "Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>"

Real name: Krzysztof Ksiazek
Email address: my@backups.cc
Comment: Backup key
You selected this USER-ID:
    "Krzysztof Ksiazek (Backup key) <my@backups.cc>"

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? o
You need a Passphrase to protect your secret key.

This created both public and private keys. Next, you want to export your public key to use for encrypting the data:

gpg --armor --export my@backups.cc > mybackupkey.asc

Next, you can use it to encrypt your backup.

root@vagrant:~# xtrabackup  --backup --stream=xbstream  | gzip | gpg -e --armor -r my@backups.cc -o /backup/pgp_encrypted.backup

Finally, an example how you can use your primary key (in this case it’s stored in the local key ring) to decrypt your backups:

root@vagrant:/backup# gpg -d /backup/pgp_encrypted.backup | gunzip | xbstream -x
encryption: using gcrypt 1.6.5

You need a passphrase to unlock the secret key for
user: "Krzysztof Ksiazek (Backup key) <my@backups.cc>"
4096-bit RSA key, ID E047CD69, created 2018-11-19 (main key ID BC341551)

gpg: gpg-agent is not available in this session
gpg: encrypted with 4096-bit RSA key, ID E047CD69, created 2018-11-19
      "Krzysztof Ksiazek (Backup key) <my@backups.cc>"

Rotate your encryption keys

No matter what kind of encryption you implemented, symmetric or asymmetric, you have to think about the key rotation. First of all, it is very important to have a mechanism in place to rotate the keys. This might be useful in case of a security breach, and you would have to quickly change keys that you use for backup encryption and decryption. Of course, in case of a security breach, you need to consider what is going to happen with the old backups which were encrypted using compromised keys. They have been compromised although they still may be useful and required as per Recovery Point Objective. There are couple of options including re-encrypting them or moving them to a non-compromised localization.

Speed up the encryption process by parallelizing it

If you have an option to implement parallelization of the encryption process, consider it. Encryption performance mostly depends on the CPU power, thus allowing more CPU cores to work in parallel to encrypt the file should result in much smaller encryption times. Some of the encryption tools give such option. One of them is xtrabackup which has an option to use embedded encryption and parallelize the process.

What you are looking for is either “--encrypt-key” or “--encrypt-key-file” options which enable embedded encryption. While doing that you can also define “--encrypt-threads” and “--encrypt-chunk-size”. Second increases a working buffer for encryption, first defines how many threads should be used for encryption.

Of course, this is just one of the solutions you can implement. You can achieve this using shell tools. An example below:

root@vagrant:~# files=2 ; mariabackup --user=root --backup --pass=pass --stream=xbstream  |split -b 60M - backup ; ls backup* |  parallel -j ${files} --workdir "$(pwd)" 'echo "encrypting {}" ; openssl  enc -aes-256-cbc -salt -in "{}" -k mypass > "111{}"'

This is by no means a perfect solution as you have to know in advance how big, more or less, the backup will be to split it to predefined number of files matching the parallelization level you want to achieve (if you want to use 2 CPU cores, you should have two files, if you want to use 4 cores, 4 files etc). It also requires disk space that is twice the size of the backup, as at first it generates multiple files using split and then encryption creates another set of encrypted files. On the other hand, if your data set size is acceptable and you would like to improve encryption performance, that’s an option you can consider. To decrypt the backup you will have to decrypt each of the individual files and then use ‘cat’ to join them together.

Test your backups

No matter how you are going to implement the backup encryption, you have to test it. First of all, all backups have to be tested, encrypted or not. Backups may not be complete, or may suffer from some type of corruption. You cannot be sure that your backup can be restored until you actually perform the restore. That’s why regular backup verification is a must. Encryption adds more complexity to the backup process. Issues may show up at the encryption time, again - bugs or glitches may corrupt the encrypted files. Once encrypted, the question is then if it is possible to decrypt it and restore?

You should have a restore test process in place. Ideally, the restore test would be executed after each backup. As a minimum, you should test your backups a couple of times per year. Definitely you have to test it as soon as a change in the backup process had been introduced. Have you added compression to the backup? Did you change the encryption method? Did you rotate the encryption key? All of those actions may have some impact on your ability to actually restore your backup. Therefore you should make sure you test the whole process after every change.

ClusterControl can automate the verification process, both on demand or scheduled after every backup.

To verify an existing backup, you just need to pick the one from the list, click on “Restore” option and then go through the restore wizard. First, you need to verify which backup you want to restore.

Then, on the next step, you should pick the restore and verify option.

You need to pass some information about the host on which you want to test the restore. It has to be accessible via SSH from the ClusterControl instance. You may decide to keep the restore test server up and running (and then dump some partial data from it if you wanted to go for a partial restore) or shut it down.

The final step is all about verifying if you made the correct choices. If yes, you can start the backup verification job.

If the verification completed successfully, you will see that the backup is marked as verified on the list of the backups.

If you want to automate this process, it is also possible with ClusterControl. When scheduling the backup you can enable backup verification:

This adds another step in the backup scheduling wizard.

Here you again have to define the host which you want to use for backup restore tests, decide if you want to install the software on it (or maybe you already have it done), if you want to keep the restore server up and whether you want to test the backup immediately after it is completed or maybe you want to wait a bit.

by krzysztof at November 30, 2018 10:29 AM

November 29, 2018

Oli Sennhauser

MariaDB indexing of NULL values

In the recent MariaDB DBA advanced training class the question came up if MariaDB can make use of an index when searching for NULL values... And to be honest I was not sure any more. So instead of reading boring documentation I did some little tests:

Search for NULL

First I started with a little test data set. Some of you might already know it:

CREATE TABLE null_test (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, data VARCHAR(32) DEFAULT NULL
, ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP() 
);

INSERT INTO null_test VALUES (NULL, 'Some data to show if null works', NULL);
INSERT INTO null_test SELECT NULL, 'Some data to show if null works', NULL FROM null_test;
... up to 1 Mio rows

Then I modified the data according to my needs to see if the MariaDB Optimizer can make use of the index:

-- Set 0.1% of the rows to NULL
UPDATE null_test SET data = NULL WHERE ID % 1000 = 0;

ALTER TABLE null_test ADD INDEX (data);

ANALYZE TABLE null_test;

and finally I run the test (MariaDB 10.3.11):

EXPLAIN EXTENDED
SELECT * FROM null_test WHERE data IS NULL;
+------+-------------+-----------+------+---------------+------+---------+-------+------+----------+-----------------------+
| id   | select_type | table     | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra                 |
+------+-------------+-----------+------+---------------+------+---------+-------+------+----------+-----------------------+
|    1 | SIMPLE      | null_test | ref  | data          | data | 35      | const | 1047 |   100.00 | Using index condition |
+------+-------------+-----------+------+---------------+------+---------+-------+------+----------+-----------------------+

We can clearly see that the MariaDB Optimizer considers and uses the index and its estimation of about 1047 rows is quite appropriate.

Unfortunately the optimizer chooses the completely wrong strategy (3 times slower) for the opposite query:

EXPLAIN EXTENDED
SELECT * FROM null_test WHERE data = 'Some data to show if null works';
+------+-------------+-----------+------+---------------+------+---------+-------+--------+----------+-----------------------+
| id   | select_type | table     | type | possible_keys | key  | key_len | ref   | rows   | filtered | Extra                 |
+------+-------------+-----------+------+---------------+------+---------+-------+--------+----------+-----------------------+
|    1 | SIMPLE      | null_test | ref  | data          | data | 35      | const | 522351 |   100.00 | Using index condition |
+------+-------------+-----------+------+---------------+------+---------+-------+--------+----------+-----------------------+

Search for NOT NULL

Now let us try to test the opposite problem:

CREATE TABLE anti_null_test (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, data VARCHAR(32) DEFAULT NULL
, ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP()
);

INSERT INTO anti_null_test VALUES (NULL, 'Some data to show if null works', NULL);
INSERT INTO anti_null_test SELECT NULL, 'Some data to show if null works', NULL FROM anti_null_test;
... up to 1 Mio rows

Then I modified the data as well but this time in the opposite direction:

-- Set 99.9% of the rows to NULL
UPDATE anti_null_test SET data = NULL WHERE ID % 1000 != 0;

ALTER TABLE anti_null_test ADD INDEX (data);

ANALYZE TABLE anti_null_test;

and then we have to test again the query:

EXPLAIN EXTENDED
SELECT * FROM anti_null_test WHERE data IS NOT NULL;
+------+-------------+----------------+-------+---------------+------+---------+------+------+----------+-----------------------+
| id   | select_type | table          | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                 |
+------+-------------+----------------+-------+---------------+------+---------+------+------+----------+-----------------------+
|    1 | SIMPLE      | anti_null_test | range | data          | data | 35      | NULL | 1047 |   100.00 | Using index condition |
+------+-------------+----------------+-------+---------------+------+---------+------+------+----------+-----------------------+

Also in this case the MariaDB Optimizer considers and uses the index and produces a quite fast Query Execution Plan.

Also in this case the optimizer behaves wrong for the opposite query:

EXPLAIN EXTENDED
SELECT * FROM anti_null_test WHERE data IS NULL;
+------+-------------+----------------+------+---------------+------+---------+-------+--------+----------+-----------------------+
| id   | select_type | table          | type | possible_keys | key  | key_len | ref   | rows   | filtered | Extra                 |
+------+-------------+----------------+------+---------------+------+---------+-------+--------+----------+-----------------------+
|    1 | SIMPLE      | anti_null_test | ref  | data          | data | 35      | const | 523506 |   100.00 | Using index condition |
+------+-------------+----------------+------+---------------+------+---------+-------+--------+----------+-----------------------+
Taxonomy upgrade extras: 

by Shinguz at November 29, 2018 07:10 PM

Peter Zaitsev

Percona Server for MySQL 5.6.42-84.2 Is Now Available

Percona Server for MySQL

Percona Server for MySQL 5.6Percona announces the release of Percona Server 5.6.42-84.2 on November 29, 2018 (Downloads are available here and from the Percona Software Repositories).

Based on MySQL 5.6.42, including all the bug fixes in it, Percona Server 5.6.42-84.2 is the current GA release in the Percona Server 5.6 series. All of Percona‘s software is open-source and free.

Improvements

  • PS-4790: Improve user statistics accuracy

Bugs Fixed

  • Slave replication could break if upstream bug #74145 (FLUSH LOGS improperly disables the logging if the log file cannot be accessed) occurred in master. Bug fixed PS-1017 (Upstream #83232).
  • The binary log could be corrupted when the disk partition used for temporary. files (tmpdir system variable) had little free space. Bug fixed PS-1107 (Upstream #72457).
  • PURGE CHANGED_PAGE_BITMAPS did not work when the innodb_data_home_dir system variable was used. Bug fixed PS-4723.
  • Setting the tokudb_last_lock_timeout variable via the command line could cause the server to stop working when the actual timeout took place. Bug fixed PS-4943.
  • Dropping TokuDB table with non-alphanumeric characters could lead to a crash. Bug fixed PS-4979.

Other bugs fixed

  • PS-4781: sql_yacc.yy uses SQLCOM_SELECT instead of SQLCOM_SHOW_XXXX_STATS
  • PS-4529: MTR: index_merge_rocksdb2 inadvertently tests InnoDB instead of MyRocks
  • PS-4746: Revert our fix for PS-3851 (Percona Ver 5.6.39-83.1 Failing assertion: sym_node->table != NULL)
  • PS-4773: Percona Server sources can’t be compiled without server
  • PS-4785: Setting version_suffix to NULL leads to handle_fatal_signal (sig=11) in Sys_var_version::global_value_ptr
  • PS-4813: Using flush_caches leads to SELinux denial errors
  • PS-4881: Add LLVM/clang 7 to Travis-CI

Find the release notes for Percona Server for MySQL 5.6.42-84.2 in our online documentation. Report bugs in the Jira bug tracker.

 

by Borys Belinsky at November 29, 2018 05:21 PM

Jean-Jerome Schmidt

Cloud Backup Options for MySQL & MariaDB Databases

The principal objective of backing up your data is, of course, the ability to roll back and access your archives in case of hardware failure. To do business today, you need the certainty of knowing that in the case of disaster, your data will be protected and accessible. You would need to store your backups offsite, in case your datacenter goes down in flames.

Data protection remains a challenge for small and medium-sized businesses. Small-to-medium sized businesses prefer to archive their company’s data using direct-attached storage, with the majority of firms having plans to do offsite backup copies. Local storage approach can lead to one of the most severe dilemmas the modern company can face - loss of data in case of disaster.

Many factors come into deliberation when judging on whether to allow a business critical database to be transferred offsite, and when choosing a suitable vendor to do so. Traditional methods like writing to tape and shipping to a remote location can be a complicated process that requires special hardware, adequately trained staff and procedures to ensure that backups are regularly produced, protected and that the information contained in them is verified for integrity. Small businesses usually have small IT budgets. Often they can not afford to have a secondary datacenter, even if they have a dedicated data center. But nevertheless, it is still important to keep a copy of your backup files offsite. Disasters like hurricane, flood, fire or theft can destroy your servers and storage. Keeping backed up data in the separate data center ensures data is safe, no matter what is going on in your primary datacenter. Cloud storage is a great way of addressing this problem.
With the cloud backup approach, there are a number of factors to consider. Some of the questions you have are:

  • Is backed-up data secured at rest in the external data center?
  • Is transfer to or from the external data center through the public internet network safe?
  • Is there an effect on RTO (Recovery Time Objective)?
  • Is the backup and recovery process easy enough for our IT staff?
  • Are there any changes required to existing processes?
  • Are the 3rd party backup tools needed?
  • What are the additional costs in terms of required software or data transfer?
  • What are the storage costs?

Backup features when doing a backup to the cloud

If your MySQL server or backup destination is located in an exposed infrastructure like a public cloud, hosting provider or connected through an untrusted WAN network, you need to think about additional actions in your backup policy. There are few different ways to perform database backups for MySQL, and depending on the type of backup, recovery time, size, and infrastructure options will vary. Since many of the cloud storage solutions are simply storage with different API front ends, any backup solution can be performed with a bit of scripting. So what are the options we have to make process smooth and secure?

Encryption

It is always a good idea to enforce encryption to enhance the security of backup data. A simple use case to implement encryption is where you want to push the backup to an offsite backup storage located in the public cloud.

When creating an encrypted backup, one thing to have in mind is that it usually takes more time to recover. The backup has to be decrypted before any recovery activities. With a big dataset, this could introduce some delays to the RTO.

On the other hand, if you are using private key for encryption, make sure to store the key in a safe place. If the private key is missing, the backup will be useless and unrecoverable. If the key is stolen, all created backups that use the same key would be compromised as they are no longer secured. You can use the popular GnuPG or OpenSSL to generate the private or public keys.
To perform mysqldump encryption using GnuPG, generate a private key and follow the wizard accordingly:

$ gpg --gen-key

Create a plain mysqldump backup as usual:

$ mysqldump --routines --events --triggers --single-transaction db1 | gzip > db1.tar.gz

Encrypt the dump file and remove the older plain backup:

$ gpg --encrypt -r ‘admin@email.com’ db1.tar.gz
$ rm -f db1.tar.gz

GnuPG will automatically append .gpg extension on the encrypted file. To decrypt,
simply run the gpg command with --decrypt flag:

$ gpg --output db1.tar.gz --decrypt db1.tar.gz.gpg

To create an encrypted mysqldump using OpenSSL, one has to generate a private key and a public key:
OpenSSL req -x509 -nodes -newkey rsa:2048 -keyout dump.priv.pem -out dump.pub.pem

This private key (dump.priv.pem) must be kept in a safe place for future decryption. For mysqldump, an encrypted backup can be created by piping the content to openssl, for example

mysqldump --routines --events --triggers --single-transaction database | openssl smime -encrypt -binary -text -aes256
-out database.sql.enc -outform DER dump.pub.pem

To decrypt, simply use the private key (dump.priv.pem) alongside the -decrypt flag:
openssl smime -decrypt -in database.sql.enc -binary -inform

DEM -inkey dump.priv.pem -out database.sql

Percona XtraBackup can be used to encrypt or decrypt local or streaming backups with xbstream option to add another layer of protection to the backups. Encryption is done with the libgcrypt library. Both --encrypt-key option and --encryptkey-file option can be used to specify the encryption key. Encryption keys can be generated with commands like

$ openssl rand -base64 24
$ bWuYY6FxIPp3Vg5EDWAxoXlmEFqxUqz1

This value then can be used as the encryption key. Example of the innobackupex command using the --encrypt-key:

$ innobackupex --encrypt=AES256 --encrypt-key=”bWuYY6FxIPp3Vg5EDWAxoXlmEFqxUqz1” /storage/backups/encrypted

The output of the above OpenSSL command can also be redirected to a file and can be treated as a key file:

openssl rand -base64 24 > /etc/keys/pxb.key

Use it with the --encrypt-key-file option instead:

innobackupex --encrypt=AES256 --encrypt-key-file=/etc/keys/pxb.key /storage/backups/encrypted

To decrypt, simply use the --decrypt option with appropriate --encrypt-key or --encrypt-key-file:

$ innobackupex --decrypt=AES256 --encrypt-key=”bWuYY6FxIPp3Vg5EDWAxoXlmEFqxUqz1”
/storage/backups/encrypted/2018-11-18_11-10-09/

For more information about MySQL and MariaDB encryption, please check our another blog post.

Compression

Within the database cloud backup world, compression is one of your best friends. It can not only save storage space, but it can also significantly reduce the time required to download/upload data.
There are lots of compression tools available out there, namely gzip, bzip2, zip, rar, and 7z.
Normally, mysqldump can have best compression rates as it is a flat text file. Depending on the compression tool and ratio, a compressed mysqldump can be up to 6 times smaller than the original backup size. To compress the backup, you can pipe the mysqldump output to a compression tool and redirect it to a destination file. You can also skip several things like comments, lock tables statement (if InnoDB), skip GTID purged and triggers:

mysqldump --single-transaction --skip-comments --skip-triggers --skip-lock-tables --set-gtid-purged OFF --all-databases | gzip > /storage/backups/all-databases.sql.gz

With Percona Xtrabackup, you can use the streaming mode (innobackupex), which sends the backup to STDOUT in special tar or xbstream format instead of copying files to the backup directory. Having a compressed backup could save you up to 50% of the original backup size, depending on the dataset. Append the --compress option in the backup command. By using the xbstream in streaming backups, you can speed up the compression process by using the --compress-threads option. This option specifies the number of threads created by xtrabackup for parallel data compression. The default value for this option is 1. To use this feature, add the option to a local backup. An example backup with compression:

innobackupex --stream=xbstream --compress --compress-threads=4 > /storage/backups/backup.xbstream

Before applying logs during the preparation stage, compressed files will need to be
decompressed using xbstream:
Then, use qpress to extract each file ending with .qp in their respective directory before
running --apply-log command to prepare the MySQL data.

$ xbstream -x < /storage/backups/backup.xbstream

Limit network throughput

An great option for cloud backups is to limit network streaming bandwidth (Mb/s) when doing a backup. You can achieve that with pv tool. The pv utility comes with data modifiers option -L RATE, --rate-limit RATE which limit the transfer to a maximum of RATE bytes per second. Below example will restrict it to 2MB/s.

$ pv -q -L 2m

In below example, you can see xtrabackup with parallel gzip, encryption

 /usr/bin/innobackupex --defaults-file=/etc/mysql/my.cnf --galera-info --parallel 4 --stream=xbstream --no-timestamp . | pv -q -L 2m | pigz -9 - | openssl enc -aes-256-cbc -pass file:/var/tmp/cmon-008688-19992-72450efc3b6e9e4f.tmp > /home/ubuntu/backups/BACKUP-3445/backup-full-2018-11-28_213540.xbstream.gz.aes256 ) 2>&1.

Transfer backup to Cloud

Now when your backup is compressed and encrypted, it is ready for transfer.

Google cloud

The gsutil command line tool is used to manage, monitor and use your storage buckets on Google Cloud Storage. If you already installed the gcloud util, you already have the gsutil installed. Otherwise, follow the instructions for your Linux distribution from here.

To install the gcloud CLI you can follow below procedure:

curl https://sdk.cloud.google.com | bash

Restart your shell:

exec -l $SHELL

Run gcloud init to initialize the gcloud environment:

gcloud init

With the gsutil command line tool installed and authenticated, create a regional storage bucket named mysql-backups-storage in your current project.

gsutil mb -c regional -l europe-west1 gs://severalnines-storage/
Creating gs://mysql-backups-storage/

Amazon S3

If you are not using RDS to host your databases, it is very probable that you are doing your own backups. Amazon’s AWS platform, S3 (Amazon Simple Storage Service) is a data storage service that can be used to store database backups or other business critical files. Either it’s Amazon EC2 instance or your on-prem environment you can use the service to secure your data.

While backups can be uploaded through the web interface, the dedicated s3 command line interface can be used to do it from the command line and through backup automation scripts. If backups are to be kept for a very long time, and recovery time isn’t a concern, backups can be transferred to Amazon Glacier service, providing much cheaper long-term storage. Files (amazon objects) are logically stored in a huge flat container named bucket. S3 presents a REST interface to its internals. You can use this API to perform CRUD operations on buckets and objects, as well as to change permissions and configurations on both.

The primary distribution method for the AWS CLI on Linux, Windows, and macOS is pip, a package manager for Python. Instruction can be found here.

aws s3 cp severalnines.sql s3://severalnine-sbucket/mysql_backups

By default S3 provides eleven 9s object durability. It means that if you store 1.000.000.000 (1 billion) objects into it, you can expect to lose 1 object every 10 years on average. The way S3 achieves that impressive number of 9s is by replicating the object automatically in multiple Availability Zones, which we’ll talk about in another post. Amazon has regional datacenters all around the world.

Microsoft Azure Storage

Microsoft’s public cloud platform, Azure, has storage options with their control line interface. Information can be found here. The open-source, cross-platform Azure CLI provides a set of commands for working with the Azure platform. It gives much of the functionality seen in the Azure portal, including rich data access.

The installation of Azure CLI is fairly simple, you can find instructions here. Below you can find how to transfer your backup to Microsoft storage.

az storage blob upload --container-name severalnines --file severalnines.sql --name severalnines_backup
ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Hybrid Storage for MySQL and MariaDB backups

With the growing public and private cloud storage industry, we have a new category called hybrid storage. This technology allows the files to be stored locally, with changes automatically synced to remote in the cloud. Such an approach is coming from the need of having recent backups stored locally for fast restore (lower RTO), as well as business continuity objectives.
The important aspect of efficient resource usage is to have separate backup retentions. Data that is stored locally, on redundant disk drives would be kept for a shorter period while cloud backup storage would be held for a longer time. Many times the requirement for longer backup retention comes from legal obligations for different industries (like telecoms having to store connection metadata). Cloud providers like Google Cloud Services, Microsoft Azure and Amazon S3 each offer virtually unlimited storage, decreasing local space needs. It allows you to retain your backup files longer, for as long as you would like and not have concerns around local disk space.

ClusterControl backup management - hybrid storage
ClusterControl backup management - hybrid storage

When scheduling backup with ClusterControl, each of the backup methods are configurable with a set of options on how you want the backup to be executed. The most important for the hybrid cloud storage would be:

  • Network throttling
  • Encryption with the build in key management
  • Compression
  • Retention period for the local backups
  • Retention period for the cloud backups
ClusterControl dual backup retention
ClusterControl dual backup retention
ClusterControl advanced backup features for cloud, parallel compression, network bandwitch limit, encryption etc ...
ClusterControl advanced backup features for cloud, parallel compression, network bandwitch limit, encryption etc ...

Conclusion

The cloud has changed the data backup industry. Because of its affordable price point, smaller businesses have an offsite solution that backs up all of their data.

Your company can take advantage of cloud scalability and pay-as-you-go pricing for growing storage needs. You can design a backup strategy to provide both local copies in the datacenter for immediate restoration, and a seamless gateway to cloud storage services from AWS, Google and Azure.

Advanced TLS and AES 256-bit encryption and compression features support secure backups that take up significantly less space in the cloud.

by Bart Oles at November 29, 2018 03:26 PM

Peter Zaitsev

MySQL High Availability: Stale Reads and How to Fix Them

solutions for MySQL Stale Reads

solutions for MySQL Stale ReadsContinuing on the series of blog posts about MySQL High Availability, today we will talk about stale reads and how to overcome this issue.

The Problem

Stale reads is a read operation that fetches an incorrect value from a source that has not synchronized an update operation to the value (source Wiktionary).

A practical scenario is when your application applies INSERT or UPDATE data to your master/writer node, and has to read it immediately after. If this particular read is served from another server in the replication/cluster topology, the data is either not there yet (in case of an INSERT) or it still provides the old value (in case of an UPDATE).

If your application or part of your application is sensitive to stale reads, then this is something to consider when implementing HA/load balancing.

How NOT to fix stale reads

While working with customers, we have seen a few incorrect attempts to fix the issue:

SELECT SLEEP(X)

The most common incorrect approach that we see in Percona support is when customers add a sleep between the write and the read. This may work in some cases, but it’s not 100% reliable for all scenarios, and it can add latency when there is no need.

Let’s review an example where by the time you query your slave, the data is already applied and you have configured your transaction to start with a SELECT SLEEP(1). In this case, you just added 1000ms latency when there was no need for it.

Another example could be when the slave is lagging behind for more than whatever you configured as the parameter on the sleep command. In this case, you will have to create a login to keep trying the sleep until the slave has received the data: potentially it could take several seconds.

Reference: SELECT SLEEP.

Semisync replication

By default, MySQL replication is asynchronous, and this is exactly what causes the stale read. However, MySQL distributes a plugin that can make the replication semi-synchronous. We have seen customers enabling it hoping the stale reads problem will go away. In fact, that is not the case. The semi-synchronous plugin only ensures that at least one slave has received it (IO Thread has streamed the binlog event to relay log), but the action of applying the event is done asynchronously. In other words, stale reads are still a problem with semi-sync replication.

Reference: Semisync replication.

How to PROPERLY fix stale reads

There are several ways to fix/overcome this situation, and each one has its pros and cons:

1) MASTER_POS_WAIT

Consists of executing a SHOW MASTER STATUS right after your write, getting the binlog file and position, connecting on a slave, and executing the SELECT MASTER_POS_WAIT function, passing the binlog file and position as parameters. The execution will block until the slave has applied the position via the function. You can optionally pass a timeout to exit the function in case of exceeding this timeout.

Pros:

  • Works on all MySQL versions
  • No prerequisites

Cons:

  • Requires an application code rewrite.
  • It’s a blocking operation, and can add significant latency to queries in cases where a slave/node is too far behind.

Reference: MASTER_POS_WAIT.

2) WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS

Requires GTID: this is similar to the previous approach, but in this case, we need to track the executed GTID from the master (also available on SHOW MASTER STATUS).

Pros:

  • Works on all MySQL versions.

Cons:

  • Requires an application code rewrite.
  • It’s a blocking operation, can add significant latency to queries in cases where a slave/node is too far behind.
  • As it requires GTID, it only works on versions from 5.6 onwards.

Reference: WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS

3) Querying slave_relay_log_info

Consists of enabling relay_log_info_repository=TABLE and sync_relay_log_info=1 on the slave, and using a similar approach to option 1. After the write, execute  SHOW MASTER STATUS, connect to the slave, and query mysql.slave_relay_log_info , passing the binlog name and position to verify if the slave is already applying a position after the one you got from SHOW MASTER STATUS.

Pros:

  • This is not a blocking operation.
  • In cases where the slave is missing the position you require, you can try to connect to another slave and repeat the process. There is even an option to fail over back to the master if none of the slaves have the said position.

Cons:

  • Requires an application code rewrite.
  • In cases of checking multiple slaves, this can add significant latency.

Reference: slave_relay_log_info.

4) wsrep-sync-wait

Requires Galera/Percona XtraDB Cluster: Consists of setting a global/session variable to enforce consistency. This will block execution of subsequent queries until the node has applied all write-sets from it’s applier queue. It can be configured to trigger on multiple commands, such as SELECT, INSERT, and so on.

Pros:

  • Easy to implement. Built-in as a SESSION variable.

Cons:

  • Requires an application code rewrite in the event that you want to implement the solution on per session basis.
  • It’s a blocking operation, and can add significant latency to queries if a slave/node is too far behind.

Reference: wsrep-sync-wait

5) ProxySQL 2.0 GTID consistent reads

Requires MySQL 5.7 and GTID: MySQL 5.7 returns the GTID generated by a commit as part of the OK package. ProxySQL with the help of binlog readers installed on MySQL servers can keep track of which GTID the slave has already applied. With this information + the GTID received from the OK package at the moment of the write, ProxySQL will decide if it will route a subsequent read to one of the slaves/read nodes or if the master/write node will serve the read.

Pros:

  • Transparent to the application – no code changes are required.
  • Adds minimal latency.

Cons:

  • This still a new feature of ProxySQL 2.0, which is not yet GA.

Referece: GTID consistent reads.

Conclusions

Undesirable issues can arise from adding HA and distributing the load across multiple servers. Stale reads can cause an impact on applications sensitive to them. We have demonstrated various approaches you can use to overcome them.


Photo by Tim Evans on Unsplash

by Marcelo Altmann at November 29, 2018 02:51 PM

November 28, 2018

Peter Zaitsev

What Happens If You Set innodb_open_files Higher Than open_files_limit?

MySQL innodb_open_files open_file_limit settings

MySQL innodb_open_files open_file_limit settingsThe settings of MySQL configuration variables have a fundamental impact on the performance of your database system. Sometimes it can be a little tricky to predict how changing one variable can affect others, and especially when dealing with cases like the one I’ll describe in this post, where the outcome is not very intuitive. So here, we’ll look at what happens when you set innodb_open_files higher than the open_files_limit.

We can set the maximum number of open files in our MySQL configuration file using:

open_files_limit=10000

If this isn’t set, then the default – which is 5,000 in MySQL 5.7 – should be used.

See Sveta’s excellent blog post for an explanation of how to change the open file limit; if this value is set it will take the SystemD LIMIT_NOFILES unless it’s set to infinity (and on CentOS 7 it will then use 65536,  though much higher values are possible if specified manually):

[root@centos7-pxc57-3 ~]# grep open_files_limit /etc/my.cnf
open_files_limit=10000
[root@centos7-pxc57-3 ~]# grep LimitNOFILE /lib/systemd/system/mysqld.service.d/limit_nofile.conf
LimitNOFILE=infinity
[root@centos7-pxc57-3 ~]# mysql -e “SELECT @@open_files_limit”
+--------------------+
| @@open_files_limit |
+--------------------+
| 65536              |
+--------------------+
[root@centos7-pxc57-3 ~]# perl -pi -e ’s/LimitNOFILE=infinity/LimitNOFILE=20000/‘ /lib/systemd/system/mysqld.service.d/limit_nofile.conf && systemctl daemon-reload && systemctl restart mysqld
[root@centos7-pxc57-3 ~]# mysql -e “SELECT @@open_files_limit”
+--------------------+
| @@open_files_limit |
+--------------------+
| 20000              |
+--------------------+
[root@centos7-pxc57-3 ~]# perl -pi -e ’s/LimitNOFILE=20000/LimitNOFILE=5000/‘ /lib/systemd/system/mysqld.service.d/limit_nofile.conf && systemctl daemon-reload && systemctl restart mysqld
[root@centos7-pxc57-3 ~]# mysql -e “SELECT @@open_files_limit”
+--------------------+
| @@open_files_limit |
+--------------------+
| 5000               |
+--------------------+

As you can see above, MySQL cannot set the value of open_files_limit higher than the system is configured to allow, and open_files_limit will default back to the maximum if it’s set too high.

That seems pretty straightforward, but what isn’t quite as obvious is how that affects innodb_open_files. The innodb_open_files value configures how many .ibd files MySQL can keep open at any one time.

As this obviously requires files to be open, it should be no higher than the open_files_limit (and should be lower). If we try to set it higher as per this example, MySQL will print a warning in the log file:

[root@centos7-pxc57-3 ~]# grep innodb_open_files /var/log/mysqld.log
2018-09-21T08:31:06.002120Z 0 [Warning] InnoDB: innodb_open_files should not be greater than the open_files_limit.

What the warning doesn’t state is that the value is being lowered. Not to the maximum value allowed though:

[root@centos7-pxc57-3 ~]# mysql -e “SELECT @@innodb_open_files”
+---------------------+
| @@innodb_open_files |
+---------------------+
| 2000                |
+---------------------+

2000? Why 2000?

It’s because if we set innodb_open_files too high, it reverts back to the default value, which per the documentation is:

300 if innodb_file_per_table is not enabled, and the higher of 300 and table_open_cache otherwise. Before 5.6.6, the default value is 300.

And table_open_cache? Well that defaults to 400 for versions of MySQL up to 5.6.7, and 2000 for 5.6.8 onwards.

Note that table_open_cache is another setting completely. innodb_open_files controls the number of InnoDB files (.ibd) the server can keep open at once; whilst table_open_cache controls the number of table definition (.frm) files the server can have open at once.

 

Photo by Logan Kirschner from Pexels

by James Lawrie at November 28, 2018 01:46 PM

November 27, 2018

Peter Zaitsev

Setup Compatible OpenLDAP Server for MongoDB and MySQL

Set up LDAP authentication for MySQL and MongoDB

Set up LDAP authentication for MySQL and MongoDBBy the end of this article, you should be able to have a Percona Server for MongoDB and Percona Server for MySQL instance able to authenticate on an OpenLDAP backend. While this is mostly aimed at testing scenarios, it can be easily extended for production by following the OpenLDAP production best practices i.e. attending to security and high availability.

The first step is to install OpenLDAP via the

slapd
  package in Ubuntu.

sudo apt update
sudo apt install slapd ldap-utils

During installation, it will ask you for a few things listed below:

  • DNS Domain Name:
    ldap.local
  • Organization Name:
    Percona
  • Administrator password:
    percona

All these values are arbitrary, you can choose whatever suits your organization—especially the password.

Once

slapd
  is running, we can create our logical groups and actual users on the LDAP server. To make it simple, we use LDIF files instead of GUIs. Our first file,
perconadba.ldif
 contains our
perconadba
  group definition. Take note of the root name part
dc=ldap,dc=local
  it is simply the broken down value of our DNS Domain Name during the installation of
slapd
 .

dn: ou=perconadba,dc=ldap,dc=local
objectClass: organizationalUnit
ou: perconadba

We can add this definition into LDAP with the command shown below. With the

-W
  option, it will prompt you for a password.

ldapadd -x -W -D "cn=admin,dc=ldap,dc=local" -f perconadba.ldif

The next step is to create our user in LDAP, this user will be looked up by both MongoDB and MySQL during authentication to verify their password. Our LDIF file (

percona.ldif
 ) would look like this:

dn: uid=percona,ou=perconadba,dc=ldap,dc=local
objectClass: top
objectClass: account
objectClass: posixAccount
objectClass: shadowAccount
cn: percona
uid: percona
uidNumber: 1100
gidNumber: 100
homeDirectory: /home/percona
loginShell: /bin/bash
gecos: percona
userPassword: {crypt}x
shadowLastChange: -1
shadowMax: -1
shadowWarning: -1

The

-1
  values for the
shadow*
  fields are important, we set them to negative to mean the password shadow does not expire. If these are set to zero (0), then MySQL will not be able to authenticate since PAM will complain that the password has expired and needs to be changed.

We can then add this user into LDAP, again the command below will ask for the admin password we entered during slapd’s installation.

ldapadd -x -W -D "cn=admin,dc=ldap,dc=local" -f percona.ldif

To verify, we can search for the user we just entered using the command below. Notice we used the -w parameter to specify the admin password inline.

ldapsearch -x -D 'cn=admin,dc=ldap,dc=local' -w percona \
	-b 'ou=perconadba,dc=ldap,dc=local' '(uid=percona)'

As last step on setting up our LDAP user properly is to give it a valid password. The -s parameter below is the actual password we will set for this user.

ldappasswd -s percona -D "cn=admin,dc=ldap,dc=local" -w percona \
	-x "uid=percona,ou=perconadba,dc=ldap,dc=local"

At this point you should have a generic LDAP server that should work for both MongoDB and MySQL.

PAM Configuration for MySQL

To make this work for a MySQL and support PAM authentication, take note of the following configuration files. Instructions on setting up PAM for MySQL is aplenty on this blog I just need to specify Ubuntu Bionic specific configuration files to make it work.

/etc/nslcd.conf

The only important difference with this configuration—compared to Jaime’s post for example—is the values for

filter
 . If you are using Windows Active Directory, the map values are also important (posixAccount objectClass has been deprecated on recent release of Windows Active Directory).

uid nslcd
gid nslcd
uri ldap:///localhost
base ou=perconadba,dc=ldap,dc=local
filter passwd (&(objectClass=account)(objectClass=posixAccount))
filter group (&(objectClass=shadowAccount)(objectClass=account))
map    passwd uid           uid
map    passwd uidNumber     uidNumber
map    passwd gidNumber     gidNumber
map    passwd homeDirectory "/home/$uid"
map    passwd gecos         uid
map    passwd loginShell    "/bin/bash"
map    group gidNumber      gidNumber
binddn cn=admin,dc=ldap,dc=local
bindpw percona
tls_cacertfile /etc/ssl/certs/ca-certificates.crt

/etc/nsswitch.conf

Also for nsswitch.conf, make sure that passwd, group and shadow does LDAP lookups.

...
passwd:         compat systemd ldap
group:          compat systemd ldap
shadow:         compat systemd ldap
gshadow:        files ldap
...

SASL for MongoDB

Adamo’s excellent post on MongoDB LDAP Authentication has all the details on configuring MongoDB itself. To complement that, if you use this LDAP test setup, you need the take note of the following configuration files with specific differences.

/etc/mongod.conf

In the

mongod.conf
  configuration file, I explicitly added the saslauthd socket path.

security:
  authorization: enabled
setParameter:
  saslauthdPath: /var/run/saslauthd/mux
  authenticationMechanisms: PLAIN,SCRAM-SHA-1

/etc/saslauthd.conf

For the saslauthd daemon configuration, the configuration has no actual difference – just take note I used differing values based on the LDAP setup above. Specifically, the

ldap_filter
  and
ldap_search_base
  are key options here which are concatenated during an LDAP search to come up with the
percona
  user’s account information.

ldap_servers: ldap://localhost:389/
ldap_search_base: ou=perconadba,dc=ldap,dc=local
ldap_filter: (uid=%u)
# Optional: specify a user to perform ldap queries
ldap_bind_dn: CN=admin,DC=ldap,DC=local
# Optional: specify ldap user’s passwordi
ldap_password: percona

Enterprise quality features should not be complex and expensive. Tell us about your experience with our software and external authentication in the comments below!

by Jervin Real at November 27, 2018 01:48 PM

Jean-Jerome Schmidt

Database Backups - Comparing MariaDB Mariabackup and Percona Xtrabackup

Your database server stores some of your enterprise’s most valuable information. Guaranteeing reliable database backups to prevent data loss in the event of an accident or hardware failure is a critical checkbox.

Whether it is a 24x7 highly loaded server or a low-transaction-volume environment, you will be in the need of making backups a seamless procedure without disrupting the performance of the server in a production environment.

In this blog, we are going to review two of the most used tools to accomplish this task, namely Percona XtraBackup and Mariabackup. We will review the similarities and differences between the two of them, and also how to use them.

What is Percona XtraBackup?

Percona XtraBackup is an open source tool for performing backups of MariaDB, MySQL and Percona Server databases. It performs online non-blocking (for the supported engines*), tightly compressed and secure full backups on transactional systems so that applications remain fully available for the duration of the backup window.

By using this tool you can:

  • Create hot InnoDB backups, that complete quickly and reliably, without pausing your database or adding load to the server
  • Make incremental backups
  • Move tables between MySQL servers on-line
  • Create new MySQL replication slaves easily
  • Stream compressed MySQL backups to another server
  • Save on disk space and network bandwidth

What is Mariabackup?

Mariabackup is an open source tool provided by MariaDB for performing physical online backups. It is a fork of Percona XtraBackup designed to work with encrypted and compressed tables, and is the recommended backup method for MariaDB databases.

MariaDB Server 10.1 introduced MariaDB Compression and Data-at-Rest Encryption, but the existing backup solutions did not support full backup capability for these features. So MariaDB decided to extend XtraBackup (version 2.3.8) and named this solution Mariabackup.

Differences Between Percona XtraBackup and Mariabackup

As we noted earlier, Mariabackup is the recommended backup tool for MariaDB, and the main difference from XtraBackup is that it works with encrypted and compressed tables.

Anyway, if for any particular reason you want to use XtraBackup for your MariaDB database, there are some points to take into account depending on the MariaDB server version you have:

  • MariaDB 10.1: With uncompressed and unencrypted MariaDB data, you can use XtraBackup. If encryption or compression is used, or when innodb_page_size is set to some value other than 16K it will not work.
  • MariaDB 10.2: You might also want to try to use XtraBackup, but be aware that problems are likely due to the MySQL 5.7 undo log format incompatibility bug that was fixed in MariaDB 10.2.2. Due to this bug, backups prepared with XtraBackup may fail to recover some transactions. Only if you run the server with the setting innodb_undo_logs=1 this would not be a problem.
  • MariaDB 10.3 and later: This case is more simple. XtraBackup is not compatible.

Also, there are some limitations to take into account when using Mariabackup:

  • MyRocks: Starting with MariaDB 10.2.16 and MariaDB 10.3.8, Mariabackup will backup MyRocks Storage Engine data. Partial backup of MyRocks data is currently not supported. Incremental backup will store a full copy of MyRocks data.
  • Export file functionality: Before MariaDB 10.2.9, Mariabackup did not support the --export functionality (it creates an export file to export data from the database). You can workaround this limitation by preparing the backup as usual (without the --export flag), then start the server and execute FLUSH TABLES FOR EXPORT.
  • Log files: Mariabackup versions until 10.2.8 do not create empty log files and relies on the --copy-back action executed by user (which deletes old innodb log files, if any). If the user doesn't use --copy-back or makes sure that the data directory is empty before restoring, the backups created with these versions may well become inconsistent/corrupted (because of the presence of leftover InnoDB logs).
  • Gcrypt: Backup tool based encryption (gcrypt) is not supported on Mariabackup.
  • Innobackupex option: No symlink to innobackupex (use the --innobackupex parameter instead). The innobackupex tool patches and provides additional features over the innobackup tool for backing up InnoDB and MyISAM tables.
  • Compact option: --compact option is not supported.
  • Rebuild indexes option: --rebuild_indexes option is not supported.
  • Tar for backup files: Support for --stream=tar was removed in Mariabackup 10.1.24 (The --streams options streams backup files to stdout).

At last but not least, Mariabackup can be installed on Windows.

Backup Process
Backup Process
Restore Process
Restore Process

How to - Percona XtraBackup and Mariabackup

Let's see how we can install and use it.

Installation

You have different methods to install both XtraBackup and Mariabackup. Let's try the installation from repositories.

XtraBackup Installation

On Debian/Ubuntu
$ wget https://repo.percona.com/apt/percona-release_0.1-6.$(lsb_release -sc)_all.deb
$ sudo dpkg -i percona-release_0.1-6.$(lsb_release -sc)_all.deb
$ sudo apt-get update
$ sudo apt-get install percona-xtrabackup-24
On RedHat/CentOS
$ sudo yum install http://www.percona.com/downloads/percona-release/redhat/0.1-6/percona-release-0.1-6.noarch.rpm
$ sudo yum install percona-xtrabackup-24

Mariabackup Installation

On Debian / Ubuntu

Mariabackup is part of MariaDB Server starting with MariaDB 10.1.23.

$ sudo apt-get install software-properties-common
$ sudo apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8
$ sudo add-apt-repository 'deb [arch=amd64,arm64,ppc64el] http://nyc2.mirrors.digitalocean.com/mariadb/repo/10.1/ubuntu bionic main'
$ sudo apt-get update
$ sudo apt-get install mariadb-server-10.1
On CentOS / RedHat
$ sudo vi /etc/yum.repos.d/MariaDB.repo
[mariadb]
name=MariaDB
baseurl=http://yum.mariadb.org/10.1/centos7-amd64
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1
$ sudo yum install MariaDB-backup

Configuration

Both Xtrabackup and Mariabackup read the [mysqld] and [xtrabackup] sections of any MySQL configuration file, in that order. In this way, it can read MySQL parameters, such as datadir or InnoDB parameters.

We can modify the parameters included in the [mysqld] section by modifying their value in [xtrabackup], as we mentioned before, they are read in order, so the last thing we have in [xtrabackup] will have priority.

[mysqld]
datadir=/data/datadir
[xtrabackup]
target_dir=/backups/

The user with the minimum privileges required to full backups would be RELOAD, LOCK TABLES, PROCESS and REPLICATION CLIENT:

mysql> CREATE USER 'backupuser'@'localhost' IDENTIFIED BY 'Password';
mysql> GRANT RELOAD, LOCK TABLES, PROCESS, REPLICATION CLIENT ON *.* TO 'backupuser'@'localhost';

And then, you can add this user in the MySQL configuration files:

[xtrabackup]
user=backupuser
password=Password

Also, you can use Xtrabackup or Mariabackup to perform the State Snapshot Transfers when you are using a Percona XtraDB Cluster or MariaDB Galera Cluster. You must set the wsrep_sst_method and the wsrep_sst_auth variables in the configuration files:

[mysqld]
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=backupuser:Password

Or

[mysqld]
wsrep_sst_method=mariabackup
wsrep_sst_auth=backupuser:Password

Usage

Since Mariabackup is based on XtraBackup, it can be used similarly.

Now let's see an example using both methods to create, prepare and restore a full backup.

Creating a backup

To create a new backup with XtraBackup or Mariabackup, you need to add the --backup and --target-dir options to the command line:

$ xtrabackup --backup --target-dir=/backups/

Or

$ mariabackup --backup --target-dir=/backups/

The target dir, where the backup will be stored, can be specified in the MySQL configuration files. The backup process will not overwrite existing files. If the file exists the backup will fail.

Transaction log of lsn (9755450) to (9755467) was copied.
181122 23:02:44 completed OK!

If everything went fine, the last line that you see should be "completed OK!". You can cancel the backup at any time, as it doesn't modify the database's content.

[root@MySQL1 ~]# ls -l /backups/
total 102448
-rw-r----- 1 root root       488 Nov 22 23:02 backup-my.cnf
-rw-r----- 1 root root       482 Nov 22 23:02 ib_buffer_pool
-rw-r----- 1 root root 104857600 Nov 22 23:02 ibdata1
drwxr-x--- 2 root root      4096 Nov 22 23:02 mysql
drwxr-x--- 2 root root      4096 Nov 22 23:02 performance_schema
drwxr-x--- 2 root root      4096 Nov 22 23:02 sakila
drwxr-x--- 2 root root     12288 Nov 22 23:02 sys
-rw-r----- 1 root root        64 Nov 22 23:02 xtrabackup_binlog_info
-rw-r----- 1 root root       113 Nov 22 23:02 xtrabackup_checkpoints
-rw-r----- 1 root root       533 Nov 22 23:02 xtrabackup_info
-rw-r----- 1 root root      2560 Nov 22 23:02 xtrabackup_logfile

This should be the content of your backup. It could change depending on your databases.

Preparing a backup

When you have your backup created with XtraBackup or Mariabackup, you need to prepare it to be restored. Data files aren't consistent until they’ve been prepared, because they were copied at different times during the duration of the backup. If you try to restore it and start your database, it will detect corruption and crash itself to prevent you from running on inconsistent data.

To prepare the backup, you need to run the command xtrabackup or mariabackup with --prepare option and specify the target dir where is stored the backup.

$ xtrabackup --prepare --target-dir=/backups/

Or

$ mariabackup --prepare --target-dir=/backups/
InnoDB: Shutdown completed; log sequence number 9757224
181122 23:05:29 completed OK!

The last line that you see should be "Shutdown completed; log sequence number xxxxxxx" and "completed OK!" if everything went fine. It's not recommended to cancel the prepare process because it may cause data file corruption and the backup will become unusable.

If you want to use this backup with an incremental backup later, you should use the --apply-log-only option when preparing it, or you will not be able to do it.

Restoring a Backup

After preparing the backup, you can use the restore option with the parameters --copy-back or --move-back, to copy or move the backup to the datadir. If you don't have enough disk space, you probably should use the move option. Also, we need to specify the target dir where the backup is stored. Keep in mind that the datadir must be empty and the database service should be down before restoring the backup.

$ xtrabackup --copy-back --target-dir=/backups/

Or

$ mariabackup --copy-back --target-dir=/backups/

It will first copy/move the MyISAM tables and indexes, InnoDB tables and indexes next and the log files at last. It will preserve file’s attributes when copying them, you may have to change the files’ ownership to mysql before starting the database server, as they will be owned by the user who created the backup.

$ sudo chown -R mysql:mysql /var/lib/mysql

There are several parameters to use with Xtrabackup and Mariabackup. You can check these parameters here for XtraBackup, and here for Mariabackup.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Managing Your Backups on ClusterControl

As we saw above, running a backup is not rocket science. It can also be scheduled with cron (but beware of silent failures!). However, a script to regularly create backups is not a backup management solution. You need a way to report on your backups, and alert on failures. Now, configuring backups in your environment and seeing the backups work without error does not mean everything is good. You might have heard about Schrödinger’s backup, which states that the condition of any backup is unknown until a restore is attempted. Because the worse thing that can happen is a disaster and you realize the backups are wrong for some reason. You try to restore the files that were backed up, and it does not restore what you think you backed up, or it does not restore at all! Then there are things like moving backup files offsite, e.g. to external cloud storage, for disaster recovery. Encryption and handling of keys are important for security. Retention of local as well as external/archived backups also need to be managed.

Let's see how ClusterControl can help.

If you want to use the ClusterControl Backup feature, go to ClusterControl -> Select Cluster -> Backup, and there you can see your current backups, create or schedule a new one.

ClusterControl Backup Section
ClusterControl Backup Section

Using the create or schedule option, we can choose both, XtraBackup or Mariabackup method. In the same section, we can choose the server from which to take the backup, enable partial backup, choose where you want to store the backup and if you want to upload the backup to the cloud (AWS, Azure or Google Cloud).

ClusterControl Create Backup 1
ClusterControl Create Backup 1

Then, we can select backup parameters like compression, encryption and retention.

ClusterControl Create Backup 2
ClusterControl Create Backup 2

And these should be the commands that ClusterControl will run for you:

[16:37:58]: 192.168.100.120: Launching ( LC_ALL=C /usr/bin/innobackupex --defaults-file=/etc/my.cnf --galera-info --parallel 1 --stream=xbstream --no-timestamp . | gzip -6 - > /root/backups/BACKUP-13/backup-full-2018-11-14_193757.xbstream.gz ) 2>&1.

Or

[16:29:57]: 192.168.100.131: Launching ( LC_ALL=C /usr/bin/mariabackup --defaults-file=/etc/my.cnf --backup --galera-info --parallel 1 --stream=xbstream --no-timestamp | gzip -6 - > /root/backups/BACKUP-11/backup-full-2018-11-14_192957.xbstream.gz ) 2>&1.

This command could be different depending on which parameters you chose.

As we could see, ClusterControl is a good friend if we want to use XtraBackup or Mariabackup. We can run complex backup commands in an easy way, by selecting the options from the ClusterControl UI.
ClusterControl support both full or incremental backup, so we can configure all our backup strategy from a friendly UI.

Conclusion

When backing up a MariaDB server, it is recommended to use the Mariabackup tool. However, if for some reason you prefer to use XtraBackup, you still can. But you need to keep in mind the restrictions that apply, as we have noted in this blog. And finally, we discussed how a script to backup a database is not the same thing as a backup management solution, and had a quick look at ClusterControl.

Let us know if we missed any differences between the XtraBackup and MariaBackup.


*The non-blocking backups are supported for InnoDB, XtraDB, and HailDB storage engines. The following storage engines can be backed up by briefly pausing writes at the end of the backup: MyISAM, Merge, and Archive, including partitioned tables, triggers, and database options.

by Sebastian Insausti at November 27, 2018 10:39 AM

November 26, 2018

Peter Zaitsev

Upcoming Webinar Thurs 11/29: Improve MongoDB Performance with Proper Queries

Improve MongoDB Performance with Proper Queries

Improve MongoDB Performance with Proper QueriesPlease join Percona’s Sr. Technical Operations Architect, Tim Vaillancourt, as he presents Improve MongoDB Performance with Proper Queries on Thursday, November 29th, 2018, at 12:30 PM PST (UTC-8) / 3:30 PM EST (UTC-5).

Register Now

There are many different ways you can use queries in MongoDB to find the data you need. However, knowing which queries are slowing down your performance can be a challenge. In this webinar we’ll discuss the following:

  • Performing ad hoc queries on the database using the find or findOne functions and a query document.
  • How to query for ranges, set inclusion, inequalities, and more using $-conditionals.
  • How to use and sort queries that return a database cursor, which lazily returns batches of documents as you need them.
  • What pitfalls you can encounter when performing the many available meta operations on a cursor, including skipping a certain number of results, and limiting the number of results returned.

By the end of this webinar you will have a better understanding of which queries impact performance. Moreover, you’ll understand how to leverage open source tools to monitor queries.

Register for this webinar to learn how to improve MongoDB performance with the proper queries.

by Tim Vaillancourt at November 26, 2018 03:31 PM

Percona at AWS Re:Invent 2018!

AWS re:Invent

Come see Percona at AWS re:Invent from November 26-30, 2018 in booth 1605 in The Venetian Hotel Expo Hall.

Percona is a Bronze sponsor of AWS re:Invent in 2018 and will be there for the whole show! Drop by booth 1605 in The Venetian Expo Hall to discuss how Percona’s unbiased open source databse experts can help you with your cloud database and DBaaS deployments!

Our CEO, Peter Zaitsev will be presenting a keynote called MySQL High Availability and Disaster Recovery at AWS re:Invent!

  • When: 27 November at 1:45 PM – 2:45 PM
  • Where: Bellagio Hotel, Level 1, Gauguin 2

Check out our case study with Passportal on how Percona DBaaS expertise help guarantee uptime for their AWS RDS environment.

Percona has a lot of great content on how we can help improve your AWS open source database deployment. Check out some of our resources below:

Blogs:

Case Studies:

White Papers:

Webinars:

Datasheets:

See you at the show!

by Dave Avery at November 26, 2018 01:03 PM

See Percona CEO Peter Zaitsev’s Keynote at AWS re:Invent: MySQL High Availability and Disaster Recovery

AWS re:Invent

AWS re:InventJoin Percona CEO Peter Zaitsev at AWS re:Invent as he presents MySQL High Availability and Disaster Recovery on Tuesday, November 27, 2018, in the Bellagio Resort, Level 1, Gaugin 2 at 1:45 PM.

In this hour-long session, Peter describes the differences between high availability (HA) and disaster recovery (DR), and then moves through scenarios detailing how each is handled manually and in Amazon RDS.

He’ll review the pros and cons of managing HA and DR in the traditional database environment as well in the cloud. Having full control of these areas is daunting, and Amazon RDS makes meeting these needs easier and more efficient.

Regardless of which path you choose, it is necessary that you monitor your environment, so Peter wraps up with a discussion of metrics you should regularly review to keep your environment working correctly and performing optimally.

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona to one of the most respected open source companies in the business. Peter is a co-author of High-Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance.

You can also stop by and see Percona at AWS re:Invent in booth 1605 in The Venetian Hotel Expo Hall.

by Rick Golba at November 26, 2018 03:07 AM

November 24, 2018

Valeriy Kravchuk

Fun with Bugs #73 - On MySQL Bug Reports I am Subscribed to, Part X

It's time to continue my review of MySQL bug reports that I considered interesting for some reason recently. I had not got any notable reaction from Oracle engineers to my previous post about recent regression bugs in MySQL 8.0.13, so probably this topic is not really that hot. In this boring post I'll just review some bugs I've subscribed to since August that are still not closed, starting from the oldest.

Let me start with a couple of bug reports that remain "Open":
  • Bug #91959 - "UBSAN: signed integer overflow in lock_update_trx_age". It's really unusual to see bug reported by Shane Bester himself just "Open" for months. As he noted, it's really strange to see age defined as 32-bit value here when age_updated two lines later is 64-bit. On the other hand, from comments in the code it seems the age value is supposed to be small enough and if it grows too much this is a problem.
  • Bug #92964 - "Slave performance degrades over time". As description says:
    "When mysql slave is running gtid based asynchronous multi threaded replication, performance degrades over time in significant way when session_track_gtids is set to OWN_GTID; "
    The bug is assigned, but there are no public comments, while the way to reproduce seem to be defined clearly.
Same as the weather at Seven Sisters Cliffs back on September 14, 2018, some recent MySQL bug reports do not add confidence to those few visitors who explore them...

Now let me continue with some "Verified" bugs:
  • Bug #91981 - "Inconsistent user@host definitions for definer/grantor/grantee columns". It seems consistency is not a high priority these days.
  • Bug #92020 - "Introduce new SQL mode rejecting queries with results depending on query plan". I already mentioned this report while reviewing recent MySQL optimizer bugs. I have nothing to add to my conclusion there:
    "So, current MySQL considers different results when different execution plans are used normal!"
  • Bug #92032 - "Restrict usage of session foreign_key_checks". It's a great request from Federico Razzoli and I am happy to see it verified as a bug, fast.
  • Bug #92093 - "Replication crash safety needs relay_log_recovery even with GTID." Recently Jean-François Gagné pays special attention to MySQL replication crash safety in his bug reports, talks and blog posts. This specific report ended up as a request for a more clear documentation that should not provoke (false) safety feelings. See also his Bug #92109 - "Please make replication crash safe with GTID and less durable setting (bis)." for the request to improve MySQL in this regard one day.
  • Bug #92131 - "ASan: Direct leak of 272 byte(s) in main.mysqlpump_partial_bkp MTR test case". This kind of reports make me nervous. They make me think that still there is no regular testing of ASan-enabled builds in Oracle. Lucky we are, Percona engineers (like Yura Sorokin) do this for Oracle.
  • Bug #92133 - "DICT_SYS mutex contention causes complete stall when running with 40 mill tables". Proper data dictionary in MySQL 8.0 was supposed to solve some performance problems for instances with a lot of tables. As this perfect bug report from Alexander Rubin shows, this is not yet the case. We can still push MySQL instance over the limits and single instance for thousands of different small databases is hardly usable.
  • Bug #92209 - "AVG(YEAR(datetime_field)) makes an error result because of overflow". Perfect analysis from Pin Lin. All recent MySQL (and MariaDB) versions are affected.
  • Bug #92252 - "Mysql generates gtid gaps because of option slave-skip-errors". Yet another bug report from Pin Lin
  • Bug #92364 - "events_transactions_summary_global_by_event_name not working as expected". This bug was reported by Przemyslaw Malkowski from Percona. We all know Performance Schema is near perfect, but it turned out that "...performance_schema.events_transactions_summary_global_by_event_name behavior for instrumenting transactions seems completely broken.".
  • Bug #92398 - "point in time recovery using mysqlbinlog broken with temporary table -> errors". Let me quote Shane Bester:
    "Running a multi-threaded workload involving temporary and non-temporary tables leads to binary log playback giving errors."
  • Bug #92421 - "Queries with views and operations over local variables don't use indexes". This is a kind of regression in MySQL 5.7. MySQL 8 allows to workaround the problem properly using ROW_NUMBER() function.
  • Bug #92540 - "Comparison between DATE and DECIMAL does not round nanoseconds". This bug was reported by Alexander Barkov from MariaDB. MariaDB 10.3.x is not affected based on my tests.
To summarize:
  1. It seems ASan-enabled builds are not tested by Oracle engineers on a regular basis.
  2. Percona engineers help a lot with regular MySQL QA and testing for various extreme use cases.
  3. There is a lot to do to make GTID-based replication crash-safe and working fast in common use cases.
 More bug review are coming soon, so stay tuned.



by Valeriy Kravchuk (noreply@blogger.com) at November 24, 2018 08:27 PM

November 23, 2018

Peter Zaitsev

Percona Server for MongoDB 3.4.18-2.16 Is Now Available

Percona Server for MongoDB Operator

Percona Server for MongoDB 3.4Percona announces the release of Percona Server for MongoDB 3.4.18-2.16 on November 23, 2018. Download the latest version from the Percona website or the Percona Software Repositories.

Percona Server for MongoDB 3.4 is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.4 Community Edition. It supports MongoDB 3.4 protocols and drivers.

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine and MongoRocks storage engines, as well as several enterprise-grade features:

Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.4.18. There are the following improvements on top of those upstream fixes.

Improvements

  • #247: Now, AuditLog formats are listed in the output of mongod --help and mongos --help commands

Other improvements:

  • #238audit_drop_database.js test occasionally fails with MMAPv1

 

by Borys Belinsky at November 23, 2018 06:28 PM

Compression options in MySQL (part 1)

Over the last year, I have been pursuing a part time hobby project exploring ways to squeeze as much data as possible in MySQL. As you will see, there are quite a few different ways. Of course things like compression ratio matters a lot but, other items like performance of inserts, selects and updates, along with the total amount of bytes written are also important. When you start combining all the possibilities, you end up with a large set of compression options and, of course, I am surely missing a ton. This project has been a great learning opportunity and I hope you’ll enjoy reading about my results. Given the volume of results, I’ll have to write a series of posts. This post is the first of the series. I also have to mention that some of my work overlaps work done by one of my colleague, Yura Sorokin, in a presentation he did in Dublin.

The compression options

  • InnoDB page size in {16k, 32k, 64k} (as references)
  • InnoDB barracuda page compression, block_size in {8k, 4k}
  • InnoDB Transparent page compression with punch holes, page size in {16k, 32k, 64k} * compression algo in {LZ4, Zlib}
  • MyISAM, MyISAM Packed, MyISAM on ZFS with recordsize in {16k, 32k}
  • InnoDB on ZFS, ZFS compression algo in {LZ4, Zlib}, ZFS record size in {16k, 32k, 64k, 128k}, InnoDB page size in {16k, 32k, 64k}
  • TokuDB, TokuDB compression algo in {ZLIB, LZMA, QUICKLZ, SNAPPY}
  • TokuDB on ZFS, TokuDB compression algo set to None, ZFS compression Zlib, ZFS record size in {16k, 32k, 64k, 128k}
  • MyRocks, compression algo in {None, ZSTD}
  • MyRocks on ZFS, MyRocks compression algo set to None, ZFS compression Zlib, ZFS record size in {16k, 32k, 64k, 128k}

In many interesting cases, the ZFS experiments have been conducted with and without a SLOG.

The test datasets

In order to cover these solutions efficiently, I used a lot of automation and I restricted myself to two datasets. My first dataset consists of a set of nearly 1 billion rows from the Wikipedia access stats. The table schema is:

CREATE TABLE `wiki_pagecounts` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `day` date NOT NULL,
  `hour` tinyint(4) NOT NULL,
  `project` varchar(30) NOT NULL,
  `title` text NOT NULL,
  `request` int(10) unsigned NOT NULL,
  `size` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_time` (`day`,`hour`)
);

and here’s a typical row:

mysql> select * from wiki_pagecounts where id = 16\G
*************************** 1. row ***************************
     id: 16
    day: 2016-01-01
   hour: 0
project: aa
  title: 'File:Wiktionary-logo-en.png'
request: 1
   size: 10752
1 row in set (0.00 sec)

The average length of the title columns is above 70 and it often has HTML escape sequences for UTF-8 characters in it. The actual column content is not really important but it is not random data. Loading this dataset in plain InnoDB results in a data file of about 113GB.

The second dataset is from the defunct Percona cloud tool project and is named “o1543”. Instead of a large number of rows, it is made of only 77M rows but this time, the table has 134 columns, mostly using float or bigint. The table definition is:

CREATE TABLE `query_class_metrics` (
   `day` date NOT NULL,
   `query_class_id` int(10) unsigned NOT NULL,
   `instance_id` int(10) unsigned NOT NULL,
   `start_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
   `end_ts` timestamp NOT NULL DEFAULT '1970-01-01 00:00:01',
   `query_count` bigint(20) unsigned NOT NULL,
   `lrq_count` bigint(20) unsigned NOT NULL DEFAULT '0',
   ...
   `Sort_scan_sum` bigint(20) unsigned DEFAULT NULL,
   `No_index_used_sum` bigint(20) unsigned DEFAULT NULL,
   `No_good_index_used_sum` bigint(20) unsigned DEFAULT NULL,
   PRIMARY KEY (`start_ts`,`instance_id`,`query_class_id`),
   KEY `start_ts` (`instance_id`)
 );

When loaded in plain InnoDB, the resulting data file size is slightly above 87GB.

The test queries

The first test query is, of course, the inserts used to load the datasets. These are multi-inserts statements in primary key order generated by the mysqldump utility.

The second test query is a large range select. For the Wikipedia dataset I used:

select `day`, `hour`, max(request), sum(request), sum(size)
from wikipedia_pagecounts.wiki_pagecounts
where day = '2016-01-05'
group by `day`,`hour`;

while for the o1543 dataset, I used:

select query_class_id, sum(Query_time_sum) as Totat_time,sum(query_count), sum(Rows_examined_sum),
  sum(InnoDB_pages_distinct_sum)
from o1543.query_class_metrics
where start_ts between '2014-10-01 00:00:00' and '2015-06-30 00:00:00'
group by query_class_id
order by Totat_time desc limit 10;

In both cases, a significant amount of data needs to be processed. Finally, I tested random access with the updates. I generated 20k distinct single row updates in random primary key order for the Wikipedia dataset like:

update wikipedia_pagecounts.wiki_pagecounts set request = request + 1 where id = 377748793;

For the o1543 dataset, I used the following update statement:

update o1543.query_class_metrics set Errors_sum = Errors_sum + 1
where query_class_id = 472 and start_ts between '2014-10-01 00:00:00' and '2014-12-31 23:59:59';

which ends up updating very close to 20k rows, well spaced in term of primary key values.

The metrics recorded

In order to compare the compression options, I recorded key metrics.

  • Time: that’s simply the execution time of the queries. It is very difficult to minimally tune a server for all the different engines. Some also rely on the OS file cache. Take the execution time as a rough performance indicator which could be modified substantially through targeted tuning.
  • Amount of data read and written by the MySQL process, as reported by /proc/$(pidof mysqld)/io.
  • Amount of data written to the actual physical device where the data is stored, from /sys/block/$datadevice/stat. That matters a lot for flash devices that have a finite endurance. The amount of data written to the storage is the main motivation of Facebook with MyRocks.
  • The actual size of the final datasets

Even with these simple metrics, you’ll see there is quite a lot to discuss and learn.

The procedure

For the benchmarks, I used a LXC virtual machine. My main goal was to simulate a dataset much larger than the available memory. I tried to limit the MySQL buffers to 128MB but in some cases, like with MyRocks, that was pretty unfair and it impacted the performance results. Basically, the procedure was:

  1. Start mysqld (no buffer pool load)
  2. Sync + drop cache
  3. Capture: du -hs the datadir
  4. Capture: cat /proc/$(pidof mysqld)/io
  5. Capture: cat /sys/block/vdb/stat
  6. Capture: show global variables and show global status;
  7. Run the test query
  8. Wait for 30 minutes for any flushing or maintenance to complete
  9. Capture: du -hs the datadir
  10. Capture: cat /proc/$(pidof mysqld)/io
  11. Capture: cat /sys/block/vdb/stat
  12. Capture: show global variables and show global status;
  13. Stop mysqld

As much as possible, I automated the whole procedure. On many occasions, I ran multiple runs of the same benchmark to validate unexpected behaviors.

First results: Traditional storage options

Inserting the data

In this first post of the series, I’ll report on the traditional “built-in” options which are InnoDB with Barracuda compression (IBC) and MyISAM with packing. I debated a bit the inclusion of MyISAM in this post since tables become read-only once packed but still, I personally implemented solutions using MyISAM packed a few times.

On the first figure (above), we have the final table sizes in GB for the different options we are considering in this first round of results. Over all the post series, we’ll use the plain InnoDB results as a reference. The Wikipedia dataset has a final size in InnoDB of 112.6GB while the o1543 dataset is slightly smaller, at 87.4GB.

The MyISAM sizes are smaller by 10 to 15% which is expected since InnoDB is page based, with page and row headers, and it doesn’t fully pack its pages. The InnoDB dataset size could have been up to twice as large if the data was inserted in random order of the primary key.

Adding Barracuda page compression with an 8KB block size (InnoDBCmp8k), both datasets shrink by close to 50%. Pushing to a block size of 4KB (InnoDBCmp4k), the Wikipedia dataset clearly doesn’t compress that much but, the o1543 dataset is very compressible, by more than 75%. Looking at the MyISAM Packed results, the o1543 dataset compressed to only 8.8GB, a mere 11% of the original MyISAM size. That means the o1543 could have done well with IBC using block size of 2KB. Such a compression ratio is really exceptional. I’ve rarely encountered a ratio that favorable in a production database.

With IBC, we know when the block size is too small for the compressibility of the dataset when the final size is no longer following the ratio of compressed block size over the original block size. For example, the Wikipedia dataset started at 112.6GB and a fourth (4KB/16KB) of this number is much smaller than the 46.9GB size of the InnoDB compression with 4k block size. You’ll also see a large number of compression failure in the innodb_cmp table of the information schema.

When the compression fails with IBC, InnoDB splits the page in two and recompresses each half. That process adds an overhead which can observed in the insertion time figure. While the insertion time of the Wikipedia dataset for the InnoDBCmp4k explodes to 2.6 times the uncompressed insertion time, the o1543 dataset takes only 30% more time. I suggest you do not take the times here too formally, the test environment I used was not fully isolated. View these times as trends.

The amount of data written during the inserts, shown on the above figure, has bugged me for a while. Let’s consider, for example, the amount of writes needed to insert all the rows of the Wikipedia dataset in plain InnoDB. The total is 503GB for a dataset of 113GB. From the MySQL status variables, I have 114GB written to the data files (innodb_pages_written * 16kb), 114GB written to the double write buffer (Inndb_dblwr_pages_written * 16kb) and 160GB written to the InnoDB log files (innodb_os_log_written). If you sum all these values, you have about 388GB, a value short by about… 114GB, too close to the size of the dataset to be an accident. What else is written?

After some research, I finally found it! When a datafile is extended, InnoDB first writes zeros to allocate the space on disk. All the tablespaces have to go through that initial allocation phase so here are the missing 114GB.

Back to the data written during the inserts figure, look at the number of writes required for the Wikipedia dataset when using InnoDBCmp4k. Any idea why it is higher? What if I tell you the double write buffer only writes 16KB pages (actually, it depends on innodb_page_size)? So, when the pages are compressed, they are padded with zeros when written to the double write buffer. We remember that, when compressing to 4KB, we had many compression misses so we ended up with many more pages to write. Actually, Domas Mituzas filed a bug back in 2013 about this. Also, by default (innodb_log_compressed_pages), the compressed and uncompressed versions of the pages are written to the InnoDB log files. Actually, only the writes to the tablespace are reduced by IBC.

MyISAM, since it is not a transactional engine, cheats here. The only overhead are the writes to the b-trees of the index. So, to the expense of durability, MyISAM writes much less data in this simple benchmark.

Range selects

So great, we have now inserted a lot of rows in our tables. How fast can we access these rows? The following figure presents the times to perform large range scans on the datasets. In InnoDB, the times are “fairly” stable. For the Wikipedia dataset, InnoDB compression improves the select performance, the time to decompress a page is apparently shorter than the time to read a full page. MyISAM without compression is the fastest solution but, once again, the benchmark offers the most favorable conditions to MyISAM. MyISAMPacked doesn’t fare as well for the large range select, likely too much data must be decompressed.

20k updates

Going to the updates, the time required to perform 20k updates by a single thread is shown on the above figure. For both datasets, InnoDB and InnoDBCmp8k show similar times. There is a divergence with Wikipedia dataset stored on InnoDBCmp4k, the execution time is 57% larger, essentially caused by a large increase in the number of pages read. MyISAM is extremely efficient dealing with the updates of the o1543 dataset since the record size is fixed and the update process is single-threaded.


Finally, let’s examine the number of bytes written per updates as shown on the figure above. I was naively expecting about two pages written per update statement, one for the double write buffer and one for the tablespace file. The Wikipedia dataset shows more like three pages written per update while the o1543 dataset fits rather well with what I was expecting. I had to look at the file_summary_by_instance table of the Performance schema and the innodb_metrics table to understand. Actually, my updates to the Wikipedia dataset are single row updates executed in autocommit mode, while the updates to the o1543 dataset are from a single statement updating 20k rows. When you do multiple small transactions, you end up writing much more to the undo log and to the system tablespace. The worse case is when the updates are in separate transactions and a long time is allowed for MySQL to flush the dirty pages.

Here are the writes associated with 30 updates, in autocommit mode, 20s apart:

mysql> select NAME, COUNT_RESET from innodb_metrics where name like '%writt%' and count_reset > 0 ;
+---------------------------------+-------------+
| NAME                            | COUNT_RESET |
+---------------------------------+-------------+
| buffer_pages_written            |         120 |
| buffer_data_written             |     2075136 |
| buffer_page_written_index_leaf  |          30 |
| buffer_page_written_undo_log    |          30 |
| buffer_page_written_system_page |          30 |
| buffer_page_written_trx_system  |          30 |
| os_log_bytes_written            |       77312 |
| innodb_dblwr_pages_written      |         120 |
+---------------------------------+-------------+
8 rows in set (0.01 sec)

The index leaf write is where the row is stored in the tablespace, 30 matches the number of rows updated. Each update has dirtied one leaf page as expected. The undo log is used to store the previous version of the row for rollback, in the ibdata1 file. I wrongly assumed these undo entries would not be actually written to disk, and would only live in the buffer pool and purged before they needed to be flushed to disk. I don’t clearly enough understand what is written to the system page and trx system to attempt a clear explanation for these ones. The sum of pages to write is 120, four per update but you need to multiply by two because of the double write buffer. So, in this worse case scenario, a simple single row update may cause up to eight pages to be written to disk.

Grouping the updates in a single transaction basically removes the pages written to the system_page and trx_system as these are per transaction.

Here is the result for the same 30 updates, send at a 20s interval, but this time in a single transaction:

mysql> select NAME, COUNT_RESET from innodb_metrics where name like '%writt%' and count_reset > 0 ;
+---------------------------------+-------------+
| NAME                            | COUNT_RESET |
+---------------------------------+-------------+
| buffer_pages_written            |          63 |
| buffer_data_written             |     1124352 |
| buffer_page_written_index_leaf  |          30 |
| buffer_page_written_undo_log    |          31 |
| buffer_page_written_system_page |           1 |
| buffer_page_written_trx_system  |           1 |
| os_log_bytes_written            |       60928 |
| innodb_dblwr_pages_written      |          63 |
+---------------------------------+-------------+

The write load, in terms of the number of pages written, is cut by half, to four per update. The most favorable case will be a single transaction with no sleep in between.

For 30 updates in a single transaction with no sleep, the results are:

mysql> select NAME, COUNT_RESET from innodb_metrics where name like '%writt%' and count_reset > 0 ;
+---------------------------------+-------------+
| NAME                            | COUNT_RESET |
+---------------------------------+-------------+
| buffer_pages_written            |          33 |
| buffer_data_written             |      546304 |
| buffer_page_written_index_leaf  |          30 |
| buffer_page_written_undo_log    |           1 |
| buffer_page_written_system_page |           1 |
| buffer_page_written_trx_system  |           1 |
| os_log_bytes_written            |        4608 |
| innodb_dblwr_pages_written      |          33 |
+---------------------------------+-------------+
8 rows in set (0.00 sec)

Now, the undo log is flushed only once and we are down to approximately two page writes per update. This is what I was originally expecting. The other results falls well into place if you keep in mind that only the index_leaf writes are compressed. The InnoDBCmp4k results for the Wikipedia dataset are higher, essentially because it took much more time and thus more page flushing occurred.

What we learned?

Everything can be a pretext to explore and learn. Just to summarize, what have we learned in this post?

  • The InnoDB log file logs compressed and uncompressed result by default (see innodb_log_compressed_pages)
  • The double write buffer only writes full pages, compressed pages are zero padded
  • With InnoDB the total amount of data written to disk during the inserts is more than 5 times the final size. Compression worsen the ratio.
  • A single row update causes from two up to eight pages to be written to disk

Not bad in term of collateral learning…

Next?

In this post, we reviewed the traditional data compression solutions available with MySQL. In future posts, we’ll start looking at the alternatives. In the next one, I will evaluate InnoDB Transparent page compression with punch hole, a feature available since MySQL 5.7.

by Yves Trudeau at November 23, 2018 03:18 PM

November 22, 2018

Peter Zaitsev

Caveats With pt-table-checksum Using Row-Based Replication, and Replication Filters

pt-table-checksum row based replication caveat

pt-table-checksum row based replication caveatAs per the documentation, pt-table-checksum is a tool to perform online replication consistency checks by executing checksum queries on the master, which produces different results on replicas that are inconsistent with the master.

The master and each slave insert checksums into the percona.checksums table, and these are later compared for differences. It’s fairly obvious that the checksums need to be determined independently on each node, and so these inserts must be replicated as STATEMENT and not ROW. Otherwise, the slaves would just insert the same checksum as the master and not calculate it independently.

The tool only requires

binlog_format=STATEMENT
  for its own session. It sets this itself on the master, but will error if this isn’t already set on each slave node. The reason for the error is that the statement to change the session’s binlog_format will not be replicated. So if a slave has binlog_format=ROW then the slave itself will execute the checksum correctly, but the results will be written as a ROW. Any slaves of this slave in the chain will just insert the same result. See bug 899415.

This is only a problem if we have chained replication, and the error can be skipped with --no-check-binlog-format so for simple replication setups with ROW or MIXED replication we can still use the tool. If we do not have a slave-of-slave style chained topology, then there’s no need to worry about this.

However, there is one caveat to be aware of if you’re using replication filters: when a slave isn’t replicating a particular database due to binlog-ignore-db, this setting behaves differently with ROW based replication (RBR) vs. STATEMENT based.

Per the documentation, with RBR,

binlog-ignore-db=testing

will cause all updates to testing.* to be skipped. With STATEMENT-based replication it will cause all updates after

USE test_database;
  to be ignored (regardless of where the updates were being written to).

pt-table-checksum operates in the following way:

use `testing`/*!*/;
SET TIMESTAMP=1541583280/*!*/;
REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT ‘testing', 'testing', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, CONCAT(ISNULL(`id`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `testing`.`testing` /*checksum table*/

Due to the use testing the slave will then skip these statements with no errors, and simply not write into percona.checksums.

As per the documentation:

The tool monitors replicas continually. If any replica falls too far behind in replication, pt-table-checksum pauses to allow it to catch up. If any replica has an error, or replication stops, pt-table-checksum pauses and waits.

In this case, you will see the tool continually wait, with the following debug output:

# pt_table_checksum:12398 10967 SELECT MAX(chunk) FROM `percona`.`checksums` WHERE db=‘testing’ AND tbl=‘testing’ AND master_crc IS NOT NULL
# pt_table_checksum:12416 10967 Getting last checksum on slave1
# pt_table_checksum:12419 10967 slave1 max chunk: undef
# pt_table_checksum:12472 10967 Sleep 0.25 waiting for chunks
# pt_table_checksum:12416 10967 Getting last checksum on slave1
# pt_table_checksum:12419 10967 slave1 max chunk: undef
# pt_table_checksum:12472 10967 Sleep 0.5 waiting for chunks
# pt_table_checksum:12416 10967 Getting last checksum on slave1
# pt_table_checksum:12419 10967 slave1 max chunk: undef

We don’t recommend using the tool with replication filters in place, but if --no-check-replication-filters is specified you should be aware of the differences in how different binlog formats handle these filters.

One workaround would be to replace

binlog-ignore-db=testing

With the following which will just ignore writes to that database:

binlog-wild-ignore-table=testing.%

More resources

You can read more about pt-table-checksum in this blog post MySQL Replication Primer with pt-table-checksum and pt-table-sync

The latest version of Percona Toolkit can be downloaded from our website. All Percona software is open source and free.

Photo by Black ice from Pexels

 

by James Lawrie at November 22, 2018 03:20 PM

November 21, 2018

Peter Zaitsev

Percona Server for MySQL 5.7.23-25 Is Now Available

Percona Server for MySQL

Percona Server for MySQL 5.7Percona announces the release of Percona Server for MySQL 5.7.23-25 on November 21, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.7.23, including all the bug fixes in it. Percona Server for MySQL 5.7.23-25 is now the current GA release in the 5.7 series. All of Percona’s software is open-source and free.

This release fixes a critical bug in a RocksDB submodule.

Bugs Fixed

  • #5049: Severe memory leak regression in the RocksDB Block Cache

Find the release notes for Percona Server for MySQL 5.7.23-25 in our online documentation. Report bugs in the Jira bug tracker.

by Borys Belinsky at November 21, 2018 09:41 PM

Percona Server for MySQL 5.5.62-38.14 Is Now Available

Percona Server for MySQL

Percona Server for MySQL 5.6Percona announces the release of Percona Server for MySQL 5.5.62-38.14 on November 21, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.5.62, including all the bug fixes in it. Percona Server for MySQL 5.5.62-38.14 is now the current GA release in the 5.5 series. All of Percona’s software is open-source and free.

Note that Percona Server for MySQL 5.5.62-38.14 is the last release of the 5.5 series. This series goes EOL on December 1st, 2018.

Improvements

  • #4790: The accuracy of user statistics has been improved

Bugs Fixed

  • The binary log could be corrupted when the disk partition used for temporary files (tmpdir system variable) had little free space. Bug fixed #1107
  • PURGE CHANGED_PAGE_BITMAPS did not work when the innodb_data_home_dir system variable was used. Bug fixed #4723
Other Bugs Fixed
  • #4773: Percona Server sources can’t be compiled without server.
  • #4781: sql_yacc.yy uses SQLCOM_SELECT instead of SQLCOM_SHOW_XXXX_STATS

Find the release notes for Percona Server for MySQL 5.5.62-38.14 in our online documentation. Report bugs in the Jira bug tracker.

by Borys Belinsky at November 21, 2018 05:57 PM

Identifying Unused Indexes in MongoDB

mongodb index usage stats PMM visualization

Like MySQL, having too many indexes on a MongoDB collection not only affects overall write performance, but disk and memory resources as well. While MongoDB holds predictably well in scaling both reads and writes options, maintaining a heathly schema design should always remain a core character of a good application stack.

Aside from knowing when to add an index to improve query performance, and how to modify indexes to satisfy changing query complexities, we also need to know how to identify unused indexes and cut their unnecessary overhead.

First of all, you can already identify access operation counters from each collection using the

$indexStats
  (
indexStats
  command before 3.0) aggregation command. This command provides two important pieces of information: the
ops
  counter value, and
since
 , which is when the ops counter first iterated to one. It is reset when the
mongod
  instance is restarted.

m34:PRIMARY> db.downloads.aggregate( [ { $indexStats: { } } ] ).pretty()
{
	"name" : "_id_",
	"key" : {
		"_id" : 1
	},
	"host" : "mongodb:27018",
	"accesses" : {
		"ops" : NumberLong(0),
		"since" : ISODate("2018-11-10T15:53:31.429Z")
	}
}
{
	"name" : "h_id_1",
	"key" : {
		"h_id" : 1
	},
	"host" : "mongodb:27018",
	"accesses" : {
		"ops" : NumberLong(0),
		"since" : ISODate("2018-11-10T15:54:57.634Z")
	}
}

From this information, if the ops counter is zero for any index, then we can assume it has not been used either since the index was added or since the server was restarted, with a few exceptions. An index might be unique and not used at all (a uniqueness check on INSERT does not increment the ops counter). The documentation also indicates that index stats counter does not get updated by TTL indexes expiration or chunk split and migration operations.

Be aware of occasional index use

One golden rule, however, is that this type of observation based on type is subjective – before you decide to drop the index, make sure that the counter has collected for a considerable amount of time. Dropping an index that is only used once a month for some heavy reporting can be problematic.

The same information from

$indexStats
  can also be made available to PMM. By default, the
mongo_exporter
  does not include this this information but it can be enabled as an additional collection parameter.

sudo pmm-admin add mongodb:metrics --uri 127.0.0.1:27018 -- -collect.indexusage

Once enabled, we can create a custom graph for this information from any PMM dashboard, as shown below. As mentioned above, any index(es) that has zero values will not have been used for the current time range in the graph. One minor issue with the collector is that each metric does not come with the database and collection information. Consequently, we cannot filter to the collection level yet, we have an improvement request open for that.

MongoDB index usage dashboard report from percona monitoring and management

An alternative view to this information from Grafana PMM is available from the Time Series to Aggregation table panel, shown below. One advantage of having these metrics in PMM is that the data survives an instance restart. Of course, to be useful for identifying unused indexes, the retention period has to match or exceed your complete application “cycle” period.MongoDB index usage stats from PMM

Given that in a MongoDB replicaset, you can delegate data bearing member nodes to different roles, perhaps with tags and priorities. You can also have nodes with different sets of indexes. Being able to identify the sets of indexes needed at the node level allows you to optimize replication, queries, and resource usage.

More Resources

We have an introductory series of posts on MongoDB indexes available on this blog. Read Part 1 here.

You can download Percona Server for MongoDB – all Percona software is open source and free.

by Jervin Real at November 21, 2018 03:09 PM

Jean-Jerome Schmidt

How to Encrypt Your MySQL & MariaDB Backups

We usually take care of things we value, whether it is an expensive smartphone or the company’s servers. Data is one of the most important assets of the organisation, and although we do not see it, it has to be carefully protected. We implement data at rest encryption to encrypt database files or whole volumes which contain our data. We implement data in transit encryption using SSL to make sure no one can sniff and collect data sent across networks. Backups are no different. No matter if this is a full backup or incremental, it will store at least a part of your data. As such, backups have to be encrypted too. In this blog post, we will look into some options you may have when it comes to encrypting backups. First though, let’s look at how you can encrypt your backups and what could be use cases for those methods.

How to encrypt your backup?

Encrypt local file

First of all, you can easily encrypt existing files. Let’s imagine that you have a backup process storing all your backups on a backup server. At some point you decided it’s the high time to implement offsite backup storage for disaster recovery. You can use S3 or similar infrastructure from other cloud providers for that. Of course, you don’t want to upload unencrypted backups anywhere outside of your trusted network, therefore it is critical that you implement backup encryption one way or the other. A very simple method, not requiring any changes in your existing backup scripts would be to create a script which will take a backup file, encrypt it and upload it to S3. One of the methods you can use to encrypt the data is to use openssl:

openssl enc -aes-256-cbc -salt -in backup_file.tar.gz -out backup_file.tar.gz.enc -k yoursecretpassword

This will create a new, encrypted file, ‘backup_file.tar.gz.enc’ in the current directory. You can always decrypt it later by running:

openssl aes-256-cbc -d -in backup_file.tar.gz.enc -out backup_file.tar.gz -k yoursecretpassword

This method is very simple, but it has some drawbacks. The biggest one is the disk space requirements. When encrypting like we described above, you have to keep both unencrypted and encrypted file and in the result you require a disk space twice the size of the backup file. Of course, depending on your requirements, this might be something you want - keeping non-encrypted files locally will improve recovery speed - after all decrypting the data will also take some time.

Encrypt backup on the fly

To avoid the need of storing both encrypted and unencrypted data, you may want to implement the encryption at the earlier stage of the backup process. We can do that through pipes. Pipes are, in short, a way of sending the data from one command to another. This makes it possible to create a chain of commands that processes data. You can generate the data, then compress it and encrypt. An example of such chain might be:

mysqldump --all-databases --single-transaction --triggers --routines | gzip | openssl  enc -aes-256-cbc -k mypass > backup.xb.enc

You can also use this method with xtrabackup or mariabackup. In fact, this is the example from MariaDB documentation:

mariabackup --user=root --backup --stream=xbstream  | openssl  enc -aes-256-cbc -k mypass > backup.xb.enc

If you want, you can even try to upload data as the part of the process:

mysqldump --all-databases --single-transaction --triggers --routines | gzip | openssl  enc -aes-256-cbc -k mysecretpassword | tee -a mysqldump.gz.enc | nc 10.0.0.102 9991

As a result, you will see a local file ‘mysqldump.gz.enc’ and copy of the data will be piped to a program which will do something about it. We used ‘nc’, which streamed data to another host on which following was executed:

nc -l 9991 > backup.gz.enc

In this example we used mysqldump and nc but you can use xtrabackup or mariabackup and some tool which will upload the stream to Amazon S3, Google Cloud Storage or some other cloud provider. Any program or script which accepts data on stdin can be used instead of ‘nc’.

Use embedded encryption

In some of the cases, a backup tool has embedded support for encryption. An example here is xtrabackup, which can internally encrypt the file. Unfortunately, mariabackup, even though it is a fork of xtrabackup, does not support this feature.

Before we can use it, we have to create a key which will be used to encrypt the data. It can be done by running the following command:

root@vagrant:~# openssl rand -base64 24
HnliYiaRo7NUvc1dbtBMvt4rt1Fhunjb

Xtrabackup can accept the key in plain text format (using --encrypt-key option) or it can read it from file (using --encrypt-key-file option). The latter is safer as passing passwords and keys as plain text to commands result in storing them in the bash history. You can also see it clearly on the list of running processes, which is quite bad:

root      1130  0.0  0.6  65508  4988 ?        Ss   00:46   0:00 /usr/sbin/sshd -D
root     13826  0.0  0.8  93100  6648 ?        Ss   01:26   0:00  \_ sshd: root@notty
root     25363  0.0  0.8  92796  6700 ?        Ss   08:54   0:00  \_ sshd: vagrant [priv]
vagrant  25393  0.0  0.6  93072  4936 ?        S    08:54   0:01  |   \_ sshd: vagrant@pts/1
vagrant  25394  0.0  0.4  21196  3488 pts/1    Ss   08:54   0:00  |       \_ -bash
root     25402  0.0  0.4  52700  3568 pts/1    S    08:54   0:00  |           \_ sudo su -
root     25403  0.0  0.4  52284  3264 pts/1    S    08:54   0:00  |               \_ su -
root     25404  0.0  0.4  21196  3536 pts/1    S    08:54   0:00  |                   \_ -su
root     26686  6.0  4.0 570008 30980 pts/1    Sl+  09:48   0:00  |                       \_ innobackupex --encrypt=AES256 --encrypt-key=TzIZ7g+WzLt0PXWf8WDPf/sjIt7UzCKw /backup/

Ideally, you will use the key stored in a file but then there’s a small gotcha you have to be aware of.

root@vagrant:~# openssl rand -base64 24 > encrypt.key
root@vagrant:~# innobackupex --encrypt=AES256 --encrypt-key-file=/root/encrypt.key /backup/
.
.
.
xtrabackup: using O_DIRECT
InnoDB: Number of pools: 1
encryption: unable to set libgcrypt cipher key - User defined source 1 : Invalid key length
encrypt: failed to create worker threads.
Error: failed to initialize datasink.

You may wonder what the problem is. It’ll become clear when we will open encrypt.key file in a hexadecimal editor like hexedit:

00000000   6D 6B 2B 4B  66 69 55 4E  5A 49 48 77  39 42 36 72  68 70 39 79  6A 56 44 72  47 61 79 45  6F 75 6D 70  0A                                     mk+KfiUNZIHw9B6rhp9yjVDrGayEoump.

You can now notice the last character encoded using ‘0A’. This is basically the line feed character, but it is taken under consideration while evaluating the encryption key. Once we remove it, we can finally run the backup.

root@vagrant:~# innobackupex --encrypt=AES256 --encrypt-key-file=/root/encrypt.key /backup/
xtrabackup: recognized server arguments: --datadir=/var/lib/mysql --innodb_buffer_pool_size=185M --innodb_flush_log_at_trx_commit=2 --innodb_file_per_table=1 --innodb_data_file_path=ibdata1:100M:autoextend --innodb_read_io_threads=4 --innodb_write_io_threads=4 --innodb_doublewrite=1 --innodb_log_file_size=64M --innodb_log_buffer_size=16M --innodb_log_files_in_group=2 --innodb_flush_method=O_DIRECT --server-id=1
xtrabackup: recognized client arguments: --datadir=/var/lib/mysql --innodb_buffer_pool_size=185M --innodb_flush_log_at_trx_commit=2 --innodb_file_per_table=1 --innodb_data_file_path=ibdata1:100M:autoextend --innodb_read_io_threads=4 --innodb_write_io_threads=4 --innodb_doublewrite=1 --innodb_log_file_size=64M --innodb_log_buffer_size=16M --innodb_log_files_in_group=2 --innodb_flush_method=O_DIRECT --server-id=1 --databases-exclude=lost+found --ssl-mode=DISABLED
encryption: using gcrypt 1.6.5
181116 10:11:25 innobackupex: Starting the backup operation

IMPORTANT: Please check that the backup run completes successfully.
           At the end of a successful backup run innobackupex
           prints "completed OK!".

181116 10:11:25  version_check Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock' as 'backupuser'  (using password: YES).
181116 10:11:25  version_check Connected to MySQL server
181116 10:11:25  version_check Executing a version check against the server...
181116 10:11:25  version_check Done.
181116 10:11:25 Connecting to MySQL server host: localhost, user: backupuser, password: set, port: not set, socket: /var/lib/mysql/mysql.sock
Using server version 5.7.23-23-57
innobackupex version 2.4.12 based on MySQL server 5.7.19 Linux (x86_64) (revision id: 170eb8c)
xtrabackup: uses posix_fadvise().
xtrabackup: cd to /var/lib/mysql
xtrabackup: open files limit requested 0, set to 1024
xtrabackup: using the following InnoDB configuration:
xtrabackup:   innodb_data_home_dir = .
xtrabackup:   innodb_data_file_path = ibdata1:100M:autoextend
xtrabackup:   innodb_log_group_home_dir = ./
xtrabackup:   innodb_log_files_in_group = 2
xtrabackup:   innodb_log_file_size = 67108864
xtrabackup: using O_DIRECT
InnoDB: Number of pools: 1
181116 10:11:25 >> log scanned up to (2597648)
xtrabackup: Generating a list of tablespaces
InnoDB: Allocated tablespace ID 19 for mysql/server_cost, old maximum was 0
181116 10:11:25 [01] Encrypting ./ibdata1 to /backup/2018-11-16_10-11-25/ibdata1.xbcrypt
181116 10:11:26 >> log scanned up to (2597648)
181116 10:11:27 >> log scanned up to (2597648)
181116 10:11:28 [01]        ...done
181116 10:11:28 [01] Encrypting ./mysql/server_cost.ibd to /backup/2018-11-16_10-11-25/mysql/server_cost.ibd.xbcrypt
181116 10:11:28 [01]        ...done
181116 10:11:28 [01] Encrypting ./mysql/help_category.ibd to /backup/2018-11-16_10-11-25/mysql/help_category.ibd.xbcrypt
181116 10:11:28 [01]        ...done
181116 10:11:28 [01] Encrypting ./mysql/slave_master_info.ibd to /backup/2018-11-16_10-11-25/mysql/slave_master_info.ibd.xbcrypt
181116 10:11:28 [01]        ...done

As a result we will end up with a backup directory full of encrypted files:

root@vagrant:~# ls -alh /backup/2018-11-16_10-11-25/
total 101M
drwxr-x---  5 root root 4.0K Nov 16 10:11 .
drwxr-xr-x 16 root root 4.0K Nov 16 10:11 ..
-rw-r-----  1 root root  580 Nov 16 10:11 backup-my.cnf.xbcrypt
-rw-r-----  1 root root  515 Nov 16 10:11 ib_buffer_pool.xbcrypt
-rw-r-----  1 root root 101M Nov 16 10:11 ibdata1.xbcrypt
drwxr-x---  2 root root 4.0K Nov 16 10:11 mysql
drwxr-x---  2 root root  12K Nov 16 10:11 performance_schema
drwxr-x---  2 root root  12K Nov 16 10:11 sys
-rw-r-----  1 root root  113 Nov 16 10:11 xtrabackup_checkpoints
-rw-r-----  1 root root  525 Nov 16 10:11 xtrabackup_info.xbcrypt
-rw-r-----  1 root root 2.7K Nov 16 10:11 xtrabackup_logfile.xbcrypt

Xtrabackup has some other variables which can be used to tune encryption performance:

  • --encrypt-threads allows for parallelization of the encryption process
  • --encrypt-chunk-size defines a working buffer for encryption process.

Should you need to decrypt the files, you can use innobackupex with --decrypt option for that:

root@vagrant:~# innobackupex --decrypt=AES256 --encrypt-key-file=/root/encrypt.key /backup/2018-11-16_10-11-25/

As xtrabackup does not clean encrypted files, you may want to remove them using following one-liner:

for i in `find /backup/2018-11-16_10-11-25/ -iname "*\.xbcrypt"`; do rm $i ; done

Backup encryption in ClusterControl

With ClusterControl encrypted backups are just one click away. All backup methods (mysqldump, xtrabackup or mariabackup) support encryption. You can both create a backup ad hoc or you can prepare a schedule for your backups.

In our example we picked a full xtrabackup, we will store it on the controller instance.

On the next page we enabled the encryption. As stated, ClusterControl will automatically create an encryption key for us. This is it, when you click at the “Create Backup” button a process will be started.

New backup is visible on the backup list. It is marked as encrypted (the lock icon).

We hope that this blog post gives you some insights into how to make sure your backups are properly encrypted.

by krzysztof at November 21, 2018 10:58 AM

November 20, 2018

Peter Zaitsev

How CVE-2018-19039 Affects Percona Monitoring and Management

CVE-2018-19039

CVE-2018-19039Grafana Labs has released an important security update, and as you’re aware PMM uses Grafana internally. You’re probably curious whether this issue affects you.  CVE-2018-19039 “File Exfiltration vulnerability Security fix” covers a recently discovered security flaw that allows any Grafana user with Editor or Admin permissions to have read access to the filesystem, performed with the same privileges as the Grafana process has.

We have good news: if you’re running PMM 1.10.0 or later (released April 2018), you’re not affected by this security issue.

The reason you’re not affected is an interesting one. CVE-2018-19039 relates to Grafana component PhantomJS, which Percona omitted when we changed how we build the version of Grafana embedded in Percona Monitoring and Management. We became aware of this via bug PMM-2837 when we discovered images do not render.

We fixed this image rendering issue in and applied the required security update in 1.17. This ensures PMM is not vulnerable to CVE-2018-19039.

Users of PMM who are running release 1.1.0 (February 2017) through 1.9.1 (April 2018) are advised to upgrade ASAP.  If you cannot immediately upgrade, we advise that you take two steps:

  1. Convert all Grafana users to Viewer role
  2. Remove all Dashboards that contain text panels

How to Get PMM Server

PMM is available for installation using three methods:

by Dmitriy Kostiuk at November 20, 2018 07:48 PM

Percona Monitoring and Management (PMM) 1.17.0 Is Now Available

Percona Monitoring and Management 1.17.0

Percona Monitoring and Management 1.17.0

Percona Monitoring and Management 1.17.0 (PMM) is a free and open-source platform for managing and monitoring MySQL, MongoDB, and PostgreSQL performance. You can run Percona Monitoring and Management 1.17.0 in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL®, MongoDB®, and PostgreSQL® servers to ensure that your data works as efficiently as possible.

Although we patched a bug this release related to a Grafana CVE announcement, previous releases since 1.10 (April 2018) were not vulnerable, and we encourage you to see our Grafana CVE blog post for further details.

In this release, we made six improvements and fixed 11 bugs.

Dashboard Improvements

We updated five Dashboards with improved Tooltips – If you haven’t seen this before, hover your mouse over the  icon in the top left of most graph elements, and you’ll see a new box appear.  This box provides a brief description of what the graph displays, along with links to related documentation resources so you can learn further.  We hope you find the content useful!

Percona Monitoring and Management 1.17.0

The Dashboards we’re updating are:

  1. MySQL Amazon Aurora Metrics
  2. MySQL MyISAM/Aria Metrics
  3. MySQL Replication
  4. Prometheus Exporters Overview
  5. Trends

We hope you enjoy this release, and we welcome your feedback via the Percona PMM Forums!

New Features and Improvements

Fixed Bugs

  • PMM-3257: Grafana Security patch for CVE-2018-19039
  • PMM-3252: Update button in 1.16 is not visible when a newer version exists
  • PMM-3209: Special symbols in username or password prevent the addition of Remote Instances
  • PMM-2837: Image Rendering Does not work due to absent Phantom.JS binary
  • PMM-2428: Remove Host=All on dashboards where this variable does not apply
  • PMM-2294: No changes in zoomed out Cluster Size Graph if the node was absent for a short time
  • PMM-2289: SST Time Graph based on the wrong formula
  • PMM-2192: Memory leak in ProxySQL_Exporter when ProxySQL is down
  • PMM-2158: MongoDB “Query Efficiency – Document” arithmetic appears to be incorrectly calculated
  • PMM-1837: System Info shows duplicate hosts
  • PMM-1805: Available Downtime before SST Required doesn’t seem to be accurate –  thanks to and Yves Trudeau for help

How to get PMM Server

PMM is available for installation using three methods:

Help us improve our software quality by reporting any Percona Monitoring and Management bugs you encounter using our bug tracking system.

by Dmitriy Kostiuk at November 20, 2018 07:47 PM

Serge Frezefond

Using Terraform to provision a managed MariaDB server in AWS

How to rapidly provision a MariaDB in the cloud ? Various option are available. A very effective approach is to provision MariaDB with Terraform. Terraform is a powerful tool to deploy infrastructure as code. Terraform is developed by Hashicorp that started their business with the very successful Vagrant deployment tool. Terraform allows you to describe ...continue reading "Using Terraform to provision a managed MariaDB server in AWS"

by Serge at November 20, 2018 06:00 PM

MariaDB Foundation

MariaDB 10.3.11, and MariaDB Connector/C 3.0.7, Connector/ODBC 3.0.7 and Connector/Node.js 2.0.1 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.3.11, the latest stable release in the MariaDB 10.3 series, as well as MariaDB Connector/C 3.0.7 and MariaDB Connector/ODBC 3.0.7, both stable releases, and MariaDB Connector/Node.js 2.0.1, the first beta release of the new 100% JavaScript non-blocking MariaDB client for Node.js, compatible with Node.js […]

The post MariaDB 10.3.11, and MariaDB Connector/C 3.0.7, Connector/ODBC 3.0.7 and Connector/Node.js 2.0.1 now available appeared first on MariaDB.org.

by Ian Gilfillan at November 20, 2018 02:43 PM

Peter Zaitsev

Installing and Configuring JIT in PostgreSQL 11

JIT with PostgreSQL

JIT in PostgreSQLJust-in-time (JIT in PostgreSQL) compilation of SQL statements is one of the highlighted features in PostgreSQL 11. There is great excitement in the community because of the many claims of up to a 30% jump in performance. Not all queries and workloads get the benefit of JIT compilation. So you may want to test your workload against this new feature.

However, It is important to have a general understanding of what it does and where we can expect the performance gains. Installing PostgreSQL 11 with the new JIT compilation feature requires few extra steps and packages. Taking the time and effort to figure out how to do this shouldn’t be a reason to shy away from trying these cutting-edge features and testing a workload against the JIT feature. This blog post is for those who want to try it.

What is JIT and What it does in PostgreSQL

Normal SQL execution in any DBMS software is similar to what an interpreted language does to the source code. No machine code gets generated out of your SQL statement. But we all know that how dramatic the performance gains can be from a JIT compilation and execution of the machine code it generates. We saw the magic Google V8 engine did to JavaScript language. The quest for doing a similar thing with SQL statement was there for quite some time. But it is a challenging task.

It is challenging because we don’t have the source code (SQL statement) ready within the PostgreSQL server. The source code that needs to undergo JIT need to come from client connections and there could be expressions/functions with a different number of arguments, and it may be dealing with tables of different number and type of columns.

Generally, a computer program won’t get modified at this level while it is running, so branching-predictions are possible. The unpredictability and dynamic nature of SQL statements coming from client connections and hitting the database from time-to-time give no scope for doing advance prediction or compilation in advance. That means the JIT compiler should kick in every time the database gets an SQL statement. For this reason, PostgreSQL needs the help of compiler infrastructure like LLVM  continuously available behind. Even though there were a couple of other options, the main developer of this feature (Andres Freund) had a strong reason why LLVM was the right choice.

. In PostgreSQL 11, the JIT feature currently does:

  1. Accelerating expression evaluation: Expressions in  WHERE clauses, target lists, aggregates and projections
  2. Tuple deforming: Converting on-disk image to corresponding memory representation.
  3. In-lining: bodies of small custom functions, operators and user-defined data types are inline-ed into the expressions using them
  4. You can use compiler optimizations provided by LLVM for preparing optimized machine code.

In this blog, we are going to see how to install PostgreSQL with JIT. Just like regular PostgreSQL installations, we have two options:

  1. Get PostgreSQL from the packages in the PGDG repository
  2. Build PostgreSQL from source

Option 1. Install from PGDG repository.

Compiling from source requires us to install all compilers and tools. We might want to avoid this for various reasons. Installing packages from a PGDG repository is straightforward. On production systems or a container, you might want to install only the bare minimum required packages. Additional packages you don’t really use are always a security concern. Distributions like Ubuntu provide more recent versions of libraries and tool-sets in their default repos. However, distributions like CentOS / RHEL are quite conservative — their priority is stability and proven servers rather than cutting-edge features. So In this section of the post is mostly relevant for CentOS7/RHEL 7.

Here are the steps for the bare minimum installation of PostgreSQL with JIT feature on CentOS7

Step 1. Install PGDG repo and Install PostgreSQL server package.

This is usually the bare minimum installation if we don’t need the JIT feature.

sudo yum install https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-7-x86_64/pgdg-centos11-11-2.noarch.rpm
sudo yum install postgresql11-server

At this stage, we can initialize the data directory and start the service if we don’t need JIT:

sudo /usr/pgsql-11/bin/postgresql*-setup initdb
sudo systemctl start postgresql-11

Step 2. Install EPEL repository

sudo yum install epel-release

Step 3. Install package for PostgreSQL with llvmjit

sudo yum install postgresql11-llvmjit

Since we have already added the EPEL repository, now the dependancy can be resolved by YUM and it can pull and install the necessary package from EPEL. Installation message contains the necessary packages.

...
Installing:
postgresql11-llvmjit      x86_64     11.1-1PGDG.rhel7     pgdg11    9.0 M
Installing for dependencies:
llvm5.0                   x86_64     5.0.1-7.el7          epel      2.6 M
llvm5.0-libs              x86_64     5.0.1-7.el7          epel      13 M
...

As we can see, there are two packages: llvm5.0 and llvm5.0-libs get installed.

Note for Ubuntu users:

As we already mentioned, Repositories of recent versions of Ubuntu contains recent versions of LLVM libraries. For example, Ubuntu 16.04 LTS repo contains libllvm6.0 by default. Moreover, PostgreSQL server package is not divided to have a separate package for jit related files. So default installation of PostgreSQL 11 can get you JIT feature also.

wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
sudo apt install postgresql-11

Option 2. Building from Source

The primary means of distributing PostgreSQL is the source code. Building a minimal PostgreSQL instance requires just a C compiler. But building JIT options requires a few more things. One of the challenges you can run into is different errors during the build process due to older versions of LLVM and Clang present in the system.

Step 1. Download PostgreSQL source tarball and unpack

Tarballs are available in the repository. We can grab and unpack the latest:

curl -LO https://ftp.postgresql.org/pub/source/v11.0/postgresql-11.0.tar.bz2
tar -xvf postgresql-11.0.tar.bz2

Step 2.  Get SCL Repository and Install toolset

Latest versions of LLVM, CLang and GCC are available in SCL. We can get everything in a stretch:

sudo yum install centos-release-scl
sudo yum install llvm-toolset-7 llvm-toolset-7-llvm-devel.x86_64

Now either you can set or edit your PATH to have all new tools in PATH. I would prefer to put that into my profile file:

PATH=/opt/rh/devtoolset-7/root/usr/bin/:/opt/rh/llvm-toolset-7/root/usr/bin/:$PATH

Alternatively, we can open a shell with SCL enabled:

scl enable devtoolset-7 llvm-toolset-7 bash

We should attempt to compile the source from a shell with all these paths set.

Step 3. Install Additional libraries/tools

Based on the configuration options you want, this list may change. Consider this as a sample for demonstration purposes:

sudo yum install  readline-devel zlib-devel libxml2-devel openssl-devel

Step 4. Configure with –with-llvm option and make

Now we should be able to configure and make with our preferred options. The JIT feature will be available if the 

--with-llvm
 option is specified. For this demonstration, I am using an installation directory with my home (/home/postgres/pg11):

./configure --prefix=/home/postgres/pg11 --with-openssl --with-libxml --with-zlib --with-llvm
make
make install

Enabling JIT

You may observe that there is a new directory under the PostgreSQL’s lib folder with name

bit code
Which contains lots of files with .bc extension these are pre-generated bytecodes for LLVM for facilitating features like in-lining.

By default, the JIT feature is disabled in PostgreSQL 11. If you want to test it, you may have to enable the parameter

jit
:

postgres=# ALTER SYSTEM SET jit=on;
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)
postgres=# show jit;
 jit
-----
 on
(1 row)

By default, most of the simple queries won’t use JIT because of the cost. The cost is high when JIT kicks in. In case we want to test if JIT is properly configured, we can lie to PostgreSQL that that cost is very low by adjusting the parameter value. However, we should keep in mind that we are accepting negative performance gains. Let me show a quick example:

postgres=# SET jit_above_cost=5;
SET
postgres=# create table t1 (id int);
CREATE TABLE
postgres=# insert into t1 (SELECT (random()*100)::int FROM generate_series(1,800000) as g);
INSERT 0 800000
postgres=# analyze t1;
ANALYZE
postgres=# explain select sum(id) from t1;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=8706.88..8706.89 rows=1 width=8)
   ->  Gather  (cost=8706.67..8706.88 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=7706.67..7706.68 rows=1 width=8)
               ->  Parallel Seq Scan on t1  (cost=0.00..6873.33 rows=333333 width=4)
 JIT:
   Functions: 6
   Options: Inlining false, Optimization false, Expressions true, Deforming true
(8 rows)

As we can see in the above example, a separate JIT section comes up in the explain plan.

We expect JIT compilation to make a difference in complex analytical queries because the overhead in JIT compilation gets compensated only if the code runs for the duration. Here is a simple aggregate query for demonstration. (I know this is not a complex query, and not the perfect example for demonstrating JIT feature):

postgres=# EXPLAIN ANALYZE SELECT COMPANY_ID,
      SUM(SHARES) TOT_SHARES,
      SUM(SHARES* RATE) TOT_INVEST,
      MIN(SHARES* RATE) MIN_TRADE,
      MAX(SHARES* RATE) MAX_TRADE,
      SUM(SHARES* RATE * 0.002) BROKERAGE
FROM TRADING
GROUP BY COMPANY_ID;
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=757298.72..758741.91 rows=5005 width=138) (actual time=16992.290..17011.395 rows=5000 loops=1)
   Group Key: company_id
   ->  Gather Merge  (cost=757298.72..758466.64 rows=10010 width=138) (actual time=16992.270..16996.919 rows=15000 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=756298.70..756311.21 rows=5005 width=138) (actual time=16983.900..16984.356 rows=5000 loops=3)
               Sort Key: company_id
               Sort Method: quicksort  Memory: 1521kB
               Worker 0:  Sort Method: quicksort  Memory: 1521kB
               Worker 1:  Sort Method: quicksort  Memory: 1521kB
               ->  Partial HashAggregate  (cost=755916.09..755991.16 rows=5005 width=138) (actual time=16975.997..16981.354 rows=5000 loops=3)
                     Group Key: company_id
                     ->  Parallel Seq Scan on trading  (cost=0.00..287163.65 rows=12500065 width=12) (actual time=0.032..1075.833 rows=10000000 loops=3)
 Planning Time: 0.073 ms
 Execution Time: 17013.116 ms
(15 rows)

We can switch on the JIT parameter at the session level and retry the same query:

postgres=# SET JIT=ON;
SET
postgres=# EXPLAIN ANALYZE SELECT COMPANY_ID,
      SUM(SHARES) TOT_SHARES,
      SUM(SHARES* RATE) TOT_INVEST,
      MIN(SHARES* RATE) MIN_TRADE,
      MAX(SHARES* RATE) MAX_TRADE,
      SUM(SHARES* RATE * 0.002) BROKERAGE
FROM TRADING
GROUP BY COMPANY_ID;
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=757298.72..758741.91 rows=5005 width=138) (actual time=15672.809..15690.901 rows=5000 loops=1)
   Group Key: company_id
   ->  Gather Merge  (cost=757298.72..758466.64 rows=10010 width=138) (actual time=15672.781..15678.736 rows=15000 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=756298.70..756311.21 rows=5005 width=138) (actual time=15661.144..15661.638 rows=5000 loops=3)
               Sort Key: company_id
               Sort Method: quicksort  Memory: 1521kB
               Worker 0:  Sort Method: quicksort  Memory: 1521kB
               Worker 1:  Sort Method: quicksort  Memory: 1521kB
               ->  Partial HashAggregate  (cost=755916.09..755991.16 rows=5005 width=138) (actual time=15653.390..15658.581 rows=5000 loops=3)
                     Group Key: company_id
                     ->  Parallel Seq Scan on trading  (cost=0.00..287163.65 rows=12500065 width=12) (actual time=0.039..1084.820 rows=10000000 loops=3)
 Planning Time: 0.072 ms
 JIT:
   Functions: 28
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 5.844 ms, Inlining 137.226 ms, Optimization 201.152 ms, Emission 125.022 ms, Total 469.244 ms
 Execution Time: 15696.092 ms
(19 rows)

Here we see a 7.7% improvement in performance. I executed this several times and found that the performance gain is consistently 7-8% for this simple query (which takes 15 seconds to execute). The gains are higher for queries with more calculations/expressions.

Summary

It is fairly simple to install and configure JIT with PostgreSQL as demonstrated above. One point we would like to highlight is that installing JIT packages and enabling the JIT feature can be done online while the database is up and running. This is because all JIT related parameters are dynamic in nature. Parameter changes can be loaded a SIGHUP signal or

SELECT pg_reload_conf()
 by the superuser. If it is not helping our workload, we can turn it off anytime. Nothing stops you from trying it in a non-production environment. We might not see a gain in small and simple queries that take less time for execution because the overhead in doing the JIT compilation can become more than executing the SQL statement. But we should expect a good gain in OLAP workload with complex queries that run for a longer duration.

by Jobin Augustine at November 20, 2018 01:49 AM

November 19, 2018

Peter Zaitsev

Migrating to Amazon Aurora: Design for Flexibility

Migrating to Amazon Aurora

Migrating to Amazon AuroraIn this Checklist for Success series, we will discuss reducing unknowns when hosting in the cloud using and migrating to Amazon Aurora. These tips might also apply to other database as a service (DBaaS) offerings.

Previous blogs in the migrating to Amazon Aurora series:

The whole premise of a database as a service offering is that you do not need to worry about the operating the service, you just need to use it. But all DBaaS offerings have limitations as well as strengths. You should not get too comfortable with all the niceties of such services. You need to remain flexible and ultimately design to prevent database failure.

Have a Grasp of Your Cluster Behavior

Disable lab mode. You should not depend on this setting to take advantage of new features. It can break and leave you in a bad position. If you rely on this feature, and designed your application around it, you might find yourself working around the same problem if, for example, you are running the same queries on a non-Aurora deployment. This is not to say that you shouldn’t take advantage of all Aurora features. Lab mode, however, is “lab mode” and should not be enabled on a production environment.

Separate parameter group per cluster to keep your configuration changes isolated. In some cases, you might have a group of clusters that operate the same workload, But this should be rare, and also prohibits you from making rolling changes against each cluster.

Some might point out that syncing the parameter groups in this situation might be difficult. It really isn’t, and you don’t need any complicated tools to do it. For example, you can use

pt-config-diff
 to regularly inspect the differences between the runtime config on each cluster and identify or resolve differences.

While it is ideal that your clusters are always up to date, it can be intrusive to let them run on their own. Especially if your workload is not dependent on high/low traffic periods. I recommend having more control over the upgrade process, and this excellent community blog post from Renato on how to do just that is worth a read.

Don’t Put All Your Eggs in One Basket

On another note, Aurora can hold up to 64TB of data. Yes, that’s big. It might not be a problem and some of you might even be excited about this potential. But when you think about it, do you really want to store that amount of data in a single basket? What if you need to analyze this data for a particular time period. Is the cost worth it? Surely at some point, you will need to transport that data somewhere.

We’ve seen problems even at sizes less than 2TB. If you need to rebuild an asynchronous replica, for example, it takes a while. You have to be really ahead of capacity planning to ensure that you add new read-replicas when needed. This can be a challenge when you are on the spot. A burst of traffic might already be over before the replica provisioning is complete no matter how fast Aurora replica provisioning is.

Another challenge with datasets that are too big is when you have large tables. Schema changes become increasingly difficult in these situations, especially when such tables are subject to highly concurrent reads and writes. Recall that in the blog Migrating to Amazon Aurora: Optimize for Binary Log Replication we recommend setting

binlog_format
 to
ROW
 to be able to use tools like gh-ost in these types of situations.

High Availability On Your Terms

One limitation with Aurora cluster instances is that there is no easy way of taking a misbehaving read-replica out of rotation. Sure, you can delete the read replica. That leads to transient errors to the application, however, and impacts performance due to the time lag required to replace it to cover the workload.

Similarly, a misbehaving query can easily spoil the whole cluster, even if that query is spread out evenly to the read-replicas. Depending on how quickly you can disable the query, it might result in losing some business in the process. It would be nice if you could blackhole, rewrite or redirect such queries on demand so as to isolate the impact (or even fix it immediately).

Lastly, certain situations require that you restart the cluster. However, doing so could violate your uptime SLA. These situations can occur when you need to apply a non-dynamic cluster parameter, or you need to perform a cluster upgrade.

You can avoid most of these problems by not solely relying on Aurora’s own implementation of high availability. I say this because they are continuously improving this process. For now, however, you can use tools like ProxySQL to redirect traffic both in-cluster and between clusters replicating asynchronously. Percona has existing blog posts on this topic: Leveraging ProxySQL with AWS Aurora to Improve Performance, Or How ProxySQL Out-performs Native Aurora Cluster Endpoints and How to Implement ProxySQL with AWS Aurora.

Meanwhile, we’d like to hear your success stories in migrating to Amazon Aurora in the comments below!

Don’t forget to come by and see us at AWS re:Invent, November 26-30, 2018 in booth 1605! Percona CEO Peter Zaitsev will deliver a keynote on MySQL High Availability & Disaster Recovery, Tuesday, November 27 at 1:45 PM – 2:45 PM in the Bellagio Hotel, Level 1, Gauguin 2

by Jervin Real at November 19, 2018 04:55 PM

November 18, 2018

Valeriy Kravchuk

Fun with Bugs #72 - On MySQL Bug Reports I am Subscribed to, Part IX

I've subscribed to more than 60 new bug reports since my previous post in this series. It means that I'd need 4-5 posts to cover all new subscriptions and reasons behind them. I still plan to write about most of the bug reports I was interested in recently, but for this post I decided to pick up only MySQL 8.0 regression bugs and pay special attention to those that could be handled better or faster by Oracle engineers, as well as those handled perfectly.

The initial reason for this special attention originally was Bug #93085 - "Stall when concurrently execute create/alter user with flush privilege", that caused a lot of interesting Twitter discussions. It took some time, comments (in the bug report and in social media) and pressure from MySQL Community (including yours truly) to get it accepted as a real (regression!) bug to work on, and got "Verified". Unfortunately too often recently I see more time spent on arguing that something is not a bug or can not be reproduced, or is an example of improper usage of some MySQL feature etc instead of simply checking how things worked before MySQL 8.0 and how this changed, to worse.

Another example of "interesting" approach to bugs in MySQL 8.0 is Bug #93102 - "I_S queries don't return column names in the same case as requested.". It's indeed a duplicate of old and well known Bug #84456 - "column names in metadata appearing as uppercase when selecting from I_S" reported at early 8.0 development stage by Shane Bester from Oracle and community user (see Bug #85947). Still, it was decided NOT to fix it and tell users to rely on workaround, while this breaks application compatibility and is a regression.

Take a look at Bug #92998 - "After finishing the backup with mysqldump, mysql crashes and restarts" also. It ended up in "Unsupported" status, with some statements that "Dumping and restoring data between different 8.0 releases is not supported". This can be classified as a regression by itself. What I miss is a link to the manual saying it's not supported (was not able to find it in 5 minutes) any explanation of crash and restart - supported or not, running mysqldump should NOT cause server restarts in a general case. I think this bug report could end up in many statuses, but of them all "Unsupported" is hardly correct.

This my photo is far from ideal and can be criticized from different points of view, but there is no point to argue with the fact that it shows clouds in the sky. I wish the fact that MySQL 8.0 GA releases still have regression bugs is accepted with less arguing and more attention.

Now let me continue with a list of recently reported regression bugs in MySQL 8.0 that were handled mostly properly:

  • Bug #93215 - "Case statement use in conditional with sub_part index regression in 8.0". MySQL of versions < 8.0 (and MariaDB 10.3.7) work as expected also. The bug was verified fast, but it still misses explicit "regression" tag.
  • Bug #93214 - "LIMIT is applied before HAVING when you have a subquery". The bug was "Verified" quickly, but I still miss the exact 8.0.x version(s) affected and the results of checking with older versions. I strongly suspect it's a regression, as MariaDB 10.3.7 provides expected result:
MariaDB [test]> CREATE TABLE test (id INT PRIMARY KEY, value INT);
Query OK, 0 rows affected (0.510 sec)
MariaDB [test]> INSERT INTO test VALUES (1, 99), (2,98), (3, 97);
Query OK, 3 rows affected (0.057 sec)
Records: 3  Duplicates: 0  Warnings: 0
MariaDB [test]> SELECT t1.id, (SELECT t2.value FROM test t2 WHERE t1.id = t2.id) AS sub_value FROM test t1 HAVING sub_value = 99 ORDER BY value LIMIT 1;
+----+-----------+
| id | sub_value |
+----+-----------+
|  1 |        99 |
+----+-----------+
1 row in set (0.116 sec)
            • Bug #93170 - "undo truncation in 8.0.13 is not crash safe". The bug was quickly verified (after all, it's a failure of existing innodb_undo.truncate_recover MTR test case), but had not got "regression" tag. I am still not sure how it was missed during regular testing and ended up in the MySQL 8.0.13 release.
            • Bug #93147 - "Upgrade to 8.0.13 from 8.0.11 fails". In pre-8.0 releases there was no strict need to update to every intermediate minor version, so it's also a regression of a kind for any production DBA.
            • Bug #92979 - "MySQL 8.0 performance degradation on INSERT with foreign_key_checks=0". This is a verified performance regression comparing to MySQL 5.7, but "regression" tag is still missing. 

            To summarize, there are some regressions noted by community users recently in MySQL 8.0 GA releases. Some of them were demonstrated with simple test cases, so it's strange they were not noted by Oracle's QA. What's worse, it seems some of Oracle engineers are not ready to accept the fact that the best ever MySQL 8.0 GA release they worked on may get some things done incorrectly and worse than before, so they seem to waste time on useless discussions that everything is OK, work as expected and nothing can be done differently.  I also see some processed and verified bug reports without detailed check for regressions presented to users or even with "regression" tag NOT added when needed.

            I hope this is not going to become a new trend. I wish all community bug reports and features of MySQL get as much attention and detailed study from Oracle employees as (far from perfect) JSON support in MariaDB...

            by Valeriy Kravchuk (noreply@blogger.com) at November 18, 2018 05:51 PM

            November 17, 2018

            MariaDB Foundation

            2019 Developers Unconference, New York

            February in New York City is again MariaDB time, and the first MariaDB Developers Unconference of 2019 will take place on Saturday 23 and Sunday 24 February, with Hudson River Trading as kind hosts. The event is free to attend and you can join for the entire weekend, or as little time as you wish. […]

            The post 2019 Developers Unconference, New York appeared first on MariaDB.org.

            by Ian Gilfillan at November 17, 2018 12:35 PM

            Daniël van Eeden

            Peter Zaitsev

            Percona on the AWS Database Blog

            AWS Database Blog

            AWS Database BlogPercona’s very own Michael Benshoof posted an article called Supercharge your Amazon RDS for MySQL deployment with ProxySQL and Percona Monitoring and Management on the AWS Database blog.

            In this article, Michael looks at how to maximize the performance of your cloud database Amazon RDS deployment. He shows that you can enhance performance by using ProxySQL to handle load balancing and Percona Monitoring and Management to look at performance metrics.

            Percona is an Advanced AWS Partner and provides support and managed cloud database services for AWS RDS and Amazon Aurora.

            Check the article it out!

            by Dave Avery at November 17, 2018 12:06 AM

            November 16, 2018

            Peter Zaitsev

            Newly-Released PostgreSQL Minor Versions: Time to Update!

            PostgreSQL Minor Versions

            In this blog post we’ll look at what the newly-released PostgreSQL minor versions contain. You probably want to update your current versions and use these updates.

            You might already have seen that they released the updates for supported PostgreSQL versions on November 8, 2018. PostgreSQL releases minor versions with several bug fixes and feature enhancements each quarter. An important point to note is that PostgreSQL 9.3 got its final minor version release (9.3.24) this quarter, and is no longer supported.

            We always recommended that you keep your PostgreSQL databases updated to the latest minor versions. Applying a minor release might need a restart after updating the new binaries. The following is the sequence of steps you should follow to upgrade to the latest minor versions:

            1. Shutdown the PostgreSQL database server
            2. Install the updated binaries
            3. Restart your PostgreSQL database server

            Most times, you can choose to update the minor versions in a rolling fashion, in a master-slave (replication) setup. Just perform the update on one server after another, but not all-at-once. Rolling updates avoid downtime for both reads and writes simultaneously. However, we recommended that you shutdown, update and restart them all-at-once while you are performing the updates.

            One of the most important fixes is a security fix: CVE-2018-16850. The bug allowed an attacker with CREATE privileges on some non-temporary schema or TRIGGER privileges on some table to create a malicious trigger that, when dumped and restored using pg_dump/pg_restore, would result in additional SQL statements being executed. This applies to PostgreSQL 10 and 11 versions.

            Before proceeding further, let’s look at the list of minor versions released this quarter.

            • PostgreSQL 11.1
            • PostgreSQL 10.6
            • PostgreSQL 9.6.11
            • PostgreSQL 9.5.15
            • PostgreSQL 9.4.20
            • PostgreSQL 9.3.25

            Now, let us look into the benefits you should see by updating your Postgres versions with the latest minor versions.

            PostgreSQL 11.1

            PostgreSQL 11.0 was released on October 18, 2018. You might want to look at our blog post on our first take on PostgreSQL 11. With the new minor release PostgreSQL 11.1, we get some interesting functionalities and fixes after 21 days of its previous release, as seen here. The following is a small list of fixes that you might find interesting:

            • Ability to create child indexes of partition tables in another tablespace.
            • NULL handling in parallel hashed multi-batch left joins.
            • Fix to the strictness logic that incorrectly ignored rows of ORDER BY values those were null.
            • Disable recheck_on_update that is forced off for all indexes, as it is not ready yet and requires more time.
            • Prevent creation of a partition in a trigger attached to its parent table
            • Disallow the pg_read_all_stats role from executing pg_stat_statements_reset() to rest the stats as it is just needed for monitoring. You must run ALTER EXTENSION pg_stat_statements UPDATE … to get this into effect.

            PostgreSQL 10.6

            There are some common fixes that were applied to PostgreSQL 11.1 and PostgreSQL 10.6. You can find PostgreSQL 10.6 release details here. Some of the fixes applied to PostgreSQL 10.6 are in common with other supported PostgreSQL versions, as highlighted below:

            • Disallow the pg_read_all_stats role from executing pg_stat_statements_reset() to rest the stats as it is just needed for monitoring. (PostgreSQL 11.1 and 10.6)
            • Avoid pushing sub-SELECTs containing window functions, LIMIT, or OFFSET to parallel workers to avoid the behavior of different workers getting different answers due to row-ordering variations. (PostgreSQL 9.6 and 10.6)
            • Fixed the WAL file recycling logic that might make a standby fail to remove the WAL files that need to be removed. (PostgreSQL 9.5.15, 9.6.11 and 10.6)
            • Handling of commit-timestamp tracking during recovery has been fixed to avoid recovery failures while trying to fetch the commit timestamp for a transaction that did not record it. (PostgreSQL 9.5.15, 9.6.11 and 10.6)
            • Applied the fix that ensures that the background workers are stopped properly when the postmaster receives a fast-shutdown request before completing the database startup. (PostgreSQL 9.5.15, 9.6.11 and 10.6)
            • Fixed unnecessary failures or slow connections when multiple target host names are used in libpq so that the DNS lookup happens one at a time but not all at once. (PostgreSQL 10.6)

            Following is a list of some common fixes applied to PostgreSQL 9.6.11, PostgreSQL 9.5.15, PostgreSQL 9.4.20 and PostgreSQL 9.3.25:

            • Avoid O(N^2) slowdown in regular expression match/split functions on long strings.
            • Fix mis-execution of SubPlans when the outer query is being scanned backward.
            • Fix failure of UPDATE/DELETE … WHERE CURRENT OF … after rewinding the referenced cursor.
            • Fix EvalPlanQual to avoid crashes or wrong answers in concurrent updates, when the code contains an uncorrelated sub-SELECT inside a CASE construct.
            • Memory leak in repeated SP-GiST index scans has been fixed.
            • Ensure that hot standby processes use the correct WAL consistency point to prevent possible misbehavior after reaching a consistent database state during WAL replay.
            • Fix possible inconsistency in pg_dump’s sorting of dissimilar object names.
            • Ensure that pg_restore will schema-qualify the table name when emitting DISABLE/ENABLE TRIGGER commands when a restore is ran using a restrictive search_path.
            • Fix pg_upgrade to handle event triggers in extensions correctly.
            • Fix pg_upgrade’s cluster state check to work correctly on a standby server.
            • Fix build problems on macOS 10.14

            Now that you understand the added fixes to existing PostgreSQL versions, we recommend that you test and update your PostgreSQL databases with the new minor versions (if you haven’t already).

            If you are currently running your databases on PostgreSQL 9.3.x or earlier, we recommend that you to prepare a plan to upgrade your PostgreSQL databases to the supported versions ASAP. Please subscribe to our blog posts so that you know about the various options on upgrading your PostgreSQL databases to a supported major version.

            by Avinash Vallarapu at November 16, 2018 05:25 PM

            Migrating to Amazon Aurora: Optimize for Binary Log Replication

            Migrating to Amazon Aurora 1

            Migrating to Amazon Aurora 1In this Checklist for Success series, we will discuss reducing unknowns when hosting in the cloud using and migrating to Amazon Aurora. These tips might also apply to other database as a service (DBaaS) offerings.

            In our previous article, we discussed the importance of continuous query performance analysis, especially in Amazon Aurora where there is less diagnostic visibility compared to running on EC2 or on-premise. Aside from uptime though, we need a lot more from our data, and we definitely cannot isolate it in Aurora.

            Next on our checklist is that at one point or another, we will need to use asynchronous replication. Amazon Aurora has an excellent reputation for absorbing intense amounts of writes, and for many cases where you need an asynchronous replica, any replica can have potential issues catching up.

            Different Clusters for Different Workloads

            Critical workloads and datasets cannot rely on a single copy of their data. With Amazon Aurora, predictable performance means avoiding mixing workloads within your production cluster. While read heavy workloads might fit easily into read-replicas, reporting or analytics workloads might not be a good fit to execute on your main cluster where read-what-you-write profiles are normally found. You can either delegate this on a separate asynchronous replica as separate Amazon Aurora cluster, or another that runs on an EC2 instance (as an example). This is also true if, say, your analytics or reporting workload generate a significant amount of disk IOPs.

            Amazon Aurora IO bills operations per million. You might save some money running disk-heavy analytics operations on a replica running on an i3 instance with a local NVMe for example. Similarly, running an async replica on an EC2 instance or on-premise allows you to take your own independent backups, or just an extra level of redundancy.

            Multi-Threaded Replication

            It is a known fact that MySQL asynchronous replication performance is subject to some limitations. The biggest one is that, by default, it is single-threaded. MySQL 5.6 introduced multi-threaded replication at one database per thread. This did not apply to the majority of use cases, as workloads vary per database and therefore create an imbalance. With MySQL 5.7 (Aurora 2.0), there have been additional improvements such as an alternative algorithm in parallelizing thread execution that depends on certain behaviors regarding how acting primary servers write binary log entries.

            With that said, certain multi-threaded replication variables (transaction_write_set_extraction) require that the binlog format is set to ROW. This might sound counter-intuitive because the ROW binlog format actually can increase the replication workload. While ROW format reduces the ambiguity from potentially non-deterministic statements that could cause a replica to drift and become inconsistent, critical operations (schema changes) and optimizations (MTS) requires that you use ROW binlog format.

            It should be apparent by now that the only reasonable path forward to improving asynchronous replication is via the multi-threaded approach. Along with that, there is the need for ROW binlog format. Any design effort should always include this fact if async replication lag is considered a risk. For the basics, configuration options like

            slave_compressed_protocol
              and
            binlog_row_image
             can reduce network churn. In the deep end, reducing dataset hotspots, ensuring tables have PRIMARY KEYs and embracing multi-threaded optimization can also go a long way.

            Additional Benefits

            While running certain read-heavy queries on an async replica, or ensuring you have access to physical datafiles are common use cases, being able to switch to another location (region) or just simply another cluster might also be necessary for some instances.

            • Adding or dropping a column/index on a large table can be achieved with either pt-online-schema-change or gh-ost. But for cases where time is a constraint, applying schema changes on an asynchronous cluster and switching to that sometimes pays for itself.
            • Configuration changes or upgrades that require a cluster restart can take seconds or even minutes. Wouldn’t it be nice if you already had a cluster with all these changes ready to take over at a moment’s notice? Or even fail back to the original if there was an unforeseen issue?

            Stay “tuned” for part three.

            Meanwhile, we’d like to hear your success stories in Amazon Aurora in the comments below!

            Don’t forget to come by and see us at AWS re:Invent, November 26-30, 2018 in booth 1605! Percona CEO Peter Zaitsev will deliver a keynote on MySQL High Availability & Disaster Recovery, Tuesday, November 27 at 1:45 PM – 2:45 PM in the Bellagio Hotel, Level 1, Gauguin 2

            by Jervin Real at November 16, 2018 04:25 PM

            How Not to do MySQL High Availability: Geographic Node Distribution with Galera-Based Replication Misuse

            MySQL High Availability 2

            Let’s talk about MySQL high availability (HA) and synchronous replication once more.

            It’s part of a longer series on some high availability reference architecture solutions over geographically distributed areas.

            Part 1: Reference Architecture(s) for High Availability Solutions in Geographic Distributed Scenarios: Why Should I Care?

            Part 2: MySQL High Availability On-Premises: A Geographically Distributed Scenario

            The Problem

            A question I often get from customers is: How do I achieve high availability in case if I need to spread my data in different, distant locations? Can I use Percona XtraDB Cluster?

            Percona XtraDB Cluster (PXC), mariadb-cluster or MySQL-Galera are very stable and well-known solutions to improve MySQL high availability using an approach based on multi-master data-centric synchronous data replication model. Which means that each data-node composing the cluster MUST see the same data, at a given moment in time.

            Information/transactions must be stored and visible synchronously on all the nodes at a given time. This is defined as a tightly coupled database cluster. This level of consistency comes with a price, which is that nodes must physically reside close to each other and cannot be geographically diverse.

            This is by design (in all synchronous replication mechanisms). This also has to be clarified over and over throughout the years. Despite that we still see installations that span across geographic locations, including AWS Regions.

            We still see some solutions breaking the golden rule of proximity, and trying to break the rules of physics as well. The problem/mistake is not different for solutions based on-premises or in the cloud (for whatever cloud provider).

            Recently I had to design a couple of customer solutions based on remote geographic locations. In both cases, the customer was misled by an incorrect understanding of how the synchronous solution works, and from a lack of understanding of the network layer. I decided I need to cover this topic again, as I have done previously in Galera geographic replication and Effective way to check network connection in a geographically distributed environment 

            What Happen When I Put Things on the Network?

            Well, let’s start with the basics.

            While light travels at 300 million meters per second, the propagation of the electric fields or electric signaling is slower than that.

            The real speed depends by the medium used to transmit it. But it can be said that the real speed normally spans from 0% to 99% of light-speed (depending on the transmission medium).

            This means that in optimal conditions the signal travels at approximately 299.72Km per millisecond, in good/mid condition about half that at 149.86Km per millisecond, and in bad conditions it could be 3Km per millisecond or less.

            To help you understand, the distance between Rome (Italy) and Mountain View (California) is about 10,062Km. At light-speed it will take 33.54ms. In good conditions (90% of light-speed) the signal will take 37.26ms to reach Mountain View, and in less optimal conditions it can easily double to 74.53 ms.

            Keep in mind this is the electric field propagation speed: optimal conditions with no interruption, re-routing and so on. Reality will bring all the kind of interruptions, repeaters and routing.

            All the physics above works as a baseline. On top of this, each human construct adds functionalities, flexibility and (unfortunately) overhead – leading to longer times and slower speeds.

            The final speed will be different than the simple propagation of the electric fields. It will include the transmission time of complex signaling using ICMP protocol, or even higher delays with the use of a very complex protocol like TCP/IP, which includes handshaking, package rerouting, re-sending and so on. On top of that, when sending things over the internet, we need to realize that it is very improbable we will be the only user sending data over that physical channel. As such, whatever we have “on the road” will need to face bandwidth limitation, traffic congestion and so on.

            I had described the difference between protocols (ICMP – TCP/IP) hereclarifying how the TCP/IP scenario is very different from using different protocols like ICMP, or the theoretical approach.

            What it all means is that we cannot trust the theoretical performance. We must move to a more empirical approach. But we must understand the right empirical approach or we will be misled.

            An Example

            I recently worked on a case where a customer had two data centers (DC) at a distance of approximately 400Km, connected with “fiber channel”. Server1 and Server2 were hosted in the same DC, while Server3 was in the secondary DC.

            Their ping, with default dimension, to Server3 was ~3ms. Not bad at all, right?

            We decided to perform some serious tests, running multiple sets of tests with netperf for many days collecting data. We also used the data to perform additional fine tuning on the TCP/IP layer AND at the network provider.

            The results produced a common (for me) scenario (not so common for them):

             

            The red line is the first set of tests BEFORE we optimized. The yellow line is the results after we optimized.

            The above graph reports the number of transactions/sec (AVG) we could run against the different dimension of the dataset and changing the destination server. The full roundtrip was calculated.

            It is interesting to note that while the absolute numbers were better in the second (yellow) tests, this was true only for a limited dataset dimension. The larger the dataset, the higher the impact. This makes sense if you understand how the TCP/IP stack works (the article I mentioned above explains it).

            But what surprised them were the numbers. Keeping aside the extreme cases and focusing instead on the intermediate case, we saw that shifting from a 48k dataset dimension to 512K hugely dropped the performance. The drop for executed transactions was from 2299 to 219 on Server2 (same dc) and from 1472 to 167 Server3 (different DC).

            Also, note that Server3 only managed ~35% fewer transactions comparing to Server2 from the start given the latency. Latency moved from a more than decent 2.61ms to 27.39ms for Server2 and 4.27ms to 37.25ms for Server3.

            37ms latency is not very high. If that had been the top limit, it would have worked.

            But it was not.

            In the presence of the optimized channel, with fiber and so on, when the tests were hitting heavy traffic, the congestion was such to compromise the data transmitted. It hit a latency >200ms for Server3. Note those were spikes, but if you are in the presence of a tightly coupled database cluster, those events can become failures in applying the data and can create a lot of instability.

            Let me recap a second the situation for Server3:

            We had two datacenters.

            • The connection between the two was with fiber
            • Distance Km ~400, but now we MUST consider the distance to go and come back. This because in case of real communication, we have not only the send, but also the receive packages.
            • Theoretical time at light-speed =2.66ms (2 ways)
            • Ping = 3.10ms (signal traveling at ~80% of the light speed) as if the signal had traveled ~930Km (full roundtrip 800 Km)
            • TCP/IP best at 48K = 4.27ms (~62% light speed) as if the signal had traveled ~1,281km
            • TCP/IP best at 512K =37.25ms (~2.6% light speed) as if the signal had traveled ~11,175km

            Given the above, we have from ~20%-~40% to ~97% loss from the theoretical transmission rate. Keep in mind that when moving from a simple signal to a more heavy and concurrent transmission, we also have to deal with the bandwidth limitation. This adds additional cost. All in only 400Km distance.

            This is not all. Within the 400km we were also dealing with data congestions, and in some cases the tests failed to provide the level of accuracy we required due to transmission failures and too many packages retry.

            For comparison, consider Server2 which is in the same DC of Server1. Let see:

            • Ping = 0.027ms that is as if the signal had traveled ~11km light-speed
            • TCP/IP best at 48K = 2.61ms as if traveled for ~783km
            • TCP/IP best at 512K =27.39ms as if traveled for ~8,217km
            • We had performance loss, but the congestion issue and accuracy failures did not happen.

            You might say, “But this is just a single case, Marco, you cannot generalize from this behavior!”

            You would be right IF that were true (but is not).

            The fact is, I have done this level of checks many times and in many different environments. On-premises or using the cloud. Actually, in the cloud (AWS), I had even more instability. The behavior stays the same. Please test it yourself (it is not difficult to use netperf). Just do the right tests with RTT and multiple requests (note at the end of the article).

            Anyhow, what I know from the tests is that when working INSIDE a DC with some significant overhead due to the TCP/IP stack (and maybe wrong cabling), I do not encounter the same congestion or bandwidth limits I have when dealing with an external DC.

            This allows me to have more predictable behavior and tune the cluster accordingly. Tuning that I cannot do to cover the transmission to Server3 because of unpredictable packages behavior and spikes. >200ms is too high and can cause delivery failures.

            If we apply the given knowledge to the virtually-synchronous replication we have with Galera (Percona XtraDB Cluster), we can identify that we are hitting the problems well-explained in Jay’s article Is Synchronous Replication right for your appThere, he explains Callaghan’s Law: [In a Galera cluster] a given row can’t be modified more than once per RTT. 

            On top of that, when talking of geographical disperse solutions we have the TCP/IP magnifying the effect at writeset transmission/latency level. This causes nodes NOT residing on the same physical contiguous network delay for all the certification-commit phases for an X amount of time.

            When X is predictable, it may range between 80% – 3% of the light speed for the given distance. But you can’t predict the transmission-time of a set of data split into several datagrams, then sent on the internet, when using TCP/IP. So we cannot use the X range as a trustable measure.

            The effect is unpredictable delay, and this is read as a network issue from Galera. The node can be evicted from the cluster. Which is exactly what happens, and what we experience when dealing with some “BAD” unspecified network issue. This means that whenever we need to use a solution based on tightly coupled database cluster (like PXC), we cannot locate our nodes at a distance that is longer than the largest RTT time of our shortest desired period of commit.

            If our application must apply the data in a maximum of 200ms in one of its functions, our min RTT is 2ms and our max RTT is 250ms. We cannot use this solution, period. To be clear, locating a node on another geolocation, and as such use the internet to transmit/receive data, is by default a NO GO given the unpredictability of that network link.

            I doubt that nowadays we have many applications that can wait an unpredictable period to commit their data. The only case when having a node geographically distributed is acceptable is if you accept commits happening in undefined periods of time and with possible failures.

            What Is the Right Thing To Do?

            The right solution is easier than the wrong one, and there are already tools in place to make it work efficiently. Say you need to define your HA solution between the East and West Coast, or between Paris and Frankfurt. First of all, identify the real capacity of your network in each DC. Then build a tightly coupled database cluster in location A and another tightly coupled database cluster in the other location B. Then link them using ASYNCHRONOUS replication.

            Finally, use a tool like Replication Manager for Percona XtraDB Cluster to automatically manage asynchronous replication failover between nodes. On top of all of that use a tool like ProxySQL to manage the application requests.

            The full architecture is described here.

            Conclusions

            The myth of using ANY solution based on tightly coupled database cluster on distributed geographic locations is just that: a myth. It is conceptually wrong and practically dangerous. It MIGHT work when you set it up, it MIGHT work when you test it, it MIGHT even work for some time in production.

            By definition, it will break, and it will break when it is least convenient. It will break in an unpredictable moment, but because of a predictable reason. You did the wrong thing by following a myth.

            Whenever you need to distribute your data over different geographic locations, and you cannot rely on a single physical channel (fiber) to connect the two locations, use asynchronous replication, period!

            References

            https://github.com/y-trudeau/Mysql-tools/tree/master/PXC

            http://www.tusacentral.net/joomla/index.php/mysql-blogs/164-effective-way-to-check-the-network-connection-when-in-need-of-a-geographic-distribution-replication-.html

            https://www.percona.com/blog/2013/05/14/is-synchronous-replication-right-for-your-app/

            Sample test

            #!/bin/bash
            test_log=/tmp/results_$(date +'%Y-%m-%d_%H_%M_%S').txt
            exec 9>>"$test_log"
            exec 2>&9
            exec 1>&9
            echo "$(date +'%Y-%m-%d_%H_%M_%S')" >&9
            for ip in 11 12 13; do
              echo "  ==== Processing server 10.0.0.$ip === "
              for size in 1 48 512 1024 4096;do
                echo " --- PING ---"
                ping -M do -c 5  10.0.0.$ip -s $size
                echo "  ---- Record Size $size ---- "
                netperf -H 10.0.0.$ip -4 -p 3307 -I 95,10 -i 3,3 -j -a 4096 -A 4096  -P 1 -v 2 -l 20 -t TCP_RR -- -b 5 -r ${size}K,48K -s 1M -S 1M
                echo "  ---- ================= ---- ";
              done
               echo "  ==== ----------------- === ";
            done

             

            by Marco Tusa at November 16, 2018 12:46 AM

            MySQL High Availability On-Premises: A Geographically Distributed Scenario

            On-Premises MySQL High Availability
            MySQL High Availability

            MySQL High Availability. Shutterstock.com

            In this article, we’ll look at an example of an on-premises, geographically distributed MySQL high availability solution. It’s part of a longer series on some high availability reference architecture solutions over geographically distributed areas.

            Part 1: Reference Architecture(s) for High Availability Solutions in Geographic Distributed Scenarios: Why Should I Care?

            Percona consulting’s main aim is to identify simple solutions to complex problems. We try to focus on identifying the right tool, a more efficient solution, and what can be done to make our customers’ lives easier. We believe in doing the work once, doing it well and have more time afterward for other aspects of life.

            In our journey, we often receive requests for help – some simple, some complicated.  

            Scenario

            The company “ACME Inc.” is moving its whole business from a monolithic application to a distributed application, split into services. Each different service deals with the requests independently from each other. Some services follow the tightly-bounded transactional model, and others work/answer asynchronously. Each service can access the data storage layer independently.

            In this context, ACME Inc. identified the need to distribute the application services over wide geographic regions, focusing on each region achieving scale independently.

            The identified regions are:

            • North America
            • Europe
            • China

            ACME Inc. is also aware of the fact that different legislation acts on each region. As such, each region requires independent information handling about sales policies, sales campaigns, customers, orders, billing and localized catalogs, but will share the global catalog and some historical aggregated data. While most of the application services will work feeding and reading local distributed caches, the basic data related to the catalog, sales and billing is based on an RDBMS.

            Historical data is instead migrated to a “Big Data” platform, and aggregated data is elaborated and push to a DWH solution at HQ. The application components are developed using multiple programming languages, depending on the service.   

            The RDBMS identified by ACME Inc. in collaboration with the local authorities was MySQL-oriented. There were several solutions like:

            • PostgreSQL
            • Oracle DB
            • MS SQL server

            We excluded closed-source RDBMSs given that some countries imposed a specific audit plugin. This plugin was only available for the mentioned platforms. The cost of parallel development and subsequent maintenance in case of RDBMS diversification was too high. As such all the regions must use the same major RDBMS component.

            We excluded PostgreSQL given that compared to the adoption of MySQL, utilization cases were higher and MySQL had a well-defined code producer. Finally, the Business Continuity team of ACME Inc., had defined an ITSC (Information Technology Service Continuity) plan that defined the RPO (Recovery Point Objective), the RTO (Recovery Time Objective) and system redundancy.

            That’s it. To fulfill the ITSCP, each region must have the critical system redundantly replicated in the same region, but not on the proximity.

            Talking About the Components

            This is a not-so-uncommon scenario, and it also presents a lot of complexity if you try to address it with one solution. But let’s analyze it and see how we can simplify the approach while still meeting the needs and requirements of ACME Inc.

            When using MySQL-based solutions, the answer to “what should we use?” is use what best fits your business needs. The “nines” availability reference table for the MySQL world (most RDBMSs) can be summarized below:

            9 0. 0 0 0 % (36 days) MySQL Replication
            9 9. 9 0 0 % (8 hours) Linux Heartbeat with DRBD (Obsolete DRBD)
            9 9. 9 0 0 % (8 hours) RHCS with Shared Storage (Active/Passive)
            9 9. 9 9 0 % (52 minutes) MHA/Orchestrator with at least three nodes
            9 9. 9 9 0 % (52 minutes) DRBD and Replication (Obsolete DRBD)
            9 9 .9 9 5 % (26 minutes) Multi-Master (Galera replication) 3 node minimum
            9 9. 9 9 9 % (5 minutes) MySQL Cluster

            An expert will tell you that it always doesn’t make sense to go for the most “nines” in the list. This because each solution comes with a tradeoff: the more high availability (HA) you get, the higher the complexity of the solution and in managing the solution.

            For instance, the approach used in MySQL Cluster (NDB) makes this solution not suitable for generic utilization. It requires proper analysis of the application needs, data utilization and archiving before being selected. It also requires in-depth knowledge to properly manage the cluster, as it is more complex than other similar solutions.

            This indirectly makes a solution based on MySQL+Galera replication the one with the highest HA level a better choice, since it is close to the defaults generalized utilizations. 

            This is why MySQL+Galera replication has become in the last six years the most used solution for platform looking for very high HA, without the need to diverge from standard MySQL/InnoDB approach. You can read more about Galera replication: http://galeracluster.com/products/ 

            Read more about Percona XtraDB Cluster.

            There are several distributions implementing Galera replication:

            *Note that MariaDB Cluster/Server and all related solutions coming from MariaDB have significantly diverged from the MySQL mainstream. This often means that once migrated to MariaDB; your database will not be compatible with other MySQL solutions. In short, you are locked-in to MariaDB. It is recommended that you carefully evaluate the move to MariaDB before making that move.

            Choosing the Components

            RDBMS

            Our advice is to use Percona XtraDB Cluster (PXC), because at the moment it is one of the most flexible and reliable and compatible solutions. PXC is composed of three main components:

            The cluster is normally composed of three nodes or more. Each node can be used as a Master, but the preferred and recommended way is to use one node as a Writer and the other as Readers.

            Application-wise, accessing the right node can be challenging since this means you need to be aware of which node is the writer, which is the reader, and be able to shift from one to the other if necessary.

            Proxy

            To simplify this process, it helps to have an additional component that works as a “proxy” connecting the application layer to the desired node(s). The most popular solutions are:

            • HAProxy
            • ProxySQL

            There are several important differences between the two. But in summary, ProxySQL is a Level 7 proxy and is MySQL protocol aware. So, while HAProxy is just passing the connection over as a forward proxy (level 4), ProxySQL is aware of what is going through it and acts as reverse proxy. 

            With ProxySQL is possible to decide, based on several parameters, where to send traffic (read/write split and more), what must be stopped, or if we should rewrite an incoming SQL command. A lot of information is available on the ProxySQL website https://github.com/sysown/proxysql/wiki and on the Percona Database Performance Blog .

            Backup/Restore

            No RDBMS platform is safe without a well-tested procedure for backup and recovery. The Percona XtraDB Cluster package distribution comes with Percona XtraBackup as the default method for node provisioning. A good backup and restore (B/R) policy start from the consideration of ACME’s ITSCP, to have full and incremental backups, perfectly covering the RPO, and a good recovery procedure to keep the recovery time inside RTO whenever possible.

            There are several tools that allow you to plan and execute backup/restore procedure, some coming from vendors other than open source and community-oriented. In respect to being a fully open source and community-oriented, we in consulting normally suggest using: https://github.com/dotmanila/pyxbackup.

            Pyxbackup is a wrapper around XtraBackup that helps simplify the B/R operations, including the preparation of a full and incremental set. This helps significantly reduce the recovery time.  

            Disaster Recovery

            Another very important aspect of the ITSC Plan is the capacity of the system to survive to major disasters. The disaster and recovery (DR) solution must be able to act as the main production environment. Therefore, it must be designed and scaled as the main production site in resources. It must be geographically separated, normally hundreds of kilometers or more. It must be completely independent of the main site. It must be as much as possible in sync with the main production site.

            While the first three “musts” are easy to understand, the fourth one is often the object of misunderstanding.

            The concept of be as much in sync with the production site as possible creates confusion in designing HA solutions with Galera replication involved. The most common misunderstanding is the misuse of the Galera replication layer. Mainly the conceptual confusion between tightly coupled database cluster and loosely coupled database cluster.

            Any solution based on Galera replication is a tightly coupled database cluster, because the whole idea is to be data-centric, synchronously distributed and consistent. The price is that this solution cannot be geographically distributed.

            Solutions like standard MySQL replication are instead loosely coupled database cluster and they are designed to be asynchronous. Given that, the nodes connected by it are completely independent in processing/apply the transaction, and the solution fits perfectly into ANY geographically distributed replication solution. The price is that data on the receiving front might not be up to date with the one from the source in that specific instant.

            The point is that for the DR site the ONLY valid solution is the asynchronous link (loosely coupled database cluster), because by design and requirement the two sites must be separated by a significant number of kilometers. For better understanding about why synchronous replication cannot work in a geographically distributed scenario, see “Misuse of Geographic Node distribution with Galera-based replication“.

            In our scenario, the use of Percona XtraDB Cluster helps to create a most robust asynchronous solution. This is because each tightly coupled database cluster, no matter if source or destination, will be seen by the other tightly coupled database cluster as a single entity.

            What it means is that we can shift from one node to another inside the two clusters, still confident we will have the same data available and the same asynchronous stream passing from one source to the other.

            To ensure this procedure is fully automated, we add to our architecture the last block: replication manager for Percona XtraDB Cluster (https://github.com/y-trudeau/Mysql-tools/tree/master/PXC). RMfP is another open source tool that simplifies and automates failover inside each PXC cluster such that our asynchronous solution doesn’t suffer if the node is currently acting as Master fails.  

            How to Link the Components

            Summarizing all the different components of our solution:

            • Application stack
              • Load balancer
              • Application nodes by service
              • Distributed caching
              • Data access service
            • Database stack
              • Data proxy (ProxySQL)
              • RDBMS (Percona XtraDB Cluster)
              • Backup/Restore
                • Xtrabackup
                • Pyxbackup
                • Custom scripts
              • DR
                • Replication Manager for Percona XtraDB Cluster
            • Monitoring
              • PMM (not covered here see <link> for detailed information)

             

            In the solution above, we have two locations separated by several kilometers. On top of them, the load balancer(s)/DNS resolution redirects the incoming traffic to the active site. Each site hosts a full application stack, and applications connect to local ProxySQL.

            ProxySQL has read/write enabled to optimize the platform utilization, and is configured to shift writes from one PXC node to another in case of node failure. Asynchronous replication connects the two locations and transmits data from master to slave.

            Note that with this solution, it is possible to have multiple geographically distributed sites.

            Backups are taken at each site independently and recovery test is performed. RMfP oversees and modifies the replication channels in the case of a node failure.

            Finally, Percona Monitoring and Management (PMM) is in place to perform in-depth monitoring of the whole database platform.

            Conclusions

            We always look for the most efficient, manageable, user-friendly combination of products, because we believe in providing and supporting the community with simple but efficient solutions. What we have presented here is the most robust and stable high availability solution in the MySQL space (except for MySQL NDB that we have excluded). 

            It is conceptualized to provide maximum service continuity, with limited bonding between the platforms/sites. It also is a well-tested solution, that has been adopted and adapted in many different scenarios where performance and real HA are a must. I have preferred to keep this digression at a high level, given the details of the implementation have already been discussed elsewhere (see reference section for more reading).

            Still, Percona XtraDB Cluster (as any other solution implementing Galera replication) might not fit the final use. Given that, it is important to understand where it does and doesn’t fit. This article is a good summary with examples: Is Synchronous Replication right for your app?.

            Check out the next article on How Not to do MySQL High Availability.

            References

            https://www.percona.com/blog/2016/06/07/choosing-mysql-high-availability-solutions/

            https://dev.mysql.com/doc/mysql-ha-scalability/en/ha-overview.html

            https://www.percona.com/blog/2014/11/17/typical-misconceptions-on-galera-for-mysql/

            http://galeracluster.com/documentation-webpages/limitations.html

            http://tusacentral.net/joomla/index.php/mysql-blogs/170-geographic-replication-and-quorum-calculation-in-mysqlgalera.html

            http://tusacentral.net/joomla/index.php/mysql-blogs/167-geographic-replication-with-mysql-and-galera.html

            http://tusacentral.net/joomla/index.php/mysql-blogs/164-effective-way-to-check-the-network-connection-when-in-need-of-a-geographic-distribution-replication-.html

            http://tusacentral.net/joomla/index.php/mysql-blogs/183-proxysql-percona-cluster-galera-integration.html

            https://github.com/sysown/proxysql/wiki

             

            by Marco Tusa at November 16, 2018 12:46 AM

            Reference Architecture(s) for High Availability Solutions in Geographic Distributed Scenarios: Why Should I Care?

            High Availability Solutions
            High Availability Solutions

            High Availability Solutions. Shutterstock.com

            In this series of blog posts, I’m going to look at some high availability reference architecture solutions over geographically distributed areas.

            The Problem

            Nowadays, when businesses plan a new service or application, it is very common for them to worry about ensuring a very high level of availability. 

            It doesn’t matter if we are talking about an online shop, online banking or the internal services of a large organization. We know users are going to expect access to services 24x7x365. They also expect to access data consistently and instantaneously. If we fail to meet their expectations, then they move to another provider and we lose money. Simple as that.

            The other important aspect of providing online services and applications is that the amount of data produced, analyzed and stored is growing every day. We’ve moved from the few gigabytes of yesterday to terabytes today. Who knows what number of petabytes we need tomorrow?

            What was once covered with a single LAMP stack, today can require dozens of Ls, As, different letters instead of P (like J, R, Py, G) and M. Our beloved MySQL that used to be “enough” to cover our needs 12 years ago is not fitting well with all the needs of many modern applications.

            It is very common to have an application using different types of  “storage” at different levels and in different aspects of their activities. We can use a key-value store to cache inflight operations, and a relational full ACID database for the “valuable” core data (the kind of data that must be consistent and durable). Large data gets stored in an eventually consistent columns store mechanism, and long-term data in some “big data” approach. 

            On top of all this is are reporting mechanisms that collect elements of each data store to provide a required, comprehensive data picture. The situation is very diversified and complex, and the number of possible variables is high. The way we can combine them is so vast that nowadays developers have no limits, and often comes up with creative solutions.

            This is where we as architects can help: we can clarify how each tool can be used for the right JOB. We, at Percona, have the strong belief that we must seek simplicity in the complexity, and embracing the KISS approach. This starts with the initial identification of the right tool for the job.

            Let’s start by looking at the following good practices in the following examples:

            • It is not a good idea to use key-value storage if you need to define the relationship between entities and rules between them.
            • Avoid using an eventually consistent storage when you have to save monetary information about customer payments.
            • It’s not a best practice to use a relational database to store HTML caching, page-tracking information, or game info in real time.

            Use the right tool for the right job. Some tools scale writes better and keep an eventually consistent approach. Some others are designed to store an unbelievable amount of data, but cannot handle relations. As a result, they might take a long time when processing a typical OLTP request – if they can at all. Each tool has a different design and goal, each one scales differently, and each one has its way of handling and improving availability.

            It is a crucial part of the architectural phase of your project not to mix the cards. Keep things clean and build the right architecture for each component. Then combine them in the way that harmonizes in the final result. We should optimize each block when solving a complex issue with simple answers.

            How far are we from the old LAMP single stack? Ages. It is like turning your head and looking at our ancestors building the first tents. Tents are still a valid solution if you want to go camping. But only for fun, not for everyday life.

            There is too often confusion around what a relational database should do and how it should do it. A relational database should not replace every other component of the wide architecture, and vice versa. They must coexist and work together with other options. Each one should maximize its characteristics and minimize its limitations.

            In this series, we will focus on RDBMSs, and we will present a few possible reference architectures for the relational database layer. I will illustrate solutions that improve service availability, keeping a focus on what the tool’s design and the relational data approach concerning the ACID paradigm.

            This means employing the simple rules of:

            • Atomicity -> All operations, part of the same transaction, are concluded successfully or not applied at all.
            • Consistency -> Any data written must be valid/validated against the defined rules and combination thereof.
            • Isolation -> Guarantees that all transactions will occur in isolation. No transaction affects any other transaction.
            • Durability -> Durability means that, once a transaction is committed, it will remain in the system even if there’s a system crash immediately following the transaction. Transaction changes must be stored permanently.

            We will discuss the solution involving the most common open source RDBMSs, covering on-premises and in the cloud:

            • MySQL
            • PostgreSQL
            • MongoDB

            The scenario will be common to all solutions, but the way we implement the solution will instead answer to different needs. The first example is MySQL high availability on premises: MySQL High Availability on premises.

            by Marco Tusa at November 16, 2018 12:44 AM

            November 15, 2018

            Jean-Jerome Schmidt

            Webinar Replay: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB with ClusterControl

            Thanks to everyone who participated in this week’s webinar on ‘Backup Management with ClusterControl’. The replay is now available to watch online as well as the slide deck.

            If you feel frustrated by traditional, labour-intensive backup and archive practices for your MySQL, MariaDB, MongoDB and PostgreSQL databases … then this session is for you!

            What if you could have one backup management solution for all your business data? What if you could ensure integrity of all your backups? And what if you could leverage the competitive pricing and almost limitless capacity of cloud-based backup while meeting cost, manageability, and compliance requirements from the business?

            ClusterControl’s centralized backup management for open source databases provides you with hot backups of large datasets, point in time recovery in a couple of clicks, at-rest and in-transit data encryption, data integrity via automatic restore verification, cloud backups (AWS, Google and Azure) for Disaster Recovery, retention policies to ensure compliance, and automated alerts and reporting.

            Whether you are looking at rebuilding your existing backup infrastructure, or updating it, this webinar provides the necessary insights and details on how to go about that.

            Agenda

            • Backup and recovery management of local or remote databases
              • Logical or physical backups
              • Full or Incremental backups
              • Position or time-based Point in Time Recovery (for MySQL and PostgreSQL)
              • Upload to the cloud (Amazon S3, Google Cloud Storage, Azure Storage)
              • Encryption of backup data
              • Compression of backup data
            • One centralized backup system for your open source databases (Demo)
              • Schedule, manage and operate backups
              • Define backup policies, retention, history
              • Validation - Automatic restore verification
              • Backup reporting

            Speaker

            Bartlomiej Oles is a MySQL and Oracle DBA, with over 15 years experience in managing highly available production systems at IBM, Nordea Bank, Acxiom, Lufthansa, and other Fortune 500 companies. In the past five years, his focus has been on building and applying automation tools to manage multi-datacenter database environments.

            by jj at November 15, 2018 03:46 PM

            November 14, 2018

            Jean-Jerome Schmidt

            Percona Live Frankfurt 2018 - Event Recap & Our Sessions

            Severalnines was pleased to yet again sponsor Percona Live Europe which was held this year in Frankfurt, Germany. Thanks to the Percona Team for having us and the great organisation.

            At the Conference

            Severalnines team members flew in from around the world to show off the latest edition of ClusterControl in the exhibit hall and present five sessions (see below).

            On our Twitter feed we live tweeted both of the keynote sessions to help keep those who weren’t able to attend up-to-speed on the latest happenings in the open source database world.

            Our Sessions

            Members of the Severalnines Team presented five sessions in total at the event about MySQL, MariaDB & MongoDB, each of which showcased ClusterControl and how it delivers on those topics.

            Disaster Recovery Planning for MySQL & MariaDB

            Presented by: Bart Oles - Severalnines AB

            Session Details: Organizations need an appropriate disaster recovery plan to mitigate the impact of downtime. But how much should a business invest? Designing a highly available system comes at a cost, and not all businesses and indeed not all applications need five 9's availability. We will explain fundamental disaster recovery concepts and walk you through the relevant options from the MySQL & MariaDB ecosystem to meet different tiers of disaster recovery requirements, and demonstrate how to automate an appropriate disaster recovery plan.

            MariaDB Performance Tuning Crash Course

            Presented by: Krzysztof Ksiazek - Severalnines AB

            Session Details: So, you are a developer or sysadmin and showed some abilities in dealing with databases issues. And now, you have been elected to the role of DBA. And as you start managing the databases, you wonder

            • How do I tune them to make best use of the hardware?
            • How do I optimize the Operating System?
            • How do I best configure MySQL or MariaDB for a specific database workload?

            If you're asking yourself the following questions when it comes to optimally running your MySQL or MariaDB databases, then this talk is for you!

            We will discuss some of the settings that are most often tweaked and which can bring you significant improvement in the performance of your MySQL or MariaDB database. We will also cover some of the variables which are frequently modified even though they should not.

            Performance tuning is not easy, especially if you're not an experienced DBA, but you can go a surprisingly long way with a few basic guidelines.

            Performance Tuning Cheat Sheet for MongoDB

            Presented by: Bart Oles - Severalnines AB

            Session Details: Database performance affects organizational performance, and we tend to look for quick fixes when under stress. But how can we better understand our database workload and factors that may cause harm to it? What are the limitations in MongoDB that could potentially impact cluster performance?

            In this talk, we will show you how to identify the factors that limit database performance. We will start with the free MongoDB Cloud monitoring tools. Then we will move on to log files and queries. To be able to achieve optimal use of hardware resources, we will take a look into kernel optimization and other crucial OS settings. Finally, we will look into how to examine performance of MongoDB replication.

            Advanced MySql Data-at-Rest Encryption in Percona Server

            Presented by: Iwo Panowicz - Percona & Bart Oles - Severalnines AB

            Session Details: The purpose of the talk is to present data-at-rest encryption implementation in Percona Server for MySQL.
            Differences between Oracle's MySQL and MariaDB implementation.

            • How it is implemented?
            • What is encrypted:
            • Tablespaces?
            • General tablespace?
            • Double write buffer/parallel double write buffer?
            • Temporary tablespaces? (KEY BLOCKS)
            • Binlogs?
            • Slow/general/error logs?
            • MyISAM? MyRocks? X?
            • Performance overhead.
            • Backups?
            • Transportable tablespaces. Transfer key.
            • Plugins
            • Keyrings in general
            • Key rotation?
            • General-Purpose Keyring Key-Management Functions
            • Keyring_file
            • Is useful? How to make it profitable?
            • Keyring Vault
            • How does it work?
            • How to make a transition from keyring_file

            Polyglot Persistence Utilizing Open Source Databases as a Swiss Pocket Knife

            Presented by: Art Van Scheppingen - vidaXL & Bart Oles - Severalnines AB

            Session Details: Over the past few years, VidaXL has become a European market leader in the online retail of slow moving consumer goods. When a company achieved over 50% year over year growth for the past 9 years, there is hardly enough time to overhaul existing systems. This means existing systems will be stretched to the maximum of their capabilities, and often additional performance will be gained by utilizing a large variety of datastores. Polyglot persistence reigns in rapidly growing environments and the traditional one-size-fits-all strategy of monoglots is over. VidaXL has a broad landscape of datastores, ranging from traditional SQL data stores, like MySQL or PostgreSQL alongside more recent load balancing technologies such as ProxySQL, to document stores like MongoDB and search engines such as SOLR and Elasticsearch.

            by fwlymburner at November 14, 2018 12:45 PM

            November 13, 2018

            Peter Zaitsev

            ProxySQL 1.4.12 and Updated proxysql-admin Tool

            ProxySQL 1.4.12

            ProxySQL 1.4.12

            ProxySQL 1.4.12, released by ProxySQL, is now available for download in the Percona Repository along with an updated version of Percona’s proxysql-admin tool.

            ProxySQL is a high-performance proxy, currently for MySQL and its forks (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. René Cannaò created ProxySQL for DBAs as a means of solving complex replication topology issues.

            The ProxySQL 1.4.12 source and binary packages available at https://percona.com/downloads/proxysql include ProxySQL Admin – a tool, developed by Percona to configure Percona XtraDB Cluster nodes into ProxySQL. Docker images for release 1.4.12 are available as well: https://hub.docker.com/r/percona/proxysql/. You can download the original ProxySQL from https://github.com/sysown/proxysql/releases. GitHub hosts the documentation in the wiki format.

            Improvements

            • #68: Scripts are now compatible with Percona XtraDB Cluster (PXC) hosts using IPv6
            • #107: In include-slaves, slaves are not moved into the write hostgroup even if the whole cluster went down. A new option --use-slave-as-writer specifies whether or not the slave is added to the write hostgroup.

            Bugs Fixed

            • #110: In some cases, pattern cluster hostname did not work with proxysql-admin.
            • #104: proxysql-admin testsuite bug fixes.
            • #113: proxysql_galera_checker assumed that parameters were given in the long format
            • #114: In some cases, ProxySQL could not be started
            • #115: proxysql_node_monitor could fail with more than one command in the scheduler
            • #116: In some cases, the scheduler was reloading servers on every run
            • #117: The --syncusers option did not work when enabling cluster
            • #125: The function check_is_galera_checker_running was not preventing multiple instances of the script from running

            Other bugs fixed: #112#120

            ProxySQL is available under Open Source license GPLv3.

            by Borys Belinsky at November 13, 2018 06:54 PM

            MariaDB Foundation

            MariaDB 10.2.19 now available

            The MariaDB Foundation is pleased to announce the availability of MariaDB 10.2.19, the latest stable release in the MariaDB 10.2 series. See the release notes and changelogs for details. Download MariaDB 10.2.19 Release Notes Changelog What is MariaDB 10.2? MariaDB APT and YUM Repository Configuration Generator Contributors to MariaDB 10.2.19 Alexander Barkov (MariaDB Corporation) Alexey […]

            The post MariaDB 10.2.19 now available appeared first on MariaDB.org.

            by Ian Gilfillan at November 13, 2018 06:26 PM

            November 12, 2018

            Peter Zaitsev

            Percona Server for MySQL 5.7.23-24 Is Now Available

            Percona Server for MySQL 5.7

            Percona Server for MySQL 5.7Percona announces the release of Percona Server for MySQL 5.7.23-24 on November 12, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.7.23, including all the bug fixes in it. Percona Server for MySQL 5.7.23-24 is now the current GA release in the 5.7 series. All of Percona’s software is open-source and free.

            This release introduces InnoDB encryption improvements and merges upstream MyRocks changes. Also, we’ve improved the usage of column families in MyRocks. The InnoDB encryption improvements are in Alpha quality and we don’t recommend that they are used in production.

            New Features

            Bugs Fixed

            • #4723: PURGE CHANGED_PAGE_BITMAPS did not work when innodb_data_home_dir was used
            • #4937: rocksdb_update_cf_options was ignored when specified in my.cnf or on the command line
            • #1107: The binlog could be corrupted when tmpdir got full
            • #4834: The encrypted system tablespace could have an empty uuid

            Other bugs fixed

            • #4106: “Assertion `log.getting_synced’ failed in rocksdb::DBImpl::MarkLogsSynced(uint64_t, bool, const rocksdb::Status&)
            • #4930: “main.percona_log_slow_innodb: Result content mismatch”
            • #4811: “5.7 Merge and fixup for old DB-937 introduces possible regression”
            • #4705: “crash on snapshot size check in RocksDB”

            Find the release notes for Percona Server for MySQL 5.7.23-24 in our online documentation. Report bugs in the Jira bug tracker.

            by Borys Belinsky at November 12, 2018 06:52 PM

            MariaDB Foundation

            Running MariaDB in a Docker container

            Virtualisation has been a very popular technique for both development and production systems for many years. It allows multiple software environments to run on the same physical machine. Containerisation takes this idea even further. It allows you to segment your software environment down to the level of individual software packages. This means you can install […]

            The post Running MariaDB in a Docker container appeared first on MariaDB.org.

            by Jonathan Oxer at November 12, 2018 10:48 AM

            November 09, 2018

            MariaDB Foundation

            First MariaDB 10.4 alpha release

            The MariaDB Foundation is pleased to announce the availability of MariaDB 10.4.0, the first alpha release in the new MariaDB 10.4 series. See the release notes and changelogs for details. Download MariaDB 10.4.0 Release Notes Changelog What is MariaDB 10.4? MariaDB APT and YUM Repository Configuration Generator Contributors to MariaDB 10.4.0 Aleksey Midenkov (Tempesta) Alexander […]

            The post First MariaDB 10.4 alpha release appeared first on MariaDB.org.

            by Ian Gilfillan at November 09, 2018 05:57 PM

            November 08, 2018

            Peter Zaitsev

            Oracle Recognizes Percona Fixes in MySQL 8.0

            MySQL 8.0 Code Contributions
            MySQL 8.0 Code Contributions

            MySQL 8.0 Code Contributions (Shutterstock)

            An Oracle engineer thanked two Percona engineers by name, along with engineers from Facebook and elsewhere, for their recent MySQL 8.0 source code contributions. Oracle incorporated their work into its latest MySQL production release (8.0.13).

            Percona’s Zsolt Parragi authored a patch for a rare replication bug that left locked mutexes in production builds following debug crashes (bug #89421). Yura Sorokin authored a patch to fix wrong file I/O statistics in the MySQL Performance Schema (bug #90264).  Percona CTO Vadim Tkachenko cited both patches as examples of Percona’s continuing contributions to the open source community. This is one of Percona’s core ideals since the company’s founding in 2006.  

            In past last three years alone, Percona has reported on over 600 bugs in the MySQL server.  Most of these bug reports Percona provided Oracle engineers with reproducible test cases. They also contained detailed stack traces and other information appropriate for analyzing and fixing the bug. During that same period, Oracle accepted at least 20 patches authored by Percona engineers into its MySQL code base.

            Over its 12 year history, Percona engineers have created numerous open source projects that have won widespread community adoption.  These include Percona Server for MySQL, an enhanced version of the flagship MySQL database, Percona XtraDB Cluster, a high availability database solution, Percona Server for MongoDB®, an enhanced fork of the MongoDB® database, a  Percona XtraBackup, a database backup tool, Percona Tookit, a suite of utilities for database administrators, and the most recent, Percona Monitoring and Management (PMM), a GUI tool providing visibility into database performance.

            by Tom Basil at November 08, 2018 09:21 PM

            November 07, 2018

            Peter Zaitsev

            Percona Live Europe 2018: What’s Up for Wednesday

            Percona Live Europe Open Source Database Conference PLE 2018

            Welcome to Wednesday at Percona Live Europe 2018! Today is the final day! Check out all of the excellent sessions to attend.

            Please see the important updates below.

            Download the conference App

            If you haven’t already downloaded the app, go to the app store and download the official Percona Live App! You can view the schedule, be alerted for any important updates, create your own personalized schedule, rate the talks and interact with fellow attendees.

            For Apple: Download here
            For Android: Download here

            Rate the talks!

            We want to encourage all attendees to rate the talks which you have attended. Please take a few moments to rate the talks which you attended on the Percona Live App.

            Registration and Badge Pick Up

            Registration is open from 8 am.

            AWS Cloud Track

            Join the featured cloud track today where AWS will be presenting A Deep Dive on Amazon Aurora, Zero to Serverless in 60 Seconds, Top 10 Mistakes When Migrating From Oracle to PostgreSQL to name a few! These sessions will run in Wallstreet 2!

            AWS LEC

            Keynotes

            Keynotes begin promptly at 9:15 am. Please be seated and ready! Arrive early to secure your spot! Keynotes will take place in Dow Jones next to the expo area.

            Expo Opening Hours

            Have you visited the expo area yet? The expo will be open from 8:00 am to 4:30 pm today.

            Conference Slides

            Conference slides and presentations will be available to view after the conference and will be located on the Percona Live Europe website.

            Breaks and Lunch

            Coffee Breaks: The morning break is at 10:50 am – 11:20 am and the afternoon break from 4:10 pm- 4:30 pm (Conference Floor Foyer)
            Lunch: 1:10 pm -2:10 pm Lunch will be served on the conference floor and in Showroom and Gaia restaurant on the lobby level.

            With Thanks to Our Sponsors!

            Percona Live Europe 2018 Sponsors
            We hope you have enjoyed the conference!

            Save the Date!

            Percona Live 2019 will happen in Austin, Texas. Save the dates in your diary for May 28-30, 2019!

            The conference will take place just after Memorial Day at The Hyatt Regency, Austin on the shores of Lady Bird Lake. This is also an ideal central location for those who wish to extend their stay and explore what Austin has to offer! Call for papers, ticket sales and sponsorship opportunities will be announced soon, so stay tuned!

            by Bronwyn Campbell at November 07, 2018 05:14 AM

            November 06, 2018

            MariaDB Foundation

            MariaDB Galera Cluster 10.0.37 now available

            The MariaDB Foundation is pleased to announce the availability of MariaDB Galera Cluster 10.0.37, the latest stable release in the MariaDB Galera Cluster 10.0 series. See the release notes and changelogs for details. Download MariaDB Galera Cluster 10.0.37 Release Notes Changelog What is MariaDB Galera Cluster? Contributors to MariaDB Galera Cluster 10.0.37 Alexander Barkov (MariaDB […]

            The post MariaDB Galera Cluster 10.0.37 now available appeared first on MariaDB.org.

            by Ian Gilfillan at November 06, 2018 03:36 PM

            Peter Zaitsev

            Welcome to Percona Live Europe 2018 Tuesday Keynotes and Sessions!

            Percona Live Europe Open Source Database Conference PLE 2018

            Hello, open source database enthusiasts at Percona Live Europe 2018! There is a lot to see and do today, and we’ve got some of the highlights listed below.

            On Facebook? Go here for some pics that captured the action on Percona Live Europe 2018 Tutorials day (Monday, Nov. 5, 2018). 
            Percona Live Europe 2018 app

             

            Download the Conference App

            We apologize for the confusion yesterday on the app but can assure you, the schedule and timings have been updated! If you haven’t already downloaded the app, go to the app store and download the official Percona Live App! You can view the schedule, be alerted for any important updates, create your own personalized schedule, rate the talks and interact with fellow attendees.

            For Apple: Download here
            For Android: Download here

            Percona Live Frankfurt 1st Day-1244Registration and Badge Pick Up

            Registration is open from 8 am. The registration desk is located at the top of the stairs on the first floor of the Radisson Blu Hotel. 

            Keynotes

            Keynotes begin promptly at 9:15 am. Please be seated and ready! Arrive early to secure your spot! Keynotes will take place in Dow Jones next to the expo area. 

            Community Networking Reception

            Join the Open Source community on Tuesday evening at Chicago Meatpackers (Riverside), Frankfurt!

            This is a great opportunity to socialize and network with Percona Live Attendees and Other Open Source Enthusiasts who’d like to come along too!

            This is not a ticketed event or an official event of Percona Live Europe, simply an open invitation with a place to congregate for food and drinks! An A La Carte food menu and cash bar will be available.

            Percona Live Frankfurt 1st Day-1000Expo Opening Hours

            The expo will be open from 8:00 am to 4:30 pm today. 

            Breaks & Lunch

            Coffee Breaks: Sponsored by Facebook!  AM Break will be at 10:50am – 11:20 am and the Afternoon break from  4:10 pm- 4:30 pm (Conference Floor Foyer)
            Lunch: 1:10 pm -2:10 pm Lunch will be served on the conference floor and in Showroom and Gaia restaurant on the lobby level.

            With thanks to our Sponsors!

            Percona Live Europe 2018 Sponsors

            Enjoy the conference!

            by Bronwyn Campbell at November 06, 2018 08:21 AM

            November 05, 2018

            Peter Zaitsev

            How to Quickly Add a Node to an InnoDB Cluster or Group Replication

            Quickly Add a Node in InnoDB Cluster or Group Replication
            Quickly Add a Node to an InnoDB Cluster or Group Replication

            Quickly Add a Node to an InnoDB Cluster or Group Replication (Shutterstock)

            In this blog, we’ll look at how to quickly add a node to an InnoDB Cluster or Group Replication using Percona XtraBackup.

            Adding nodes to a Group Replication cluster can be easy (documented here), but it only works if the existing nodes have retained all the binary logs since the creation of the cluster. Obviously, this is possible if you create a new cluster from scratch. The nodes rotate old logs after some time, however. Technically, if the

            gtid_purged
             set is non-empty, it means you will need another method to add a new node to a cluster. You also need a different method if data becomes inconsistent across cluster nodes for any reason. For example, you might hit something similar to this bug, or fall prey to human error.

            Hot Backup to the Rescue

            The quick and simple method I’ll present here requires the Percona XtraBackup tool to be installed, as well as some additional small tools for convenience. I tested my example on Centos 7, but it works similarly on other Linux distributions. First of all, you will need the Percona repository installed:

            # yum install http://www.percona.com/downloads/percona-release/redhat/0.1-6/percona-release-0.1-6.noarch.rpm -y -q

            Then, install Percona XtraBackup and the additional tools. You might need to enable the EPEL repo for the additional tools and the experimental Percona repo for XtraBackup 8.0 that works with MySQL 8.0. (Note: XtraBackup 8.0 is still not GA when writing this article, and we do NOT recommend or advise that you install XtraBackup 8.0 into a production environment until it is GA). For MySQL 5.7, Xtrabackup 2.4 from the regular repo is what you are looking for:

            # grep -A3 percona-experimental-\$basearch /etc/yum.repos.d/percona-release.repo
            [percona-experimental-$basearch]
            name = Percona-Experimental YUM repository - $basearch
            baseurl = http://repo.percona.com/experimental/$releasever/RPMS/$basearch
            enabled = 1

            # yum install pv pigz nmap-ncat percona-xtrabackup-80 -q
            ==============================================================================================================================================
             Package                             Arch                 Version                             Repository                                 Size
            ==============================================================================================================================================
            Installing:
             nmap-ncat                           x86_64               2:6.40-13.el7                       base                                      205 k
             percona-xtrabackup-80               x86_64               8.0.1-2.alpha2.el7                  percona-experimental-x86_64                13 M
             pigz                                x86_64               2.3.4-1.el7                         epel                                       81 k
             pv                                  x86_64               1.4.6-1.el7                         epel                                       47 k
            Installing for dependencies:
             perl-DBD-MySQL                      x86_64               4.023-6.el7                         base                                      140 k
            Transaction Summary
            ==============================================================================================================================================
            Install  4 Packages (+1 Dependent package)
            Is this ok [y/d/N]: y
            #

            You need to do it on both the source and destination nodes. Now, my existing cluster node (I will call it a donor) – gr01 looks like this:

            gr01 > select * from performance_schema.replication_group_members\G
            *************************** 1. row ***************************
              CHANNEL_NAME: group_replication_applier
                 MEMBER_ID: 76df8268-c95e-11e8-b55d-525400cae48b
               MEMBER_HOST: gr01
               MEMBER_PORT: 3306
              MEMBER_STATE: ONLINE
               MEMBER_ROLE: PRIMARY
            MEMBER_VERSION: 8.0.13
            1 row in set (0.00 sec)
            gr01 > show global variables like 'gtid%';
            +----------------------------------+-----------------------------------------------+
            | Variable_name                    | Value                                         |
            +----------------------------------+-----------------------------------------------+
            | gtid_executed                    | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302662 |
            | gtid_executed_compression_period | 1000                                          |
            | gtid_mode                        | ON                                            |
            | gtid_owned                       |                                               |
            | gtid_purged                      | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-295538 |
            +----------------------------------+-----------------------------------------------+
            5 rows in set (0.01 sec)

            The new node candidate (I will call it a joiner) – gr02, has no data but the same MySQL version installed. It also has the required settings in place, like the existing node address in group_replication_group_seeds, etc. The next step is to stop the MySQL service on the joiner (if already running), and wipe out it’s datadir:

            [root@gr02 ~]# rm -fr /var/lib/mysql/*

            and start the “listener” process, that waits to receive the data snapshot (remember to open the TCP port if you have a firewall):

            [root@gr02 ~]# nc -l -p 4444 |pv| unpigz -c | xbstream -x -C /var/lib/mysql

            Then, start the backup job on the donor:

            [root@gr01 ~]# xtrabackup --user=root --password=*** --backup --parallel=4 --stream=xbstream --target-dir=./ 2> backup.log |pv|pigz -c --fast| nc -w 2 192.168.56.98 4444
            240MiB 0:00:02 [81.4MiB/s] [ <=>

            On the joiner side, we will see:

            [root@gr02 ~]# nc -l -p 4444 |pv| unpigz -c | xbstream -x -C /var/lib/mysql
            21.2MiB 0:03:30 [ 103kiB/s] [ <=> ]
            [root@gr02 ~]# du -hs /var/lib/mysql
            241M /var/lib/mysql

            BTW, if you noticed the difference in transfer rate between the two, please note that on the donor side I put

            |pv|
             before the compressor while in the joiner before decompressor. This way, I can monitor the compression ratio at the same time!

            The next step will be to prepare the backup on joiner:

            [root@gr02 ~]# xtrabackup --use-memory=1G --prepare --target-dir=/var/lib/mysql 2>prepare.log
            [root@gr02 ~]# tail -1 prepare.log
            181019 19:18:56 completed OK!

            and fix the files ownership:

            [root@gr02 ~]# chown -R mysql:mysql /var/lib/mysql

            Now we should verify the GTID position information and restart the joiner (I have the

            group_replication_start_on_boot=off
             in my.cnf):

            [root@gr02 ~]# cat /var/lib/mysql/xtrabackup_binlog_info
            binlog.000023 893 aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302662
            [root@gr02 ~]# systemctl restart mysqld

            Now, let’s check if the position reported by the node is consistent with the above:

            gr02 > select @@GLOBAL.gtid_executed;
            +-----------------------------------------------+
            | @@GLOBAL.gtid_executed                        |
            +-----------------------------------------------+
            | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302660 |
            +-----------------------------------------------+
            1 row in set (0.00 sec)

            No, it is not. We have to correct it:

            gr02 > reset master; set global gtid_purged="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302662";
            Query OK, 0 rows affected (0.05 sec)
            Query OK, 0 rows affected (0.00 sec)

            Finally, start the replication:

            gr02 > START GROUP_REPLICATION;
            Query OK, 0 rows affected (3.91 sec)

            Let’s check the cluster status again:

            gr01 > select * from performance_schema.replication_group_members\G
            *************************** 1. row ***************************
            CHANNEL_NAME: group_replication_applier
            MEMBER_ID: 76df8268-c95e-11e8-b55d-525400cae48b
            MEMBER_HOST: gr01
            MEMBER_PORT: 3306
            MEMBER_STATE: ONLINE
            MEMBER_ROLE: PRIMARY
            MEMBER_VERSION: 8.0.13
            *************************** 2. row ***************************
            CHANNEL_NAME: group_replication_applier
            MEMBER_ID: a60a4124-d3d4-11e8-8ef2-525400cae48b
            MEMBER_HOST: gr02
            MEMBER_PORT: 3306
            MEMBER_STATE: ONLINE
            MEMBER_ROLE: SECONDARY
            MEMBER_VERSION: 8.0.13
            2 rows in set (0.00 sec)
            gr01 > select * from performance_schema.replication_group_member_stats\G
            *************************** 1. row ***************************
                                          CHANNEL_NAME: group_replication_applier
                                               VIEW_ID: 15399708149765074:4
                                             MEMBER_ID: 76df8268-c95e-11e8-b55d-525400cae48b
                           COUNT_TRANSACTIONS_IN_QUEUE: 0
                            COUNT_TRANSACTIONS_CHECKED: 3
                              COUNT_CONFLICTS_DETECTED: 0
                    COUNT_TRANSACTIONS_ROWS_VALIDATING: 0
                    TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302666
                        LAST_CONFLICT_FREE_TRANSACTION: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:302665
            COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0
                     COUNT_TRANSACTIONS_REMOTE_APPLIED: 2
                     COUNT_TRANSACTIONS_LOCAL_PROPOSED: 3
                     COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0
            *************************** 2. row ***************************
                                          CHANNEL_NAME: group_replication_applier
                                               VIEW_ID: 15399708149765074:4
                                             MEMBER_ID: a60a4124-d3d4-11e8-8ef2-525400cae48b
                           COUNT_TRANSACTIONS_IN_QUEUE: 0
                            COUNT_TRANSACTIONS_CHECKED: 0
                              COUNT_CONFLICTS_DETECTED: 0
                    COUNT_TRANSACTIONS_ROWS_VALIDATING: 0
                    TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-302666
                        LAST_CONFLICT_FREE_TRANSACTION:
            COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0
                     COUNT_TRANSACTIONS_REMOTE_APPLIED: 0
                     COUNT_TRANSACTIONS_LOCAL_PROPOSED: 0
                     COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0
            2 rows in set (0.00 sec)

            OK, our cluster is consistent! The new node joined successfully as secondary. We can proceed to add more nodes!

            by Przemysław Malkowski at November 05, 2018 07:46 PM

            November 04, 2018

            Valeriy Kravchuk

            On New Severity Levels for MySQL Bugs

            Four weeks ago while working on a blog post about half baked XA transactions feature of MySQL server I've noted that there are new severity levels added by Oracle for MySQL bug reports. Previously we had 5 levels:

            • S1 (Critical) - mostly for all kinds of crashes, DoS attack vectors, data corruptions etc
            • S2 (Serious) - mostly for wrong results bugs, broken replication etc
            • S3 (Non-critical) - all kinds of minor but annoying bugs, from unexpected results in some corner cases to misleading or wrong error messages, inefficient or unclear code etc
            • S4 (Feature requests) - anything that should work or be implemented based on common sense, but is not documented in the manual and was not required by the original specification or implementation of some feature.
            • S5 (Performance) - everything works as expected and documented, but the resulting performance is bad or less than expected. Something does not scale well, doesn't return results fast enough in some cases, or could be made faster or some specific platform using some different code or library. This severity level was also probably added at Oracle times, at least it was not there in 2005 when I started to work on MySQL bugs.

            Informal descriptions above are mine and may be incorrect or different from definitions Oracle engineers currently use. I tried to search for Oracle definitions that apply to MySQL, but was not able to find anything immediately useful (any help with public URL is appreciated). 

            In general, severity is defined as the degree of impact a bug has on the operation or use of some software, so less severity assumes less impact on common MySQL operations. One may also expect that bugs with higher severity are fixed first (have higher internal priority). It may not be that simple (and was not during my days in MySQL, when many more inputs were taken into account while setting priority for the bug fix), but it's a valid assumption for any community member.

            By default when searching for bugs you got all bugs of severity levels S1, S2, S3 and S5. You had to specifically care to get feature requests in search results while using bugs database search interface.

            If you try to search bugs today, you'll see two more severity levels added, S6 (Debug Builds) and S7 (Test Cases):

            Now we have 7 Severity levels for MySQL bug reports
            S6 severity level seems to be used for assertion failures and other bugs that affect only debug builds and can not be reproduced literally with non-debug binaries. S7 severity level is probably used for bug reports about failing MTR test cases, assuming that failure does NOT show a regression in MySQL software, but rather some non-determinism, platform dependencies, timing assumptions or other defects of the test case itself.

            By default bug reports with these severity levels are NOT included in search (they are not considered "Production Bugs"). So, one has to care to see them. This, as well as normal common sense based assumption that lower severity eventually means to lower priority for the fix, caused some concerns. It would be great for somebody from Oracle to explain the intended use and reasons for introduction of these severity levels with some more text than a single tweet, to clarify possible FUD people may have. If applied formally, these new severity values may lead to low priority for quite important problems. Most debug assertions are in the code for really good reason, as many weird things (up to crashes and data corruption) may happen in non-debug binaries somewhere later in cases when debug-only assertion fails.

            I was surprised to find out that at the moment we have 67 active S6 bug reports, and 32 active S7 bug reports. The latter list obviously includes reports that should not be S7, like Bug #92274 - "Question of mysql 8.0 sysbench oltp_point_select test" that is obviously about a performance regression noted in MySQL 8 (vs MySQL 5.7) by the bug reporter.

            Any comments from Oracle colleagues on the reasons to introduce new severity levels, their formal definitions and impact on community bug reports processing are greatly appreciated.

            by Valeriy Kravchuk (noreply@blogger.com) at November 04, 2018 04:05 PM

            November 02, 2018

            MariaDB Foundation

            MariaDB 10.1.37 now available

            The MariaDB Foundation is pleased to announce the availability of MariaDB 10.1.37, the latest stable release in the MariaDB 10.1 series. See the release notes and changelogs for details. Download MariaDB 10.1.37 Release Notes Changelog What is MariaDB 10.1? MariaDB APT and YUM Repository Configuration Generator Contributors to MariaDB 10.1.37 Alexander Barkov (MariaDB Corporation) Alexey […]

            The post MariaDB 10.1.37 now available appeared first on MariaDB.org.

            by Ian Gilfillan at November 02, 2018 04:37 PM

            Peter Zaitsev

            Maintenance Windows in the Cloud

            maintenance windows cloud

            maintenance windows cloudRecently, I’ve been working with a customer to evaluate the different cloud solutions for MySQL. In this post I am going to focus on maintenance windows and requirements, and what the different cloud platforms offer.

            Why is this important at all?

            Maintenance windows are required so that the cloud provider can do the necessary updates, patches, and changes to our setup. But there are many questions like:

            • Is this going to impact our production traffic?
            • Is this going to cause any downtime?
            • How long does it take?
            • Any way to avoid it?

            Let’s discuss the three most popular cloud provider: AWS, Google, Microsoft. These three each have a MySQL based database service where we can compare the maintenance settings.

            AWS

            When you create an instance you can define your maintenance window. It’s a 30 minutes block when AWS can update and restart your instances, but it might take more time, AWS does not guarantee the update will be done in 30 minutes. The are two different type of updates, Required and Available. 

            If you defer a required update, you receive a notice from Amazon RDS indicating when the update will be performed. Other updates are marked as available, and these you can defer indefinitely.

            It is even possible to disable auto upgrade for minor versions, and in that case you can decide when do you want to do the maintenance.

            AWS separate OS updates and database engine updates.

            OS Updates

            It requires some downtime, but you can minimise it by using Multi-AZ deployments. First, the secondary instance will be updated. Then AWS do a failover and update the Primary instance as well. This means some small outage during the failover.

            DB Engine Updates

            For DB maintenance, the updates are applied to both instances (primary and secondary) at the same time. That will cause some downtime.

            More information: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_UpgradeDBInstance.Maintenance.html#USER_UpgradeDBInstance.Maintenance.Multi-AZ

            Google CloudSQL

            With CloudSQL you have to define an hour for a maintenance window, for example 01:00–02:00, and in that hour, they can restart the instances at any time. It is not guaranteed the update will be done in that hour. The primary and the secondary have the same maintenance window. The read replicas do not have any maintenance window, they can be stopped at any time.

            CloudSQL does not differentiate between OS or DB engine, or between required and available upgrades. Because the failover replica has the same maintenance window, any upgrade might cause database outage in that time frame.

            More information: https://cloud.google.com/sql/docs/mysql/instance-settings

            Microsoft Azure

            Azure provides a service called Azure Database for MySQL servers. I was reading the documentation and doing some research trying to find anything regarding the maintenance window, but I did not find anything.

            I span up an instance in Azure to see if there is any available settings, but I did not find anything so at this point I do not know how Azure does OS or DB maintenance or how that impacts production traffic.

            If someone knows where can I find this information in the documentation, please let me know.

            Conclusion

            AWS CloudSQL Azure
            Maintenance Window 30m 1h Unknown
            Maintenance Window for Read Replicas No No Unknown
            Separate OS and DB updates Yes No Unknown
            Outage during update Possible Possible Unknown
            Postpone an update Possible No Unknown
            Different priority for updates Yes No Unknown

             

            While I do not intend  to prefer or promote any of the providers, for this specific question, AWS offers the most options and controls for how we want to deal with maintenance.


            Photo by Caitlin Oriel on Unsplash

            by Tibor Korocz at November 02, 2018 01:02 PM

            Jean-Jerome Schmidt

            Effective Monitoring of MySQL With SCUMM Dashboards - Part 3

            We discussed in our previous blogs about the MySQL-related dashboards. We highlighted the things that a DBA can benefit from by studying the graphs, especially when performing their daily routines from diagnostics, metric reporting, and capacity planning. In this blog, we will discuss the InnoDB Metrics and the MySQL Performance Schema, which is very important especially on monitoring InnoDB transactions, disk/cpu/memory I/O, optimizing your queries, or performance tuning of the server.

            This blog touches upon the deep topic of performance, considering that InnoDB would require extensive coverage if we tackle its internals. The Performance Schema is also extensive as it covers kernel and core parts of MySQL and storage engines.

            Let’s begin walking through the graphs.

            MySQL InnoDB Metrics

            This dashboard is great for any MySQL DBA or ops person, as it offers a very good view into the InnoDB storage engine. There are certain graphs here that a user has to consider to enable, because not in all situations that the variables are set correctly in the MySQL configuration.

            • Innodb Checkpoint Age

              According to the manual, checkpointing is defined as follows: “As changes are made to data pages that are cached in the buffer pool, those changes are written to the data files sometime later, a process known as flushing. The checkpoint is a record of the latest changes (represented by an LSN value) that have been successfully written to the data files”. This graph is useful when you would like to determine how your server is performing checkpointing data to your disk. This can be a good reference if your transaction log (redo log or ib_logfile0) is too large. This graph is also a good indicator if you need to adjust variables such as innodb_log_file_size,, innodb_log_buffer_size, innodb_max_dirty_pages_pct, or innodb_adaptive_flushing_method. The closer checkpoint age is to the max checkpoint age, the more filled are the logs and InnoDB will be doing more I/O in order to maintain some free space in the logs. Checkpointing mechanism differs in subtle details between Percona XtraDB-based flavours, MariaDB and Oracle’s version, you can also find differences in it’s implementation between MySQL versions.

            • InnoDB Transactions

              Whenever there’s a large transaction on-going in your MySQL server, this graph is a good reference. It will count the transactions that were created at a specific time, and the history length (or is actually the history list length found in SHOW ENGINE INNODB STATUS) is the number of pages in the undo log. The trends you’ll see here is a good resource to check if it could mean, for example, that purge is delayed due to a very high insert rate of reloading the data or due to a long-running transaction, or if purge simply can't keep up due to a high disk I/O in the volume where your $DATADIR resides.

            • Innodb Row Operations

              For certain DBA tasks, you might want to determine the number of deletes, inserts, reads, and rows updated. Then this graph is what you can use to check these.

            • Innodb Row Lock Time

              This graph is a good resource to look upon when you are noticing that your application is encountering lots of occurrences for “Lock wait timeout exceeded; try restarting transaction”. This can also help you determine if you might have an indication for using bad queries on handling locks. This is also a good reference to look upon when optimizing your queries that involves locking of rows. If the time to wait is too high, you need to check the slow query log or run a pt-query-digest and see what are those suspecting queries causing that bloat in the graph.

            • InnoDB I/O

              Whenever you want to determine the amount of InnoDB data reads, disk flushes, writes, and log writes, this graph has what you need to look at. You can use this graph to determine if your InnoDB variables are well tuned to handle your specific requirements. For example, if you have Battery Backup Module cache but you are not gaining much of its optimum performance, you can rely on this graph to determine if your fsyncs() are higher than expected. Then changing the variable innodb_flush_method and using O_DSYNC can resolve the issue.

            • InnoDB Log File Usage Hourly

              This graph shows only the number of bytes written to the InnoDB redo log files and the growth of your InnoDB log files based on the 24-hour time range of the current date.

            • InnoDB Logging Performance

              This graph is closely related to InnoDB Log File Usage Hourly graph. You have to use this graph whenever you need to determine how large your innodb_log_file_size needs to be. You can determine the number of bytes written to the InnoDB redo log files and how efficiently your MySQL flushes data from memory to disk. Whenever you are experiencing a low-time in need to use your redo log space, then it would indicate that you have to increase your innodb_log_file size. In that case, this graph would tell you that you need to do so. However, to dig more into how much you need for your innodb_log_file, it might make more sense to check the LSN (Log Sequence Number) in SHOW ENGINE INNODB STATUS. Percona has a good blog related to this which is a good source to look at.

            • InnoDB Deadlocks

              In certain situations that your application client is often experiencing deadlocks or you have to look at how much your MySQL is experiencing deadlocks, this graph serves the purpose. Deadlocks indicate that you have poor SQL design which leads to your transactions creating a race condition causing deadlocks.

            • Index Condition Pushdown

              A little word of caution when looking at this graph. First, you have to determine that you have your MySQL global variable innodb_monitor_enable set to the correct value that is module_icp. Otherwise, you’ll experience a “No Data Points” as shown below:

              The graph’s purpose, if has data points defined as what I have in the sample outputs, will provide a DBA with an overlook of how well your queries are benefiting with Index Condition Pushdown or ICP for short. ICP is great feature in MySQL that offers optimization to your queries. Instead of MySQL reading the full rows filtered in your WHERE queries upon retrieval, it will add more checks after your secondary indexes. This adds more granularity and saves time, otherwise the engine has to read the full-table rows instead when it is based only on the filtered index and no ICP is used. This avoids reading the full rows corresponding to your index tuples that matches your secondary indexes.

              Let me elaborate a bit about this graph, let say I have a table named:

              mysql> show create table a\G
              *************************** 1. row ***************************
                     Table: a
              Create Table: CREATE TABLE `a` (
                `id` int(11) NOT NULL,
                `age` int(11) NOT NULL,
                KEY `id` (`id`)
              ) ENGINE=InnoDB DEFAULT CHARSET=latin1
              1 row in set (0.00 sec)

              And has some small data:

              mysql> select * from a;
              +----+-----+
              | id | age |
              +----+-----+
              |  1 |   1 |
              |  2 |   1 |
              |  3 |   1 |
              |  3 |  41 |
              |  4 |  41 |
              |  5 |   4 |
              |  4 |   4 |
              |  4 |   4 |
              +----+-----+
              8 rows in set (0.00 sec)

              When ICP is enabled, results is more efficient and feasible:

              mysql> explain extended select * from a where id>2 and id<4 and age=41;
              +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+------------------------------------+
              | id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                              |
              +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+------------------------------------+
              |  1 | SIMPLE      | a     | NULL       | range | id            | id   | 4       | NULL |    2 |    12.50 | Using index condition; Using where |
              +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+------------------------------------+
              1 row in set, 2 warnings (0.00 sec)

              Than without ICP,

              mysql> set optimizer_switch='index_condition_pushdown=off';
              Query OK, 0 rows affected (0.00 sec)
              
              mysql> explain extended select * from a where id>2 and id<4 and age=41;
              +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-------------+
              | id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
              +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-------------+
              |  1 | SIMPLE      | a     | NULL       | range | id            | id   | 4       | NULL |    2 |    12.50 | Using where |
              +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-------------+
              1 row in set, 2 warnings (0.00 sec)

              This is a simple example of ICP, and how this graph can benefit a DBA.

            • InnoDB Buffer Pool Content

              When working with MySQL and using InnoDB engine, this graph is one of the most common values (innodb_buffer_pool*) that you have to tune up to optimize MySQL performance. Specifically speaking on its buffer pool content, it displays the trends for dirty pages against the total buffer pool content. The total buffer pool content includes the clean pages aside of dirty pages. Determining how efficient your MySQL is handling the buffer pool, this graph serves its purpose.

            • InnoDB Buffer Pool Pages

              This graph is helpful when you want to check how efficient MySQL is using your InnoDB buffer pool. You can use this graph, for instance, if your daily traffic doesn’t fill up the assigned innodb_buffer_pool_size, then this could indicate that certain parts of an application aren’t useful or do not serve any purpose or if you set the innodb_buffer_pool_size very high which might be good to lower the value and reclaim back space to your memory.

            • InnoDB Buffer Pool I/O

              When you have to check the number of pages created and written on InnoDB tables or page reads to InnoDB buffer pool by operations on InnoDB tables.

            • InnoDB Buffer Pool Requests

              When you want to determine how efficiently are your queries are accessing the InnoDB buffer pool, this graph serves the purpose. This graph will show the trends based on the data points on how your MySQL server performs when InnoDB engine has to frequently access the disk (indication of buffer pool has not warmed up yet), how frequent the buffer pool requests were handling read requests and write requests.

            • InnoDB Read-Ahead

              When you have the variable innodb_random_read_ahead set to ON, then add this graph as a valuable trend to look at as part of your DBA routine. It shows the trends on how your MySQL InnoDB storage engine manages the buffer pool by the read-ahead background thread, how it manages those subsequently evicted without having been accessed by queries, and how does InnoDB initiate the random read-ahead when a query scans a large portion of a table but in random order.

            • InnoDB Change Buffer

              When you have Percona Server 5.7 running, this graph is useful when monitoring how well InnoDB has allocated change buffering. This changes includes those inserts, updates, and deletes which are specified by innodb_change_buffering variable. Change buffering helps speed up queries, avoiding substantial random access I/O that would be required to read-in secondary index pages from disk.

            • InnoDB Change Buffer Activity

              This is related to the InnoDB Change Buffer graph, but dissects the information into more viable data points. These provide more information to monitor how InnoDB handles change buffering. This is useful in a particular DBA task to determine if your innodb_change_buffer_max_size is set to a too high value, since the change buffering shares the same memory of the InnoDB buffer pool reducing the memory available to cache data pages. You might have to consider to disable change buffering if the working set almost fits in the buffer pool, or if your tables have relatively few secondary indexes. Remember that change buffering does not impose extra overhead, because it only applies to pages that are not in the buffer pool. This graph is also useful if you have to determine how merges are useful if you do have to benchmark your application based on certain requests for particular scenarios. Let say you have a bulk inserts, you have to set innodb_change_buffering=insert and determine if having the values set in your buffer pool and innodb_change_buffer_max_size do not impact disk I/O, specially during recovery or slow shutdown (necessary if you want to do a failover with low downtime requirement). Also, this graph can serve your purpose to evaluate certain scenarios, since merging of change buffer may take several hours when there are numerous secondary indexes to update and many affected rows. During this time, disk I/O is increased, which can cause a significant slowdown for disk-bound queries.

            MySQL Performance Schema

            The MySQL Performance Schema is a complicated topic. It’s a long and hard one, but I’m going to discuss only information that is specific to the graphs we have in SCUMM. There are certain variables as well that you must consider, and ensure they are set properly. Ensure that you have your variable innodb_monitor_enable = all and userstat=1 to see data points in your graphs. As a note, when I am using the word “event” here, it does not mean that this is related to MySQL Event Scheduler. I’m talking about specific events such as MySQL parses a query, MySQL is reading or writing to relay/binary log file, etc.

            Let’s proceed with the graphs then.

            • Performance Schema File IO (Events)

              This graph fetches data points related to any events that occurred in MySQL which might have been instrumented to create multiple instances of the instrumented object (e.g. binary log reads or InnoDB data file reads). Each row summarizes events for a given event name. For example, if there is an instrument for a mutex that is created for each connection, then there could be many instances of this instrumented event as there are multiple connections. The summary row for the instrument summarizes over all these instances. You can check these events in MySQL manual for Performance Schema Summary Tables for more info.

            • Performance Schema File IO (Load)

              This graph is same as “Performance Schema File IO (Events)” graph except that it’s instrumented based on the load.

            • Performance Schema File IO (Bytes)

              This graph is same as “Performance Schema File IO (Events)” graph except that it’s instrumented based on the the size in bytes. For example, how much time did a specific event take when MySQL triggered wait/io/file/innodb/innodb_data_file event.

            • Performance Schema Waits (Events)

              This graph has the data graph for all waits spent on a specific event. You can check Wait Event Summary Tables in the manual for more info.

            • Performance Schema Waits (Load)

              Same as the “Performance Schema Waits (Events)” graph but this time it shows the trends for the load.

            • Index Access Operations (Load)

              This graph is an aggregation of all the table index I/O wait events grouped by index(es) of a table, as generated by the wait/io/table/sql/handler instrument. You can check the MySQL manual about the Performance Schema table table_io_waits_summary_by_index_usage for more info.

            • Table Access Operations (Load)

              “Same as Index Access Operations (Load)” graph, it’s an aggregation of all table I/O wait events group by table, as generated by the wait/io/table/sql/handler instrument. This is very useful to DBAs. For example, you would like to trace how fast it takes to access (fetch) or update (insert, delete, update) a specific table. You can check in the MySQL manual about the Performance Schema table table_io_waits_summary_by_table for more info.

            • Performance Schema SQL & External Locks (Events)

              This graph is an aggregation (counts of how many times it occured) of all table lock wait events, as generated by the wait/lock/table/sql/handler instrument which is group by table. The SQL lock here in the graph means of the internal locks. These internal locks are read normal, read with shared locks, read high priority, read no insert, write allow write, write concurrent insert, write delayed, write low priority, write normal. While the external locks are read external and write external. In any DBA task, this is very useful if you have to trace and investigate locks on a particular table regardless of its type. You can check the table table_lock_waits_summary_by_table for more info.

            • Performance Schema SQL and External Locks (Seconds)

              Same as graph “Performance Schema SQL & External Locks (Events)”, but specified in seconds. If you want to look for your table locks based on seconds it held the locks, then this graph is your good resource.

            Conclusion

            The InnoDB Metrics and MySQL Performance Schema are some of the most in-depth and complicated parts in the MySQL domain, especially when there is no visualization to assist the interpretation. Thus, going to a manual trace and investigations may take some of your time and hard work. SCUMM dashboards offer a very efficient and feasible way to handle these and lower the extra load on any DBA routine task.

            In this blog, you learnt how to use the dashboards for InnoDB and Performance Schema to improve database performance. These dashboards can make you more efficient at analyzing performance.

            by Paul Namuag at November 02, 2018 08:25 AM

            Peter Zaitsev

            Percona Monitoring and Management (PMM) 1.16.0 Is Now Available

            Percona Monitoring and Management

            PMM (Percona Monitoring and Management) is a free and open-source platform for managing and monitoring MySQL, MongoDB, and PostgreSQL performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL® and MongoDB® servers to ensure that your data works as efficiently as possible.

            Percona Monitoring and Management

            While much of the team is working on longer-term projects, we were able to provide the following feature:

            • MySQL and PostgreSQL support for all cloud DBaaS providers – Use PMM Server to gather Metrics and Queries from remote instances!
            • Query Analytics + Metric Series – See Database activity alongside queries
            • Collect local metrics using node_exporter + textfile collector

            We addressed 11 new features and improvements, and fixed 21 bugs.

            MySQL and PostgreSQL support for all cloud DBaaS providers

            You’re now able to connect PMM Server to your MySQL and PostgreSQL instances, whether they run in a cloud DBaaS environment, or you simply want Database metrics without the OS metrics.  This can help you get up and running with PMM using minimal configuration and zero client installation, however be aware there are limitations – there won’t be any host-level dashboards populated for these nodes since we don’t attempt to connect to the provider’s API nor are we granted access to the instance in order to deploy an exporter.

            How to use

            Using the PMM Add Instance screen, you can now add instances from any cloud provider (AWS RDS and Aurora, Google Cloud SQL for MySQL, Azure Database for MySQL) and benefit from the same dashboards that you are already accustomed to. You’ll be able to collect Metrics and Queries from MySQL, and Metrics from PostgreSQL.  You can add remote instances by selecting the PMM Add Instance item in a PMM group of the system menu:

            https://github.com/percona/pmm/blob/679471210d476a5e98d26a632318f1680cfd98a2/doc/source/.res/graphics/png/metrics-monitor.menu.pmm1.png?raw=true

            where you will then have the opportunity to add a Remote MySQL or Remote PostgreSQL instance:

            You’ll add the instance by supplying just the Hostname, database Username and Password (and optional Port and Name):

            metrics-monitor.add-remote-mysql-instance.png

            Also new as part of this release is the ability to display nodes you’ve added, on screen RDS and Remote Instances:

            metrics-monitor.add-rds-or-remote-instance1.png

            Server activity metrics in the PMM Query Analytics dashboard

            The Query Analytics dashboard now shows a summary of the selected host and database activity metrics in addition to the top ten queries listed in a summary table.  This brings a view of System Activity (CPU, Disk, and Network) and Database Server Activity (Connections, Queries per Second, and Threads Running) to help you better pinpoint query pileups and other bottlenecks:

            https://raw.githubusercontent.com/percona/pmm/86e4215a58e788a8ec7cb1ebe679e1593c484078/doc/source/.res/graphics/png/query-analytics.png

            Extending metrics with node_exporter textfile collector

            While PMM provides an excellent solution for system monitoring, sometimes you may have the need for a metric that’s not present in the list of node_exporter metrics out of the box. There is a simple method to extend the list of available metrics without modifying the node_exporter code. It is based on the textfile collector.  We’ve enabled this collector as on by default, and is deployed as part of linux:metrics in PMM Client.

            The default directory for reading text files with the metrics is /usr/local/percona/pmm-client/textfile-collector, and the exporter reads files from it with the .prom extension. By default it contains an example file example.prom which has commented contents and can be used as a template.

            You are responsible for running a cronjob or other regular process to generate the metric series data and write it to this directory.

            Example – collecting docker container information

            This example will show you how to collect the number of running and stopped docker containers on a host. It uses a crontab task, set with the following lines in the cron configuration file (e.g. in /etc/crontab):

            */1* * * *     root   echo -n "" > /tmp/docker_all.prom; docker ps -a -q | wc -l | xargs echo node_docker_containers_total >> /usr/local/percona/pmm-client/docker_all.prom;
            */1* * * *     root   echo -n "" > /tmp/docker_running.prom; docker ps | wc -l | xargs echo node_docker_containers_running_total >> /usr/local/percona/pmm-client/docker_running.prom;

            The result of the commands is placed into the docker_all.prom and docker_running.prom files and read by exporter and will create two new metric series named node_docker_containers_total and node_docker_containers_running_total, which we’ll then plot on a graph:

            pmm 1.16

            New Features and Improvements

            • PMM-3195 Remove the light bulb
            • PMM-3194 Change link for “Where do I get the security credentials for my Amazon RDS DB instance?”
            • PMM-3189 Include Remote MySQL & PostgreSQL instance logs into PMM Server logs.zip system
            • PMM-3166 Convert status integers to strings on ProxySQL Overview Dashboard – Thanks,  Iwo Panowicz for  https://github.com/percona/grafana-dashboards/pull/239
            • PMM-3133 Include Metric Series on Query Analytics Dashboard
            • PMM-3078 Generate warning “how to troubleshoot postgresql:metrics” after failed pmm-admin add postgresql execution
            • PMM-3061 Provide Ability to Monitor Remote MySQL and PostgreSQL Instances
            • PMM-2888 Enable Textfile Collector by Default in node_exporter
            • PMM-2880 Use consistent favicon (Percona logo) across all distribution methods
            • PMM-2306 Configure EBS disk resize utility to run from crontab in PMM Server
            • PMM-1358 Improve Tooltips on Disk Space Dashboard – thanks, Corrado Pandiani for texts

            Fixed Bugs

            • PMM-3202 Cannot add remote PostgreSQL to monitoring without specified dbname
            • PMM-3186 Strange “Quick ranges” tag appears when you hover over documentation links on PMM Add Instance screen
            • PMM-3182 Some sections for MongoDB are collapsed by default
            • PMM-3171 Remote RDS instance cannot be deleted
            • PMM-3159 Problem with enabling RDS instance
            • PMM-3127 “Expand all” button affects JSON in all queries instead of the selected one
            • PMM-3126 Last check displays locale format of the date
            • PMM-3097 Update home dashboard to support PostgreSQL nodes in Environment Overview
            • PMM-3091 postgres_exporter typo
            • PMM-3090 TLS handshake error in PostgreSQL metric
            • PMM-3088 It’s possible to downgrade PMM from Home dashboard
            • PMM-3072 Copy to clipboard is not visible for JSON in case of long queries
            • PMM-3038 Error adding MySQL queries when options for mysqld_exporters are used
            • PMM-3028 Mark points are hidden if an annotation isn’t added in advance
            • PMM-3027 Number of vCPUs for RDS is displayed incorrectly – report and proposal from Janos Ruszo
            • PMM-2762 Page refresh makes Search condition lost and shows all queries
            • PMM-2483 LVM in the PMM Server AMI is poorly configured/documented – reported by Olivier Mignault  and lot of people involved.  Special thanks to  Chris Schneider for checking with fix options
            • PMM-2003 Delete all info related to external exporters on pmm-admin list output

            How to get PMM Server

            PMM is available for installation using three methods:

            Help us improve our software quality by reporting any Percona Monitoring and Management bugs you encounter using our bug tracking system.

            by Dmitriy Kostiuk at November 02, 2018 01:26 AM

            November 01, 2018

            Peter Zaitsev

            WiredTiger Encryption at Rest with Percona Server for MongoDB

            wired-tiger-encryption

            wired-tiger-encryptionEncryption has become an important function in the database industry, as most companies are taking extra care to keep their data safe. It is important to keep the data safe on disk as well as when it is moving in the network. This restricts any unauthorized access to the data. These two types of protection are known as encryption at REST for the data in storage, and encryption in TRANSPORT for the data moving in the network.

            In upstream MongoDB software, data encryption at rest is available – but in the Enterprise version only. So those who are using the community version and want to implement encryption at rest have to use disk level encryption or file system encryption (like LUKS or DM-crypt) to achieve the same effect. This seems to solve for encrypting the data, but it comes with the added complexity of implementing and maintaining an extra set of operations. We have seen some customers face trouble after implementing the encryption at storage level due to the bugs in the encryption software.

            Now the good NEWS!

            Percona Server for MongoDB now provides WiredTiger encryption at rest with Percona Server for MongoDB 3.6.8-2.0 in BETA, and it is free to use. This useful feature applies encryption to only the MongoDB data, rather than full storage encryption. More importantly, it requires very minimal steps and is easy to implement when starting the DB. This is available only for the WiredTiger engine now, and can encrypt the data with the local key management via a keyfile. We expect that future releases will support third-party key management and vaults.

            How to implement encryption:

            The example below shows how to implement WiredTiger encryption at rest in Percona Server for MongoDB:

            Add the encryption options below into mongod.conf:

            [root@app ~]# grep security -A2 /etc/mongod.conf
            security:
              enableEncryption: true
              encryptionKeyFile: /data/key/mongodb.key

            By default, Percona Server for MongoDB uses the AES256-CBC cipher mode. If you want to use the AES256-GCM cipher mode, then use the encryptionCipherMode parameter to change it. In general, CBC and GCM cipher modes work differently. CBC is faster and GCM is safer (compared to each other). I found some interesting discussion and benchmark here and here.

            encryptionCipherMode: AES256-GCM

            Create your key with openssl as below:

            [root@app ~]# mkdir /data/key
            [root@app ~]# openssl rand -base64 32 > /data/key/mongodb.key
            [root@app ~]# chmod 600 /data/key/mongodb.key

            Now start Percona Server for MongoDB:

            [root@app ~]# systemctl start mongod
            [root@app ~]#

            How to confirm that you have enabled encryption at rest in Percona Server for MongoDB:

            To check whether you have enabled the encryption successfully in the database, you can use the command below to check:

            > db.serverCmdLineOpts().parsed.security
            { "enableEncryption" : true, "encryptionKeyFile" : "/data/key/mongodb.key" }

            Search for the string “percona_encryption_extension_init” in your log file:

            [root@app ~]# grep -i "percona_encryption_extension_init" /var/log/mongo/mongod.log
            2018-10-30T10:32:40.895+0000 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=256M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),cache_cursors=false,compatibility=(release="3.0",require_max="3.0"),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),encryption=(name=percona,keyid="/default"),extensions=[local=(entry=percona_encryption_extension_init,early_load=true,config=(cipher=AES256-CBC)),],cache_size=256M

            Hope this helped with how to encrypt your MongoDB data with the Percona Server MongoDB 3.6.8-2.0 package. We will let you know as we make future versions support third-party key management and vaults soon!


            Photo by Wayne Chan on Unsplash

            by Vinodh Krishnaswamy at November 01, 2018 07:01 PM

            MariaDB Foundation

            MariaDB 10.0.37 now available

            The MariaDB Foundation is pleased to announce the availability of MariaDB 10.0.37, the latest stable release in the MariaDB 10.0 series. See the release notes and changelogs for details. Download MariaDB 10.0.37 Release Notes Changelog What is MariaDB 10.0? MariaDB APT and YUM Repository Configuration Generator Contributors to MariaDB 10.0.37 Alexander Barkov (MariaDB Corporation) Anel […]

            The post MariaDB 10.0.37 now available appeared first on MariaDB.org.

            by Ian Gilfillan at November 01, 2018 03:19 PM

            Peter Zaitsev

            How To Best Use Percona Server Column Compression With Dictionary

            Database Compression

            column compressionVery often, database performance is affected by the inability to cache all the required data in memory. Disk IO, even when using the fastest devices, takes much more time than a memory access. With MySQL/InnoDB, the main memory cache is the InnoDB buffer pool. There are many strategies we can try to fit as much data as possible in the buffer pool, and one of them is data compression.

            With regular MySQL, to compress InnoDB data you can either use “Barracuda page compression” or “transparent page compression with punch holes”. The use of the ZFS filesystem is another possibility, but it is external to MySQL and doesn’t help with caching. All these solutions are transparent, but often they also have performance and management implications. If you are using Percona Server for MySQL, you have yet another option, “column compression with dictionary“. This feature is certainly not receiving the attention it merits. I think it is really cool—let me show you why.

            We all know what compression means, who has not zipped a file before attaching it to an email? Compression removes redundancy from a file. What about the dictionary? A compression dictionary is a way to seed the compressor with expected patterns, in order to improve the compression ratio. Because you can specify a dictionary, the scope of usefulness of column compression with the Percona Server for MySQL feature is greatly increased. In the following sections, we’ll review the impacts of a good dictionary, and devise a way to create a good one without any guessing.

            A simple use case

            A compression algorithm needs a minimal amount of data in order to achieve a reasonable compression ratio. Typically, if the object is below a few hundred bytes, there is rarely enough data to have repetitive patterns and when the compression header is added, the compressed data can end up larger than the original.

            mysql> select length('Hi!'), length(compress('Hi!'));
            +---------------+-------------------------+
            | length('Hi!') | length(compress('Hi!')) |
            +---------------+-------------------------+
            |             3 |                      15 |
            +---------------+-------------------------+
            1 row in set (0.02 sec)

            Compressing a string of three bytes results in a binary object of 15 bytes. That’s counter productive.

            In order to illustrate the potential of the dictionary, I used this dataset:

            http://skeeto.s3.amazonaws.com/share/JEOPARDY_QUESTIONS1.json.gz

            It is a set of 100k Jeopardy questions written in JSON. To load the data in MySQL, I created the following table:

            mysql> show create table TestColCompression\G
            *************************** 1. row ***************************
            Table: TestColCompression
            Create Table: CREATE TABLE `TestColCompression` (
            `id` int(11) NOT NULL AUTO_INCREMENT,
            `question` text NOT NULL,
            PRIMARY KEY (`id`)
            ) ENGINE=InnoDB AUTO_INCREMENT=79977 DEFAULT CHARSET=latin1
            1 row in set (0.00 sec)

            Then, I did some formatting to create insert statements:

            zcat JEOPARDY_QUESTIONS1.json.gz | perl -p -e 's/\[\{/\{/g' | perl -p -e 's/\}, \{/\}\n\{/g' | perl -p -e "s/'/''/g" | \
              (while read line; do echo "insert into testColComp (questionJson) values ('$line');"; done )

            And I executed the inserts. About 20% of the rows had some formatting issues but nevertheless, I ended up with close to 80k rows:

            mysql> show table status\G
            *************************** 1. row ***************************
            Name: TestColCompression
            Engine: InnoDB
            Version: 10
            Row_format: Dynamic
            Rows: 78110
            Avg_row_length: 316
            Data_length: 24690688
            Max_data_length: 0
            Index_length: 0
            Data_free: 4194304
            Auto_increment: 79977
            Create_time: 2018-10-26 15:16:41
            Update_time: 2018-10-26 15:40:34
            Check_time: NULL
            Collation: latin1_swedish_ci
            Checksum: NULL
            Create_options:
            Comment:
            1 row in set (0.00 sec)

            The average row length is 316 bytes for a total data size of 23.55MB. The question JSON objects are large enough to matter, but barely large enough for compression. Here are the first five rows:

            mysql> select question from TestColCompression limit 5\G
            *************************** 1. row ***************************
            question: {"category": "HISTORY", "air_date": "2004-12-31", "question": "'For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory'", "value": "$200", "answer": "Copernicus", "round": "Jeopardy!", "show_number": "4680"}
            *************************** 2. row ***************************
            question: {"category": "ESPN's TOP 10 ALL-TIME ATHLETES", "air_date": "2004-12-31", "question": "'No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves'", "value": "$200", "answer": "Jim Thorpe", "round": "Jeopardy!", "show_number": "4680"}
            *************************** 3. row ***************************
            question: {"category": "EVERYBODY TALKS ABOUT IT...", "air_date": "2004-12-31", "question": "'The city of Yuma in this state has a record average of 4,055 hours of sunshine each year'", "value": "$200", "answer": "Arizona", "round": "Jeopardy!", "show_number": "4680"}
            *************************** 4. row ***************************
            question: {"category": "OLD FOLKS IN THEIR 30s", "air_date": "2009-05-08", "question": "'The district of conservative rep. Patrick McHenry in this state includes Mooresville, a home of NASCAR'", "value": "$800", "answer": "North Carolina", "round": "Jeopardy!", "show_number": "5690"}
            *************************** 5. row ***************************
            question: {"category": "MOVIES & TV", "air_date": "2009-05-08", "question": "'Tim Robbins played a public TV newsman in "Anchorman: The Legend of" him'", "value": "$800", "answer": "Ron Burgundy", "round": "Jeopardy!", "show_number": "5690"}

            Let’s begin by a straight column compression without specifying a dictionary:

            mysql> alter table TestColCompression modify question text COLUMN_FORMAT COMPRESSED;
            Query OK, 79976 rows affected (4.25 sec)
            Records: 79976 Duplicates: 0 Warnings: 0
            mysql> analyze table TestColCompression;
            +----------------------------+---------+----------+----------+
            | Table | Op | Msg_type | Msg_text |
            +----------------------------+---------+----------+----------+
            | colcomp.TestColCompression | analyze | status | OK |
            +----------------------------+---------+----------+----------+
            mysql> show table status\G
            *************************** 1. row ***************************
            Name: TestColCompression
            Engine: InnoDB
            Version: 10
            Row_format: Dynamic
            Rows: 78995
            Avg_row_length: 259
            Data_length: 20496384
            Max_data_length: 0
            Index_length: 0
            Data_free: 4194304
            Auto_increment: 79977
            Create_time: 2018-10-26 15:47:56
            Update_time: 2018-10-26 15:47:56
            Check_time: NULL
            Collation: latin1_swedish_ci
            Checksum: NULL
            Create_options:
            Comment:
            1 row in set (0.00 sec)

            As expected the data didn’t compress much. The compression ration is 0.82 or if expressed as a percentage, 18%. Since the JSON headers are always the same, and are present in all questions, we should minimally use them for the dictionary. Trying a minimal dictionary made of the headers gives:

            mysql> SET @dictionary_data = 'category' 'air_date' 'question' 'value' 'answer' 'round' 'show_number' ;
            Query OK, 0 rows affected (0.01 sec)
            mysql> CREATE COMPRESSION_DICTIONARY simple_dictionary (@dictionary_data);
            Query OK, 0 rows affected (0.00 sec)
            mysql> alter table TestColCompression modify question text COLUMN_FORMAT COMPRESSED WITH COMPRESSION_DICTIONARY simple_dictionary;
            Query OK, 79976 rows affected (4.72 sec)
            Records: 79976 Duplicates: 0 Warnings: 0
            mysql> analyze table TestColCompression;
            +----------------------------+---------+----------+----------+
            | Table | Op | Msg_type | Msg_text |
            +----------------------------+---------+----------+----------+
            | colcomp.TestColCompression | analyze | status | OK |
            +----------------------------+---------+----------+----------+
            1 row in set (0.00 sec)
            mysql> show table status\G
            *************************** 1. row ***************************
            Name: TestColCompression
            Engine: InnoDB
            Version: 10
            Row_format: Dynamic
            Rows: 78786
            Avg_row_length: 246
            Data_length: 19447808
            Max_data_length: 0
            Index_length: 0
            Data_free: 4194304
            Auto_increment: 79977
            Create_time: 2018-10-26 17:58:17
            Update_time: 2018-10-26 17:58:17
            Check_time: NULL
            Collation: latin1_swedish_ci
            Checksum: NULL
            Create_options:
            Comment:
            1 row in set (0.00 sec)

            There is a little progress, we now have a compression ratio of 0.79. Obviously, we could do more but without a tool, we’ll have to guess. A compressor like zlib builds a dictionary as part of its compression effort, could we use that? Yes, but only if we can generate it correctly and access the result. That’s not readily available with the common compressors I know. Fortunately, someone else had the same issue and wrote a compressor able to save its dictionary. Please let me introduce femtozip.

            Femtozip to the rescue

            The tool, by itself, has no magic algorithm. It is based on zlib, from what I can understand from the code. Anyway, we won’t compress anything with it, we’ll use it to generate a good dictionary. In order to create a dictionary, the tool looks at a set of files and try to see patterns between them. The use of a single big file defeat the purpose. So, I generated one file per question with:

            mkdir questions
            cd questions
            l=1; mysql -u blog -pblog colcomp -e 'select question from TestColCompression' | (while read line; do echo $line > ${l}; let l=l+1; done)

            Then, I used the following command to generate a 1024 bytes dictionary using all the files starting by “1”:

            ../femtozip/cpp/fzip/src/fzip --model ../questions_1s.mod --build --dictonly --maxdict 1024 1*
            Building dictionary...

            In about 10s the job was done. I tried with all the 80k files and… I had to kill the process after thirty minutes. Anyway, there are 11111 files starting with “1”, a very decent sample. Our generated dictionary looks like:

            cat ../questions_1s.mod
            ", "air_date", "round": "Double Jeopardy!", "show_number": " of this for 00", "answer": "the 0", "question": "'e", "round": "Jeopardy!", "show_number": "r", "round": "{"cate gory": "S", "air_date": "1998-s", "round": "Double Jeopardy!", "show_number": " of the ", "air_date": "2008-{"category": "THE {"category": "As", "round": "Jeopardy!", "show_number": "4", "question": "'Jeopardy!", "show_number": "2'", "value": "$1000", "answer": "7", "question": "'The ", "question": "'A'", "value": "$600", "answer": "9", "questi on": "'In ", "question": "'This 3", "question": "'2", "question": "'e'", "value": "$", "round": "Double Jeopardy!", "show_number": "4", "round": "Jeopardy!", "show_number": "4"'", "value": "$S", "air_date": "199", "round": "Double Jeopardy!", "show_number": "5s'", "value": "$", "round": "Double Jeopardy!", "show_number": "3", "round": "Jeopardy !", "show_number": "3", "round": "Jeopardy!", "show_number": "5'", "value": "$200", "answer": "'", "value": "$800", "answer": "'", "value": "$400", "answer": "

            With some formatting, I was able to create a dictionary with the above data:

            mysql> SET @dictionary_data = '", "air_date", "round": "Double Jeopardy!", "show_number": " of this for 00", "answer": "the 0", "question": "''e", "round": "Jeopardy!", "show_number": "r", "round": "{"category": "S", "air_date": "1998-s", "round": "Double Jeopardy!", "show_number": " of the ", "air_date": "2008-{"category": "THE {"category": "As", "round": "Jeopardy!", "show_number": "4", "question": "''Jeopardy!", "show_number": "2''", "value": "$1000", "answer": "7", "question": "''The ", "question": "''A''", "value": "$600", "answer": "9", "question": "''In ", "question": "''This 3", "question": "''2", "question": "''e''", "value": "$", "round": "Double Jeopardy!", "show_number": "4", "round": "Jeopardy!", "show_number": "4"''", "value": "$S", "air_date": "199", "round": "Double Jeopardy!", "show_number": "5s''", "value": "$", "round": "Double Jeopardy!", "show_number": "3", "round": "Jeopardy!", "show_number": "3", "round": "Jeopardy!", "show_number": "5''", "value": "$200", "answer": "''", "value": "$800", "answer": "''", "value": "$400", "answer": "' ;
            Query OK, 0 rows affected (0.00 sec)
            mysql> CREATE COMPRESSION_DICTIONARY femtozip_dictionary (@dictionary_data);
            Query OK, 0 rows affected (0.00 sec)
            <\pre>
            And then, I altered the table to use the new dictionary:

            mysql> alter table TestColCompression modify question text COLUMN_FORMAT COMPRESSED WITH COMPRESSION_DICTIONARY femtozip_dictionary;
            Query OK, 79976 rows affected (4.05 sec)
            Records: 79976 Duplicates: 0 Warnings: 0
            mysql> analyze table TestColCompression;
            +----------------------------+---------+----------+----------+
            | Table | Op | Msg_type | Msg_text |
            +----------------------------+---------+----------+----------+
            | colcomp.TestColCompression | analyze | status | OK |
            +----------------------------+---------+----------+----------+
            1 row in set (0.00 sec)
            mysql> show table status\G
            *************************** 1. row ***************************
            Name: TestColCompression
            Engine: InnoDB
            Version: 10
            Row_format: Dynamic
            Rows: 79861
            Avg_row_length: 190
            Data_length: 15220736
            Max_data_length: 0
            Index_length: 0
            Data_free: 4194304
            Auto_increment: 79977
            Create_time: 2018-10-26 17:56:09
            Update_time: 2018-10-26 17:56:09
            Check_time: NULL
            Collation: latin1_swedish_ci
            Checksum: NULL
            Create_options:
            Comment:
            1 row in set (0.00 sec)

            That’s interesting, we are now achieving a ratio of 0.61, a significant improvement. I pushed my luck and tried with a 2048 bytes dictionary. It further reduced the ratio to 0.57 but that was about the best I got. Larger dictionaries didn’t lower the ratio below 0.57. Zlib supports up to 32KB for the dictionary.

            So, to recap:

            • column compression without dictionary, ratio of 0.82
            • column compression with simple dictionary, ratio of 0.79
            • column compression with a 1k dictionary from femtozip, ratio of 0.61
            • column compression with a 2k dictionary from femtozip, ratio of 0.57

            The above example stores a JSON document in a text column. MySQL 5.7 includes a JSON datatype which behaves a bit differently regarding the dictionary. Delimiting characters like ‘{}’ are removed in the on disk representation of a JSON column. If you have TBs of data in similar tables, you should really consider column compression and a systematic way of determining the dictionary with femtozip. In addition to improve the compression, it is likely to be the less performance impacting solution. Would it be interesting to generate a dictionary from existing data with a command like this one?

            CREATE COMPRESSION_DICTIONARY_FROM_DATA A_good_dictionary (2048, select questions from TestColCompression limit 10000);

            where the dictionary creation process would implicitly includes steps similar to the ones I did with femtozip.

            by Yves Trudeau at November 01, 2018 02:04 PM

            October 31, 2018

            Peter Zaitsev

            Percona Server for MongoDB 3.6.8-2.0 Is Now Available

            Percona Server for MongoDB

            Percona Server for MongoDB

            Percona announces the release of Percona Server for MongoDB 3.6.8-2.0 on October 31, 2018. Download the latest version from the Percona website or the Percona Software Repositories.

            Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.6 Community Edition. It supports MongoDB 3.6 protocols and drivers.

            Percona Server for MongoDB extends Community Edition functionality by including the Percona Memory Engine storage engine, as well as several enterprise-grade features. Also, it includes MongoRocks storage engine, which is now deprecated. Percona Server for MongoDB requires no changes to MongoDB applications or code.

            This release introduces data at rest encryption for the WiredTiger storage engine. Data at rest encryption for WiredTiger in Percona Server for MongoDB is compatible with the upstream implementation. In this release of Percona Server for MongoDB, this feature is of BETA quality and should not be used in a production environment.

            Note that Percona Server for MongoDB 3.6.8-2.0 is based on  MongoDB 3.6.8 which is distributed under the GNU AGPLv3 license. Subsequent releases of Percona Server for MongoDB will change its license to SSPL when we move to the SSPL codebase released by MongoDB. For more information, see Percona Statement on MongoDB Community Server License Change.

            This release also contains a fix for bug PSMDB-238

            Known Issues

            • PSMDB-233: When starting Percona Server for MongoDB 3.6 with WiredTiger encryption options but using a different storage engine, the server starts normally and produces no warnings that these options are ignored.
            • PSMDB-239: WiredTiger encryption is not disabled with using the Percona Memory Engine storage engine.
            • PSMDB-245: KeyDB’s WiredTiger logs are not properly rotated without restarting the server.

            The Percona Server for MongoDB 3.6.8-2.0 release notes are available in the official documentation.

            by Borys Belinsky at October 31, 2018 05:25 PM

            MariaDB Foundation

            MariaDB Galera Cluster 5.5.62 now available

            The MariaDB Foundation is pleased to announce the availability of MariaDB 5.5.62, the latest stable release in the MariaDB Galera Cluster 5.5 series. See the release notes and changelogs for details. Download MariaDB Galera Cluster 5.5.62 Release Notes Changelog What is MariaDB Galera Cluster? Contributors to MariaDB Galera Cluster 5.5.62 Alexander Barkov (MariaDB Corporation) Daniel […]

            The post MariaDB Galera Cluster 5.5.62 now available appeared first on MariaDB.org.

            by Ian Gilfillan at October 31, 2018 03:06 PM

            Peter Zaitsev

            Percona Server for MySQL 8.0 Delivers Increased Reliability, Performance and Security

            Percona Server for MySQL 5.7

            Percona Server for MySQLPercona released a Release Candidate (RC) version of Percona Server for MySQL 8.0, the company’s free, enhanced, drop-in replacement for MySQL Community Edition. Percona Server for MySQL 8.0 includes all the features of MySQL Community Edition 8.0, along with enterprise-class features from Percona that make it ideal for enterprise production environments. The latest release offers increased reliability, performance and security.

            Percona Server for MySQL 8.0 General Availability (GA) will be available later this year. You learn how to install the release candidate software here. Please note this release candidate is not intended nor recommended for production environments

            MySQL databases remain a pillar of enterprise data centers. But as the amount of data collected continues to soar, and as the types of databases deployed for specific applications continue to expand, organizations require a powerful and cost-effective MySQL solution for their production environments. Percona meets this need with its mature, proven open source alternative to MySQL Community Edition. Percona also backs MySQL 8.0 with the support and consulting services enterprises need to achieve optimal performance and maximize the value they obtain from the software – all with lower cost and complexity.

            With more than 4,550,000 downloads, Percona Server for MySQL offers self-tuning algorithms and support for extremely high-performance hardware, delivering excellent performance and reliability for production environments. Percona Server for MySQL is trusted by thousands of enterprises to provide better performance. The following capabilities are unique to Percona Server for MySQL 8.0:

            • Greater scalability and availability, enhanced backups, and increased visibility to improve performance, reliability and usability
            • Parallel doublewrite functionality for greatly improved write performance, resulting in increased speed
            • Additional write-optimized storage engine, MyRocks, which takes advantage of modern hardware and database technology to reduce storage requirements, maintenance costs, and increase ROI in both on-premises and cloud-based applications, delivered with a MySQL-compatible interface.
            • Enhanced encryption functionality – including integration with Hashicorp Vault to simplify the management of encryption keys – for increased security
            • Advanced PAM-based authentication, audit logging, and threadpool scalability – enterprise-grade features available in Percona Server for MySQL without a commercial license

            Percona Server for MySQL 8.0 also contains all the new features introduced in MySQL Community Edition 8.0, including:

            • Greater Reliability – A new transactional data dictionary makes recovery from failure easier, providing users with a higher level of comfort that data will not be lost. The transactional data dictionary is now crash-safe and centralized. In addition, support for atomic Data Definition Language (DDL) statements ensures that all operations associated with a DDL transaction are committed or rejected as a unit.
            • Enhanced Performance – New functions and expressions (along with Percona’s parallel doublewrite buffer) improve overall performance, allowing users to see their data more quickly. Enhanced JSON functionality improves document storage and query capabilities, and the addition of Window functions provides greater flexibility for aggregation queries. Common Table Expressions (CTE) enable improved query syntax for complex queries.
            • Increased Security – The ability to collect a typical series of permission grant statements for a user into a defined role and then apply that role to a user in MySQL makes the database environment more secure from outside attack by allowing administrators to better manage access permissions for authorized users. SQL Roles also enable administrators to more easily and efficiently determine a set of privileges for multiple associated users, saving time and reducing errors.
            • Expanded Queries – Percona Server for MySQL 8.0 provides support for spatial data types and indexes, as well as for Spatial Reference System (SRS), the industry-standard method for geospatial lookup.

            Learn more about the Percona Server for MySQL 8.0 RC release here.

            by Tyler Duzan at October 31, 2018 07:57 AM

            Percona Server for MongoDB 4.0 Is an Ideal Solution for Even More Production Use Cases

            Percona Server for MongoDB

            Percona Server for MongoDBPercona announces the planned release of Percona Server for MongoDB 4.0, the latest version of the company’s free, enhanced, drop-in replacement for MongoDB Community Edition. Percona Server for MongoDB 4.0 includes all the features of MongoDB Community Edition 4.0, along with enterprise-class features from Percona that make it ideal for enterprise production environments.

            Percona Server for MongoDB 4.0 will be generally available later this year.

            Organizations are increasingly leveraging multiple open source databases to meet their diverse application requirements and improve the customer experience. Percona supports these organizations with a range of enhanced, production-ready open source databases, enterprise-grade support, and consulting services.

            With more than 385,000 downloads, Percona Server for MongoDB provides all the cost and agility benefits of free, proven open source software, along with greater security, reliability and flexibility. With Percona Server for MongoDB, an increasing number of organizations can confidently run the document-based NoSQL MongoDB database to support their product catalogs, online shopping carts, Internet of Things (IoT) applications, mobile/social apps and more. Percona also backs Percona Server for MongoDB 4.0 and MongoDB Community Edition with the support and consulting services enterprises need to achieve optimal performance and maximize the value they obtain from the software – all with lower cost and complexity.

            Percona Kubernetes Operator for MongoDB enables easier deployment of Percona Server for MongoDB environments – including standalone, replica set, or sharded cluster – inside Kubernetes and OpenShift platforms, without the need to move to an Enterprise Server. Management and backup are provided through the Operator working with Percona Monitoring and Management and hot backup.

            Percona Server for MongoDB 4.0 contains all of the features in MongoDB Community Edition 4.0, including important capabilities that greatly expand its use cases:

            • Support for multi-document ACID transactions – ensuring accurate updates to all of the documents involved in a transaction, and moving the complexity of achieving these updates from the application to the database.
            • Support for SCRAM-SHA-256 authentication – making the production environment more secure and less vulnerable to external attack.
            • New type conversion and string operators – providing greater flexibility when performing aggregations.

            Percona Server for MongoDB 4.0 also offers essential enterprise features for free, including:

            • Encrypted WiredTiger storage engine (“data at rest encryption”) with local key management. Integration with key management systems will be available in future releases of Percona Server for MongoDB.
            • SASL Authentication plugin for enabling authentication through OpenLDAP or Active Directory.
            • Open-source auditing for visibility into user and process actions in the database, with the ability to redact sensitive information (such as usernames and IP addresses) from log files.
            • Hot backups to protect against data loss in the event of a crash or disaster – backup activity does not impact performance.
            • Percona Memory Engine, a 100 percent open source in-memory storage engine designed for Percona Server for MongoDB, which is ideal for in-memory computing and other applications demanding very low-latency workloads.
            • Integration with Percona Toolkit and Percona Monitoring and Management for query performance analytics and troubleshooting.
            • Enhanced query profiling.

            by Vadim Tkachenko at October 31, 2018 07:55 AM

            Percona XtraBackup 8.0-3-rc1 Is Available

            Percona XtraBackup 8.0

            Percona XtraBackup 8.0Percona is glad to announce the release candidate of Percona XtraBackup 8.0-3-rc1 on October 31 2018. You can download it from our download site and apt and yum repositories.

            This is a Release Candidate quality release and it is not intended for
            production. If you want a high quality, Generally Available release, use the current stable version (the most recent stable version at the time of writing is 2.4.12 in the 2.4 series).

            This release supports backing up and restoring MySQL 8.0 and Percona Server for MySQL 8.0

            Things to Note

            • innobackupex was previously deprecated and has been removed
            • Due to the new MySQL redo log and data dictionary formats the Percona XtraBackup 8.0.x versions will only be compatible with MySQL 8.0.x and the upcoming Percona Server for MySQL 8.0.x
            • For experimental migrations from earlier database server versions, you will need to backup and restore and using XtraBackup 2.4 and then use mysql_upgrade from MySQL 8.0.x

            Installation

            As this is a release candidate, installation is performed by enabling the testing repository and installing the software via your package manager. For Debian based distributions see apt installation instructions, for RPM based distributions see yum installation instructions. Note that in both cases after installing the current percona-release package, you’ll need to enable the testing repository in order to install Percona XtraBackup 8.0.3-rc1.

            Improvements

            • PXB-1655:  The --lock-ddl option is supported when backing up MySQL 8

            Bugs Fixed

            • PXB-1678:  Incremental backup prepare run with the --apply-log-only option could roll back uncommitted transactions.
            • PXB-1672:  The MTS slave without GTID could be backed up when the --safe-slave-backup option was applied.

            by Borys Belinsky at October 31, 2018 07:55 AM

            Release Candidate for Percona Server 8.0.12-2rc1 Is Available

            Percona Server for MySQL 5.7

            Following the alpha release announced earlier, Percona announces the release candidate of Percona Server for MySQL 8.0.12-2rc1 on October 31, 2018. Download the latest version from the Percona website or from the Percona Software Repositories.

            This release is based on MySQL 8.0.12 and includes all the bug fixes in it. It is a Release Candidate quality release and it is not intended for production. If you want a high quality, Generally Available release, use the current Stable version (the most recent stable release at the time of writing in the 5.7 series is 5.7.23-23).

            Percona provides completely open-source and free software.

            Installation

            As this is a release candidate, installation is performed by enabling the testing repository and installing the software via your package manager.  For Debian based distributions see apt installation instructions, for RPM based distributions see yum installation instructions.  Note that in both cases after installing the current percona-release package, you’ll need to enable the testing repository in order to install Percona Server for MySQL 8.0.12-2rc1.  For manual installations you can download from the testing repository directly through our website.

            New Features

            • #4550: Native Partitioning support for MyRocks storage engine
            • #3911: Native Partitioning support for TokuDB storage engine
            • #4946: Add an option to prevent implicit creation of column family in MyRocks
            • #4839: Better default configuration for MyRocks and TokuDB
            • InnoDB changed page tracking has been rewritten to account for redo logging changes in MySQL 8.0.11.  This fixes fast incremental backups for PS 8.0
            • #4434: TokuDB ROW_FORMAT clause has been removed, compression may be set by using the session variable tokudb_row_format instead.

            Improvements

            • Several packaging changes to bring Percona packages more in line with upstream, including split repositories. As you’ll note from our instructions above we now ship a tool with our release packages to help manage this.

            Bugs Fixed

            • #4785: Setting version_suffix to NULL could lead to handle_fatal_signal (sig=11) in Sys_var_version::global_value_ptr
            • #4788: Setting log_slow_verbosity and enabling the slow_query_log could lead to a server crash
            • #4947: Any index comment generated a new column family in MyRocks
            • #1107: Binlog could be corrupted when tmpdir got full
            • #1549: Server side prepared statements lead to a potential off-by-second timestamp on slaves
            • #4937: rocksdb_update_cf_options was useless when specified in my.cnf or on command line.
            • #4705: The server could crash on snapshot size check in RocksDB
            • #4791: SQL injection on slave due to non-quoting in binlogged ROLLBACK TO SAVEPOINT
            • #4953: rocksdb.truncate_table3 was unstable

            Other bugs fixed:

            • #4811: 5.7 Merge and fixup for old DB-937 introduces possible regression
            • #4885: Using ALTER … ROW_FORMAT=TOKUDB_QUICKLZ leads to InnoDB: Assertion failure: ha_innodb.cc:12198:m_form->s->row_type == m_create_info->row_type
            • Numerous testsuite failures/crashes

            Upcoming Features

            by Borys Belinsky at October 31, 2018 07:50 AM

            October 30, 2018

            Peter Zaitsev

            20+ MongoDB Alternatives You Should Know About

            alternatives to MongoDB

            alternatives to MongoDBAs MongoDB® has changed their license from AGPL to SSPL many are concerned by this change, and by how sudden it has been. Will SSPL be protective enough for MongoDB, or will the next change be to go to an altogether proprietary license? According to our poll, many are going to explore MongoDB alternatives. This blog post provides a brief outline of technologies to consider.

            Open Source Data Stores

            • PostgreSQL is the darling of the open source database community. Especially if your concern is the license,  PostgreSQL’s permissive licence is hard to beat. PostgreSQL has powerful JSON Support, and there are many successful stories of migrating from MongoDB to PostgreSQL
            • Citus While PostgreSQL is a powerful database, and you can store terabytes of data on a single cluster, at a larger scale you will need sharding. If so, consider the Citus PostgreSQL extension, or the DBaaS offering from the same guys.
            • TimescaleDB  If on the other hand you are storing  time series data in MongoDB, then TimescaleDB might be a good fit.
            • ToroDB If you would love to use PostgreSQL but need MongoDB wire protocol compatibility, take a look at ToroDB. While it can’t serve as a full drop-in replacement for MongoDB server just yet, the developer told me that with some work it is possible.
            • CockroachDB While not based on the PostgreSQL codebase, CockroachDB is PostgreSQL wire protocol compatible and it is natively distributed, so you will not need to do manual sharding.
            • MySQL® is another feasible replacement. MySQL 5.7 and MySQL 8 have great support for JSON, and it continues to get better with every maintenance release. You can also consider MySQL Cluster for medium size sharded environments. You can also consider MariaDB and Percona Server  for MySQL
            • MySQL DocStore is a CRUD interface for JSON data stored in MySQL, and while it is not the same as MongoDB’s query language, it is much easier to transition to compared to SQL.
            • Vitess Would you love to use MySQL but can’t stand manual sharding? Vitess is a powerful sharding engine for MySQL which will allow you to grow to great scale while using proven MySQL as a backend.
            • TiDB is another take on MySQL compatible sharding. This NewSQL engine is MySQL wire protocol compatible but underneath is a distributed database designed from the ground up.
            • CouchDB is a document database which speaks JSON natively.
            • CouchBase is another database engine to consider. While being a document based database, CouchBase offers the N1QL language which has SQL look and feel.
            • ArangoDB is multi-model database, which can be used as document store.
            • Elastic While not a perfect choice for every MongoDB workload, for workloads where document data is searched and analyzed ElasticSearch can be a great alternative.
            • Redis is another contender for some MongoDB workloads. Often used as a cache in front of MongoDB, it can also be used as a JSON store through extensions.  While such extensions from RedisLabs are no longer open source, GoodForm projects provides open source alternatives.
            • ClickHouse may be a great contender for moving analytical workloads from MongoDB. Much faster, and with JSON support and Nested Data Structures, it can be great choice for storing and analyzing document data.
            • Cassandra does not have a document data model, but it has proven to be extremely successful for building scalable distributed clusters. If this is your main use case for MongoDB, then you should consider Cassandra.
            • ScyllaDB is a protocol compatible Cassandra alternative which claims to offer much higher per node performance.
            • HBase is another option worth considering, especially if you already have a Hadoop/HDFS infrastructure.

            Public Cloud Document Stores

            Most major cloud providers offer some variant of a native document database for you to consider.

            • Microsoft Azure Cosmos DB is an interesting engine that provides multiple NoSQL APIs, including for MongoDB and Cassandra.
            • Amazon DynamoDB supports key value and document based APIs. While not offering MongoDB compatibility, DynamoDB has been around for a long time, and is the most battle tested of the public cloud database offerings.
            • Google Cloud DataStore  – Google Cloud offers a number of data storage options for you to consider, and Cloud DataStore offers a data model and query language that is the most similar to MongoDB.

            If you’re not ready for a major migration effort, there is one more solution for you – Percona Server for MongoDB.  Based on MongoDB Community, and enhanced by Percona with Enterprise Features, Percona Server for MongoDB offers 100% compatibility. As we wrote in a previous post, we commit to shipping a supported AGPL version until the situation around SSPL is clearly resolved.

            Want help on deciding what is the best option for you, or with migration heavy lifting? Percona Professional Services can help!

            Have idea for another feasible MongoDB alternative?  Please comment, and I will consider adding it to the list!


            Image by JOSHUA COLEMAN on Unsplash

            by Peter Zaitsev at October 30, 2018 02:52 PM

            PostgreSQL locking, part 3: lightweight locks

            LWLocks lightweight locks postgres

            PostgreSQL logoPostgreSQL lightweight locks, or LWLocks, control memory access. PostgreSQL uses multi-process architecture and should allow only consistent reads and writes to shared memory structures. LWLocks have two levels of locking: shared and exclusive. It’s also possible to release all acquired LWLocks to simplify clean up. Other databases often call primitives similar to LWLocks “latches”. Because LWLocks is an implementation detail, application developers shouldn’t pay much attention to this kind of locking.

            This is the third and final part of a series on PostgreSQL locking, related to latches protecting internal database structures. Here are the previous parts: Row-level locks and table-level locks.

            Instrumentation

            Starting from PostgreSQL 9.6, LWLocks activity can be investigated with the pg_stat_activity system view. It could be useful under high CPU utilization. There are system settings to help with contention on specific lightweight locks.

            Before PostgreSQL 9.5, the LWLocks implementation used spin-locks.  It was a bottleneck. This was fixed in 9.5 with atomic state variable.

            Potential heavy contention places

            • WALInsertLock: protects WAL buffers. You can increase the number of wal buffers to get a slight improvement. Incidentally, synchronous_commit=off increases pressure on the lock even more, but it’s not a bad thing. full_page_writes=off reduces contention, but it’s generally not recommended.
            • WALWriteLock: accrued by PostgreSQL processes while WAL records are flushed to disk or during a WAL segments switch. synchronous_commit=off removes the wait for disk flush, full_page_writes=off reduces the amount of data to flush.
            • LockMgrLock: appears in top waits during a read-only workload. It latches relations regardless of its size. It’s not a single lock, but at least 16 partitions. Thus it’s important to use multiple tables during benchmarks and avoid single table anti-pattern in production.
            • ProcArrayLock: Protects the ProcArray structure. Before PostgreSQL 9.0, every transaction acquired this lock exclusively before commit.
            • CLogControlLock: protects CLogControl structure, if it shows on the top of pg_stat_activity, you should check the location of $PGDATA/pg_clog—it should be on a buffered file system.
            • SInvalidReadLock: protects sinval array. Readers using shared lock. SICleanupQueue, and other array-wide updates, requires an exclusive lock. It shows at the top of the pg_stat_activity when the shared buffer pool is under stress. Using a higher number of shared_buffers helps to reduce contention.
            • BufMappingLocks: protects regions of buffers. Sets 128 regions (16 before 9.5) of buffers to handle the whole buffer cache.

            Spinlocks

            The lowest level for locking is spinlocks. Therefore, it’s implemented within CPU-specific instructions. PostgreSQL is trying to change an atomic variable value in a loop. If the value is changed from zero to one – the process obtained a spinlock. If it’s not possible to get a spinlock immediately, the process will increase its wait delay exponentially.  There is no monitoring on spinlocks and it’s not possible to release all accrued spinlocks at once. Due to the single state change, it’s also an exclusive lock. In order to simplify the porting of PostgreSQL to exotic CPU and OS variants, PostgreSQL uses OS semaphores for its spinlocks implementation. Of course, it’s significantly slower compared to native CPU instructions port.

            Summary

            • Use pg_stat_activity to find which queries or LWLocks are causing lock waits
            • Use fresh branches of PostgreSQL, as developers have been working on performance improvements and trying to reduce locking contention on hot mutexes.

            References

            Locks Photo by Warren Sammut on Unsplash

            by Nickolay Ihalainen at October 30, 2018 01:31 PM

            Jean-Jerome Schmidt

            SQL Firewalling Made Easy with ClusterControl & ProxySQL

            Reading the title of this blog post may raise some questions. SQL firewall - what is that? What does it do? Why would I need something like that in the first place? Well, the ability to block certain queries could come in handy in certain situations. When using ProxySQL in front of your database servers, the proxy is able to inspect all SQL statements being sent. ProxySQL has a sophisticated rules engine, and can match queries that are to be allowed, blocked, re-written on the fly or routed to a specific database server. Let’s go through some examples.

            You have a dedicated slave which is used by developers to test their queries against production data. You want to make sure the developers can only connect to that particular host and execute only SELECT queries.

            Another case, let’s say that you encountered one too many accidents with people running schema changes and you would like to limit which users which can execute ALTER statement.

            Finally, let’s think about a paranoid approach in which users are allowed to execute just a pre-defined whitelisted set of queries.

            In our environment we have a replication setup with the master and two slaves.

            In front of our databases, wee have three ProxySQL nodes with Keepalived managing Virtual IP. We also have ProxySQL cluster configured (as explained in this previous blog) so we don’t have to worry about making configuration or query rule changes three times on all three ProxySQL nodes. For the query rules, a simple read-write split is set up:

            Let’s take a look at how ProxySQL, with its extensive query rules mechanism, can help us to achieve our goals in all those three cases.

            Locking user access to a single hostgroup

            A dedicated slave available to developers - this is not uncommon practice. As long as your developers can access production data (and if they are not allowed, e.g., due to compliance reasons, data masking as explained in our ProxySQL tutorial may help), this can help them to test and optimize queries on the real world data set. It may also help to verify data before executing some of the schema changes. For example, is my column really unique before adding a unique index?

            With ProxySQL it is fairly easy to restrict access. For starters, let’s assume that the hostgroup 30 contains the slave we want developers to access.

            We need an user which will be used by the developers to access that slave. If you have it already in ProxySQL, that’s fine. If not, you may either need to import it to ProxySQL (if it is created in MySQL but not in ProxySQL) or create it in both locations (if you’ll be creating a new user). Let’s go with the last option, creating a new user.

            Let’s create a new user with limited privileges on both MySQL and ProxySQL. We will use it in query rules to identify traffic coming from the developers.

            In this query rule we are going to redirect all of the queries which are executed by dev_test user to the hostgroup 30. We want this rule to be active and it should be the final one to parse, therefore we enabled ‘Apply’. We also configured RuleID to be smaller than the ID of the first existing rule as we want this query to be executed outside of the regular read/write split setup.

            As you can see, we used an username but there are also other options.

            If you can predict which development hosts will send the traffic to the database (for example, you can have developers use a specific proxy before they can reach the database), you can also use the “Client Address” option to match queries executed by that single host and redirect them to a correct hostgroup.

            Disallowing user from executing certain queries

            Now, let’s consider the case where we want to limit execution of some particular commands to a given user. This could be handy to ensure that the right people can run some of the performance impacting queries like schema changes. ALTER will be the query which we will use in this example. For starters, let’s add a new user which will be allowed to run schema changes. We will call it ‘admin_user’. Next, we need to create the required query rules.

            We will create a query rule which uses ‘.*ALTER TABLE.*’ regular expression to match the queries. This query rule should be executed before other, read/write split rules. We assigned a rule ID of 20 to it. We define an error message that will be returned to the client in case this query rule will be triggered. Once done, we proceed to another query rule.

            Here we use the same regular expression to catch the query but we don’t define any error text (which means that query will not return an error). We also define which user is allowed to execute it (admin_user in our case). We make sure this query is checked before the previous one, so we assigned a lower rule ID of 19 to it.

            Once these two query rules are in place, we can test how they work. Let’s try to log in as an application user and run an ALTER TABLE query:

            root@vagrant:~# mysql -P6033 -usbtest -ppass -h10.0.0.111
            mysql: [Warning] Using a password on the command line interface can be insecure.
            Welcome to the MySQL monitor.  Commands end with ; or \g.
            Your MySQL connection id is 43160
            Server version: 5.5.30 (ProxySQL)
            
            Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
            
            Oracle is a registered trademark of Oracle Corporation and/or its
            affiliates. Other names may be trademarks of their respective
            owners.
            
            Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
            
            mysql> use sbtest;
            Reading table information for completion of table and column names
            You can turn off this feature to get a quicker startup with -A
            
            Database changed
            mysql> alter table sbtest1 add index (pad);
            ERROR 1148 (42000): You are not allowed to execute ALTER
            mysql> ^DBye

            As expected, we couldn’t execute this query and we received an error message. Let’s now try to connect using our ‘admin_user’:

            root@vagrant:~# mysql -P6033 -uadmin_user -ppass -h10.0.0.111
            mysql: [Warning] Using a password on the command line interface can be insecure.
            Welcome to the MySQL monitor.  Commands end with ; or \g.
            Your MySQL connection id is 43180
            Server version: 5.5.30 (ProxySQL)
            
            Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
            
            Oracle is a registered trademark of Oracle Corporation and/or its
            affiliates. Other names may be trademarks of their respective
            owners.
            
            Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
            
            mysql> use sbtest;
            Reading table information for completion of table and column names
            You can turn off this feature to get a quicker startup with -A
            
            Database changed
            mysql> alter table sbtest1 add index (pad);
            Query OK, 0 rows affected (0.99 sec)
            Records: 0  Duplicates: 0  Warnings: 0

            We managed to execute the ALTER as we logged in using ‘admin_user’. This is a very simple way of ensuring that only appointed people can run schema changes on your databases.

            Creating a whitelist of allowed queries

            Finally, let’s consider a tightly locked environment where only predefined queries can be executed. ProxySQL can be easily utilized to implement such setup.

            First of all, we need to remove all existing query rules before we can implement what we need. Then, we need to create a catch-all query rule, which will block all the queries:

            The rest we have to do is to create query rules for all of the queries which are allowed. You can do one rule per query. Or you can use regular expressions if, for example, SELECTs are always ok to run. The only thing you have to remember is that the rule ID has to be smaller than the rule ID of this catch-all rule, and ensure that the query will eventually hit the rule with ‘Apply’ enabled.

            We hope that this blog post gave you some insight into how you can utilize ClusterControl and ProxySQL to improve security and ensure compliance of your databases.

            by krzysztof at October 30, 2018 08:14 AM