Planet MariaDB

February 19, 2019

Oli Sennhauser

MySQL Enterprise Backup Support Matrix

MySQL Enterprise Backup (MEB) is a bit limited related to support of older MySQL versions. So you should consider the following release matrix:

MEB/MySQLSupported 5.5  5.6  5.7  8.0 
3.11.xNOxx
3.12.xYESxx
4.0.xNOx
4.1.xYESx
8.0.xYES8.0.x*

* MySQL Enterprise Backup 8.0.15 only supports MySQL 8.0.15. For earlier versions of MySQL 8.0, use the MySQL Enterprise Backup version with the same version number as the server.

MySQL Enterprise Backup is available for download from the My Oracle Support (MOS) website. This release will be available on Oracle eDelivery (OSDC) after the next upload cycle. MySQL Enterprise Backup is a commercial extension to the MySQL family of products.

As an Open Source alternative Percona XtraBackup for MySQL databases is available.

Compatibility with MySQL Versions: 3.11, 3.12, 4.0, 4.1, 8.0.

MySQL Enterprise Backup User's Guide: 3.11, 3.12, 4.0, 4.1, 8.0.

by Shinguz at February 19, 2019 06:13 PM

Peter Zaitsev

Percona Server for MongoDB 3.4.19-2.17 Is Now Available

Percona Server for MongoDB

Percona Server for MongoDB

Percona announces the release of Percona Server for MongoDB 3.4.19-2.17 on February 19, 2019. Download the latest version from the Percona website or the Percona Software Repositories.

Percona Server for MongoDB 3.4 is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.4 Community Edition. It supports MongoDB 3.4 protocols and drivers.

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine and MongoRocks storage engines, as well as several enterprise-grade features:

Percona Server for MongoDB requires no changes to MongoDB applications or code. This release is based on MongoDB 3.4.19.

In this release, Percona Server for MongoDB supports the ngram full-text search engine. Thanks to Sunguck Lee (@SunguckLee) for this contribution. To enable the ngram full-text search engine, create an index passing ngram to the default_language parameter:

mongo > db.collection.createIndex({name:"text"}, {default_language: "ngram"})

New Features

  • PSMDB-250: The ngram full-text search engine has been added to Percona Server for MongoDB.Thanks to Sunguck Lee (@SunguckLee) for this contribution.

Bugs Fixed

  • PSMDB-272mongos could crash when running the createBackup command.

Other bugs fixed: PSMDB-247

The Percona Server for MongoDB 3.4.19-2.17 release notes are available in the official documentation.

by Borys Belinsky at February 19, 2019 01:43 PM

How Network Bandwidth Affects MySQL Performance

10gb network and 10gb with SSL

Network is a major part of a database infrastructure. However, often performance benchmarks are done on a local machine, where a client and a server are collocated – I am guilty myself. This is done to simplify the setup and to exclude one more variable (the networking part), but with this we also miss looking at how network affects performance.

The network is even more important for clustering products like Percona XtraDB Cluster and MySQL Group Replication. Also, we are working on our Percona XtraDB Cluster Operator for Kubernetes and OpenShift, where network performance is critical for overall performance.

In this post, I will look into networking setups. These are simple and trivial, but are a building block towards understanding networking effects for more complex setups.

Setup

I will use two bare-metal servers, connected via a dedicated 10Gb network. I will emulate a 1Gb network by changing the network interface speed with

ethtool -s eth1 speed 1000 duplex full autoneg off
  command.

network test topology

I will run a simple benchmark:

sysbench oltp_read_only --mysql-ssl=on --mysql-host=172.16.0.1 --tables=20 --table-size=10000000 --mysql-user=sbtest --mysql-password=sbtest --threads=$i --time=300 --report-interval=1 --rand-type=pareto

This is run with the number of threads varied from 1 to 2048. All data fits into memory – innodb_buffer_pool_size is big enough – so the workload is CPU-intensive in memory: there is no IO overhead.

Operating System: Ubuntu 16.04

Benchmark N1. Network bandwidth

In the first experiment I will compare 1Gb network vs 10Gb network.

1gb vs 10gb network

threads/throughput 1Gb network 10Gb network
1 326.13 394.4
4 1143.36 1544.73
16 2400.19 5647.73
32 2665.61 10256.11
64 2838.47 15762.59
96 2865.22 17626.77
128 2867.46 18525.91
256 2867.47 18529.4
512 2867.27 17901.67
1024 2865.4 16953.76
2048 2761.78 16393.84

 

Obviously the 1Gb network performance is a bottleneck here, and we can improve our results significantly if we move to the 10Gb network.

To see that 1Gb network is bottleneck we can check the network traffic chart in PMM:

network traffic in PMM

We can see we achieved 116MiB/sec (or 928Mb/sec)  in throughput, which is very close to the network bandwidth.

But what we can do if the our network infrastructure is limited to 1Gb?

Benchmark N2. Protocol compression

There is a feature in MySQL protocol whereby you can see the compression for the network exchange between client and server:

--mysql-compression=on
  for sysbench.

Let’s see how it will affect our results.

1gb network with compression protocol

threads/throughput 1Gb network 1Gb with compression protocol
1 326.13 198.33
4 1143.36 771.59
16 2400.19 2714
32 2665.61 3939.73
64 2838.47 4454.87
96 2865.22 4770.83
128 2867.46 5030.78
256 2867.47 5134.57
512 2867.27 5133.94
1024 2865.4 5129.24
2048 2761.78 5100.46

 

Here is an interesting result. When we use all available network bandwidth, the protocol compression actually helps to improve the result.10g network with compression protocol

threads/throughput 10Gb 10Gb with compression
1 394.4 216.25
4 1544.73 857.93
16 5647.73 3202.2
32 10256.11 5855.03
64 15762.59 8973.23
96 17626.77 9682.44
128 18525.91 10006.91
256 18529.4 9899.97
512 17901.67 9612.34
1024 16953.76 9270.27
2048 16393.84 9123.84

 

But this is not the case with the 10Gb network. The CPU resources needed for compression/decompression are a limiting factor, and with compression the throughput actually only reach half of what we have without compression.

Now let’s talk about protocol encryption, and how using SSL affects our results.

Benchmark N3. Network encryption

1gb network and 1gb with SSL

threads/throughput 1Gb network 1Gb SSL
1 326.13 295.19
4 1143.36 1070
16 2400.19 2351.81
32 2665.61 2630.53
64 2838.47 2822.34
96 2865.22 2837.04
128 2867.46 2837.21
256 2867.47 2837.12
512 2867.27 2836.28
1024 2865.4 1830.11
2048 2761.78 1019.23

10gb network and 10gb with SSL

threads/throughput 10Gb 10Gb SSL
1 394.4 359.8
4 1544.73 1417.93
16 5647.73 5235.1
32 10256.11 9131.34
64 15762.59 8248.6
96 17626.77 7801.6
128 18525.91 7107.31
256 18529.4 4726.5
512 17901.67 3067.55
1024 16953.76 1812.83
2048 16393.84 1013.22

 

For the 1Gb network, SSL encryption shows some penalty – about 10% for the single thread – but otherwise we hit the bandwidth limit again. We also see some scalability hit on a high amount of threads, which is more visible in the 10Gb network case.

With 10Gb, the SSL protocol does not scale after 32 threads. Actually, it appears to be a scalability problem in OpenSSL 1.0, which MySQL currently uses.

In our experiments, we saw that OpenSSL 1.1.1 provides much better scalability, but you need to have a special build of MySQL from source code linked to OpenSSL 1.1.1 to achieve this. I don’t show them here, as we do not have production binaries.

Conclusions

  1. Network performance and utilization will affect the general application throughput.
  2. Check if you are hitting network bandwidth limits
  3. Protocol compression can improve the results if you are limited by network bandwidth, but also can make things worse if you are not
  4. SSL encryption has some penalty (~10%) with a low amount of threads, but it does not scale for high concurrency workloads.

by Vadim Tkachenko at February 19, 2019 11:52 AM

February 18, 2019

Peter Zaitsev

Percona Server for MySQL 5.7.25-28 Is Now Available

Percona Server for MySQL 8.0

Percona Server for MySQL 5.6Percona is glad to announce the release of Percona Server 5.7.25-28 on February 18, 2019. Downloads are available here and from the Percona Software Repositories.

This release is based on MySQL 5.7.25 and includes all the bug fixes in it. Percona Server 5.7.25-28 is now the current GA (Generally Available) release in the 5.7 series.

All software developed by Percona is open-source and free.

In this release, Percona Server introduces the variable binlog_skip_flush_commands. This variable controls whether or not FLUSH commands are written to the binary log. Setting this variable to ON can help avoid problems in replication. For more information, refer to our documentation.

Note

If you’re currently using Percona Server 5.7, Percona recommends upgrading to this version of 5.7 prior to upgrading to Percona Server 8.0.

Bugs fixed

  • FLUSH commands written to the binary log could cause errors in case of replication. Bug fixed #1827: (upstream #88720).
  • Running LOCK TABLES FOR BACKUP followed by STOP SLAVE SQL_THREAD could block replication preventing it from being restarted normally. Bug fixed #4758.
  • The ACCESS_DENIED field of the information_schema.user_statistics table was not updated correctly. Bug fixed #3956.
  • MySQL could report that the maximum number of connections was exceeded with too many connections being in the CLOSE_WAIT state. Bug fixed #4716 (upstream #92108)
  • Wrong query results could be received in semi-join sub queries with materialization-scan that allowed inner tables of different semi-join nests to interleave. Bug fixed #4907 (upstream bug #92809).
  • In some cases, the server using the the MyRocks storage engine could crash when TTL (Time to Live) was defined on a table. Bug fixed #4911
  • Running the SELECT statement with the ORDER BY and LIMIT clauses could result in a less than optimal performance. Bug fixed #4949 (upstream #92850)
  • There was a typo in mysqld_safe.shtrottling was replaced with throttling. Bug fixed #240. Thanks to Michael Coburn for the patch.
  • MyRocks could crash while running START TRANSACTION WITH CONSISTENT SNAPSHOT if other transactions were in specific states. Bug fixed #4705,
  • In some cases, mysqld could crash when inserting data into a database the name of which contained special characters (CVE-2018-20324). Bug fixed #5158.
  • MyRocks incorrectly processed transactions in which multiple statements had to be rolled back. Bug fixed #5219.
  • In some cases, the MyRocks storage engine could crash without triggering the crash recovery. Bug fixed #5366.
  • When bootstrapped with undo or redo log encryption enabled on a very fast storage, the server could fail to start. Bug fixed #4958.

Other bugs fixed: #2455#4791#4855#4996#5268.

This release also contains fixes for the following CVE issues: CVE-2019-2534, CVE-2019-2529, CVE-2019-2482, CVE-2019-2434.

Find the release notes for Percona Server for MySQL 5.7.25-28 in our online documentation. Report bugs in the Jira bug tracker.

 

by Borys Belinsky at February 18, 2019 04:38 PM

Percona Server for MongoDB 4.0.5-2 Is Now Available

Percona Server for MongoDB

Percona Server for MongoDB

Percona announces the release of Percona Server for MongoDB 4.0.5-2 on February 18, 2019. Download the latest version from the Percona website or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 4.0 Community Edition. It supports MongoDB 4.0 protocols and drivers.

Percona Server for MongoDB extends Community Edition functionality by including the Percona Memory Engine storage engine, as well as several enterprise-grade features. It also includes MongoRocks storage engine (which is now deprecated). Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release includes all features of MongoDB 4.0 Community Edition. Most notable among these are:

Note that the MMAPv1 storage engine is deprecated in MongoDB 4.0 Community Edition.

In Percona Server for MongoDB 4.0.5-2, data at rest encryption becomes GA. The data at rest encryption feature now covers the temporary files used for external sorting and the rollback files. You can decrypt and examine the contents of the rollback files using the new perconadecrypt command line tool.

In this release, Percona Server for MongoDB supports the ngram full-text search engine. Thanks to Sunguck Lee (@SunguckLee) for this contribution. To enable the ngram full-text search engine, create an index passing ngram to the default_language parameter:

mongo > db.collection.createIndex({name:"text"}, {default_language: "ngram"})

New Features

  • PSMDB-276perconadecrypt tool is now available for decrypting the encrypted rollback files.
  • PSMDB-250: The Ngram full-text search engine has been added to Percona Server for MongoDB.Thanks to Sunguck Lee (@SunguckLee) for this contribution.

Bugs Fixed

  • PSMDB-234: It was possible to use a key file for encryption the owner of which was not the owner of the mongod process.
  • PSMDB-273: When using data at rest encryption, temporary files for external sorting and rollback files were not encrypted
  • PSMDB-257: MongoDB could not be started with a group-readable key file owned by root.
  • PSMDB-272mongos could crash when running the createBackup command.

Other bugs fixed: PSMDB-247

The Percona Server for MongoDB 4.0.5-2 release notes are available in the official documentation.

by Borys Belinsky at February 18, 2019 04:13 PM

Jean-Jerome Schmidt

How to Migrate from Oracle to MySQL / Percona Server

Migrating from Oracle to MySQL/Percona Server is not a trivial task. Although it is getting easier, especially with the arrival of MySQL 8.0 and Percona announced Percona Server for MySQL 8.0 GA. Aside from planning for your migration from Oracle to Percona Server, you must ensure that you understand the purpose and functionality for why it has to be Percona Server.

This blog will focus on Migrating from Oracle to Percona Server as its specific target database of choice. There's a page in the Oracle website about SQL Developer Supplementary Information for MySQL Migrations which can be used as a reference for the planned migration. This blog will not cover the overall process of migration, as it is a long process. But it will hopefully provide enough background information to serve as a guide for your migration process.

Since Percona Server is a fork of MySQL, almost all features that come along in MySQL are present in Percona Server. So any reference of MySQL here is applicable as well to Percona Server. We previously blogged about migrating Oracle Database to PostgreSQL. I’ll reiterate again the reasons why one would consider migrating from Oracle to an open-source RDBMS such as PostgreSQL or Percona Server/MySQL/MariaDB.

  1. Cost: As you may know Oracle licence cost is very expensive and there is additional cost for some features like partitioning and high availability. So overall it's very expensive.
  2. Flexible open source licensing and easy availability from public cloud providers like AWS.
  3. Benefit from open source add-ons to improve performance.

Planning and Development Strategy

Migration from Oracle going to Percona Server 8.0 can be a pain since there's a lot of key factors that needs to be considered and addressed. For example, Oracle can run on a Windows Server machine but Percona Server does not support Windows. Although you can compile it for Windows, Percona itself does not offer any support for Windows. You must also identify your database architecture requirements, since Percona Server is not designed for OLAP (Online Analytical Processing) or data-warehousing applications. Percona Server/MySQL RDBMS are perfect fit for OLTP (Online Transaction Processing).

Identifying the key aspect of your database architecture, for example if your current Oracle architecture implements MAA (Maximum Available Architecture) with Data Guard ++ Oracle RAC (Real Application Cluster), you should determine its equivalence in Percona Server. There's no straight answer for this within MySQL/Percona Server. However, you can choose from a synchronous replication, an asynchronous replication (Percona XtraDB Cluster is still currently on version 5.7.x), or with Group Replication. Then, there's multiple alternatives that you can implement for your own high-availability solution. For example, (to name a few) using Corosync/Pacemaker/DRBD/Linux stack, or using MHA (MySQL High Availability), or using Keepalived/HaProxy/ProxySQL stack, or plainly rely on ClusterControl which supports Keepalived, HaProxy, ProxySQL, Garbd, and Maxscale for your high-availability solutions.

On the other side, the question you have also to consider as part of the plan is "How will Percona will provide support and who will help us when Percona Server itself encounters a bug or how high is the urgency when we need help?". One thing to consider as well is budget, if the purpose of migration from enterprise database to an open-source RDBMS is because of cost-cutting.

There are different options from migration planning to the things you need to do as part of your development strategy. Such options include engaging with experts in the MySQL/Percona Server field and that includes us here at Severalnines. There are lots of MySQL consulting firms that can help you through this since migration from Oracle to MySQL requires a lot of expertise and know-how in the MySQL Server area. This should not be limited to the database but it should cover expertise in scalability, redundancy, backups, high-availability, security, monitoring/observability, recovery and engaging on mission critical systems. Overall, it should have an understanding of your architectural insight without exposing confidentiality of your data.

Assessment or Preliminary Check

Backing up your data including configurations or setup files, kernel tunings, automation scripts shall not be left into oblivion. It's an obvious task, but before you migrate, always secure everything first , especially when moving to a different platform.

You must assess as well that your applications are following the up-to-date software engineering conventions and ensure that they are platform agnostic. These practices can be to your benefit especially when moving to a different database platform, such as Percona Server for MySQL.

Take note that the operating system that Percona Server requires can be a show-stopper if your application and database run on a Windows Server and the application is Windows dependent; then this could be a lot of work! Always remember that Percona Server is on a different platform: perfection might not be guaranteed but can be achieved close enough.

Lastly, make sure that the targeted hardware is designed to work feasibly with Percona's server requirements or that it is bug-free at least (see here). You may consider stress testing first with Percona Server before reliably moving off your Oracle Database.

What You Should Know

It is worth noting that in Percona Server / MySQL, you can create multiple databases whereas Oracle does not come with that same functionality as MySQL.

In MySQL, physically, a schema is synonymous with a database. You can substitute the keyword SCHEMA instead of DATABASE in MySQL SQL syntax, for example using CREATE SCHEMA instead of CREATE DATABASE; whilst Oracle has a distinction of this. A schema represents only a part of a database: the tables and other objects owned by a single user. Normally, there is a one-to-one relationship between the instance and the database.

For example, in a replication setup equivalent in Oracle (e.g. Real Application Clusters or RAC), you have your multiple instances accessing a single database. This lets you start Oracle on multiple servers, but all accessing the same data. However, in MySQL, you can allow access to multiple databases from your multiple instances and can even filter out which databases/schema you can replicate to a MySQL node.

Referencing from one of our previous blog, the same principle applies when speaking of converting your database with available tools found on the internet.

There is no such tool that can 100% convert Oracle database into Percona Server / MySQL; some of it will be manual work.

Checkout the following sections for things that you must be aware of when it comes to migration and verifying the logical SQL result.

Data Type Mapping

MySQL / Percona Server have a number of data-types that is almost the same as Oracle but not as rich as compared to Oracle. But since the arrival of the 5.7.8 version of MySQL, is supports for a native JSON data type.

Below is its data-type equivalent representation (tabular representation is taken from here):

  Oracle MySQL
1 BFILE Pointer to binary file, ⇐ 4G VARCHAR(255)
2 BINARY_FLOAT 32-bit floating-point number FLOAT
3 BINARY_DOUBLE 64-bit floating-point number DOUBLE
4 BLOB Binary large object, ⇐ 4G LONGBLOB
5 CHAR(n), CHARACTER(n) Fixed-length string, 1 ⇐ n ⇐ 255 CHAR(n), CHARACTER(n)
6 CHAR(n), CHARACTER(n) Fixed-length string, 256 ⇐ n ⇐ 2000 VARCHAR(n)
7 CLOB Character large object, ⇐ 4G LONGTEXT
8 DATE Date and time DATETIME
9 DECIMAL(p,s), DEC(p,s) Fixed-point number DECIMAL(p,s), DEC(p,s)
10 DOUBLE PRECISION Floating-point number DOUBLE PRECISION
11 FLOAT(p) Floating-point number DOUBLE
12 INTEGER, INT 38 digits integer INT DECIMAL(38)
13 INTERVAL YEAR(p) TO MONTH Date interval VARCHAR(30)
14 INTERVAL DAY(p) TO SECOND(s) Day and time interval VARCHAR(30)
15 LONG Character data, ⇐ 2G LONGTEXT
16 LONG RAW Binary data, ⇐ 2G LONGBLOB
17 NCHAR(n) Fixed-length UTF-8 string, 1 ⇐ n ⇐ 255 NCHAR(n)
18 NCHAR(n) Fixed-length UTF-8 string, 256 ⇐ n ⇐ 2000 NVARCHAR(n)
19 NCHAR VARYING(n) Varying-length UTF-8 string, 1 ⇐ n ⇐ 4000 NCHAR VARYING(n)
20 NCLOB Variable-length Unicode string, ⇐ 4G NVARCHAR(max)
21 NUMBER(p,0), NUMBER(p) 8-bit integer, 1 <= p < 3 TINYINT (0 to 255)
16-bit integer, 3 <= p < 5 SMALLINT
32-bit integer, 5 <= p < 9 INT
64-bit integer, 9 <= p < 19 BIGINT
Fixed-point number, 19 <= p <= 38 DECIMAL(p)
22 NUMBER(p,s) Fixed-point number, s > 0 DECIMAL(p,s)
23 NUMBER, NUMBER(*) Floating-point number DOUBLE
24 NUMERIC(p,s) Fixed-point number NUMERIC(p,s)
25 NVARCHAR2(n) Variable-length UTF-8 string, 1 ⇐ n ⇐ 4000 NVARCHAR(n)
26 RAW(n) Variable-length binary string, 1 ⇐ n ⇐ 255 BINARY(n)
27 RAW(n) Variable-length binary string, 256 ⇐ n ⇐ 2000 VARBINARY(n)
28 REAL Floating-point number DOUBLE
29 ROWID Physical row address CHAR(10)
30 SMALLINT 38 digits integer DECIMAL(38)
31 TIMESTAMP(p) Date and time with fraction DATETIME(p)
32 TIMESTAMP(p) WITH TIME ZONE Date and time with fraction and time zone DATETIME(p)
33 UROWID(n) Logical row addresses, 1 ⇐ n ⇐ 4000 VARCHAR(n)
34 VARCHAR(n) Variable-length string, 1 ⇐ n ⇐ 4000 VARCHAR(n)
35 VARCHAR2(n) Variable-length string, 1 ⇐ n ⇐ 4000 VARCHAR(n)
36 XMLTYPE XML data LONGTEXT

Data type attributes and options:

Oracle MySQL
BYTE and CHAR column size semantics Size is always in characters
 

Transactions

Percona Server uses XtraDB (an enhanced version of InnoDB) as its primary storage engine for handling transactional data; although various storage engines can be an alternative choice for handling transactions such as TokuDB (deprecated) and MyRocks storage engines.

Whilst there are advantages and benefits to using or exploring MyRocks with XtraDB, the latter is more robust and de facto storage engine that Percona Server is using and its enabled by default, so we'll use this storage engine as the basis for migration with regards to transactions.

By default, Percona Server / MySQL has autocommit variable set to ON which means that you have to explicitly handle transactional statements to take advantage of ROLLBACK for ignoring changes or taking advantage of using SAVEPOINT.

It's basically the same concept that Oracle uses in terms of commit, rollbacks and savepoints.

For explicit transactions, this means that you have to use the START TRANSACTION/BEGIN; <SQL STATEMENTS>; COMMIT; syntax.

Otherwise, if you have to disable autocommit, you have to explicitly COMMIT all the time for your statements that requires changes to your data.

Dual Table

MySQL has the dual compatibility with Oracle which is meant for compatibility of databases using a dummy table, namely DUAL.

This suits Oracle's usage of DUAL so any existing statements in your application that use DUAL might require no changes upon migration to Percona Server.

The Oracle FROM clause is mandatory for every SELECT statement, so Oracle database uses DUAL table for SELECT statement where a table name is not required.

In MySQL, the FROM clause is not mandatory so DUAL table is not necessary. However, the DUAL table does not work exactly the same as it does for Oracle, but for simple SELECT's in Percona Server, this is fine.

See the following example below:

In Oracle,

SQL> DESC DUAL;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 DUMMY                                              VARCHAR2(1)

SQL> SELECT CURRENT_TIMESTAMP FROM DUAL;
CURRENT_TIMESTAMP
---------------------------------------------------------------------------
16-FEB-19 04.16.18.910331 AM +08:00

But in MySQL:

mysql> DESC DUAL;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'DUAL' at line 1
mysql> SELECT CURRENT_TIMESTAMP FROM DUAL;
+---------------------+
| CURRENT_TIMESTAMP   |
+---------------------+
| 2019-02-15 20:20:28 |
+---------------------+
1 row in set (0.00 sec)

Note: the DESC DUAL syntax does not work in MySQL and the results as well differ as CURRENT_TIMESTAMP (uses TIMESTAMP data type) in MySQL does not include the timezone.

SYSDATE

Oracle's SYSDATE function is almost the same in MySQL.

MySQL returns date and time and is a function that requires () (close and open parenthesis with no arguments required. To demonstrate this below, here's Oracle and MySQL on using SYSDATE.

In Oracle, using plain SYSDATE just returns the date of the day without the time. But to get the time and date, use TO_CHAR to convert the date time into its desired format; whereas in MySQL, you might not need it to get the date and the time as it returns both.

See example below.

In Oracle:

SQL> SELECT TO_CHAR (SYSDATE, 'MM-DD-YYYY HH24:MI:SS') "NOW" FROM DUAL;
NOW
-------------------
02-16-2019 04:39:00

SQL> SELECT SYSDATE FROM DUAL;

SYSDATE
---------
16-FEB-19

But in MySQL:

mysql> SELECT SYSDATE() FROM DUAL;
+---------------------+
| SYSDATE()           |
+---------------------+
| 2019-02-15 20:37:36 |
+---------------------+
1 row in set (0.00 sec)

If you want to format the date, MySQL has a DATE_FORMAT() function.

You can check the MySQL Date and Time documentation for more info.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

TO_DATE

Oracle's TO_DATE equivalent in MySQL is the STR_TO_DATE() function.

It’s almost identical to the one in Oracle: it returns the DATE data type, while in MySQL it returns the DATETIME data type.

Oracle:

SQL> SELECT TO_DATE ('20190218121212','yyyymmddhh24miss') as "NOW" FROM DUAL; 
NOW
-------------------------
18-FEB-19

MySQL:

mysql> SELECT STR_TO_DATE('2019-02-18 12:12:12','%Y-%m-%d %H:%i:%s') as "NOW" FROM DUAL;
+---------------------+
| NOW                 |
+---------------------+
| 2019-02-18 12:12:12 |
+---------------------+
1 row in set (0.00 sec)

SYNONYM

In MySQL, there's no such support nor any equivalence for SYNONYM in Oracle.

A possible alternative can be possible with MySQL is using VIEW.

Although SYNONYM can be used to create an alias of a remote table,

e.g.

CREATE PUBLIC SYNONYM emp_table FOR hr.employees@remote.us.oracle.com

In MySQL, you can take advantage of using FEDERATED storage engine.

e.g.

CREATE TABLE hr_employees (
    id     INT(20) NOT NULL AUTO_INCREMENT,
    name   VARCHAR(32) NOT NULL DEFAULT '',
    other  INT(20) NOT NULL DEFAULT '0',
    PRIMARY KEY  (id),
    INDEX name (name),
    INDEX other_key (other)
)
ENGINE=FEDERATED
DEFAULT CHARSET=utf8mb4
CONNECTION='mysql://fed_user@remote_host:9306/federated/test_table';

Or you can simplify the process with CREATE SERVER syntax, so that when creating a table acting as your SYNONYM for accessing a remote table, it will be easier. See the documentation for more info on this.

Behaviour of Empty String and NULL

Take note that in Percona Server / MySQL, empty string is not NULL whereas Oracle treats empty string as null values.

In Oracle:

SQL> SELECT CASE WHEN '' IS NULL THEN 'Yes' ELSE 'No' END AS "Null Eval" FROM dual;
Nul
---
Yes

In MySQL:

mysql> SELECT CASE WHEN '' IS NULL THEN 'Yes' ELSE 'No' END AS "Null Eval" FROM dual;
+-----------+
| Null Eval |
+-----------+
| No        |
+-----------+
1 row in set (0.00 sec)

Sequences

In MySQL, there's no exact same approach to what Oracle does for SEQUENCE.

Although there are some posts that are simulating the functionality of this approach, you might be able to try to get the next key using LAST_INSERT_ID() as long as your table's clustered index, PRIMARY KEY, is defined with << is there something missing? >>

Character String Functions

Unlike Oracle, MySQL / Percona Server has a handful of string functions but not as many helpful functions built-in to the database.

It would be too long to discuss it here one-by-one, but you can check the documentation from MySQL and compare this against Oracle's string functions.

DML Statements

Insert/Update/Delete statements from Oracle are congruous in MySQL.

Oracle's INSERT ALL/INSERT FIRST is not supported in MySQL.

Otherwise, you’d need to state your MySQL queries one-by-one.

e.g.

In Oracle:

SQL> INSERT ALL
  INTO CUSTOMERS (customer_id, customer_name, city) VALUES (1000, 'Jase Alagaban', 'Davao City')
  INTO CUSTOMERS (customer_id, customer_name, city) VALUES (2000, 'Maximus Aleksandre Namuag', 'Davao City')
SELECT * FROM dual;
2 rows created.

2 rows created.

But in MySQL, you have to run the insert one at a time:

mysql> INSERT INTO CUSTOMERS (customer_id, customer_name, city) VALUES (1000, 'Jase Alagaban', 'Davao City');
Query OK, 1 row affected (0.02 sec)
mysql> INSERT INTO CUSTOMERS (customer_id, customer_name, city) VALUES (2000, 'Maximus Aleksandre Namuag', 'Davao City');
Query OK, 1 row affected (0.00 sec)

The INSERT ALL/INSERT FIRST doesn’t compare to how it is used in Oracle, where you can take advantage of conditions by adding a WHEN keyword in your syntax; there's no equivalent option in MySQL / Percona Server in this case.

Hence, your alternative solution on this is to use procedures.

Outer Joins "+" Symbol

In Oracle, using + operator for left and right joins is not supported at present in MySQL as + operator is only used for arithmetic decisions.

Hence, if you have + operator in your existing Oracle SQL statements, you need to replace this with LEFT JOIN or RIGHT JOIN.

You might want to check the official documentation for "Outer Join Simplification" of MySQL.

START WITH..CONNECT BY

Oracle uses START WITH..CONNECT BY for hierarchical queries.

Starting with MySQL / Percona 8.0, there is support for generating hierarchical data results which uses models such as adjacency list or nested set models. This is called Common Table Expressions (CTE) in MySQL.

Similar to PostgreSQL, MySQL uses WITH RECURSIVE syntax for hierarchical queries so translate CONNECT BY statement into WITH RECURSIVE statement.

Check down below on how it differs from ORACLE and in MySQL / Percona Server.

In Oracle:

SELECT cp.id, cp.title, CONCAT(c2.title, ' > ' || cp.title) as path
FROM category cp INNER JOIN category c2
  ON cp.parent_id = c2.id
WHERE cp.parent_id IS NOT NULL
START WITH cp.id >= 1
CONNECT BY NOCYCLE PRIOR c2.id=cp.parent_id; 

And in MySQL:

WITH RECURSIVE category_path (id, title, path) AS
(
  SELECT id, title, title as path
    FROM category
    WHERE parent_id IS NULL
  UNION ALL
  SELECT c.id, c.title, CONCAT(cp.path, ' > ', c.title)
    FROM category_path AS cp JOIN category AS c
      ON cp.id = c.parent_id
)
SELECT * FROM category_path
ORDER BY path;

PL/SQL in MySQL / Percona?

MySQL / Percona RDBMS has a different approach than Oracle's PL/SQL.

MySQL uses stored procedures or stored functions, which is similar to PL/SQL and syntax using BEGIN..END syntax.

Oracle's PL/SQL is compiled before execution when it is loaded into the server, while MySQL is compiled and stored in the cache when it's invoked.

You may want to checkout this documentation as a reference guide on converting your PL/SQL to MySQL.

Migration Tools

I did some research for any tools that could be a de facto standard for migration but I couldn’t find a good answer.

Though, I did find sqlines and it looks simple but promising.

While I didn’t deep-dive into it, the website offers a handful of insights, which could help you on migrating from Oracle to MySQL/Percona Server. There are also paid tools such as this and this.

I've also searched through github but found nothing much more appealing as a resolution to the problem. Hence, if you're aiming to migrate from Oracle and to Amazon, they have AWS Schema Conversion Tool for which migrating from Oracle to MySQL is supported.

Overall, the reason why migration is not an easy thing to do is mainly because Oracle RDBMS is such a beast with lots of features that Percona Server / MySQL or MariaDB RDBMS still do not have.

Anyhow, if you find or know of any tools that you find helpful and beneficial for migrating from Oracle to MySQL / Percona Server, please leave a comment on this blog!

Testing

As part of your migration plan, testing is a vital task that plays a very important role and affects your decision with regards to migration.

The tool dbdeployer (a port of MySQL Sandbox) is a very helpful tool that you can take advantage of. This is pretty easy for you to try and test different approaches and saves you time, rather than setting up the whole stack if your purpose is to try and test the RDBMS platform first.

For testing your SQL stored routines (functions or procedures), triggers, events, I suggest you use these tools mytap or the Google's Unit Testing Framework.

Percona as well offers a number of tools that are available for download on their website. Checkout Percona Toolkit here. You can cherry-pick the tools according to your needs especially for testing and production-usage tasks.

Overall, things that you need to keep-in-mind as your guidelines when doing a test for your MySQL Server are:

  • After your installation, you need to consider doing some tuning. Checkout this Percona blog for help.
  • Do some benchmarks and stress-load testing for your configuration setup on your current node. Checkout mysqlslap and sysbench which can help you with this. Also check out our blog "How to Benchmark Performance of MySQL & MariaDB using SysBench".
  • Check your DDL's if they are correctly defined such as data-types, constraints, clustered and secondary indexes, or partitions, if you have any.
  • Check your DML especially if syntax are correct and are saving the data correctly as expected.
  • Check out your stored routines, events, trigger to ensure they run/return the expected results.
  • Verify that your queries running are performant. I suggest you take advantage of open-source tools or try our ClusterControl product. It offers monitoring/observability especially of your MySQL / Percona Server. You can use ClusterControl here to monitor your queries and its query plan to make sure they are performant.

by Paul Namuag at February 18, 2019 03:17 PM

Peter Zaitsev

Deprecation of TLSv1.0 2019-02-28

end of Percona support for TLS1.0

end of Percona support for TLS1.0Ahead of the PCI move to deprecate the use of ‘early TLS’, we’ve previously taken steps to disable TLSv1.0.

Unfortunately at that time we encountered some issues which led us to rollback these changes. This was to allow users of operating systems that did not – yet – support TLSv1.1 or higher to download Percona packages over TLSv1.0.

Since then, we have been tracking our usage statistics for older operating systems that don’t support TLSv1.1 or higher at https://repo.percona.com. We now receive very few legitimate requests for these downloads.

Consequently,  we are ending support for TLSv1.0 on all Percona web properties.

While the packages will still be available for download from percona.com, we are unlikely to update them as the OS’s are end-of-life (e.g. RHEL5). Also, in future you will need to download these packages from a client that supports TLSv1.1 or greater.

For example EL5 will not receive an updated version of OpenSSL to support versions greater than TLSv1.1. PCI has called for the deprecation of ‘early TLS’ standards. Therefore you should upgrade any EL5 installations to EL6 or greater as soon as possible. As noted in this support policy update by Red Hat, EL5 stopped receiving support under extended user support (EUS) in March 2015.

To continue to receive updates for your OS and for any Percona products that you use, you need to update to more recent versions of CentOS, Scientific Linux, and RedHat Enterprise Linux.


Photo by Kevin Noble on Unsplash

by David Busby at February 18, 2019 12:53 PM

February 16, 2019

Valeriy Kravchuk

Fun with Bugs #79 - On MySQL Bug Reports I am Subscribed to, Part XV

More than 3 weeks passed since my previous review of public MySQL bug reports I am subscribed to, so it's time to present some of the bugs I've considered interesting in January, 2019.

As usual, I'll review them starting from the oldest and try to summarize my feelings about these bugs at the end of this post. Here they are:
  • Bug #93806 - "Document error about ON DUPLICATE KEY UPDATE". Years pass, but fine MySQL manual still does not explain some cases of InnoDB locking properly. Xiaobin Lin found yet another case that it does not explain properly. Or, maybe, the manual is correct and the problem in the implementation? MariaDB 10.3.7 shows the same behavior.
  • Bug #93827 - "dict_index_has_desc() is not efficient". Yet another bug report from Zhai Weixiang. I see 50 still active bug reports from him! Maybe Oracle should send some nice T-shirts to top N most productive bug reporters?
  • Bug #93845 - "Optimizer choose wrong index, sorting index instead of filtering index". yet another bug report of a known class, this time from Daniele Renda. It's good example of optimizer trace usage to make a point. Note also that using ANALYZE ... UPDATE HISTOGRAMS does not help. As a side note, implementation of optimizer trace for MariaDB is finally in progress and should be done for upcoming 10.4. See MDEV-6111 for the details if you care.
  • Bug #93875 - "mysqldump per-table dump is slow since 5.7 on instances with many tables". This performance regression bug (that was "verified" without adding the regression tag) was reported by Nikolai Ikhalainen from Percona. This bug report is a nice example of using Docker to create easily repeatable test cases for bug reports.
  • Bug #93878 - "innodb_status_output fails to restore to old value". This great bug report from Yuhui Wang  not only describes 3 cases when InnoDB status is printed to the error log automatically, but also shows that in one of these cases, when we can not found free block in the buffer pool in 20 loops, this printing is not stopped after the problem is resolved, and provides a patch that resolves the problem. See also his nice Bug #94065 - "MySQL fails to startup when setting persist variable" with detailed analysis of the problem.
  • Bug #93917 - "Wrong binlog entry for BLOB on a blackhole intermediary master". Nice corner case was found by Sveta Smirnova from Percona. With her 52 "Verified" bug reports at the moment she also deserves a T-shirt from Oracle as one of top bug reporters!
  • Bug #93922 - "UNION ALL very slow with SUM(0)". This weird bug was found and reported by Sergio Paternoster. He had to spend notable efforts to see this bug "Verified"...
  • Bug #93948 - "XID inconsistency on master-slave with CTAS". Krunal Bauskar from Percona noted this inconsistency in XID generation on slave vs master. Let's wait and check if it ends up as "Not a bug".
  • Bug #93957 - "slave_compressed_protocol doesn't work with semi-sync replication in MySQL-5.7". This bug report from Pavel Katiushyn also looks like a regression, as similar bug was fixed in older 5.7.x release. But I do not see any public comment with verification attempt neither in recent 5.7, nor in recent 8.0 (where older bug also had to be fixed). So, the bug is "verified", but the real impact and versions affected are not clear.
  • Bug #93963 - "Slow query log doesn't log a slow CREATE INDEX with admin statements enabled". This clear and properly tagged regression vs MySQL 5.7 was reported by Jeremy Smyth.
  • Bug #93986 - "Transactions in serializable mode are not actually serializable". I've subscribed to this bug report mostly for (expected) fun of reading further comments. It's still "Need feedback", but single comment so far is worth reading.
  • Bug #94121 - "Enable hardware CRC32 under Valgrind". Laurynas Biveinis from Percona also provided a patch for this 8 years old problem.
  • Bug #94130 - "XA COMMIT may lead replication broken". Yet another proof that XA transactions implementation is broken in MySQL. This time from Phoenix Zhang and in semi-sync replication case.
This photo reminds me current state of MySQL bugs processing in Oracle - it seems there is no clear and straightforward way to follow. Everything is fuzzy these days...

There are few more bugs reported in January, 2019 that I am watching, but their status is not yet clearly defined, so I decided to skip them in this review.

To summarize:
  1.  Oracle engineers who process bugs still do not add regression tag to many regression bugs. This is a shame, really. If I were their boss I'd make this a policy and one of important KPI values to monitor.
  2. In some cases bugs get verified immediately without any demonstrated attempt to show how the check was performed, while in other cases poor bug reporters have to fight hard to re-make their point and get a real check done. It seems these days good old approaches to bugs verification are not followed strictly by some Oracle engineers.

by Valeriy Kravchuk (noreply@blogger.com) at February 16, 2019 08:24 PM

February 15, 2019

Peter Zaitsev

ClickHouse Performance Uint32 vs Uint64 vs Float32 vs Float64

Q1 least compression

While implementing ClickHouse for query executions statistics storage in Percona Monitoring and Management (PMM),  we were faced with a question of choosing the data type for metrics we store. It came down to this question: what is the difference in performance and space usage between Uint32, Uint64, Float32, and Float64 column types?

To test this, I created a test table with an abbreviated and simplified version of the main table in our ClickHouse Schema.

The “number of queries” is stored four times in four different columns to be able to benchmark queries referencing different columns.  We can do this with ClickHouse because it is a column store and it works only with columns referenced by the query. This method would not be appropriate for testing on MySQL, for example.

CREATE TABLE test
(
    digest String,
    db_server String,
    db_schema String,
    db_username String,
    client_host String,
    period_start DateTime,
    nq_UInt32 UInt32,
    nq_UInt64 UInt64,
    nq_Float32 Float32,
    nq_Float64 Float64
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(period_start)
ORDER BY (digest, db_server, db_username, db_schema, client_host, period_start)
SETTINGS index_granularity = 8192

When testing ClickHouse performance you need to consider compression. Highly compressible data (for example just a bunch of zeroes) will compress very well and may be processed a lot faster than incompressible data. To take this into account we will do a test with three different data sets:

  • Very Compressible when “number of queries” is mostly 1
  • Somewhat Compressible when we use a range from 1 to 1000 and
  • Poorly Compressible when we use range from 1 to 1000000.

Since it’s unlikely that an application will use the full 32 bit range, we haven’t used it for this test.

Another factor which can impact ClickHouse performance is the number of “parts” the table has. After loading the data we ran OPTIMIZE TABLE FINAL to ensure only one part is there on the disk. Note: ClickHouse will gradually delete old files after the optimize command has completed. To avoid these operations interfering with benchmarks, I waited for about 15 minutes to ensure all unused data was removed from the disk.

The amount of memory on the system was enough to cache whole columns in all tests, so this is an in-memory test.

Here is how the table with only one part looks on disk:

root@d01e692c291f:/var/lib/clickhouse/data/pmm/test_lc# ls -la
total 28
drwxr-xr-x 4 clickhouse clickhouse 12288 Feb 10 20:39 .
drwxr-xr-x 8 clickhouse clickhouse 4096 Feb 10 22:38 ..
drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 10 20:30 201902_1_372_4
drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 10 19:38 detached
-rw-r--r-- 1 clickhouse clickhouse 1 Feb 10 19:38 format_version.txt

When you have only one part it makes it very easy to see the space different columns take:

root@d01e692c291f:/var/lib/clickhouse/data/pmm/test_lc/201902_1_372_4# ls -la
total 7950468
drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 10 20:30 .
drwxr-xr-x 4 clickhouse clickhouse 12288 Feb 10 20:39 ..
-rw-r--r-- 1 clickhouse clickhouse 971 Feb 10 20:30 checksums.txt
-rw-r--r-- 1 clickhouse clickhouse 663703499 Feb 10 20:30 client_host.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 client_host.mrk
-rw-r--r-- 1 clickhouse clickhouse 238 Feb 10 20:30 columns.txt
-rw-r--r-- 1 clickhouse clickhouse 9 Feb 10 20:30 count.txt
-rw-r--r-- 1 clickhouse clickhouse 228415690 Feb 10 20:30 db_schema.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 db_schema.mrk
-rw-r--r-- 1 clickhouse clickhouse 6985801 Feb 10 20:30 db_server.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 db_server.mrk
-rw-r--r-- 1 clickhouse clickhouse 19020651 Feb 10 20:30 db_username.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 db_username.mrk
-rw-r--r-- 1 clickhouse clickhouse 28227119 Feb 10 20:30 digest.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 digest.mrk
-rw-r--r-- 1 clickhouse clickhouse 8 Feb 10 20:30 minmax_period_start.idx
-rw-r--r-- 1 clickhouse clickhouse 1552547644 Feb 10 20:30 nq_Float32.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 nq_Float32.mrk
-rw-r--r-- 1 clickhouse clickhouse 1893758221 Feb 10 20:30 nq_Float64.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 nq_Float64.mrk
-rw-r--r-- 1 clickhouse clickhouse 1552524811 Feb 10 20:30 nq_UInt32.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 nq_UInt32.mrk
-rw-r--r-- 1 clickhouse clickhouse 1784991726 Feb 10 20:30 nq_UInt64.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 nq_UInt64.mrk
-rw-r--r-- 1 clickhouse clickhouse 4 Feb 10 20:30 partition.dat
-rw-r--r-- 1 clickhouse clickhouse 400961033 Feb 10 20:30 period_start.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 period_start.mrk
-rw-r--r-- 1 clickhouse clickhouse 2486243 Feb 10 20:30 primary.idx

We can see there are two files for every column (plus some extras), and so, for example, the Float32 based “number of queries” metric store takes around 1.5GB.

You can also use the SQL queries to get this data from the ClickHouse system tables instead:

SELECT *
FROM system.columns
WHERE (database = 'pmm') AND (table = 'test') AND (name = 'nq_UInt32')
Row 1:
──────
database: pmm
table: test
name: nq_UInt32
type: UInt32
default_kind:
default_expression:
data_compressed_bytes: 7250570
data_uncompressed_bytes: 1545913232
marks_bytes: 754848
comment:
is_in_partition_key: 0
is_in_sorting_key: 0
is_in_primary_key: 0
is_in_sampling_key: 0
compression_codec:
1 rows in set. Elapsed: 0.002 sec.
SELECT *
FROM system.parts
WHERE (database = 'pmm') AND (table = 'test')
Row 1:
──────
partition: 201902
name: 201902_1_372_4
active: 1
marks: 47178
rows: 386478308
bytes_on_disk: 1401028031
data_compressed_bytes: 1390993287
data_uncompressed_bytes: 29642900064
marks_bytes: 7548480
modification_time: 2019-02-10 23:26:20
remove_time: 0000-00-00 00:00:00
refcount: 1
min_date: 0000-00-00
max_date: 0000-00-00
min_time: 2019-02-08 14:50:32
max_time: 2019-02-08 15:58:30
partition_id: 201902
min_block_number: 1
max_block_number: 372
level: 4
data_version: 1
primary_key_bytes_in_memory: 4373363
primary_key_bytes_in_memory_allocated: 6291456
database: pmm
table: test
engine: MergeTree
path: /var/lib/clickhouse/data/pmm/test/201902_1_372_4/
1 rows in set. Elapsed: 0.003 sec.

Now let’s look at the queries

We tested with two queries.  One of them – we’ll call it Q1 – is a very trivial query, simply taking the sum across all column values. This query needs only to access one column to return results so it is likely to be the most impacted by a change of data type:

SELECT sum(nq_UInt32)
FROM test

The second query – which we’ll call Q2 – is a typical ranking query which computes the number of queries per period and then shows periods with the highest amount of queries in them:

SELECT
    sum(nq_UInt32) AS cnt,
    period_start
FROM test
GROUP BY period_start
ORDER BY cnt DESC
LIMIT 10

This query needs to access two columns and do more complicated processing so we expect it to be less impacted by the change of data type.

Before we get to results I think it is worth drawing attention to the raw performance we’re getting.  I did these tests on DigitalOcean Droplet with just six virtual CPU cores, yet still I see numbers like these:

SELECT sum(nq_UInt32)
FROM test
┌─sum(nq_UInt32) ──┐
│     386638984    │
└──────────────────┘
1 rows in set. Elapsed: 0.205 sec. Processed 386.48 million rows, 1.55 GB (1.88 billion rows/s., 7.52 GB/s.)

Processing more than 300M rows/sec per core and more than 1GB/sec per core is very cool!

Query Performance

Results between different compression levels show similar differences between column types, so let’s focus on those with the least compression:

Q1 least compression

Q2 least compression

As you can see, the width of the data type (32 bit vs 64 bit) matters a lot more than the type (float vs integer). In some cases float may even perform faster than integer. This was the most unexpected result for me.

Another metric ClickHouse reports is the processing speed in GB/sec. We see a different picture here:

Q1 GB per second

64 bit data types have a higher processing speed than their 32 bit counter parts, but queries run slower as there is more raw data to process.

Compression

Let’s now take a closer look at compression.  For this test we use default LZ4 compression. ClickHouse has powerful support for Per Column Compression Codecs but testing them is outside of scope for this post.

So let’s look at size on disk for UInt32 Column:

On disk data size for UINT32

What you can see from these results is that when data is very compressible ClickHouse can compress it to almost nothing.  The compression ratio for our very compressible data set is about 200x (or 99.5% size reduction if you prefer this metric).

Somewhat compressible data compression rate is 1.4x.  That’s not bad but considering we are only storing 1-1000 range in this column – which requires 10 bits out of 32 – I would hope for better compression. I guess LZ4 is not compressing such data very well.

Now let’s look at compression for a 64 bit integer column:

On disk data size for UINT64

We can see that while the size almost doubled for very compressible data, increases for our somewhat compressible data and poorly compressible data are not that large.  Somewhat compressible data now compresses 2.5x.

Now let’s take a look at Performance depending on data compressibility:

Q1 time for UINT32

Poorly compressible data which takes a larger space on disk is processed faster than somewhat compressible data? This did not make sense. I repeated the run a few times to make sure that the results were correct. When I looked at the compression ratio, though, it suddenly made sense to me.

Poorly compressible data for the UInt32 data type was not compressible by LZ4 so it seems the original data was stored, significantly speeding up “decompression” process.   With somewhat compressible data, compression worked and so real decompression needed to take place too. This makes things slower.

This is why we can only observe these results with UInt32 and Float32 data types.  UInt64 and Float64 show the more expected results:

Q1 time for UINT64

Summary

Here are my conclusions:

  • Even with “slower” data types, ClickHouse is very fast
  • Data type choice matters – but less than I expected
  • Width (32bit vs 64bit) impacts performance more than integer vs float data types
  • Storing a small range of values in a wider column type is likely to yield better compression, though with default compression it is not as good as theoretically possible
  • Compression is interesting. We get the best performance when data can be well compressed. Second best is when we do not have to spend a lot of time decompressing it, as long as it is fits in memory.

by Peter Zaitsev at February 15, 2019 01:23 PM

Jean-Jerome Schmidt

How to Migrate MySQL from Amazon EC2 to your On-Prem Data Center Without Downtime

Since the concept of cloud was born, there has been strong growth in the number of migrations to this environment. However, not everything that shines is gold.

As the demand grows, so does the costs. We can find ourselves in a situation where our monthly cloud expenses are very high and, in this case, it may make sense to migrate back to an on-prem environment.

The costs may not be the only reason. There might be security or compliance requirements, or we may need to have more control of our systems. Knowing what happens at a lower level can help us better optimize things.

AWS not only give us the environment, it also provides us with monitoring and management tools to run our system in the cloud. So, it can be really hard to migrate to an on-prem environment and recreate all these tools to manage our systems in the same way.

In this blog, we will see how we can migrate our systems from AWS to an on-prem datacenter, and how ClusterControl can help us in the process.

Concepts

First of all, let’s see some basic concepts about Amazon Cloud.

AWS

Amazon Web Services (AWS) is an Infrastructure as a Service platform, comprising a large number of independent and semi-independent services. The purpose of Infrastructure as a Service platform is to offer, on a commodity basis, services that previously required the purchase of capital-intensive infrastructure components such as high-end servers, network routers and switches, and for larger enterprises, even their own datacenters.

RDS

Amazon Relational Database Service (RDS) makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching and backups.

Amazon RDS is available on several database instance types and provides you with six familiar database engines to choose from, including Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server.

EC2

Amazon Elastic Compute Cloud (EC2) is a service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.

Amazon EC2’s simple web interface allows you to obtain and configure capacity with minimal friction. It provides you with complete control of your computing resources and lets you run on Amazon’s proven computing environment.

ClusterControl

ClusterControl is a comprehensive management system for open source databases that automates deployment and management functions, as well as health and performance monitoring. There are two versions: Community Edition or Enterprise Edition. ClusterControl supports deployment, management, monitoring and scaling for different database technologies on any environment.

Why Migrate?

As we mentioned at the beginning, the most common reasons to migration from AWS to an on-premise environment would be costs, security, compliance, or even working with running local applications. In AWS, we don’t know what is happening under the hood of the infrastructure. We only know that all is working. In cases where you experience poor performance or other anomalies, the only solution is to get in contact with Amazon support.

Example Migration Scenario

In AWS we have two different products related to this blog: EC2 and RDS.

The main difference between them is that in EC2 you have SSH access to the server and have to manage the database yourself. RDS is a hosted database service, and you only have access to the database instance.

In RDS, as you don’t have SSH access, you need to create a dump and import it into the new server, or you can configure replication and promote the slave to the new master. For both options, the process is manual. Also, you can add some load balancer to improve this process. We covered this task in these blogs: Part 1 and Part 2.

So, let focus on the migration from EC2.

In our example, let’s see how to migrate MySQL from AWS EC2 to an on-prem datacenter. We will use a MySQL Replication environment, but these steps should work for other technologies like PostgreSQL.

We will assume that you have your main MySQL database running on EC2 instance. In the on-prem datacenter, we assume you have ClusterControl installed, as well as a fresh database server to migrate to.

In the AWS console, you should have something like this in the EC2 instances section:

AWS EC2 Section
AWS EC2 Section

First, we’ll import our current master running on EC2 to ClusterControl. For this import process, you must open the port 3306 by editing the Security Group associated with the EC2 instance.

AWS Security Group
AWS Security Group

After this, within ClusterControl, go to the Import section.

ClusterControl Import Section 1
ClusterControl Import Section 1

There, you can choose the technology, in our example MySQL Replication, and we must specify User, Key or Password and port to connect by SSH to our server. We also need the name for our new replication ‘cluster’.

ClusterControl Import Section 2
ClusterControl Import Section 2

After setting up the SSH access information, we must define some database information like the database user, version and basedir. Also, we can enable the ClusterControl Node AutoRecovery and Cluster AutoRecovery features for the new cluster.

Then, we need to add our server by using the IP address or hostname and press Import.

ClusterControl Import Section 3
ClusterControl Import Section 3
ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

We can monitor the status of the import of our setup from the ClusterControl activity monitor.

Once the task is finished, we can see our master in the main ClusterControl screen.

Make sure that you have enabled the binlog generation in your current master database. If not, you can enable it from the Node Action section in ClusterControl.

Now, we can add our future new master as a new replica from our current master database. For this, go to ClusterControl -> Select Cluster -> Cluster Actions -> Add Replication Slave.

ClusterControl Add Replication Slave
ClusterControl Add Replication Slave

Here, we need to add the hostname or IP address of the new slave server, and if we want ClusterControl to install the software for us.

Make sure that you have connectivity from AWS to the port 3306 and 9999 in the on-prem server.

The way ClusterControl stages the slave with data is to take an hotbackup of the master, stream it to the slave and restore it there. Once restored, the slave is connected to the master so it can catch up on events and get in sync. Note that, for large databases running with some load, you might want to avoid the extra load of this operation on the master. In that case, it is possible to build the slave first from an existing backup, and then connect the slave so it catches up with the master.

After this task, we should have something like this:

You can also verify the topology on the ClusterControl Topology section.

ClusterControl Topology View 1
ClusterControl Topology View 1

Then, we need to promote the slave to master (ClusterControl -> Select Cluster -> Node Actions -> Promote slave) and change the endpoint in your application.

To improve this topology, you can add a load balancer to manage the traffic from the application server to the database. Using a load balancer, during the migration, you don’t need to change the endpoint from your application. The load balancer will change the master in a transparent way for your application.

ClusterControl Topology View 2
ClusterControl Topology View 2

There are many ways to perform this task and, probably you should be able to adapt this strategy or similar, to your environment, depending on your infrastructure, security, etc.

For security reasons, you should consider using a VPN between the AWS and the on-premise environment.

In the case of a multi-master topology like Galera Cluster, you only need to add the nodes that you want on-premise, but be careful with the latency. You can use for example different Galera segments to decrease network usage.

Considerations

Some considerations to take into account when we want to leave AWS and start to use our own environment could be:

  • Monitoring: Don’t forget to use some monitoring system. You need to know what is happening in your system.
  • Disaster Recovery Strategy: You should consider some disaster recovery strategy. In general, you should have the information in three different places, for example, Master, Slave, and backup, each in different physical places.
  • High Availability: Nowadays, HA is a must in most production environments, so we need to think about the best HA solution depending on our infrastructure.
  • Scaling: We should be able to scale if it’s needed in the future or for some specific event.
  • Rollback: If you want to migrate from AWS to an on-premise environment, keep in mind that something could go wrong (as in any type of migration), so you should have some rollback plan.
  • If you are after some kind of hybrid environment, with instances running on AWS and on-prem, then ClusterControl can be a good fit for monitoring, managing availability, backups and scaling.
ClusterControl Overview
ClusterControl Overview

by Sebastian Insausti at February 15, 2019 07:26 AM

February 14, 2019

Peter Zaitsev

FOSDEM 2019 – Percona Presentations

FOSDEM Paintings

For those not familiar with it, FOSDEM is an amazing, free entry, full on celebration of open source that takes place in Brussels, Belgium every year. This year the event was held over the first weekend of February. Fringe events, such as the Pre-FOSDEM MySQL day hosted by Oracle MySQL, and the community dinner that follows, provide an opportunity to network.

In case you didn’t make it to FOSDEM this year, here are links to Percona’s presentations from the event. Organizers video and share online every talk from every dev room, a phenomenal achievement in itself. All credit to the volunteers who run this show.

Database Dev Room: Hugepages and databases presented by Fernando Laudares Camargos

 

MySQL, MariaDB and Friends Dev Room: MySQL Replication – Advanced Features presented by Peter Zaitsev

 

MySQL, MariaDB and Friends Dev Room: MySQL Performance Schema in 20 Minutes presented by Sveta Smirnova

 

Monitoring and Observability Dev Room: Using eBPF for Linux Performance Analyses by Peter Zaitsev

Percona enjoyed plenty of attention in the booth area where shared information about our open source, free-as-in-beer projects. We were in Brussels after all!

Evgeniy Patlan, Slava Sarzhan and Alexey Palazhchenko enjoying booth duty

Evgeniy Patlan, Slava Sarzhan and Alexey Palazhchenko enjoying booth duty

Percona Booth FOSDEM 2019

Alexey making sure everything is in order at the Percona booth

Sandra Dannenberg art

Passing artist and open source enthusiast Sandra Dannenberg took a liking to our Percona logos and painted her own versions. They’re great aren’t they? FOSDEM is that kind of event… we’re looking forward already to 2020!

by Lorraine Pocklington, Community Manager at February 14, 2019 11:33 AM

Jean-Jerome Schmidt

Monitoring Your Databases with MySQL Enterprise Monitor

How to Monitor MySQL Databases?

Operational visibility is a must in any production environment. It is crucial to be able to identify any issues as soon as possible, otherwise you may end up in serious troubles as an undetected issue can cause serious service disruption or downtime. MySQL Enterprise Monitor is one of the oldest monitoring products for MySQL on the market, and is available as part of an commercial enterprise subscription agreement from Oracle.In this blog post we will take a look at MySQL Enterprise Monitor and the kind of insight it provides into MySQL.

Installation

First of all, MySQL Enterprise Monitor is part of MySQL Enterprise Edition, a commercial offering from Oracle. It comes in multiple versions of packages, for different operating systems. The installation on Windows 10 (the system we tested on) is pretty much straightforward. MySQL Enterprise Monitor is configured and some bundled services will be installed (MySQL, Tomcat). The tool can be accessed via the browser.

Initial Configuration

First of all, you have to add hosts you would like to monitor.

You can either add single hosts or a batch of them. The dialog window looks the same except that when adding in bulk, you can pass a comma-separated list of servers.

We won’t go into details, but in short you have to define from which host the MySQL instances should be monitored - typically it will be the host on which you installed MySQL Enterprise Monitor. You can also setup agents on your MySQL instances, in that case they will be able to collect data for the host as well, not only MySQL metrics. Then you need to define how to reach the monitored instance (IP address/hostname, user and password). MySQL Enterprise Monitor will then create additional users for tasks like monitoring, which does not require superuser privileges. If you want, you can also configure SSL communication if that’s what the MySQL instance uses, you can also define some timeouts and if a replication topology should be auto-detected or not.

What is also important to keep in mind is that MySQL Enterprise Monitor relies heavily on Performance Schema - make sure your databases have PS enabled, otherwise you will not benefit from a significant part of the features of MySQL Enterprise Monitor.

Monitoring

Once the monitored MySQL instances are configured, you can start to look at the collected data. The Overview section gives you a short summary of some of the most important metrics in MySQL. Data is aggregated and it makes it easier to find any unexpected patterns and then dig further into what happened.

Events tab gives an overview of different issues or events reported by the MySQL Enterprise Monitor and its advisors. You can click at any of the events and read what it is all about, as well as any recommended steps to take:

In this particular case it seems like some queries are doing full table scans and it is recommended to investigate it further to pinpoint such queries and see if they can be optimized.

Another example, here we see that table cache is not configured in an optimal way. You can see the explanation of the problem, advice and recommended actions to take based on this alert.

Metrics

In this tab we can see data for multiple MySQL metrics that are helpful to understand the state of the system.

Timeseries Graphs

Screenshots above are just an example, there are many more graphs to look at.

It is possible to apply filtering: you can define which graphs you would like to see, you can also define what time range should be shown. On top of that, you can just mark a part of the graph and either zoom into it or open the Query Analyzer with data from that particular time:

We will go through this functionality later but in short, it allows you to analyze queries, how their performance changed in time and some example queries.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Table Statistics

This tab gives us insight into table statistics: how the traffic looked like (rows fetched, inserted, updated, deleted) and how the latency looked like for all the row operations.

User Statistics

In this tab MySQL Enterprise Monitor presents data about users - statements executed, latency, table scans, I/O latency, connections, memory utilization. This data should give quite a good insight into which user is responsible for the load on the database. It might be very useful especially in the multi-user environments, where there is no one main source of the traffic.

Database File I/O

Database File I/O explains how the I/O load is distributed across the files in the database. Total number of I/O operations, latency, how many reads and writes were performed on a given file.

Memory Usage

Memory usage shows memory structures in MySQL, which help to build the better picture of the memory utilization in the database. This data can come handy in case of issues with memory - it is easy to track where the growth is the biggest and, if needed, reduce relevant settings. It can also help significantly in diagnosing potential memory leaks.

InnoDB Buffer Pool

This tab in MySQL Enterprise Monitor gives the user insight into the structure of the buffer pool utilization. Which tables are cached, how many dirty pages are there to flush?

Queries

It is extremely important for any MySQL user to understand the load that queries create. Which queries are the most problematic? How they behave in time? Performance can be measured in multiple ways but it is quite common that it is the predictable, stable performance is more important than the top performance. As long as the response time is acceptable, users will like the predictable results better than somewhat faster response (low latency), which can sometimes slow the server down significantly. That’s why it is very valuable to see how a query behaves in time and pinpoint those, which behavior is not consistent.

MySQL Enterprise Monitor definitely delivers such data. On the list of the queries, you can easily see how the latency changed in time. Flat line is good, spikes - not so much. This means such query may have to be investigated further. When you click on it, MySQL Enterprise Monitor will give you more data about it.

As you can see, there are some statistical data about the particular query type, you can also see how the latency changed in time. At the bottom you can see some example statements in time and you can compare their execution time.

When you click on one of them, you will see a full query that was executed at that moment. It can be useful in case of queries where the performance differs depending on what arguments were used in WHERE case (for example, WHERE some_column = ‘some value’ and values in that column are not distributed evenly across the rows).

Replication

In a MySQL replication environment, lag is something you have to learn to deal with. What is important is to keep the track of it - how badly are slaves lagging? How often does it happen? With this information it is possible to try and pinpoint the issue and understand better what queries are causing it. Then you can try to implement some improvements like, for example, multi-threaded replication and track if the changes improved the replication performance and reduced the lag to an acceptable level.

How is MySQL Enterprise Monitor Different from ClusterControl

As we stated, MySQL Enterprise Monitor is a part of the paid MySQL Enterprise Edition. For all users of the MySQL Community, MariaDB or Percona Server, MySQL Enterprise Edition is not available. ClusterControl provides access to monitoring of MySQL in its free Community version. In terms of server and query monitoring, there are many similarities.

ClusterControl gives you access to MySQL metrics collected and stored in the Prometheus time-series database. You can easily keep track of numerous metrics made available in ClusterControl.

ClusterControl also comes with a list of advisors, which can be used to keep track of the health and performance of the database. You can also easily create new advisors using the Developer Studio:

If you are interested in query performance, ClusterControl provides a Query Monitor for you - executed queries are collected and their performance is compared making it easy for the user to pinpoint which queries use the most of the CPU on the database.

You can see statistic data on the queries - executions, rows sent and examined, execution time. You can also check the explain plan for a particular query type.

Monitoring Polyglot Persistence

One big difference is the ability to monitor all the main variants of the MySQL ecosystem (Oracle MySQL, MariaDB and Percona Server), different clustering technologies (NDB Cluster, Group Replication, asynchronous replication and Galera Cluster), load balancers/proxies (HAProxy, Keepalived, Maxscale, ProxySQL) as well as other open source databases (PostgreSQL and MongoDB).

Automation and Management

ClusterControl also provides functionality to deploy single instances or clusters on-prem or in the cloud (AWS, GCE and Azure), as well as features like backup management, automatic failover and recovery/repair, rolling upgrades, cluster management for replication or cluster setups, scaling, etc.

That’s all for today folks. If you have worked with MySQL Enterprise Monitor and would like to add something, please do so in the comments section.

by krzysztof at February 14, 2019 10:48 AM

February 13, 2019

Peter Zaitsev

plprofiler – Getting a Handy Tool for Profiling Your PL/pgSQL Code

plprofiler postgres performance tool

PostgreSQL is emerging as the standard destination for database migrations from proprietary databases. As a consequence, there is an increase in demand for database side code migration and associated performance troubleshooting. One might be able to trace the latency to a plsql function, but explaining what happens within a function could be a difficult question. Things get messier when you know the function call is taking time, but within that function there are calls to other functions as part of its body. It is a very challenging question to identify which line inside a function—or block of code—is causing the slowness. In order to answer such questions, we need to know how much time an execution spends on each line or block of code. The plprofiler project provides great tooling and extensions to address such questions.

Demonstration of plprofiler using an example

The plprofiler source contains a sample for testing plprofiler. This sample serves two purposes. It can be used for testing the configuration of plprofiler, and it is great place to see how to do the profiling of a nested function call. Files related to this can be located inside the “examples” directory. Don’t worry—I’ll be running through the installation of plprofiler later in this article.

$ cd examples/

The example expects you to create a database with name “pgbench_plprofiler”

postgres=# CREATE DATABASE pgbench_plprofiler;
CREATE DATABASE

The project provides a shell script along with a source tree to test plprofiler functionality. So testing is just a matter of running the shell script.

$ ./prepdb.sh
dropping old tables...
....

Running session level profiling

This profiling uses session level local-data. By default the plprofiler extension collects runtime data in per-backend hashtables (in-memory). This data is only accessible in the current session, and is lost when the session ends or the hash tables are explicitly reset. plprofiler’s run command will execute the plsql code and capture the profile information.

This is illustrated by below example,

$ plprofiler run --command "SELECT tpcb(1, 2, 3, -42)" -d pgbench_plprofiler --output tpcb-test1.html
SELECT tpcb(1, 2, 3, -42)
-- row1:
tpcb: -42
----
(1 rows)
SELECT 1 (0.073 seconds)

What happens during above plprofiler command run can be summarised in 3 steps:

  1. A function call with four parameters “SELECT tpcb(1, 2, 3, -42)” is presented to the plprofiler tool for execution.
  2. plprofiler establishes a connection to PostgreSQL and executes the function
  3. The tool collects the profile information captured in the local-data hash tables and generates an HTML report “tpcb-test1.html”

Global profiling

As mentioned previously, this method is useful if we want to profile the function executions in other sessions or on the entire database. During global profiling, data is captured into a shared-data hash table which is accessible for all sessions in the database. The plprofiler extension periodically copies the local-data from the individual sessions into shared hash tables, to make the statistics available to other sessions. See the

plprofiler monitor
  command, below, for details. This data still relies on the local database system catalog to resolve Oid values into object definitions.

In this example, the plprofiler tool will be running in monitor mode for a duration of 60 seconds. Every 10 seconds, the tool copies data from local-data to shared-data.

$ plprofiler monitor --interval=10 --duration=60 -d pgbench_plprofiler
monitoring for 60 seconds ...
done.

For testing purposes you can start executing a few functions at the same time.

Once the data is captured into shared-data, we can generate a report. For example:

$ plprofiler report --from-shared --title=MultipgMax --output=MultipgMax.html -d pgbench_plprofiler

The data in shared-data will be retained until it’s explicitly cleared using the

plprofiler reset
  command

$ plprofiler reset

If there is no profile data present in the shared hash tables, execution of the report will result in error message.

$ plprofiler report --from-shared --title=MultipgMax --output=MultipgMax.html
Traceback (most recent call last):
File "/usr/bin/plprofiler", line 11, in <module>
load_entry_point('plprofiler==4.dev0', 'console_scripts', 'plprofiler')()
File "/usr/lib/python2.7/site-packages/plprofiler-4.dev0-py2.7.egg/plprofiler/plprofiler_tool.py", line 67, in main
return report_command(sys.argv[2:])
File "/usr/lib/python2.7/site-packages/plprofiler-4.dev0-py2.7.egg/plprofiler/plprofiler_tool.py", line 493, in report_command
report_data = plp.get_shared_report_data(opt_name, opt_top, args)
File "/usr/lib/python2.7/site-packages/plprofiler-4.dev0-py2.7.egg/plprofiler/plprofiler.py", line 555, in get_shared_report_data
raise Exception("No profiling data found")
Exception: No profiling data found

Report on profile information

The HTML report generated by plprofiler is a self-contained HTML document and it gives detailed information about the PL/pgSQL function execution. There will be a clickable FlameGraph at the top of the report with details about functions in the profile. The plprofiler FlameGraph is based on the actual Wall-Clock time spent in the PL/pgSQL functions. By default, plprofiler provides details on the top ten functions, based on their self_time (total_time – children_time).

This section of the report is followed by tabular representation of function calls. For example:

This gives a lot of detailed information such as execution counts and time spend against each line of code.

Binary Packages

Binary distributions of plprofiler are not common. However the BigSQL project provides plprofiler packages as an easy to use bundle. Such ready-to-use packages are one of the reasons for BigSQL to remain as one of the most developer friendly PostgreSQL distributions. The first screen of Package manager installation of BigSQL provided me with the information I am looking for:


Appears that there was a recent release of BigSQL packages and plprofiler is an updated package within that.

Installation and configuration is made simple:

$ ./pgc install plprofiler-pg11
['plprofiler-pg11']
File is already downloaded.
Unpacking plprofiler-pg11-3.3-1-linux64.tar.bz2
install-plprofiler-pg11...
Updating postgresql.conf file:
old: #shared_preload_libraries = '' # (change requires restart)
new: shared_preload_libraries = 'plprofiler'

As we can see, even PostgreSQL parameters are updated to have plprofiler as a

shared_preload_library
 .  If need to use plprofiler for investigating code, these binary packages from the BigSQL project are my first preference because everything is ready to use. Definitely, this is developer-friendly.

Creation of extension and configuring the plprofiler tool

At the database level, we should create the plprofiler extension to profile the function execution. This step needs to be performed in both cases, whether we want global profiling where share_preload_libraries are set, or at session level where that is not required

postgres=# create extension plprofiler;
CREATE EXTENSION

plprofiler is not just an extension, but comes with tooling to invoke profiling or to generate reports. These scripts are primarily coded in Python and uses psycopg2 to connect to PostgreSQL. The python code is located inside the “python-plprofiler” directory of the source tree. There are a few perl dependencies too which will be resolved as part of installation

sudo yum install python-setuptools.noarch
sudo yum install python-psycopg2
cd python-plprofiler/
sudo python ./setup.py install

Building from source

If you already have a PostgreSQL instance running using binaries from PGDG repository OR you want to wet your hands by building everything from source, then installation needs a different approach. I have PostgreSQL 11 already running on the system. The first step is to get the corresponding development packages which have all the header files and libraries to support a build from source. Obviously this is the thorough way of getting plprofiler working.

$ sudo yum install postgresql11-devel

We need to have build tools, and since the core of plprofiler is C code, we have to install a C compiler and make utility.

$ sudo yum install gcc make

Preferably, we should build plprofiler using the same OS user that runs PostgreSQL server, which is “postgres” in most environments. Please make sure that all PostgreSQL binaries are available in the path and that you are able to execute the pg_config, which lists out build related information:

$ pg_config
BINDIR = /usr/pgsql-11/bin
..
INCLUDEDIR = /usr/pgsql-11/include
PKGINCLUDEDIR = /usr/pgsql-11/include
INCLUDEDIR-SERVER = /usr/pgsql-11/include/server
LIBDIR = /usr/pgsql-11/lib
PKGLIBDIR = /usr/pgsql-11/lib
LOCALEDIR = /usr/pgsql-11/share/locale
MANDIR = /usr/pgsql-11/share/man
SHAREDIR = /usr/pgsql-11/share
SYSCONFDIR = /etc/sysconfig/pgsql
PGXS = /usr/pgsql-11/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--enable-rpath' '--prefix=/usr/pgsql-11' '--includedir=/usr/pgsql-11/include' '--mandir=/usr/pgsql-11/share/man' '--datadir=/usr/pgsql-11/share' '--with-icu' 'CLANG=/opt/rh/llvm-toolset-7/root/usr/bin/clang' 'LLVM_CONFIG=/usr/lib64/llvm5.0/bin/llvm-config' '--with-llvm' '--with-perl' '--with-python' '--with-tcl' '--with-tclconfig=/usr/lib64' '--with-openssl' '--with-pam' '--with-gssapi' '--with-includes=/usr/include' '--with-libraries=/usr/lib64' '--enable-nls' '--enable-dtrace' '--with-uuid=e2fs' '--with-libxml' '--with-libxslt' '--with-ldap' '--with-selinux' '--with-systemd' '--with-system-tzdata=/usr/share/zoneinfo' '--sysconfdir=/etc/sysconfig/pgsql' '--docdir=/usr/pgsql-11/doc' '--htmldir=/usr/pgsql-11/doc/html' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' 'LDFLAGS=-Wl,--as-needed' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
CC = gcc
...
VERSION = PostgreSQL 11.1

Now we’re ready to get the source code and build it. You should be able to checkout the git repository for plprofiler.

$ git clone https://github.com/pgcentral/plprofiler.git
Cloning into 'plprofiler'...
...

Building against PostgreSQL 11 binaries from PGDG can be a bit complicated because of th JIT feature. Configuration flag

--with-llvm
  will be enabled. So we may have to have LLVM present in the system as detailed in my previous blog about JIT in PostgreSQL11

Once we’re ready, we can move to the plprofiler directory and build it:

$ cd plprofiler
$ make USE_PGXS=1
--- Output ----
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC -I. -I./ -I/usr/pgsql-11/include/server -I/usr/pgsql-11/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include -c -o plprofiler.o plprofiler.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC -shared -o plprofiler.so plprofiler.o -L/usr/pgsql-11/lib -Wl,--as-needed -L/usr/lib64/llvm5.0/lib -L/usr/lib64 -Wl,--as-needed -Wl,-rpath,'/usr/pgsql-11/lib',--enable-new-dtags
/opt/rh/llvm-toolset-7/root/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -I. -I./ -I/usr/pgsql-11/include/server -I/usr/pgsql-11/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include -flto=thin -emit-llvm -c -o plprofiler.bc plprofiler.c

Now we should be able to install this extension:

$ sudo make USE_PGXS=1 install
--- Output ----
/usr/bin/mkdir -p '/usr/pgsql-11/lib'
/usr/bin/mkdir -p '/usr/pgsql-11/share/extension'
/usr/bin/mkdir -p '/usr/pgsql-11/share/extension'
/usr/bin/install -c -m 755 plprofiler.so '/usr/pgsql-11/lib/plprofiler.so'
/usr/bin/install -c -m 644 .//plprofiler.control '/usr/pgsql-11/share/extension/'
/usr/bin/install -c -m 644 .//plprofiler--1.0--2.0.sql .//plprofiler--2.0--3.0.sql .//plprofiler--3.0.sql '/usr/pgsql-11/share/extension/'
/usr/bin/mkdir -p '/usr/pgsql-11/lib/bitcode/plprofiler'
/usr/bin/mkdir -p '/usr/pgsql-11/lib/bitcode'/plprofiler/
/usr/bin/install -c -m 644 plprofiler.bc '/usr/pgsql-11/lib/bitcode'/plprofiler/./

The above command expects all build tools to be in the proper path even with sudo.

Profiling external sessions

To profile a function executed by another session, or all other sessions, we should load the libraries at global level. In production environments, that will be the case. This can be done by adding the extension library to the

shared_preload_libraries
  specification. You won’t need this if you only want to profile functions executed within your session. Session level profiling is generally possible only in Dev/Test environments.

To enable global profiling, verify the current value of

shared_preload_libraries
  and add plprofiler to the list.

postgres=# show shared_preload_libraries ;
shared_preload_libraries
--------------------------
(1 row)
postgres=# alter system set shared_preload_libraries = 'plprofiler';
ALTER SYSTEM
postgres=#

This change requires us to restart the PostgreSQL server

$ sudo systemctl restart postgresql-11

After the restart, it’s a good idea to verify the parameter change

postgres=# show shared_preload_libraries ;
shared_preload_libraries
--------------------------
plprofiler
(1 row)

From this point onwards, the steps are same as those for the binary package setup discussed above.

Summary

plprofiler is a wonderful tool for developers. I keep seeing many users who are in real need of it. Hopefully this blog post will help those who never tried it.

by Jobin Augustine at February 13, 2019 06:20 PM

Jean-Jerome Schmidt

Basic Administration Comparison Between Oracle, MSSQL, MySQL, PostgreSQL

The introduction of DevOps in organizations has changed the development process and also introduced some new challenges. In addition, developers and DevOps teams, along with their own chosen programming languages, also have their favorite database systems.

The product life cycle is getting shorter each year so developers want to be able to develop fast, using technologies they know best.

Having multiple RDBMS database backends means your organization will become more agile on the development side, but it also imposes additional knowledge on the operation teams.

Extending your infrastructure from one to many databases implies you have to also monitor, manage and scale them.

As every storage backend excels at different use cases, this also means you have to reinvent the wheel for every one of them.

Knowing the similarities and key differences will help you to immerse into different flavors of RDBMS.

In this article we will go through the following points:

  • A brief introduction to the platform
    • Oracle, MSSQL, MySQL , PostgreSQL
  • Platform support
  • Installation process
  • Database access
  • Backup process
  • Controlling query execution
  • Security
  • Replication options
  • Community support

A brief introduction to the platform

PostgreSQL is for many recognized as the world's most advanced open source database. It is a fully open source database system released under its own license, the PostgreSQL License, comparable to the MIT or BSD licenses. The PostgreSQL community is active and continuously improving existing and new features. As per the DB-engine popularity rank, PostgreSQL was the DBMS of the year 2017 and 2018. The DB-Engines popularity shows that the trend didn’t change over the years.

An interesting fact is that PostgreSQL didn’t support SQL until 1994. The QUEL language was used to query data from it. SQL support was added later on.

PostgreSQL has many advanced features that other enterprise database management systems offer, such as such as views, stored procedures, indexes, and triggers in addition to the primary key, foreign key and atomicity features.

PostgreSQL can be extended by users by modifying existing features, adding new features and distributed freely as it is open-source. It runs on major platforms such as UNIX, MacOS, Windows, and Linux etc. It supports video, text, audio, images, programming interfaces for different languages. The list of supported languages includes C/C++, Java, Python, Perl etc.

Oracle is one of the largest vendors of RDBMS (relational database management system) in the IT world. It is known as an Oracle database, Oracle DB or Oracle marketed by Oracle.

Oracle Database is being used by many companies in the IT industry for transaction processing, business analytics, business intelligence application purpose, etc..

Oracle has a long and very interesting history:

On 16th June 1977 Software Development Laboratories (SDL) was created in Santa Clara, California by Larry Ellison, Bob Miner, and Ed Oates. In 1977 Oracle took its name from the CIA project codename and the irst commercialized Oracle RDBMS is shown to the world in 1979.

Oracle database is available in different editions such as Enterprise edition Standard edition, Express edition, and Oracle Lite. The biggest competitor for Oracle database is the Microsoft SQL server.

Microsoft SQL Server is a very popular RDBMS with restrictive licensing and modest cost of ownership if the database is of significant size, or is used by a significant number of clients.

It's one of the three market-leading database technologies, along with Oracle Database and IBM's DB2.

It provides a very user-friendly interface and easy to learn, which has resulted in a large installed user base.

Like other RDBMS software, Microsoft SQL Server is built on top of SQL, a standardized programming language that database administrators (DBAs) and other IT professionals use to manage databases and query the data they contain. SQL Server is tied to Transact-SQL (T-SQL), an implementation of SQL from Microsoft that adds a set of proprietary programming extensions to the standard language.

MySQL

MySQL is an Oracle-backed open source relational database management system based on SQL.

Originally conceived by the Swedish company MySQL AB, MySQL was acquired by Sun Microsystems in 2008 and then by Oracle when it bought Sun in 2010.

Developers can use MySQL under the GNU General Public License (GPL). The Enterprise version comes with support and additional features for security and high availability.

It's the second most popular database in the world according to db-engines ranking and probably the most present database backend on the planet as it runs most of the internet services around the globe. MySQL runs on virtually all platforms, including Linux, UNIX, and Windows.

MySQL is an important component of an open source enterprise stack called LAMP.

LAMP is a web development platform that uses Linux as the operating system, Apache as the web server, MySQL as the relational database management system and PHP as the object-oriented scripting language.

Platform support

Oracle

The most popular version of Oracle DB, Oracle 12c is a truly enterprise RDBMS system which is supported on a variety of operating systems and platforms. Oracle dominates the database world in part because it runs on dozens of platforms, everything from a Mainframe, Sparc, Mac to Intel. The list includes following OS and architecture combinations: Linux on x86-64 (only Red Hat Enterprise Linux, Oracle Linux, and SUSE distributions are supported) Microsoft Windows on x86-64. Oracle Solaris on SPARC and x86-64. IBM AIX on POWER Systems. Linux on IBM zEnterprise Systems HP-UX on Itanium.

MSSQL

Being a Microsoft product, SQL was designed to be very much compatible with Windows OS. On November 16, 2016, Microsoft announced the beginning of a new story: SQL Server is now supported on Linux and Docker. Hell freezes over!

MySQL

MYSQL carries out smoother execution on all platforms like Microsoft, UNIX, Linux, Mac etc.

PostgreSQL

In general, PostgreSQL can be expected to work on various (even exotic) CPU architectures and operating systems.

It includes CPU architectures like x86, x86_64, IA64, PowerPC, PowerPC 64, S/390, S/390x, Sparc, Sparc 64, Alpha, ARM, MIPS, MIPSEL, M68K, and PA-RISC. It is often possible to build on an unsupported CPU type by configuring with --disable-spinlocks, but performance will be poor.

PostgreSQL can be expected to work on the following operating systems: Linux (all recent distributions), Windows (Win2000 SP4 and later), FreeBSD, OpenBSD, NetBSD, Mac OS X, AIX, HP/UX, IRIX, Solaris, Tru64 Unix, and UnixWare.

Installation Process

Oracle

From all four presented databases systems, Oracle has the most complex system requirements which comes with a complex installation process. On both Windows and Linux based platforms Oracle uses a dedicated Oracle Universal Installer (OUI) tool as the main installation process. The OUI is used to install the Oracle Database software. OUI is a graphical user interface utility that enables you to:

  • View the Oracle software that is installed on your machine
  • Install new Oracle Database software
  • Delete Oracle software that is no longer required.

During the installation process, OUI will start the Oracle Database Configuration Assistant (DBCA) which can install a pre-created default database that contains example schemas or can guide you through the process of creating and configuring a customized database.
 

Oracle OUI - installation interface
Oracle OUI - installation interface

If you do not create a database during installation, you can invoke DBCA after you have installed the software, to create one or more databases.

MSSQL

Beginning with SQL Server 2016 (13.x), SQL Server is only available as a 64-bit application.

Installation happens via the Installation Wizard, a command prompt, or through sysprep tool.

The Installation Wizard runs the SQL Server Installation Center. To create a new installation of SQL Server, select the Installation option on the left side, and then click New SQL Server stand-alone installation or add features to an existing installation.

The Linux based installation is very similar to the open source database installation method. It supports packaging for Debian and RedHat based systems. The steps consist of repository configuration, package installation and post-installation configuration, quite similar to MySQL. The whole process is greatly described in the following article.

MSSQL Installation Wizard
MSSQL Installation Wizard



MySQL

Oracle provides a set of binary distributions of MySQL. These include generic binary distributions in the form of compressed tar files (files with a .tar.gz extension) for a number of platforms, and binaries in platform-specific packages. On the Windows platform, the installation process is triggered by the standard installation wizard via GUI.

PostgreSQL

PostgreSQL is available in a majority of Linux distributions so it’s very likely you can install it through a simple yum or apt-get command. For the HA configuration, you can use the ClusterControl s9s tool or GUI. S9S tools can help you to create a PostgreSQL cluster with just one single line command:

$ s9s cluster \
--create \
--cluster-type=postgresql \
--nodes="192.168.0.91?master;192.168.0.92?slave;192.168.0.93?slave" \
--provider-version='11' \
--db-admin='postgres' \
--db-admin-passwd='s3cr3tP455' \
--os-user=root \
--os-key-file=/root/.ssh/id_rsa \
--cluster-name='PostgreSQL 11 Streaming Replication' \
--wait
Creating PostgreSQL Cluster
\ Job 259 RUNNING    [█▋        ]  15% Installing helper packages

For more information, check this blog.

Access to the database and DB creation

Oracle

Oracle separates the process of the binary and database creation. Unlike other popular database systems, database creation involves much more steps.

The Database Configuration Assistant (DBCA) is the preferred way to create a database because it can do it in a much more automated approach. DBCA can be launched by the Oracle Universal Installer (OUI), depending on the type of install that you select. You can also launch DBCA as a standalone tool at any time after Oracle Database Installation.

You can run DBCA in interactive mode or non-interactive/silent mode. Interactive mode provides a graphical interface and guided workflow for creating and configuring a database. Non-interactive/silent mode enables you to script the database creation. You can run DBCA in non-interactive/silent mode by specifying command-line arguments, a response file or both.

Oracle DBCA - database creation
Oracle DBCA - database creation

When a database is created you can access it with a dedicated client called sqlplus. SQL*Plus is a terminal client program with which you can access Oracle Database.

MSSQL

SQL Server Management Studio (SSMS) is the main tool for administering the Database Engine and writing Transact-SQL code. SSMS is available as a free download from the Microsoft Download Center. The latest version can be used with older versions of the Database Engine.

Management Studio is a preferred method to create a new database. To create a database in Microsoft SQL Server, connect to the computer where Microsoft SQL Server is installed using an administrator account.
Start Microsoft SQL Server Management Studio and choose to create a database option. The wizard process will walk you through the process. If you prefer command line this can be done with CREATE DATABASE syntax.

MySQL

In order to access your MySQL database use mysql client. The database creation is as simple as CREATE DATABASE <name>.

PostgreSQL

PostgreSQL database has the option for multiple ‘schemas’ which operate similarly to databases in MySQL.

Schemas contain the tables, indexes, etc, and can be accessed simultaneously by the same connection to the database that houses them. Access methods for PostgreSQL are defined in a file: pg_hba.conf. It can be located in various places. On Ubuntu 14.04 it is located in /etc/postgresql/9.3/main/pg_hba.conf, on Centos 7 on the other hand it’s located by default in /var/lib/pgsql/data/pg_hba.conf.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Backup process

Oracle

Oracle has the most complex, dedicated built-in backup tool of all four servers described here; it’s called Recovery Manager (RMAN).

RMAN allows you to run sophisticated backup policies and selective restores. The same operations usually require a lot of manual steps in other RDBMS.

We can take backups in two ways:

  • disabling the database and copying physical files (so-called cold backup)
  • using RMAN and make a backup without disabling the database (hot backup)

To make a hot backup, set the base in ARCHIVELOG mode. This will tell Oracle to not keep the copy of redo log files as an archivelogs.

MSSQL

In the MS SQL world, you can use the built-in T-SQL commands to backup and restore databases. There is no need to use tools like mysqlhotcopy and mysqldump.

MS SQL Server offers three different online backup strategies:

  • Simple Recovery Model (ALTER DATABASE dbname SET RECOVERY SIMPLE)
  • Full Recovery Model (ALTER DATABASE dbname SET RECOVERY FULL)
  • Bulk-Logged Recovery Model (ALTER DATABASE dbname SET RECOVERY BULK_LOGGED)

The recommended model is the full recovery if no data loss is acceptable. This mode is similar to the MySQL feature when the binary log is enabled. You can recover the database to any point of time, but you should regularly back up the transaction log as well as the database.

The bulk-logged model can be used for large bulk operations such as importing data or creating indexes on big tables. It’s rather less common method to run a database, especially production. It does not support point-in-time recovery so it is generally used as a temporary solution.

The Simple model is useful when the database is rarely updated or for testing and development purposes. In SIMPLE mode, the transaction log of the database is cut each time after the transaction is completed. In the other modes, the log is truncated via CHECKPOINT statement or after the transaction backup file. In case the database is damaged, only the most recent backup can be recovered and all changes since this backup are lost.

MySQL

Two most popular backup utilities are available for MySQL and MariaDB, namely mysqldump logical backup and binary backup Percona XtraBackup and MariaBackup (a fork of Percona XtraBackup). MySQL Enterprise version offers also mysqlbackup which is similar to XtraBackup and MariaBackup hot backup tools.

PostgreSQL

Most DBMS's provide some built-in backup tools. PostgreSQL has pg_dump and pg_dumpall out of the box. However, you may want to use some other tools for your production databases. More information can be found in the top backup tools for PostgreSQL article.

Controlling Query execution and concurrency support

Oracle

In Oracle, all the database objects are grouped by schemas. Schemas are collection of database objects and all the database objects are shared among all schemas and users. It can be translated to MySQL databases. Even though it is all shared, each user can be limited to certain schemas and tables via roles and permissions. This concept is quite similar to MySQL databases.Hi

MSSQL

MS SQL Server organizes all objects, such as tables, views, and procedures, by database names. Users are assigned to a log in, which is granted access to the specific database and its objects. Also, in SQL Server each database has a private, unshared disk file on the server.

MySQL

MySQL only has MVCC support in InnoDB. It is a storage engine and by default is available in MySQL. It also provides ACID-complaint features like foreign key support and transaction handling. By default, each query is treated as a separate transaction, which is a different approach than in Oracle DB.

PostgreSQL

Postgres engine performs concurrency control by using a method called MVCC (Multiversion Concurrency Control). For every user connected to the database, the Postgres database gives a snapshot of the database at a particular instance. When the database must to update an item, it will add the newer version and point the old version as obsolete. It allows the database to save overhead but requires a regulated sweep to delete the old, outdated data.

Security

Oracle

Security features are great, the system provides multi-layered security including controls to evaluate risks, prevent unauthorized data disclosure, detect and report on database activities and enforce data access controls.

MSSQL

Security features are modest, the RDBMS offers fewer features than Oracle but still much more than Open Source database systems.

MySQL

MySQL implements security based on Access Control Lists (ACLs) for all connections, queries, and other operations that a user may attempt to perform. There is also some support for SSL-encrypted connections between MySQL clients and servers.

PostgreSQL

PostgreSQL has ROLES and inherited roles to set and maintain permissions. PostgreSQL has native SSL support for connections to encrypt client/server communications. It also has Row Level Security.
In addition to this, PostgreSQL comes with a built-in enhancement called SE-PostgreSQL which provides additional access controls based on SELinux security policy. More details here.

Community Support

Oracle

Oracle database, similarly to MySQL, has a large community, mostly organized around https://community.oracle.com and passionate groups in any locations around the world like for example https://poug.org/en/. The paid support gives you access to the support group previously known as metalink, not support.oracle.com.

MSSQL

Compared to other database systems, MSSQL probably has the least organized community groups but still very active. Microsoft does a great job in promoting its products in the universities. This gives young developers, devops and DBAs easy access to the technology (free licenses) and any necessary materials.

MySQL

MySQL has a large community of contributors who, particularly following the acquisition by Oracle, focus mainly on maintaining existing features with some new features emerging occasionally. The advantage over other open source databases is a very strong external vendor eco-system. Companies like MariaDB and Percona not only offer great support but also contribute by adding enterprise features into their open source versions.

PostgreSQL

PostgreSQL has a very strong and active community. Its community improves existing features while its innovative committers strive to ensure it remains the most advanced database with new features and security, limiting the distance between Oracle and MSSQL databases. PostgreSQL is known for having more features than other RDBMS on the market.

Replication options

Oracle

Oracle offers logical and physical replication through a built-in Oracle Data Guard. It is an enterprise feature.
Data Guard is a Ship Redo / Apply Redo technology, "redo" is the information needed to recover transactions.

A production database referred to as a primary database broadcasts redo to one or more replicas referred to as standby databases. When an insert or update is made to a table, this change is captured by the log writer into an archive log, and replicated to the standby system.

Standby databases are in a continuous phase of recovery, verifying and applying redo to maintain synchronization with the primary database. A standby database will also automatically re-synchronize if it becomes temporarily disconnected to the primary database due to power outages, network problems, etc.

For more flexible replication options like multisource, selective replication you should consider an extra paid tool, Oracle Golden Gate.

MSSQL

Microsoft SQL Server provides the following types of replication for use in distributed applications:

  • Transactional replication
  • Merge replication
  • Snapshot replication

It can be greatly extended with Microsoft Integration Services, giving you an option to customize the replication flow out of the box.

PostgreSQL

PostgreSQL has several options available, each with its own pros and cons, depending on what is needed through replication. The build options are based on Write Ahead Log. Files are shipped to a standby server where they are read and replayed, or Streaming Replication, where a read-only standby server fetches transaction logs over a database connection to replay them. In the case of a more sophisticated replication architecture, you would probably like to check Slony (master to multiple slaves) or Bucardo (multimaster).

MySQL

MySQL Replication is probably the most popular high availability solution for MySQL,
and widely used by top web services.

It is easy to set up but ongoing maintenance like software upgrades, schema changes, topology changes, failover and recovery have always been tricky.

MySQL replication does not require any third party tools, both master-slave and multimaster can be done out of the box.

The recent versions of MySQL added multi source replication and Global transaction id which make it even more reliable and easier to maintain.

Conclusion

Priority databases like Oracle and MSSQL offer robust management systems and fine support. Among the long list of supported features, users can get the reassuring feeling of access to enterprise support and paid knowledge systems.

On the other side, the cost of the license, not that big of a feature gap and enterprise plugins, will make you eager to shift to the open source decision easier than ever.

Using predefined processes and automation can not only save you time but also protect you from common mistakes.

A management platform that systematically addresses all the different aspects of the database lifecycle will be more robust than patching together a number of point solutions.

by Bart Oles at February 13, 2019 10:48 AM

February 12, 2019

Peter Zaitsev

Debugging MariaDB Galera Cluster SST Problems – A Tale of a Funny Experience

MariaDB galera cluster starting time

MariaDB galera cluster starting timeRecently, I had to work on an emergency for a customer who was having a problem restarting a MariaDB Galera Cluster. After a failure in the cluster they decided to restart the cluster entirely following the right path: bootstrapping the first node, and then adding the rest of the members, one by one. Everything went fine until one of the nodes rejected the request to join the cluster.

Given this problem, the customer decided to ask us to help with the problematic node because none of the tests they did worked, and the same symptom repeated over and over: SST started, copied few gigs of data and then it just hanged (apparently) while the node remained out of the cluster.

Identifying the issue…

Once onboard with the issue, initially I just checked that the cluster was trying a SST: given the whole dataset was about 31GB I decided to go directly to a healthy solution: to clean up the whole datadir and start afresh. No luck at all, the symptom was exactly the same no matter what I tried:

After reviewing the logs I noticed few strange things. In the joiner:

2019-01-29 16:14:41 139996474869504 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (18153472-f958-11e8-ba63-fae8ac6c22f8): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2019-01-29 16:14:41 139996262553344 [Note] WSREP: Member 3.0 (node1) requested state transfer from '*any*'. Selected 0.0 (node3)(SYNCED) as donor.
2019-01-29 16:14:41 139996262553344 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 4902465)
2019-01-29 16:14:41 139996474869504 [Note] WSREP: Requesting state transfer: success, donor: 0
2019-01-29 16:14:41 139996474869504 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 18153472-f958-11e8-ba63-fae8ac6c22f8:4902465
2019-01-29 16:14:42 139996270946048 [Note] WSREP: (9864c6ca, 'tcp://0.0.0.0:4567') connection to peer 9864c6ca with addr tcp://192.168.12.21:4567 timed out, no messages seen in PT3S
2019-01-29 16:14:42 139996270946048 [Note] WSREP: (9864c6ca, 'tcp://0.0.0.0:4567') turning message relay requesting off
2019-01-29 16:16:08 139996254160640 [ERROR] WSREP: Process was aborted.
2019-01-29 16:16:08 139996254160640 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.12.21' --datadir '/var/lib/mysql/' --parent '8725' --binlog '/var/log/mysql/mariadb-bin' --binlog-index '/var/log/mysql/mariadb-bin.index': 2 (No such file or directory)

In the donor (output has been a obfuscated to avoid sharing private info and the times are not matching deliberately):

Jan 29 18:08:22 node3 -innobackupex-backup: 190129 18:08:22 >> log scanned up to (203524317205)
Jan 29 18:08:23 node3 -innobackupex-backup: 190129 18:08:23 >> log scanned up to (203524318337)
Jan 29 18:08:24 node3 -innobackupex-backup: 190129 18:08:24 >> log scanned up to (203524320436)
Jan 29 18:08:25 node3 -innobackupex-backup: 190129 18:08:25 >> log scanned up to (203524322720)
Jan 29 18:08:25 node3 nrpe[25546]: Error: Request packet type/version was invalid!
Jan 29 18:08:25 node3 nrpe[25546]: Client request was invalid, bailing out...
Jan 29 18:08:26 node3 -innobackupex-backup: 190129 18:08:26 >> log scanned up to (203524322720)
Jan 29 18:08:27 node3 -innobackupex-backup: 190129 18:08:27 >> log scanned up to (203524323538)
Jan 29 18:08:28 node3 -innobackupex-backup: 190129 18:08:28 >> log scanned up to (203524324667)
Jan 29 18:08:29 node3 -innobackupex-backup: 190129 18:08:29 >> log scanned up to (203524325358)
Jan 29 18:08:30 node3 -wsrep-sst-donor: 2019/01/29 18:08:30 socat[22843] E write(6, 0x1579220, 8192): Broken pipe
Jan 29 18:08:30 node3 -innobackupex-backup: innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
Jan 29 18:08:30 node3 -innobackupex-backup: xb_stream_write_data() failed.
Jan 29 18:08:30 node3 -innobackupex-backup: innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
Jan 29 18:08:30 node3 -innobackupex-backup: [01] xtrabackup: Error: xtrabackup_copy_datafile() failed.
Jan 29 18:08:30 node3 -innobackupex-backup: [01] xtrabackup: Error: failed to copy datafile.
Jan 29 18:08:30 node3 mysqld[27345]: 2019-01-29 18:08:30 140562136139520 [Warning] Aborted connection 422963 to db: 'unconnected' user: 'sstuser' host: 'localhost' (Got an error reading communication packets)
Jan 29 18:08:30 node3 -wsrep-sst-donor: innobackupex finished with error: 1. Check /var/lib/mysql//innobackup.backup.log
Jan 29 18:08:30 node3 -wsrep-sst-donor: Cleanup after exit with status:22

So SST starts correctly and then failed. I tried forcing different donors, checked firewall rules, etc. Nothing.

Additionally I noticed that the process was starting over and over, while monitoring,  .ssh folder was growing up to certain size (something around 7GB) and then would start over. The logs kept showing the same messages, the init script failed with an error but the process kept running either until I executed service mysql stop or kill -9 to all processes. It was getting stranger every minute.

At this point I was totally lost, scratching my head looking for solutions. More strange still was that trying a manual SST using netcat worked perfectly! So it was definitely a problem with the init script. Systemd journal was not providing any insight…

And then…

MariaDB Cluster dies in the SST process after 90 seconds

Suddenly I noticed that the failure was happening roughly 90 seconds after the start. A short googling later—doing more specific search—I found this page:
https://mariadb.com/kb/en/library/systemd/#ssts-and-systemd which explained precisely my problem.

The MariaDB init script has changed its timeout from 900 seconds to 90 while MySQL Community and Percona Server has this value set to 15 mins. Also it seems that this change has caused some major issues with nodes crashing as documented in MDEV-15607 — the bug is reported as fixed but we still can see timeout problems.

I observed that in case of failure, systemd was killing the mysqld process but not stopping the service. This results in an infinite SST loop that only stops when the service is killed or stopped via systemd command.

The fix was super easy, I just needed to create a file to be used by systemd that sets the timeout to a more useful value as follows:

sudo tee /etc/systemd/system/mariadb.service.d/timeoutstartsec.conf <<EOF
[Service]
TimeoutStartSec=900
EOF
sudo systemctl daemon-reload

As you may notice I set the timeout to 15 minutes but I could set it to any time. That was it, the next SST will have plenty of time to finish. Reference to this change is very well documented here

On reflection…

One could argue about this change, and I’m still having some internal discussions about it. In my opinion, a 90 seconds timeout is too short for a Galera cluster. It is very likely that almost any cluster will reach that timeout during SST. Even a regular MySQL server that suffers a crash with a high proportion of dirty pages or many operations to rollback, 90 seconds doesn’t seem to be an feasible time for crash recovery. Why the developers changed it to such a short timeout I have no idea. Luckily, it is very easy to fix now I know the reason.


Photo by Tim Gouw on Unsplash

by Francisco Bordenave at February 12, 2019 01:25 PM

February 11, 2019

MariaDB Foundation

MariaDB 10.2.22 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.2.22, the latest stable release in the MariaDB 10.2 series. See the release notes and changelogs for details. Download MariaDB 10.2.22 Release Notes Changelog What is MariaDB 10.2? MariaDB APT and YUM Repository Configuration Generator Contributors to MariaDB 10.2.22 Alexander Barkov (MariaDB Corporation) Alexander […]

The post MariaDB 10.2.22 now available appeared first on MariaDB.org.

by Ian Gilfillan at February 11, 2019 04:52 PM

Peter Zaitsev

Compression Options in MySQL (Part 2)

Swiss cheese File system

In one of my previous posts, I started a series on data compression options with MySQL. The first post focused on the more traditional compression options like InnoDB Barracuda page compression and MyISAM packing. With this second part, I’ll discuss a newer compression option, InnoDB transparent page compression with punch holes available since 5.7. First, I’ll describe the transparent page compression method and how it works. Then I’ll present similar results as in the first post.

InnoDB transparent page compression

Before we can discuss transparent page compression, we must understand how InnoDB accesses its data pages. To access an InnoDB page, you need to know the tablespace (the file) and the offset of the page within the tablespace file. The offset is the tough part with data compression. If you just compress pages and concatenate them one after the other, the offsets will no longer be at known intervals. InnoDB Barracuda page compression solves the problem by asking the DBA to guess the compression ratio of the pages with the compressed block size setting. For example, you have to tell InnoDB to use a disk block size of 8KB if you think the compression ratio will be around 2. Transparent page compression uses another approach, sparse files.

Sparse files 101

A sparse file is a file with holes in it. Even though a sparse file may be very large, if there are a lot of holes in it, it may end up using a small amount of storage. On almost every Linux system, the /var/log/lastlog file is sparse:

yves@ThinkPad-P51:/var/log$ ls -lah lastlog
-rw-rw-r-- 1 root utmp 18M jan 5 16:09 lastlog
yves@ThinkPad-P51:/var/log$ du -hs lastlog
56K lastlog

While the ls command reports an apparent size of 18MB, the du command tells us the file actually uses only 56KB. Most of the space in the file is actually unallocated. When you access a sparse file, the filesystem has to map the actual physical offsets in the file with the logical offsets seen by the application. A logical offset is no longer directly the number of bytes since the beginning of the file.

Now that we understand a bit what sparse files are, let’s talk about the punch hole aspect. When you write something to disk, you can use the fallocate call to free up, punch, part of it. The freed/punched portion is thus a hole in the file, and the filesystem can later reuse the hole to store something else. Let’s follow a simplified view of the steps required to write a transparently compressed InnoDB page.

InnoDB using sparse files

Figure 1: InnoDB Transparent page compression

In figure 1, an in memory 16KB InnoDB page with 14KB of data is going to be written to disk. As part of the write process, the data is compressed to 6KB and the page is written to the disk. Once written, InnoDB uses the fallocate call to release the 10KB of unused space. Since only full blocks are release,  only 8KB is really freed. The remaining space unreleased space (2KB) is just zeroed. The freed space will be reused, either for the same file or by another one. For simplicity, let’s assume the space is reused by the same InnoDB file.

Figure 2: File system layout

If there is no immediate reuse, a portion of the InnoDB file will look like the top file layout of figure 2. The pages (numbers) are still sequentially laid out but there are holes in between. As the file system gets full, it will start to reuse the freed space so eventually, the file layout will look like the bottom one. If you notice, in the bottom layout, the pages are no longer in sequential order. There are consequences to that: the notion of disk sequential access is gone. The most stunning example is a simple file copy on a spinning device. While copying a 1GB regular file may take only 30 seconds, the copy of a 1GB sparse file can take much longer, up to 30 minutes in the worst cases. The impact on physical backup tools, like Percona Xtrabackup, are thus important. Normally physical backups are much faster than logical ones (ex: mysqldump), but with sparse files, it may no longer be true.

MySQL impacts

There are also consequences of the use of sparse files on the design of a MySQL database server. The added random operations increase the importance of using SSD/Flash based storage. Also some settings must be considered with a different perspective:

  • innodb_flush_neighbors should be 0 since 1 is a cheat geared toward sequential operations
  • innodb_read_ahead_threshold, normally set to 56, this means when 56 pages of an extent have been scanned, the next extent is read sequentially ahead of time. To be really useful, the next extent should be read before the remaining 8 pages of the current extent are read. Since sequential operations are slower, maybe this value should be lowered a little. The drawback is an increased possibility of a read ahead without use.
  • innodb_random_read_ahead is a wilder setting, it would be a good idea to experiment with this for your workload

There are likely to be other affected settings.

Review of the test procedure

Just to refresh memories, I am using two datasets for the basic benchmarks. The first, Wikipedia, consists in about 1B rows of Wikipedia access logs. It is moderately compressible. The second dataset, o1543, is from the defunct Percona cloud tool project. It has only 77M rows but they are much wider with 134 columns. The o1543 dataset is highly compressible.

On these two datasets, the following steps were executed:

  1. insert the rows: record time, final size and amount of data written
  2. large range select, record the time
  3. 20k updates, record the time to and total bytes written

Results

Final sizes

Figure 3, Innodb transparent page compression final sizes

One of the most critical metrics with compression is the final dataset size, as shown in figure 3. The possibility to use larger InnoDB pages is a big thing with transparent page compression. Larger pages allow for more repetitive patterns to be present within a page, and that improves the compression ratio. Results using page sizes of 16KB, 32KB and 64KB are shown. The uncompressed results are used as references, transparent compression (TC) using Lz4 and Zlib are the actual compressed datasets. First, we see that larger page sizes barely affect the size of the uncompressed dataset (I16, I32 and I64). Since the datasets were inserted in primary key order, the only possible impact is the filling factor of the pages. When InnoDB fills a page in PK order, even when the innodb_fill_factor is set to 100, it always leaves 1KB free per 16KB. With an amount of free space that scales with the page size, the final size doesn’t change much.

The impacts of larger page sizes on the compression ratio are important. The most drastic example is with the o1543 dataset and Zlib compression. While with a 16KB page, the compression ratio was already decent, at 3.65, it grows to an amazing 8.7 (I16/I64TCZlib) with pages of 64KB. Larger page sizes have also a positive impact on the compression ratio of the Wikipedia dataset. The original compression ratio with Zlib and 16KB pages is 2.4 and it grows to 3.4 with 64KB pages. Datasets compressed with Lz4 behave similarly to the Zlib ones but the compression ratio are slightly lower.

Overall, the I64TCZlib results for the Wikipedia dataset is the most compressed form we have so far. For the o1543 dataset, the MyISAMPacked compressed size is still slightly smaller but is read-only.

Insertion time

Figure 4, InnoDB transparent page compression insert times

We normally expect compression to add an overhead but here, the insertion speed improves with larger page sizes (figure 4). The reason is likely to be because we are using spinning disks. Spinning disks have a high latency so doing larger IO operations helps. The time overhead of compression with transparent page compression hovers between 10 and 17%. That’s much less than 60% overhead we observed for the Barracuda table compression in the previous post for the Wikipedia dataset (InnoDBCmp8k/InnoDB). We can conclude the insert rates, when inserts are in PK order, are not much affected by transparent page compression. If you are mostly inserting data, it is a nice win.

Data written by inserts

Figure 5, total amount of data written during the inserts

The amount of data written is not much affected by the transparent compression and the larger page sizes (figure 5) . That’s reasonable as many of the writes are not compressed, only the final write to the tablespace is. Neither the writes to the double write buffer, or to the InnoDB log files, or for the tablespace pre-allocation, are compressed. The differences we see are essentially the same as the ones for the final sizes. Only the uncompressed results do not fit that view but these are rather small deviations.

Range selects

Figure 6, time to complete a long range scan

The range select benchmarks are really a means of testing the decompression overhead. As you notice in figure 6, the time variations are not large. For the Wikipedia dataset, the faster range select is I64TCLz4, and it completed in 788 seconds. That’s almost two minutes slower than the faster results using InnoDB Barracuda compression (block_size=4KB). How can we explain such results? If the freed space is reused, transparent compression causes sequential operations to become random ones. The time should increase.  Without space reuse, the storage layer will merge many small reads into a sequential one, and then discard the holes.  Effectively, the disk will read the same amount of data, compressed or not. The only difference will come from decompression.  Lz4 is extremely fast while Zlib is slower.

Going back to the Wikipedia dataset, it took the exact same time, 830s, for I16, I16TCLz4  and I32TCLz4. That seems to indicate there was no space reuse.  With the xfs xfs_bmap tool on a TC compressed file, I listed the blocks used. Here is the command I used and the first lines of the output (with blocks of 512 bytes):

root@LabPS57kvm_1:/tmp# xfs_bmap /var/lib/mysql/test/query_class_metrics.ibd | more
/var/lib/mysql/test/query_class_metrics.ibd:
0: [0..31]: 1013168..1013199
1: [32..39]: 1014544..1014551
2: [40..63]: hole
3: [64..71]: 1016976..1016983
4: [72..95]: hole
5: [96..103]: 1017008..1017015
6: [104..127]: hole
7: [128..135]: 1016880..1016887
8: [136..159]: hole
9: [160..167]: 1016912..1016919
10: [168..191]: hole
...

We have the list:

  • 0..31: 16 KB tablespace header, apparently not compressed
  • 32..39: 4KB TC compressed page, 8 sectors of compressed data
  • 40..63: 12KB hole (24 sectors)
  • …and so on

So the layout actually looks indeed like the filesystem with no reuse case (top layout) of figure 2. When InnoDB extends the tablespace, it of course proceeds by entire pages. The filesystem will try, as much as possible, to allocate continuous blocks. Initially, the tablespace increases one page at a time but rapidly it grows by extent of 64 pages. The space reuse will start only when there are no more continuous areas large enough to satisfy the allocation requests. Until then, the filesystem still performs mostly  sequential operations. The performance characteristics will thus change once the freed blocks start to be reused. On a smaller server, I continued to insert data well after the filesystem would have been full without the holes. The insertion rate fell by about half but the read performance appeared unchanged.

The times of the range selects for the o1543 dataset are more predictable. In all cases, larger pages increase performance. That kind of makes sense – InnoDB needs less IOPS. With Lz4, InnoDB spends less time to decompress the pages than it would need to read the complete uncompressed pages. The opposite is also true for Zlib. The Lz4 results are the fastest, Zlib the slowest, and in between we have the uncompressed results.

20k updates time

Figure 7, time needed to perform 20k updates

Intuitively, I was expecting the larger pages to slow down the updates. Similarly, I was also expecting Lz4 compressed pages to be slower than uncompressed pages, but faster than the ones compressed with Zlib. The above figure shows the times to perform approximately 20k single row updates for both datasets. We performed the updates to the Wikipedia dataset in small separate transactions, while we used a single large update statement for the o1543 dataset.

While the compression algorithm assumption appears to hold true, the one for the page sizes is plainly wrong. Of course, the storage consists of spinning disks so the latency of random IO dominates. The important factor becomes the number of levels in the b-tree of the table. In the root node of the b-tree and all intermediate nodes, bigger pages mean more pointers to the next level. More pointers causes a bigger fan-out  –ratio of nodes between levels – and fewer levels. Bigger pages also cause fewer leaf level pages which in turn require less upper level node pages.

Let’s dive a bit more into this topic. The Wikipedia dataset table has an int unsigned primary key. Considering InnoDB always leaves 1KB free in a page and, along with the primary key, each entry in a node (non-leaf) has an extra 9 bytes for the pointer to the next level page. Let’s do some math:

  • Total number of pages with 16KB pages = 112.6GB / (15KB) = 7871311 pages
  • Max number of rows in the non-leaf pages for 16KB pages and an int PK = (16 * 1024)/(4 (int PK) + 9 (ptr)) = 1260 rows/pages
  • Minimum number of pages in the first level above the leaf = 7871311 / 1260 = 6247 pages
  • Minimum number of pages at the next level = 6247 / 1260 = 5 pages
  • Root page = 1

Of course, our calculations are an approximation. With a 16KB page size, there are three levels above the leaves for a total of 6253 pages and a size of 98MB. It thus requires 6253 IOPS to warm up the buffer pool with the all nodes. A SATA 7200 rpm disk delivers at best 120 IOPS (one per rotation) so that’s about 51 second. Now, let’s redo the same calculations but with a page size of 32KB:

  • Total number of pages with 32KB pages = 110.7GB / (31KB) = 3744431 pages
  • Max number of rows in the non-leaf pages for 32KB pages and an int PK = (32 * 1024)/(4 (int PK) + 9 (ptr)) = 2520 rows/pages
  • Minimum number of pages in the first level above the leaf = 3744431 / 2520 = 1486 pages
  • Root page = 1

Using 32KB pages, we have one level less and only 1487 node pages for a combined size of 47MB. To warm up, the buffer pool we have to load at least the node pages, an operation requiring only a quarter of the IOPS compared to when 16KB pages were used. That’s where most of the performance gains come from. The reduced number of IOPS more than compensates for the longer time to read a large page.  Again, in this setup, we used spinning disks.

Bytes written per update

Figure 8, average bytes written per update

Now, the last set of results concerns the number of bytes written per update statement (figure 8). There is a big price to pay when you want to use larger InnoDB pages, the write amplification is huge. The number of bytes approximately scales roughly with the page size. The worse case is the I64 result, about 192KB written for a single row update of an integer field (Wikipedia). If your database workload includes a large number of small single row updates, you should avoid expensive flash devices with 64KB InnoDB pages as you’ll burn your devices rapidly.

Operational considerations for larger InnoDB pages and TC

When is it good idea to use transparent compression? When should you use a larger InnoDB page size? One valid use case is a database storing large quantities of operational metrics, like the o1543 dataset.  The compression ratio will be fantastic and the performance penalty limited, at least until the filesystem starts reusing the holes.

If you collect data from a large number of devices and you are likely struggling with TBs of highly compressible data, transparent compression might be an interesting option. The only issue I see, but it is a major one, is how to backup large sparse files. InnoDB transparent page compression with punch holes is an interesting solution but, unless I am missing something, it has a somewhat limited scope. There are other compression options with similar compression ratios and less drawbacks.

In this post we explored a feature available since MySQL 5.7, InnoDB transparent compression with punch holes. Performance-wise, we have an interesting solution which offers excellent compression ratio, especially when larger page sizes are used. The transparent compression with punch holes technique suffers from its foundations, sparse files. Backing up very large sparse files is a slow and IO intensive process. Instead of performing large sequential IO operations, the backup process will require millions of small random IO operations.

So far we have discussed the traditional approaches to compression in MySQL (previous post) and Innodb transparent page compression. The next post of the series on data compression with MySQL will introduce the ZFS filesystem. ZFS externalizes the compression to the filesystem in a way that is pretty similar to InnoDB transparent page compression, but the ZFS b-tree file structure removes the inconvenience of sparse files.

Stay tuned, more results are coming.

by Yves Trudeau at February 11, 2019 04:39 PM

Valeriy Kravchuk

On my Favorite FOSDEM 2019 MySQL, MariaDB and Friends Devroom Talks

This year I had not only spoken about MySQL bugs reporting at FOSDEM, but spent almost the entire day listening at MySQL, MariaDB and Friends Devroom. I missed only one talk, on ProxySQL, (to get some water, drink a bottle of famous Belgian beer and chat with my former colleague in MySQL support team, Geert, whom I had not seen for a decade). So, for the first time out of my 4 FOSDEM visits I've got a first hand impression about the entire set of talks in the devroom that I want to share today, while I still remember my feelings.

Most of the talks have both slides and videos already uploaded on site, so you can check them and make your own conclusions, but my top 5 favorite talks (that have both videos and slides already available to community) were the following:

  • "Un-split brain (aka Move Back in Time) MySQL", by Shlomi Noach. You can find slides at SlideShare.

    This was a replacement talk that was really interesting and had proper style for FOSDEM. It was mostly a nice background story of creation of the gh-mysql-rewind tool, a shell script that uses MariaDB's mysqlbinlog --flashback option and MySQL GTIDs and allows to "rewind" row-based binary log to roll back transactions to some previous point in time. The tool should become available to community soon, maybe as a part of orchestrator. I was impressed how one can successfully use 49 slides for 20 minutes talk. That's far beyond my current presentation skills...
  • "Test complex database systems in a laptop with dbdeployer", by Giuseppe Maxia. You can find slides at SlideShare.

    I've already built and used dbdeployer, as described in my blog post, so I was really interested in the talk. Giuseppe was able not only to show 45 slides over 20 minutes and explain all the reasons behind re-implementing MySQL-Sandbox in Go, but also run a live demo where dozens of sandbox instances were created and used. Very impressive!
  • "MySQL and the CAP theorem: relevance & misconceptions", second great talk and show by Shlomi Noach. You can find slides at SlideShare.

    The "CAP theorem" says is a concept that a distributed database system (like any kind of MySQL replication setup) can only have 2 of the 3 features: (atomic) Consistency, (high) Availability and Partition Tolerance. This can be proved mathematically, but Shlomi had not only defined terms and conditions to present the formal proof, but also explained that they are far from real production objectives of any engineer or DBA (like 99.95% of Availability). He had shown typical MySQL setups (from simple async master-slave replication to Galera, group replication and even Vitess) and proved that formally they all are neither consistent nor available from that formal CAP theorem point of view, while, as we all know, they are practically useful and work (and with some efforts, proxies on top etc can be made both highly available and highly consistent for practical purposes). So, CAP theorem is neither representing real production systems, nor meeting their real requirements. We've also got some kind of explanation of why async master-master or circular replication are still popular... All that in 48 slides, with links, and presented in 20 minutes! Greatest short MySQL-related talk I've ever attended.
  • "TiDB: Distributed, horizontally scalable, MySQL compatible", by Morgan Tocker. You can find slides at SlideShare.

    It was probably the first time when I listened to Morgan, even though we worked together for a long time. I liked his way of explaining the architecture of this yet another database system speaking MySQL protocol and reasons to create it. If you are interested in performance of this system, check this blog post.
  • "MySQL 8.0 Document Store: How to Mix NoSQL & SQL in MySQL 8.0", by Frédéric Descamps. You can find slides (70!) at SlideShare.

    LeFred managed to get me somewhat interested in MySQL Shell and new JSON functions in MySQL, way more than ever before. It's even more surprising that hist talk was the last one and we already spent 8+ hours listening before he started. Simple step by step explanation of how one may get the best of both SQL, ACID and NoSQL (JSON, "MongoDB") worlds, if needed, in a single database management syste, was impressive. Also this talk probably caused the longest discussion and the largest number of questions from those remaining attendees.

    He was also one of two "hosts" and "managers" of the devroom, so I am really thankful him for hist efforts year after year to make MySQL devroom at FOSDEM great!
There were more good talks, but I had to pick up few that already have slides shared and those of a kind that I personally prefer to listen to at FOSDEM. This year I also missed few people whom I like to see and talk to at FOSDEM, namely Mark Callaghan and Jean-François Gagné.

The only photo I made with my Nokia dumb phone this year in Brussels, on my way to FOSDEM on February 2. We've got snow and rain that morning, nice for anyone who had to walk 5 kilometers to the ULB campus.
Overall, based on my experience this year, it still makes a lot of sense to visit FOSDEM for anyone interested in MySQL. You can hardly find so many good, different MySQL-related talks per just one single day on any other conference.

by Valeriy Kravchuk (noreply@blogger.com) at February 11, 2019 04:26 PM

February 08, 2019

MariaDB Foundation

The Story of our Sea Lion

Why a sea lion? That’s a question we get every now and then, most recently at FOSDEM. Here is the story: Our Founder Monty likes animals in the sea. For MySQL, he picked a dolphin, after swimming with them in the Florida Keys. For the MariaDB sea lion, there was a similar encounter. It happened […]

The post The Story of our Sea Lion appeared first on MariaDB.org.

by Kaj Arnö at February 08, 2019 05:18 PM

Peter Zaitsev

ProxySQL 1.4.14 and Updated proxysql-admin Tool

ProxySQL 1.4.14

ProxySQL 1.4.14

ProxySQL 1.4.14, released by ProxySQL, is now available for download in the Percona Repository along with an updated version of Percona’s proxysql-admin tool.

ProxySQL is a high-performance proxy, currently for MySQL,  and database servers in the MySQL ecosystem (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. René Cannaò created ProxySQL for DBAs as a means of solving complex replication topology issues.

The ProxySQL 1.4.14 source and binary packages available from the Percona download page for ProxySQL include ProxySQL Admin – a tool developed by Percona to configure Percona XtraDB Cluster nodes into ProxySQL. Docker images for release 1.4.14 are available as well. You can download the original ProxySQL from GitHub. GitHub hosts the documentation in the wiki format.

This release introduces an improvement on how proxysql-admin works with the --max-connections option. In previous releases, this option always equaled to 1000. Now, proxysql_galera_checker uses the value of the --max-connections option set by the user either in the command line (proxysql-admin --max-connections) or in the configuration file.

If the user doesn’t set this option, it defaults to 1000.

Improvements

  • PSQLADM-130: Every time a node is removed and then added back, proxysql_galera_checker script restores the custom value of the --max-connections option set using proxysql-admin --max-connections.
  • The --syncusers option of proxysql-admin starts to support MariaDB. Thanks to Jonas Kint (@jonaskint) for this contribution.

ProxySQL is available under Open Source license GPLv3.

by Borys Belinsky at February 08, 2019 04:56 PM

February 07, 2019

Peter Zaitsev

Column Families in MyRocks

myrocks column families

myrocks column familiesIn my webinar How To Rock with MyRocks I briefly mentioned the column families feature in MyRocks, that allows a fine tuning for indexes and primary keys.

Let’s review it in more detail.

To recap, MyRocks is based on the RocksDB library, which stores all data in [key => value] pairs, so when it translates to MySQL, all Primary Keys (data) and secondary keys (indexes) are stored in [ key => value ] pairs, which by default are assigned to “default” Column Family.

Each column family has individual set of

  • SST files, and their parameters
  • Memtable and its parameters
  • Bloom filters, and their parameters
  • Compression settings

There is a N:1 relation between tables and indexes to column family, so schematically it looks like this:

column families myrocks

How do you assign tables and indexes to a column family?

It is defined in the COMMENT section for a key or primary key:

CREATE TABLE tab1(
a INT,
b INT,
PRIMARY KEY (a) COMMENT ‘cfname=cf1’,
KEY key_b (b) COMMENT ‘cfname=cf2’)
)

Now, if you want to define individual parameters for column families, you should use

rocksdb_override_cf_options 

For example:

rocksdb_override_cf_options=’cf1={compression=kNoCompression}; cf2={compression=kLZ4Compression,bottommost_compression==kZSTD}’

Be careful of defining too many column families: as I mentioned, each column family will use an individual memtable, which takes 64MB of memory by default.

There is also an individual set of SST tables. You can see how they perform with

SHOW ENGINE ROCKSDB STATUS
 :

  Type: CF_COMPACTION
  Name: cf1
Status: 
** Compaction Stats [cf1] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0     0.00 KB   0.0      0.0     0.0      0.0       1.5      1.5       0.0   1.0      0.0     99.1        15       37    0.410       0      0
  L5      3/0   197.05 MB   0.8      0.4     0.4      0.0       0.4      0.4       0.0   1.0     75.6     75.6         6        1    5.923   8862K      0
  L6      7/0   341.24 MB   0.0      1.7     1.3      0.5       0.8      0.3       0.0   0.6     42.8     19.5        42        7    5.933     61M      0
 Sum     10/0   538.29 MB   0.0      2.2     1.7      0.5       2.7      2.2       0.0   1.8     35.5     44.1        63        45    1.392     70M     0
 Int      0/0     0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0         0        0    0.000       0      0
  Type: CF_COMPACTION
  Name: cf2
Status: 
** Compaction Stats [cf2] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0     0.00 KB   0.0      0.0     0.0      0.0       0.3      0.3       0.0   1.0      0.0     13.5        22       22    1.023       0      0
  L6      4/0   178.61 MB   0.0      0.6     0.3      0.3       0.4      0.2       0.0   1.5      9.3      7.3        61        5   12.243     72M      0
 Sum      4/0   178.61 MB   0.0      0.6     0.3      0.3       0.7      0.5       0.0   2.5      6.8      9.0        84       27    3.100     72M      0
 Int      0/0     0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0         0        0    0.000       0      0

To check the current column families and their settings you can use:

SELECT * FROM INFORMATION_SCHEMA.ROCKSDB_CF_OPTIONS:
| cf1        | COMPARATOR                                                      | RocksDB_SE_v3.10      
|
| cf1        | MERGE_OPERATOR                                                  | NULL                                                                                                      |
| cf1        | COMPACTION_FILTER                                               | NULL                                                                                                                                                                     
| cf1        | COMPACTION_FILTER_FACTORY                                       | Rdb_compact_filter_factory                                                                                                                                               
| cf1        | WRITE_BUFFER_SIZE                                               | 67108864                                                                                                       
| cf1        | MAX_WRITE_BUFFER_NUMBER                                         | 2                                                                                                                                                                        
| cf1        | MIN_WRITE_BUFFER_NUMBER_TO_MERGE                                | 1                                                                                                                                                                        
| cf1        | NUM_LEVELS                                                      | 7                                                                                                                                                                        
| cf1        | LEVEL0_FILE_NUM_COMPACTION_TRIGGER                              | 4                                                                                                                                                                        
| cf1        | LEVEL0_SLOWDOWN_WRITES_TRIGGER                                  | 20                                                                                                                                                                       
| cf1        | LEVEL0_STOP_WRITES_TRIGGER                                      | 36                                                                                                                                                                       
| cf1        | MAX_MEM_COMPACTION_LEVEL                                        | 0                                                                                                                                                                        
| cf1        | TARGET_FILE_SIZE_BASE                                           | 67108864                                                                                                                                                                 
| cf1        | TARGET_FILE_SIZE_MULTIPLIER                                     | 1                                                                                                                                                                        
| cf1        | MAX_BYTES_FOR_LEVEL_BASE                                        | 268435456                                                                                                                                                                
| cf1        | LEVEL_COMPACTION_DYNAMIC_LEVEL_BYTES                            | ON                                                                                                                                                                       
| cf1        | MAX_BYTES_FOR_LEVEL_MULTIPLIER                                  | 10.000000                                                                                                                                                                
| cf1        | SOFT_RATE_LIMIT                                                 | 0.000000                                                                                                                                                                 
| cf1        | HARD_RATE_LIMIT                                                 | 0.000000                                                                                                                                                                 
| cf1        | RATE_LIMIT_DELAY_MAX_MILLISECONDS                               | 100                                                                                                                                                                      
| cf1        | ARENA_BLOCK_SIZE                                                | 0                                                                                                                                                                        
| cf1        | DISABLE_AUTO_COMPACTIONS                                        | OFF                                                                                                                                                                      
| cf1        | PURGE_REDUNDANT_KVS_WHILE_FLUSH                                 | ON                                                                                                                                                                       
| cf1        | MAX_SEQUENTIAL_SKIP_IN_ITERATIONS                               | 8                                                                                                                                                                        
| cf1        | MEMTABLE_FACTORY                                                | SkipListFactory                                                                                                                                                          
| cf1        | INPLACE_UPDATE_SUPPORT                                          | OFF                                                                                                                                                                      
| cf1        | INPLACE_UPDATE_NUM_LOCKS                                        | ON                                                                                                                                                                       
| cf1        | MEMTABLE_PREFIX_BLOOM_BITS_RATIO                                | 0.000000                                                                                                                                                                 
| cf1        | MEMTABLE_PREFIX_BLOOM_HUGE_PAGE_TLB_SIZE                        | 0                                                                                                                                                                        
| cf1        | BLOOM_LOCALITY                                                  | 0                                                                                                                                                                        
| cf1        | MAX_SUCCESSIVE_MERGES                                           | 0                                                                                                                                                                        
| cf1        | OPTIMIZE_FILTERS_FOR_HITS                                       | ON                                                                                                                                                                       
| cf1        | MAX_BYTES_FOR_LEVEL_MULTIPLIER_ADDITIONAL                       | 1:1:1:1:1:1:1                                                                                                                                                            
| cf1        | COMPRESSION_TYPE                                                | kNoCompression                                                                                                                                                           
| cf1        | COMPRESSION_PER_LEVEL                                           | NUL                                                                                                                                                                      
| cf1        | COMPRESSION_OPTS                                                | -14:-1:0                                                                                                                                                                 
| cf1        | BOTTOMMOST_COMPRESSION                                          | kLZ4Compression                                                                                                                                                          
| cf1        | PREFIX_EXTRACTOR                                                | NULL                                                                                                                                                                     
| cf1        | COMPACTION_STYLE                                                | kCompactionStyleLevel                                                                                                                                                    
| cf1        | COMPACTION_OPTIONS_UNIVERSAL                                    | {SIZE_RATIO=1; MIN_MERGE_WIDTH=2; MAX_MERGE_WIDTH=4294967295; MAX_SIZE_AMPLIFICATION_PERCENT=200; COMPRESSION_SIZE_PERCENT=-1; STOP_STYLE=kCompactionStopStyleTotalSize} |
| cf1        | COMPACTION_OPTION_FIFO::MAX_TABLE_FILES_SIZE                    | 1073741824                                                                                                                                                               
| cf1        | TABLE_FACTORY::FLUSH_BLOCK_POLICY_FACTORY                       | FlushBlockBySizePolicyFactory(0x4715df0)                                                                                                                                 
| cf1        | TABLE_FACTORY::CACHE_INDEX_AND_FILTER_BLOCKS                    | 1                                                                                                                                                                        
| cf1        | TABLE_FACTORY::CACHE_INDEX_AND_FILTER_BLOCKS_WITH_HIGH_PRIORITY | 0                                                                                                                                                                        
| cf1        | TABLE_FACTORY::PIN_L0_FILTER_AND_INDEX_BLOCKS_IN_CACHE          | 1                                                                                                                                                                        
| cf1        | TABLE_FACTORY::INDEX_TYPE                                       | 0                                                                                                                                                                        
| cf1        | TABLE_FACTORY::HASH_INDEX_ALLOW_COLLISION                       | 1                                                                                                                                                                        
| cf1        | TABLE_FACTORY::CHECKSUM                                         | 1                                                                                                                                                                        
| cf1        | TABLE_FACTORY::NO_BLOCK_CACHE                                   | 0                                                                                                                                                                        
| cf1        | TABLE_FACTORY::BLOCK_CACHE                                      | 0x470c880                                                                                                                                                                
| cf1        | TABLE_FACTORY::BLOCK_CACHE_NAME                                 | LRUCache                                                                                                                                                                 
| cf1        | TABLE_FACTORY::BLOCK_CACHE_OPTIONS                              |                                                                                                                                                                          
| cf1        | TABLE_FACTORY::CAPACITY                                         | 536870912                                                                                                                                                                
| cf1        | TABLE_FACTORY::NUM_SHARD_BITS                                   | 6                                                                                                                                                                        
| cf1        | TABLE_FACTORY::STRICT_CAPACITY_LIMIT                            | 0                                                                                                                                                                        
| cf1        | TABLE_FACTORY::HIGH_PRI_POOL_RATIO                              | 0.000                                                                                                                                                                    
| cf1        | TABLE_FACTORY::BLOCK_CACHE_COMPRESSED                           | (nil)                                                                                                                                                                    
| cf1        | TABLE_FACTORY::PERSISTENT_CACHE                                 | (nil)                                                                                                                                                                    
| cf1        | TABLE_FACTORY::BLOCK_SIZE                                       | 16384                                                                                                                                                                    
| cf1        | TABLE_FACTORY::BLOCK_SIZE_DEVIATION                             | 10                                                                                                                                                                       
| cf1        | TABLE_FACTORY::BLOCK_RESTART_INTERVAL                           | 16                                                                                                                                                                       
| cf1        | TABLE_FACTORY::INDEX_BLOCK_RESTART_INTERVAL                     | 1                                                                                                                                                                        
| cf1        | TABLE_FACTORY::METADATA_BLOCK_SIZE                              | 4096                                                                                                                                                                     
| cf1        | TABLE_FACTORY::PARTITION_FILTERS                                | 0                                                                                                                                                                        
| cf1        | TABLE_FACTORY::USE_DELTA_ENCODING                               | 1                                                                                                                                                                        
| cf1        | TABLE_FACTORY::FILTER_POLICY                                    | rocksdb.BuiltinBloomFilter                                                                                                                                               
| cf1        | TABLE_FACTORY::WHOLE_KEY_FILTERING                              | 1                                                                                                                                                                        
| cf1        | TABLE_FACTORY::VERIFY_COMPRESSION                               | 0                                                                                                                                                                        
| cf1        | TABLE_FACTORY::READ_AMP_BYTES_PER_BIT                           | 0                                                                                                                                                                        
| cf1        | TABLE_FACTORY::FORMAT_VERSION                                   | 2                                                                                                                                                                        
| cf1        | TABLE_FACTORY::ENABLE_INDEX_COMPRESSION                         | 1                                                                                                                                                                        
| cf1        | TABLE_FACTORY::BLOCK_ALIGN                                      | 0                   
    

As a reminder MyRocks is available in Percona Server 5.7 and Percona Server 8.0, you can try it and share your experience!


Photo by Debby Hudson on Unsplash

by Vadim Tkachenko at February 07, 2019 03:12 PM

February 06, 2019

MariaDB Foundation

MariaDB 10.1.38 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.1.38, the latest stable release in the MariaDB 10.1 series. See the release notes and changelogs for details. Download MariaDB 10.1.38 Release Notes Changelog What is MariaDB 10.1? MariaDB APT and YUM Repository Configuration Generator Contributors to MariaDB 10.1.38 Alexander Barkov (MariaDB Corporation) Alexander […]

The post MariaDB 10.1.38 now available appeared first on MariaDB.org.

by Ian Gilfillan at February 06, 2019 09:25 PM

Peter Zaitsev

Percona Responds to MySQL LOCAL INFILE Security Issues

LOCAL INFILE Security

LOCAL INFILE SecurityIn this post, we’ll cover Percona’s thoughts about the current MySQL community discussion happening around MySQL LOCAL INFILE security issues.

Some of the detail within this blog post is marked <REDACTED>. I hope to address this shortly (by the end of Feb 2019) and provide complete detail and exploit proof-of-concept code. However, this post is released given the already public discussion of this particular issue, with the exploitation code currently redacted to ensure forks of MySQL client libraries have sufficient time to implement their response strategies.

Check back at the end of the month to see updates to this post!

Background

MySQL’s

LOCAL INFILE
  feature is fully documented by Oracle MySQL, and there is a legitimate use for the
LOCAL INFILE
 feature to upload data to a MySQL server in a single statement from a file on the client system.

However, some MySQL clients can be coerced into sending contents local to the machine they are running upon, without having issued a

LOCAL INFILE
 directive. This appears to be linked to how Adminer php web interface was attacked to point to a MALICIOUSLY crafted MySQL service to extract file data from the host on which Adminer was deployed. This malicious “server” has, it would appear, existed since early 2013.

The attack requires the use of a malicious/crafted MySQL “server”, to send a request for the file in place of the expected response to the SQL query in the normal query response flow.

IF however the client checks for the expected response, there is no file ex-filtration without further additional effort. This was noted with Java & ProxySQL testing, as a specific response was expected, and not sending the expected response would cause the client to retry.

I use the term “server” loosely here ,as often this is simply a service emulating the MySQL v10 protocol, and does not actually provide complete MySQL interaction capability—though this is theoretically possible, given enough effort or the adaption of a proxy to carry out this attack whilst backing onto a real MySQL server for the interaction capability.

For example, the “server” always responds OK to any auth attempt, regardless of credentials used, and doesn’t interpret any SQL sent. Consequently, you can send any string as a query, and the “server” responds with the request for a file on the client, which the client dutifully provides if local_infile is enabled.

There is potential, no doubt, for a far more sophisticated “server”. However, in my testing I did not go to this length, and instead produced the bare minimum required to test this theory—which proved to be true where local_infile was enabled.

The attack flow is as follows:

  1. The client connects to MySQL server, performs MySQL protocol handshaking to agree on capabilities.
  2. Authentication handshake (“server” often accepts any credentials passed to it).
  3. The client issues a query, e.g. SET NAMES, or other SQL (“server ignores this and immediately responds with file request response in 4.”).
  4. The server responds with a packet that is normally reserved when it is issued a “LOAD LOCAL DATA IN FILE…” SQL statement (0xFB…)
  5. IF Vulnerable the client responds with the full content of the file path if present on the local file system and if permissions allow this file to be read.
    1. Client’s handling here varies, the client may drop the connection with malformed packet error, or continue.

Exploitation testing

The following MySQL  clients were tested via their respective docker containers; and default configurations, the bash script which orchestrated this is as follows: <REDACTED>

This tests the various forks of the MySQL client; along with some manual testing the results were:

  • Percona Server for MySQL 5.7.24-26 (Not vulnerable)
    • PS 5.7.x aborts after server greeting
  • Percona Server for MySQL 5.6.42-64.2  (Not vulnerable)
    • PS 5.6 accepts the server greeting, proceeds to log in, aborts without handling malicious payload.
  • MariaDB 5.5
    • Susceptible to LOCAL INFILE abuse in testing
      • MariaDB has stated they will release a fix that tracks in the client to ensure the SQL for LOAD LOCAL INFILE was requested and otherwise drops the server request without handling.
  • MariaDB 10.0
    • Susceptible to LOCAL INFILE abuse in testing
      • MariaDB has stated they will release a fix that tracks in the client to ensure the SQL for LOAD LOCAL INFILE was requested and otherwise drops the server request without handling.
  • MariaDB 10.1.37
    • susceptible to LOCAL INFILE abuse in testing
      • MariaDB has stated they will release a fix that tracks in the client to ensure the SQL for LOAD LOCAL INFILE was requested and otherwise drops the server request without handling.
  • MariaDB 10.4.1
    • susceptible to LOCAL INFILE abuse in testing
      • MariaDB has stated they will release a fix that tracks in the client to ensure the SQL for LOAD LOCAL INFILE was requested and otherwise drops the server request without handling.
  • MySQL 5.7. (Not vulnerable by default)
    • Not susceptible to LOCAL INFILE abuse by default, enabling local_infile however makes this susceptible
  • MySQL 5.6. (Not vulnerable)
    • Not susceptible to LOCAL INFILE abuse by default, enabling local_infile however makes this susceptible
  • MySQL 8.0.14 (Not vulnerable)
    • Not susceptible to LOCAL INFILE abuse, enabling local_infile however makes this susceptible.
  • PHP 7 mysqli
    • Depends on libmysqlclient in use (As PHP’s mysqli is a C wrapper of the underlying library).
  • Ruby
    • Depends on libmysqlclient in use
    • Note: I couldn’t get this to build on my laptop due to a reported syntax error in mysql.c. However, given this wraps libmysqlclient, I would suggest the result to likely mirror PHP’s test.
  • ProxySQL
    • Underlying library is known susceptible to LOCAL INFILE abuse.
    • ProxySQL issues SQL to the backend MySQL server, and protocol commands such as PING, and expects a specific result in for queries issued by ProxySQL. This leads to difficulty for the malicious server being generic, a targeted client that specifically seeks to target ProxySQL is likely possible however this has not been explored at this time.
  • Java
    • com.mysql.jdbc.Driver
      • As with ProxySQL, testing this drive issues “background” SQL, and expects a specific response. While theoretically possible to have a malicious service target on this drive, this has not been explored at this time.
  • Connector/J

There are many more clients out there ranging from protocol compatible implementations to wrappers of the underlying c library.

Your own research will ensure you are taking appropriate measures should you choose/need to mitigate this risk in your controls.

Can/Should this be fixed?

This is a particularly tricky issue to correct in code, as the MySQL client needs to be aware of a

LOAD LOCAL INFILE
 SQL statement getting sent. MariaDB’s proposed path implements this. Even then, if a stored procedure issues a file request via
LOAD LOCAL INFILE...
, the client has no awareness of this even being needed until the packet is received with the request, and local_infile can be abused. However, the intent is to allow the feature to load data, and as such DBAs/Admins should seek to employ compensating controls to reduce the risk to their organization:

Mitigation

  • DO NOT implement any stored procedures which trigger a
    LOAD INFILE
    .
  • Close/remove/secure access to ANY web admin interfaces.
    • Remember, security through obscurity is no security at all. This only delays time to access, it does not prevent access.
  • Deploy mandatory access controls
    • SELinux, AppArmor, GRSecurity, etc. can all help to ensure your client is not reading anything unexpected, lowering your risk of exposure through proper configuration.
  • Deploy Egress controls on your application nodes to ensure your application server can only reach your MySQL service(s) and does not attempt to connect elsewhere (As the exploit requires a malicious MySQL service).
    • Iptables/firewalld/ufw/pfsense/other firewall/etc.
    • This ensures that your vulnerable clients are not connecting to anything you do not know about.
    • This does not protect against a skilled adversary. Your application needs to communicate out to the internet to server pages. Running a malicious MySQL service on a suitably high random port can aid to “hide” this network traffic.
  • Be aware of Domain Name Service (DNS) rebinding attacks if you are using a Fully Qualified Domain Name (FQDN) to connect between application and database server. Use an IP address or socket in configurations if possible to negate this attack.
  • Deploy MySQL Transport Layer Security (TLS) configuration to ensure the server you expect requires the use of TLS during connection, set your client (if possible) to VERIFY_IDENTITY to ensure TLS “fails closed” if the client fails to negotiate TLS, and to perform basic identity checking of the server being connected to.
    • This will NOT dissuade a determined adversary who has a presence in your network long enough to perform certificate spoofing (in theory), and nothing but time to carry this out.
    • mysslstrip can also lead to issues if your configuration does “fail open” as such it is imperative you have:
      • In my.cnf: ssl_mode=VERIFY_IDENTITY
      • On the cli: –ssl_mode=VERIFY_IDENTITY
      • Be aware: This performs verification of the CA (Certificate Authority) and certificate hostname, this can lead to issues if you are using self-signed certificates and the CA is not trusted.
    • This is ONLY an issue if an adversary has the capability of being able to Man in the middle your Application <-> MySQL servers;
      • If they have this capability; this feature abuse is only a single avenue of data ex-filtration they can perform.
  • Deploy a Network Intrusion Detection System
    • There are many open source software (OSS) options, for example:
    • Set alerts on the logs, curate a response process to handle these alerts.
  • Client option mitigation may be possible; however, this varies from client to client and from underlying library to library.
    • MariaDB client binary.
      • Add to my.cnf: local_infile = 0
      • Or set –local_infile=0 on the command line
    • PHP / Ruby / Anything that relies on libmysqlclient
      • Replace libmysqlclient with a version that does not enable local_infile by default
        • This can be difficult, so ensure you test your process before running anything on production!
      • Switch to use PDO MySQL over MySQLi (PDO implementation implicitly sets, local_infile to 0 at the time of writing in php’s C code).
        • Authors note: mysqli_options($conn, MYSQLI_OPT_LOCAL_INFILE, false); failed to mitigate this in testing, YMMV (Your Mileage May Vary).
        • Attempting to set a custom handler to return nothing also failed to mitigate this. Again, YMMV.

IDS Rule example

Here I provide an example “FAST” format rule for your IDS/IPS system;

Note however YMMV; this works with Snort, Suricata, and _may_ work with Zeek (formerly Bro), OSSEC, etc. However, please test and adapt as needed;

alert tcp any any <> any any (msg: “MySQL LOCAL INFILE request packet detected”; “content:”|00 00 01 FB|”; rawbytes)

Note this is only an example, this doesn’t detect any packets flowing over TLS connections.

If you are running an Intrusion Prevention System (IPS), you should change the rule action from alert to drop.

Here the rule is set to any any as an adversary may wish to not use 3306 in an attempt to avoid detection you can of course change this as desired to suit your needs.

You must also assess if your applications are running local_infile legitimately and conduct your own threat modeling as well as impact analysis, prior to implementing such a rule.

Note increasing the “noise” threshold for your team, will likely only result in your team becoming desensitized to the “noise” and potentially missing an important alert as a result.

For example, you could modify the left and right side any any, to be anything not in your internal network range communicating to anything not in your internal network range:

alert tcp 192.168.1.0/24 any <> !192.168.1.0/24 any  (msg:”MySQL LOCAL INFILE request packet detected”; “content:”|00 00 01 FB|”; rawbytes)

Adapting to your environment is key for this IDS rule to be effective.

Further reading

As noted this issue is already being publicly discussed, as such I add links here to sources relevant to this discussion and exploitation.

Exploitation Network flow

<REDACTED>

Thanks

This assessment was not a single person effort, here I would like to link to and give thanks where appropriate to the following individuals whom have helped with this investigation:

Willem de Groot – For sharing insights into the Adminer exploitation and for graciously responding to an inquiry from myself (this helped me get the PoC working, thank you).

<REDACTED> – original author of <REDACTED> (in 2013!), from which I was able to adapt to function for this investigation.

Ceri Williams – for helping me with proxySQL testing.

Marcelo Altman – for discussing MySQL protocol in depth.

Sergei Golubchik – for responding to my email notice for MariaDB, and implementing a workaround mitigation so quickly, as well providing me with a notice on the Connector/J announcement url.

Peter Zaitsev – for linking me to the original reddit discussion and for feedback.

by David Busby at February 06, 2019 06:05 PM

Percona Server for MongoDB 3.6.10-3.0 Is Now Available

Percona Server for MongoDB

Percona Server for MongoDB

Percona announces the release of Percona Server for MongoDB 3.6.10-3.0 on February 6, 2019. Download the latest version from the Percona website or the Percona Software Repositories. This
release is also available for Ubuntu 18.10 (Cosmic Cuttlefish).

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.6 Community Edition. It supports MongoDB 3.6 protocols and drivers.

Percona Server for MongoDB extends Community Edition functionality by including the Percona Memory Engine storage engine, as well as several enterprise-grade features. It also includes MongoRocks storage engine (which is now deprecated). Percona Server for MongoDB requires no changes to MongoDB applications or code.

In Percona Server for MongoDB 3.6.10-3.0, data at rest encryption becomes GA. The data at rest encryption feature now covers the temporary files used for external sorting and the rollback files. You can decrypt and examine the contents of the rollback files using the new perconadecrypt command line tool.

In this release, Percona Server for MongoDB supports the ngram full-text search engine. Thanks to Sunguck Lee (@SunguckLee) for this contribution. To enable the ngram full-text search engine, create an index passing ngram to the default_language parameter:

mongo > db.collection.createIndex({name:"text"}, {default_language: "ngram"})

New Features

  • PSMDB-276perconadecrypt tool is now available for decrypting the encrypted rollback files.
  • PSMDB-250: The Ngram full-text search engine has been added to Percona Server for MongoDB. Thanks to @SunguckLee on GitHub

Bugs Fixed

  • PSMDB-234: It was possible to use a key file for encryption the owner of which was not the owner of the mongod process.
  • PSMDB-269: In some cases, a hot backup was not using the correct path to the keydb directory designated for data encryption.
  • PSMDB-273: When using data at rest encryption, temporary files for external sorting and rollback files were not encrypted
  • PSMDB-272mongos could crash when running the createBackup command.
  • PSMDB-233: WiredTiger encryption options were silently ignored at server startup, although a storage engine different from WiredTiger was used.
  • PSMDB-257: MongoDB could not be started with a group-readable key file owned by root.
  • PSMDB-266: In some cases, it was possible to add arbitrary collections to the keydb directory which may only store encryption data.

Other bugs fixed: PSMDB-239PSMDB-243

The Percona Server for MongoDB 3.6.10-3.0 release notes are available in the official documentation.

by Borys Belinsky at February 06, 2019 04:48 PM

Upcoming Webinar Thurs 2/7: Top Trends in Modern Data Architecture for 2019

Top Trends in Modern Data Architecture for 2019

Top Trends in Modern Data Architecture for 2019Please join Percona’s PMM Product Manager, Michael Coburn for a webinar on The Top Trends in Modern Data Architecture for 2019 hosted by DBTA on Thursday, February 7th at 11:00 AM PST (UTC-8) / 2:00 PM EST (UTC-5).

Register Now

A strong data architecture strategy is critical to supporting your organization’s data-driven goals. AI and machine learning, data discovery and real-time analytics reflect that notion. Additionally, greater speed, flexibility, and scalability are common wish-list items. Smarter data governance and security capabilities are not that far behind. What’s more, many new technologies and approaches have come to the forefront of data architecture discussions. Data lakes, in-memory databases and engines like Spark and cloud services of all shapes and sizes are just a few examples.

In order to learn more about the top trends in modern data architecture for 2019, register for this webinar today.

by Michael Coburn at February 06, 2019 04:36 PM

MariaDB Foundation

FOSDEM Reflections / MySQL – MariaDB DevRoom

What a great place for informal interactions, strengthening the network, and hearing the latest news from the grapevine! Last weekend 1.-3. Feb 2019, over 8000 developers met in Brussels for FOSDEM 2019. For the overall atmosphere, take a look at this 1:05 long video by Sofia Ek. MariaDB Foundation was present with six staff people […]

The post FOSDEM Reflections / MySQL – MariaDB DevRoom appeared first on MariaDB.org.

by Kaj Arnö at February 06, 2019 04:25 PM

Jean-Jerome Schmidt

MySQL to MongoDB - An Admin Cheat Sheet

Most software applications nowadays involve some dynamic data storage for extensive future reference in the application itself. We all know data is stored in a database which falls into two categories that are: Relational and Non-relational DBMS.

Your choice of selection from these two will fully depend on your data structure, amount of data involved, database performance and scalability.

Relational DBMS store data in tables in terms of rows such that they use Structured Querying Language (SQL) making them a good choice for applications involving several transactions. They include MySQL, SQLite, and PostgreSQL.

On the other hand, NoSQL DBMS such as MongoDB are document-oriented such that data is stored in collections in terms of documents. This gives a greater storage capacity for a large set of data hence a further advantage in scalability.

In this blog we are assuming you have a better knowledge for either MongoDB or MySQL and hence would like to know the correlation between the two in terms of querying and database structure.

Below is a cheat sheet to further familiarize yourself with the querying of MySQL to MongoDB.

MySQL to MongoDB Cheat Sheet - Terms

MySQL Terms MongoDB Terms Explanation
Table Collection This is the storage container for data that tends to be similar in the contained objects.
Row Document Defines the single object entity in the table for MySQL and collection in the case of MongoDB.
Column Field For every stored item, it has properties which are defined by different values and data types. In MongoDB, documents in the same collection, may have different fields from each other. In MySQL, every row must be defined with the same columns from the existing ones.
Primary key Primary key Every stored object is identified with a unique field value in the case of MongoDB we have _id field set automatically whereas in MySQL you can define your own primary key which is incremental as you create new rows.
Table Joins Embedding and linking documents Connection associated with an object in a different collection/table to data in another collection/table.
where $match Selecting data that matches criteria.
group $group Grouping data according to some criteria.
drop $unset Removing a column/field from a row/document/
set $set Setting the value of an existing column/field to a new value.
Severalnines
 
Become a MongoDB DBA - Bringing MongoDB to Production
Learn about what you need to know to deploy, monitor, manage and scale MongoDB

Schema Statements

MySQL Table Statements MongoDB Collection Statements Explanation

The database and tables are created explicitly through the PHP admin panel or defined within a script i.e

Creating a Database

CREATE DATABASE database_name

Creating a table

CREATE TABLE users (
    id MEDIUMINT NOT NULL
        AUTO_INCREMENT,
    UserId Varchar(30),
    Age Number,
    Gender char(1),
    Name VarChar(222),
    PRIMARY KEY (id)
)

The database can be created implicitly or explicitly. Implicitly during the first document insert the database and collection are created as well as an automatic _id field being added to this document.

db.users.insert( {
    UserId: "user1",
    Age: 55,
    Name: "Berry Hellington",
    Gender: "F",
 } )

You can also create the database explicitly by running this comment in the Mongo Shell

db.createCollection("users")

In MySQL, you have to specify the columns in the table you are creating as well as setting some validation rules like in this example the type of data and length that goes to a specific column. In the case of MongoDB, it is not a must to define neither the fields each document should hold nor the validation rules the specified fields should hold.

However, in MongoDB for data integrity and consistency you can set the validation rules using the JSON SCHEMA VALIDATOR

Dropping a table

DROP TABLE users
db.users.drop()

This are statements for deleting a table for MySQL and collection in the case of MongoDB.

Adding a new column called join_date

ALTER TABLE users ADD join_date DATETIME

Removing the join_date column if already defined

ALTER TABLE users DROP COLUMN join_date DATETIME

Adding a new field called join_date

db.users.updateMany({},{$set:{‘join_date’: new Date()})

This will update all documents in the collection to have the join date as the current date.

Removing the join_date field if already defined

db.users.updateMany({},{$unset:{‘join_date’: “”})

This will remove the join_date field from all the collection documents.

Altering the structure of the schema by either adding or dropping a column/field.

Since the MongoDB architecture does not strictly enforce on the document structure, documents may have fields different from each other.

Creating an index with the UserId column ascending and Age descending

CREATE INDEX idx_UserId_asc_Age_desc
ON users(UserId)

Creating an index involving the UserId and Age fields.

db.users.ensureIndex( { UserId: 1, Age: -1 } )

Indices are generally created to facilitate the querying process.

INSERT INTO users(UserId,
                  Age,
                  Gender)
VALUES ("user1",
        25,
        "M")
db.users.insert( {
       UserId: "bcd001",
       Age: 25,
       Gender: "M",
     Name: "Berry Hellington",
} )

Inserting new records.

DELETE FROM users
WHERE Age = 25
db.users.deleteMany( { Age = 25 } )

Deleting records from the table/collection whose age is equal to 25.

DELETE FROM users
db.users.deleteMany({})

Deleting all records from the table/collection.

SELECT * FROM users
db.users.find()

Returns all records from the users table/collection with all columns/fields.

SELECT id, Age, Gender FROM users
db.users.find(
   { },
   { Age: 1, Gender: 1 }
)

Returns all records from the users table/collection with Age, Gender and primary key columns/fields.

SELECT  Age, Gender FROM users
db.users.find(
   { },
 { Age: 1, Gender: 1,_id: 0}
)

Returns all records from the users table/collection with Age and Gender columns/fields. The primary key is omitted.

SELECT * FROM users WHERE Gender = “M”
db.users.find({ Gender: "M"})

Returns all records from the users table/collection whose Gender value is set to M.

SELECT Gender FROM users WHERE Age = 25
db.users.find({ Age: 25}, { _id: 0, Gender: 1})

Returns all records from the users table/collection with only the Gender value but whose Age value is equal to 25.

SELECT * FROM users WHERE Age = 25 AND Gender = ‘F’
db.users.find({ Age: 25, Gender: "F"})

Returns all records from the users table/collection whose Gender value is set to F and Age is 25.

SELECT * FROM users WHERE  Age != 25
db.users.find({ Age:{$ne: 25}})

Returns all records from the users table/collection whose Age value is not equal to 25.

SELECT * FROM users WHERE Age = 25 OR Gender = ‘F’
db.users.find({$or:[{Age: 25, Gender: "F"}]})

Returns all records from the users table/collection whose Gender value is set to F or Age is 25.

SELECT * FROM users WHERE Age > 25
db.users.find({ Age:{$gt: 25}})

Returns all records from the users table/collection whose Age value is greater than 25.

SELECT * FROM users WHERE Age <= 25
db.users.find({ Age:{$lte: 25}})

Returns all records from the users table/collection whose Age value is less than or equal to 25.

SELECT Name FROM users WHERE Name like "He%"
db.users.find(
  { Name: /He/ }
)

Returns all records from the users table/collection whose Name value happens to have He letters.

SELECT * FROM users WHERE Gender = ‘F’ ORDER BY id ASC
db.users.find( { Gender: "F" } ).sort( { $natural: 1 } )

Returns all records from the users table/collection whose Gender value is set to F and sorts this result in the ascending order of the id column in case of MySQL and time inserted in the case of MongoDB.

SELECT * FROM users WHERE Gender = ‘F’ ORDER BY id DESC
db.users.find( { Gender: "F" } ).sort( { $natural: -1 } )

Returns all records from the users table/collection whose Gender value is set to F and sorts this result in the descending order of the id column in case of MySQL and time inserted in the case of MongoDB.

SELECT COUNT(*) FROM users
db.users.count()

or

db.users.find().count()

Counts all records in the users table/collection.

SELECT COUNT(Name) FROM users
db.users.count({Name:{ $exists: true }})

or

db.users.find({Name:{ $exists: true }}).count()

Counts all records in the users table/collection who happen to have a value for the Name property.

SELECT * FROM users LIMIT 1
db.users.findOne()

or

db.users.find().limit(1)

Returns the first record in the users table/collection.

SELECT * FROM users WHERE Gender = ‘F’ LIMIT 1
db.users.find( { Gender: "F" } ).limit(1)

Returns the first record in the users table/collection that happens to have Gender value equal to F.

SELECT * FROM users LIMIT 5 SKIP 10
db.users.find().limit(5).skip(10)

Returns the five records in the users table/collection after skipping the first five records.

UPDATE users SET Age = 26 WHERE age > 25
db.users.updateMany(
  { age: { $gt: 25 } },
  { $set: { Age: 26 } }
)

This sets the age of all records in the users table/collection who have the age greater than 25 to 26.

UPDATE users SET age = age + 1
db.users.updateMany(
  {} ,
  { $inc: { age: 1 } }
)

This increases the age of all records in the users table/collection by 1.

UPDATE users SET age = age - 1
WHERE id = 1
db.users.updateMany(
  {} ,
  { $inc: { age: -1 } }
)

This decrements the age of the first record in the users table/collection by 1.

To manage MySQL and/or MongoDB centrally and from a single point, visit: https://severalnines.com/product/clustercontrol.

by Onyancha Brian Henry at February 06, 2019 11:36 AM

Chris Calender

Chris Attending OpenWorks19

Those of you who know me know that I don’t travel often.

So it’s kind of a big deal, at least for me, that I will be at OpenWorks in NYC later this month!!!

If you’re planning on attending, please stop by and say “hi”! I’ll be helping in the Security Workshop on Monday, or you can find me at one of the Expert Bars Tuesday and Wednesday.

If you’re on the fence about attending, please message (or email) me for a significant discount code (if that might help you decide or help persuade your manager!). 🙂

Anyway, I’m excited to be going, and I hope to see you there! 🙂

Dates: February 25th – February 27th

by chris at February 06, 2019 12:54 AM

February 05, 2019

Peter Zaitsev

New Percona Package Signing Key Requires Update on RHEL and CentOS

percona release package signing

On December 20th, 2018 we began to sign our packages with a new encryption key. Our percona-release package contains both the latest and older package signing keys. However, older versions of the percona-release rpm package do not contain our latest key. Users with older percona-release packages installed, that have not been updated, may see an error message when trying to install our newer packages.

Redhat Enterprise Linux (RHEL) and CentOS users may see an error similar to the following:

The GPG keys listed for the "Percona-Release YUM repository - x86_64" repository are already installed but they are not correct for this package.
Check that the correct key URLs are configured for this repository.

Thankfully,  the solution to this problem is simple. You will need to update your percona-release package before installing packages that are signed with the latest encryption key:

$ sudo yum update percona-release

Ubuntu and Debian systems will not encounter this error as package signing and key verification works differently on those systems.


Photo by Markus Spiske on Unsplash

by David Bennett at February 05, 2019 06:45 PM

Upcoming Webinar Wed 2/6: Percona Software News and Roadmap Update

Percona Software News and Roadmap Update Webinar

Percona Software News and Roadmap Update WebinarJoin Percona CEO Peter Zaitsev as he presents Percona Software News and Roadmap Update on Wednesday, February 6, 2019, at 11:00 AM PST (UTC-8) / 2:00 PM EST (UTC-5).

Register Now

Come and listen to Percona CEO Peter Zaitsev discuss what’s new in Percona open source software. Topics include Percona Server for MySQL and MongoDB, Percona XtraBackup, Percona Toolkit, Percona XtraDB Cluster and Percona Monitoring and Management.

During this webinar, Peter will talk about newly released features in Percona software. He will also show a few quick demos and share with you highlights from the Percona open source software roadmap.

Peter will also talk about new developments in Percona commercial services and finish with a Q&A.

Register today to join Peter for his Percona Software News and Roadmap Update.

by Peter Zaitsev at February 05, 2019 04:29 PM

Using pg_repack to Rebuild PostgreSQL Database Objects Online

Rebuild PostgreSQL Database Objects

Rebuild PostgreSQL Database ObjectsIn this blog post, we’ll look at how to use

pg_repack
 to rebuild PostgreSQL database objects online.

We’ve seen a lot of questions regarding the options available in PostgreSQL for rebuilding a table online. We created this blog post to explain the 

pg_repack
 extension, available in PostgreSQL for this requirement. pg_repack is a well-known extension that was created and is maintained as an open source project by several authors.

There are three main reasons why you need to use

pg_repack
 in a PostgreSQL server:

  1. Reclaim free space from a table to disk, after deleting a huge chunk of records
  2. Rebuild a table to re-order the records and shrink/pack them to lesser number of pages. This may let a query fetch just one page  ( or < n pages) instead of n pages from disk. In other words, less IO and more performance.
  3. Reclaim free space from a table that has grown in size with a lot of bloat due to improper autovacuum settings.

You might have already read our previous articles that explained what bloat is, and discussed the internals of autovacuum. After reading these articles, you can see there is an autovacuum background process that removes dead tuples from a table and allows the space to be re-used by future updates/inserts on that table. Over a period of time, tables that take the maximum number of updates or deletes may have a lot of bloated space due to poorly tuned autovacuum settings. This leads to slow performing queries on these tables. Rebuilding the table is the best way to avoid this. 

Why is just autovacuum not enough for tables with bloat?

We have discussed several parameters that change the behavior of an autovacuum process in this blog post. There cannot be more than

autovacuum_max_workers
 number of autovacuum processes running in a database cluster at a time. At the same time, due to untuned autovacuum settings and no manual vacuuming of the database as a weekly or monthy jobs, many tables can be skipped from autovacuum. We have discussed in this post that the default autovacuum settings run autovacuum on a table with ten records more times than a table with a million records. So, it is very important to tune your autovacuum settings, set table-level customized autovacuum parameters and enable automated jobs to identify tables with huge bloat and run manual vacuum on them as scheduled jobs during low peak times (after thorough testing).

VACUUM FULL

VACUUM FULL
 is the default option available with a PostgreSQL installation that allows us to rebuild a table. This is similar to
ALTER TABLE
 in MySQL. However, this command acquires an exclusive lock and locks reads and writes on a table. 

VACUUM FULL tablename;

pg_repack

pg_repack
 is an extension available for PostgreSQL that helps us rebuild a table online. This is similar to
pt-online-schema-change
 for online table rebuild/reorg in MySQL. However,
pg_repack
 works for tables with a Primary key or a NOT NULL Unique key only.

Installing pg_repack extension

In RedHat/CentOS/OEL from PGDG Repo

Obtain the latest PGDG repo from https://yum.postgresql.org/ and perform the following step:

# yum install pg_repack11 (This works for PostgreSQL 11)
Similarly, for PostgreSQL 10,
# yum install pg_repack10

In Debian/Ubuntu from PGDG repo

Add certificates, repo and install

pg_repack
:

Following certificate may change. Please validate before you perform these steps.
# sudo apt-get install wget ca-certificates
# wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
# sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
# sudo apt-get update
# apt-get install postgresql-server-dev-11
# apt-get install postgresql-11-repack

Loading and creating pg_repack extension

Step 1 :

You need to add

pg_repack
to
shared_preload_libraries
. For that, just set this parameter in postgresql.conf or postgresql.auto.conf file.

shared_preload_libraries = 'pg_repack'

Setting this parameter requires a restart.

$ pg_ctl -D $PGDATA restart -mf

Step 2 :

In order to start using

pg_repack
, you must create this extension in each database where you wish to run it:

$ psql
\c percona
CREATE EXTENSION pg_repack;

Using pg_repack to Rebuild Tables Online

Similar to

pt-online-schema-change
, you can use the option
--dry-run
 to see if this table can be rebuilt using
pg_repack
. When you rebuild a table using
pg_repack
, all its associated Indexes does get rebuild automatically. You can also use
-t
 instead of
--table
 as an argument to rebuild a specific table.

Success message you see when a table satisfies the requirements for pg_repack.

$ pg_repack --dry-run -d percona --table scott.employee
INFO: Dry run enabled, not executing repack
INFO: repacking table "scott.employee"

Error message when a table does not satisfy the requirements for pg_repack.

$ pg_repack --dry-run -d percona --table scott.sales
INFO: Dry run enabled, not executing repack
WARNING: relation "scott.sales" must have a primary key or not-null unique keys

Now to execute the rebuild of a table: scott.employee ONLINE, you can use the following command. It is just the previous command without

--dry-run
.

$ pg_repack -d percona --table scott.employee
INFO: repacking table "scott.employee"

Rebuilding Multiple Tables using pg_repack

Use an additional

--table
 for each table you wish to rebuild.

Dry Run

$ pg_repack --dry-run -d percona --table scott.employee --table scott.departments
INFO: Dry run enabled, not executing repack
INFO: repacking table "scott.departments"
INFO: repacking table "scott.employee"

Execute

$ pg_repack -d percona --table scott.employee --table scott.departments
INFO: repacking table "scott.departments"
INFO: repacking table "scott.employee"

Rebuilding an entire Database using pg_repack

You can rebuild an entire database online using

-d
. Any table that is not eligible for
pg_repack
is skipped automatically.

Dry Run

$ pg_repack --dry-run -d percona
INFO: Dry run enabled, not executing repack
INFO: repacking table "scott.departments"
INFO: repacking table "scott.employee"

Execute

$ pg_repack -d percona
INFO: repacking table "scott.departments"
INFO: repacking table "scott.employee"


Running pg_repack in parallel jobs

To perform a parallel rebuild of a table, you can use the option

-j
. Please ensure that you have sufficient free CPUs that can be allocated to run
pg_repack
in parallel.

$ pg_repack -d percona -t scott.employee -j 4
NOTICE: Setting up workers.conns
INFO: repacking table "scott.employee"

Running pg_repack remotely

You can always run

pg_repack
from a Remote Machine. This helps in scenarios where we have PostgreSQL databases deployed on Amazon RDS. To run
pg_repack
from a remote machine, you must have the same version of
pg_repack
installed in the remote server as well as the database server (say AWS RDS).

by Avinash Vallarapu at February 05, 2019 01:14 AM

February 04, 2019

MariaDB Foundation

MariaDB Galera Cluster 10.0.38 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB Galera Cluster 10.0.38, the latest stable release in the MariaDB Galera Cluster 10.0 series. See the release notes and changelogs for details. Download MariaDB Galera Cluster 10.0.38 Release Notes Changelog What is MariaDB Galera Cluster? Contributors to MariaDB Galera Cluster 10.0.38 Alexander Barkov (MariaDB […]

The post MariaDB Galera Cluster 10.0.38 now available appeared first on MariaDB.org.

by Ian Gilfillan at February 04, 2019 09:59 PM

February 01, 2019

MariaDB Foundation

MariaDB Galera Cluster 5.5.63 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB Galera Cluster 5.5.63, the latest stable release in the MariaDB Galera Cluster 5.5 series. See the release notes and changelogs for details. Download MariaDB Galera Cluster 5.5.63 Release Notes Changelog What is MariaDB Galera Cluster? Contributors to MariaDB Galera Cluster 5.5.63 Alexander Barkov (MariaDB […]

The post MariaDB Galera Cluster 5.5.63 now available appeared first on MariaDB.org.

by Ian Gilfillan at February 01, 2019 07:19 PM

MariaDB 10.0.38, MariaDB Connector/J 2.4.0 and MariaDB Connector/Node.js 2.0.3 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.0.38, the latest stable release in the MariaDB 10.0 series, and also the final in the maintenance window, as well as MariaDB Connector/J 2.4.0, the latest stable MariaDB Connector/J release, and MariaDB Connector/Node.js 2.0.3, the first stable release of the 100% JavaScript non-blocking MariaDB […]

The post MariaDB 10.0.38, MariaDB Connector/J 2.4.0 and MariaDB Connector/Node.js 2.0.3 now available appeared first on MariaDB.org.

by Ian Gilfillan at February 01, 2019 09:55 AM

January 31, 2019

Peter Zaitsev

A New Dashboard to Monitor Memory Usage in the PMM plugin!

Dashboard to Monitor Memory Usage in Linux

While the PMM team works hard on our PMM 2.0 release, we have been working on a few things in the background which we’d like to show off !  In particular we have developed a new dashboard that displays metrics related to memory usage on Linux systems. The dashboard leverages information collected by node_exporter. The graphs take advantage of  /proc filesystem files, specifically:

  • meminfo: Provides information about distribution and utilization of memory. This varies by architecture and compile options.
  • vmstat: Provides information about block IO and CPU activity in addition to memory.

The information is split into five sections:

  1. Total Memory
  2. VMM (Virtual Memory Manager) Statistics
  3. Memory Statistics
  4. Number and Dynamic of Pages
  5. Pages per Zone

The dashboard will be included as part of the PMM 2.0 release. For you early adopters, you can get it from GrafanaLab and install it alongside your existing Dashboards – it won’t overwrite anything!

This dashboard works with all PMM Server versions starting with 1.7 (January 31, 2018).

by Vadim Yalovets at January 31, 2019 11:37 AM

Jean-Jerome Schmidt

How Roles Have Changed in MySQL 8.0 and How to Use Them

Database Security is important to any MySQL setup. Users are the foundation of any system. In terms of database systems, I generally think of them in two distinct groups:

  1. Application, service, or program users - basically customers or clients using a service.
  2. Database developers, administrators, analyst, etc… - Those maintaining, working with or monitoring the database infrastructure.

While each user does need to access the database at some level, those permissions are not all created equal.

For instance, clients and customers need access to their 'related user account' data, but even that should be monitored with some level of control. However, some tables and data should be strictly off-limits (E.g., system tables).

Nevertheless:

  • Analyst need 'read access', to garner information and insight via querying tables…
  • Developers require a slew of permissions and privileges to carry out their work…
  • DBA's need 'root' or similar type privileges to run the show…
  • Buyers of a service need to see their order and payment history…

You can imagine (I know I do) just how difficult a task managing multiple users or groups of users within a database ecosystem is.

In older versions of MySQL, a multiple-user environment is established in a somewhat monotonous and repetitive manner.

Yet, version 8 implements an exceptional, and powerful, SQL standard feature - Roles - which alleviates one of the more redundant areas of the entire process: assigning privileges to a user.

So, what is a role in MySQL?

You can surely visit, MySQL in 2018: What’s in 8.0 and Other Observations, I wrote for the Severalnines blog here where I mention roles for a high-level overview. However, where I only summarized them there, this current post looks to go deeper and focus solely on roles.

Here is how the online MySQL documentation defines a role: "A MySQL role is a named collection of privileges".

Doesn't that definition alone seem helpful?

But how?

We will see in the examples that follow.

To Make Note of the Examples Provided

The examples included in this post are in a personal 'single-user' development and learning workstation/environment so be sure and implement those best practices that benefit you for your particular needs or requirements. The user names and passwords demonstrated are purely arbitrary and weak.

Users and Privileges in Previous Versions

In MySQL 5.7, roles do not exist. Assigning privileges to users is done individually. To better understand what roles do provide, let's not use them. That doesn't make any sense at all, I know. But, as we progress through the post, it will.

Below we create some users:

CREATE USER 'reader_1'@'localhost' IDENTIFIED BY 'some_password'; 
CREATE USER 'reader_writer'@'localhost' IDENTIFIED BY 'another_password'; 
CREATE USER 'changer_1'@'localhost' IDENTIFIED BY 'a_password';

Then those users are granted some privileges:

GRANT SELECT ON some_db.specific_table TO 'reader_1'@'localhost';
GRANT SELECT, INSERT ON some_db.specific_table TO 'reader_writer'@'localhost';
GRANT UPDATE, DELETE ON some_db.specific_table TO 'changer_1'@'localhost';

Whew, glad that is over. Now back to…

And just like that, you have a request to implement two more 'read-only' users…

Back to the drawing board:

CREATE USER 'reader_2'@'localhost' IDENTIFIED BY 'password_2'; 
CREATE USER 'reader_3'@'localhost' IDENTIFIED BY 'password_3';

Assigning them privileges as well:

GRANT SELECT ON some_db.specific_table TO 'reader_2'@'localhost';
GRANT ALL ON some_db.specific_table TO 'reader_3'@'localhost';

Can you see how this is less-than-productive, full of repetition, and error-prone? But, more importantly, did you catch the mistake?

Good for you!

While granting privileges for these two additional users, I accidentally granted ALL privileges to new user reader_3.

Oops.

A mistake that anyone could make.

Enter MySQL Roles

With roles, much of the above systematic privilege assignment and delegation can be somewhat streamlined.

User creation basically remains the same, but it's assigning privileges through roles that differs:

mysql> CREATE USER 'reader_1'@'localhost' IDENTIFIED BY 'some_password';
Query OK, 0 rows affected (0.19 sec)
mysql> CREATE USER 'reader_writer'@'localhost' IDENTIFIED BY 'another_password';
Query OK, 0 rows affected (0.22 sec)
mysql> CREATE USER 'changer_1'@'localhost' IDENTIFIED BY 'a_password';
Query OK, 0 rows affected (0.08 sec)
mysql> CREATE USER 'reader_2'@'localhost' IDENTIFIED BY 'password_2';
Query OK, 0 rows affected (0.28 sec)
mysql> CREATE USER 'reader_3'@'localhost' IDENTIFIED BY 'password_3';
Query OK, 0 rows affected (0.12 sec)

Querying the mysql.user system table, you can see those newly created users exist:

(Note: I have several user accounts in this learning/development environment and have suppressed much of the output for better on-screen clarity.)

mysql> SELECT User FROM mysql.user;
+------------------+
| User             |
+------------------+
| changer_1        |
| mysql.infoschema |
| mysql.session    |
| mysql.sys        |
| reader_1         |
| reader_2         |
| reader_3         |
| reader_writer    |
| root             |
|                  | --multiple rows remaining here...
+------------------+
23 rows in set (0.00 sec)

I have this arbitrary table and sample data:

mysql> SELECT * FROM name;
+--------+------------+
| f_name | l_name     |
+--------+------------+
| Jim    | Dandy      |
| Johhny | Applesauce |
| Ashley | Zerro      |
| Ashton | Zerra      |
| Ashmon | Zerro      |
+--------+------------+
5 rows in set (0.00 sec)

Let's now use roles to establish and assign, privileges for the new users to use the name table.

First, create the roles:

mysql> CREATE ROLE main_read_only;
Query OK, 0 rows affected (0.11 sec)
mysql> CREATE ROLE main_read_write;
Query OK, 0 rows affected (0.11 sec)
mysql> CREATE ROLE main_changer;
Query OK, 0 rows affected (0.14 sec)

Notice the mysql.user table again:

mysql> SELECT User FROM mysql.user;
+------------------+
| User             |
+------------------+
| main_changer     |
| main_read_only   |
| main_read_write  |
| changer_1        |
| mysql.infoschema |
| mysql.session    |
| mysql.sys        |
| reader_1         |
| reader_2         |
| reader_3         |
| reader_writer    |
| root             |
|                  |
+------------------+
26 rows in set (0.00 sec)

Based on this output, we can surmise; that in all essence, roles are in fact, users themselves.

Next, privilege assignment:

mysql> GRANT SELECT ON practice.name TO 'main_read_only';
Query OK, 0 rows affected (0.14 sec)
mysql> GRANT SELECT, INSERT ON practice.name TO 'main_read_write';
Query OK, 0 rows affected (0.07 sec)
mysql> GRANT UPDATE, DELETE ON practice.name TO 'main_changer';
Query OK, 0 rows affected (0.16 sec)

A Brief Interlude

Wait a minute. Can I just log in and carry out any tasks with the role accounts themselves? After all, they are users and they have the required privileges.

Let's attempt to log in to the practice database with role main_changer:

:~$ mysql -u main_changer -p practice
Enter password: 
ERROR 1045 (28000): Access denied for user 'main_changer'@'localhost' (using password: YES

The simple fact that we are presented with a password prompt is a good indication that we cannot (at this time at least). As you recall, I did not set a password for any of the roles during their creation.

What does the mysql.user system tables' authentication_string column have to say?

mysql> SELECT User, authentication_string, password_expired
    -> FROM mysql.user
    -> WHERE User IN ('main_read_only', 'root', 'main_read_write', 'main_changer')\G
*************************** 1. row ***************************
                 User: main_changer
authentication_string: 
     password_expired: Y
*************************** 2. row ***************************
                 User: main_read_only
authentication_string: 
     password_expired: Y
*************************** 3. row ***************************
                 User: main_read_write
authentication_string: 
     password_expired: Y
*************************** 4. row ***************************
                 User: root
authentication_string: ***various_jumbled_mess_here*&&*&*&*##
     password_expired: N
4 rows in set (0.00 sec)

I included the root user among the role names for the IN() predicate check to simply demonstrate it has an authentication_string, where the roles do not.

This passage in the CREATE ROLE documentation clarifies it nicely: "A role when created is locked, has no password, and is assigned the default authentication plugin. (These role attributes can be changed later with the ALTER USER statement, by users who have the global CREATE USER privilege.)"

Back to the task at hand, we can now assign the roles to users based on their needed level of privileges.

Notice no ON clause is present in the command:

mysql> GRANT 'main_read_only' TO 'reader_1'@'localhost', 'reader_2'@'localhost', 'reader_3'@'localhost';
Query OK, 0 rows affected (0.13 sec)
mysql> GRANT 'main_read_write' TO 'reader_writer'@'localhost';
Query OK, 0 rows affected (0.16 sec)
mysql> GRANT 'main_changer', 'main_read_only' TO 'changer_1'@'localhost';
Query OK, 0 rows affected (0.13 sec)

It may be less confusing if you use some sort of 'naming convention' when establishing role names, (I am unaware if MySQL provides one at this time… Community?) if for no other reason than to differentiate between them and regular 'non-role' users visually.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

There is Still Some Work Left To Do

That was super-easy wasn't it?

Less redundant than the old way of privilege assignment.

Let's put those users to work now.

We can see the granted privileges for a user with SHOW GRANTS syntax. Here is what is currently assigned to the reader_1 user account:

mysql> SHOW GRANTS FOR 'reader_1'@'localhost';
+------------------------------------------------------+
| Grants for reader_1@localhost                        |
+------------------------------------------------------+
| GRANT USAGE ON *.* TO `reader_1`@`localhost`         |
| GRANT `main_read_only`@`%` TO `reader_1`@`localhost` |
+------------------------------------------------------+
2 rows in set (0.02 sec)

Although that does provide an informative output, you can 'tune' the statement for even more granular information on any exact privileges an assigned role provides by including a USING clause in the SHOW GRANTS statement and naming the assigned roles name:

mysql> SHOW GRANTS FOR 'reader_1'@'localhost' USING 'main_read_only';
+-------------------------------------------------------------+
| Grants for reader_1@localhost                               |
+-------------------------------------------------------------+
| GRANT USAGE ON *.* TO `reader_1`@`localhost`                |
| GRANT SELECT ON `practice`.`name` TO `reader_1`@`localhost` |
| GRANT `main_read_only`@`%` TO `reader_1`@`localhost`        |
+-------------------------------------------------------------+
3 rows in set (0.00 sec)

After logging in with reader_1:

mysql> SELECT * FROM practice.name;
ERROR 1142 (42000): SELECT command denied to user 'reader_1'@'localhost' for table 'name'

What on earth? That user was granted SELECT privileges through role main_read_only.

To investigate, let's visit 2 new tables in version 8, specifically for roles.

The mysql.role_edges table shows what roles have been granted to any users:

mysql> SELECT * FROM mysql.role_edges;
+-----------+-----------------+-----------+---------------+-------------------+
| FROM_HOST | FROM_USER       | TO_HOST   | TO_USER       | WITH_ADMIN_OPTION |
+-----------+-----------------+-----------+---------------+-------------------+
| %         | main_changer    | localhost | changer_1     | N                 |
| %         | main_read_only  | localhost | changer_1     | N                 |
| %         | main_read_only  | localhost | reader_1      | N                 |
| %         | main_read_only  | localhost | reader_2      | N                 |
| %         | main_read_only  | localhost | reader_3      | N                 |
| %         | main_read_write | localhost | reader_writer | N                 |
+-----------+-----------------+-----------+---------------+-------------------+
6 rows in set (0.00 sec)

But, I feel the other additional table, mysql.default_roles, will better help us solve the SELECT problems for user reader_1:

mysql> DESC mysql.default_roles;
+-------------------+----------+------+-----+---------+-------+
| Field             | Type     | Null | Key | Default | Extra |
+-------------------+----------+------+-----+---------+-------+
| HOST              | char(60) | NO   | PRI |         |       |
| USER              | char(32) | NO   | PRI |         |       |
| DEFAULT_ROLE_HOST | char(60) | NO   | PRI | %       |       |
| DEFAULT_ROLE_USER | char(32) | NO   | PRI |         |       |
+-------------------+----------+------+-----+---------+-------+
4 rows in set (0.00 sec)
mysql> SELECT * FROM mysql.default_roles;
Empty set (0.00 sec)

Empty results set.

Turns out, in order for a user to be able to use a role - and ultimately the privileges - the user must be assigned a default role.

mysql> SET DEFAULT ROLE main_read_only TO 'reader_1'@'localhost', 'reader_2'@'localhost', 'reader_3'@'localhost';
Query OK, 0 rows affected (0.11 sec)

(A default role can be assigned to multiple users in one command as above…)

mysql> SET DEFAULT ROLE main_read_only, main_changer TO 'changer_1'@'localhost';
Query OK, 0 rows affected (0.10 sec)

(A user can have multiple default roles specified as in the case for user changer_1…)

User reader_1 is now logged in...

mysql> SELECT CURRENT_USER();
+--------------------+
| CURRENT_USER()     |
+--------------------+
| reader_1@localhost |
+--------------------+
1 row in set (0.00 sec)
mysql> SELECT CURRENT_ROLE();
+----------------------+
| CURRENT_ROLE()       |
+----------------------+
| `main_read_only`@`%` |
+----------------------+
1 row in set (0.03 sec)

We can see the currently active role and also, that reader_1 can issue SELECT commands now:

mysql> SELECT * FROM practice.name;
+--------+------------+
| f_name | l_name     |
+--------+------------+
| Jim    | Dandy      |
| Johhny | Applesauce |
| Ashley | Zerro      |
| Ashton | Zerra      |
| Ashmon | Zerro      |
+--------+------------+
5 rows in set (0.00 sec)

Other Hidden Nuances

There is another important part of the puzzle we need to understand.

There are potentially 3 different 'levels' or 'variants' of role assignment:

SET ROLE …;
SET DEFAULT ROLE …;
SET ROLE DEFAULT …;

I'll GRANT an additional role to user reader_1 and then login with that user (not shown):

mysql> GRANT 'main_read_write' TO 'reader_1'@'localhost';
Query OK, 0 rows affected (0.17 sec)

Since role main_read_write does have the INSERT privilege, user reader_1 can now run that command right?

mysql> INSERT INTO name(f_name, l_name)
    -> VALUES('Josh', 'Otwell');
ERROR 1142 (42000): INSERT command denied to user 'reader_1'@'localhost' for table 'name'

What is going on here?

This may help...

mysql> SELECT CURRENT_ROLE();
+----------------------+
| CURRENT_ROLE()       |
+----------------------+
| `main_read_only`@`%` |
+----------------------+
1 row in set (0.00 sec)

Recall, we initially set user reader_1 a default role of main_read_only. This is where we need to use one of those distinct 'levels' of what I loosely term 'role setting':

mysql> SET ROLE main_read_write;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT CURRENT_ROLE();
+-----------------------+
| CURRENT_ROLE()        |
+-----------------------+
| `main_read_write`@`%` |
+-----------------------+
1 row in set (0.00 sec)

Now attempt that INSERT again:

mysql> INSERT INTO name(f_name, l_name)
    -> VALUES('Josh', 'Otwell');
Query OK, 1 row affected (0.12 sec)

However, once user reader_1 logs back out, role main_read_write will no longer be active when reader_1 logs back in. Although user reader_1 does have the main_read_write role granted to it, it is not the default.

Let’s now come to know the 3rd 'level' of 'role setting', SET ROLE DEFAULT.

Suppose user reader_1 has no roles assigned yet:

mysql> SHOW GRANTS FOR 'reader_1'@'localhost';
+----------------------------------------------+
| Grants for reader_1@localhost                |
+----------------------------------------------+
| GRANT USAGE ON *.* TO `reader_1`@`localhost` |
+----------------------------------------------+
1 row in set (0.00 sec)

Let’s GRANT this user 2 roles:

mysql> GRANT 'main_changer', 'main_read_write' TO 'reader_1'@'localhost';
Query OK, 0 rows affected (0.07 sec)

Assign a default role:

mysql> SET DEFAULT ROLE ‘main_changer’ TO 'reader_1'@'localhost';
Query OK, 0 rows affected (0.17 sec)

Then with user reader_1 logged in, that default role is active:

mysql> SELECT CURRENT_ROLE();
+--------------------+
| CURRENT_ROLE()     |
+--------------------+
| `main_changer`@`%` |
+--------------------+
1 row in set (0.00 sec)

Now switch to role main_read_write:

mysql> SET ROLE 'main_read_write';
Query OK, 0 rows affected (0.01 sec)
mysql> SELECT CURRENT_ROLE();
+-----------------------+
| CURRENT_ROLE()        |
+-----------------------+
| `main_read_write`@`%` |
+-----------------------+
1 row in set (0.00 sec)

But, to return back to the assigned default role, use SET ROLE DEFAULT as shown below:

mysql> SET ROLE DEFAULT;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT CURRENT_ROLE();
+--------------------+
| CURRENT_ROLE()     |
+--------------------+
| `main_changer`@`%` |
+--------------------+
1 row in set (0.00 sec)

Roles Not Granted

Even though user changer_1 has 2 roles available during a session:

mysql> SELECT CURRENT_ROLE();
+-----------------------------------------+
| CURRENT_ROLE()                          |
+-----------------------------------------+
| `main_changer`@`%`,`main_read_only`@`%` |
+-----------------------------------------+
1 row in set (0.00 sec)

What happens if you attempt and set a user to a role they have not been granted?

mysql> SET ROLE main_read_write;
ERROR 3530 (HY000): `main_read_write`@`%` is not granted to `changer_1`@`localhost`

Denied.

Taketh Away

No user management system would be complete without the ability to constrain or even remove access to certain operations should the need arise.

We have the SQL REVOKE command at our disposal to remove privileges from users and roles.

Recall that role main_changer has this set of privileges, essentially, all of those users granted this role do as well:

mysql> SHOW GRANTS FOR main_changer;
+-----------------------------------------------------------------+
| Grants for main_changer@%                                       |
+-----------------------------------------------------------------+
| GRANT USAGE ON *.* TO `main_changer`@`%`                        |
| GRANT UPDATE, DELETE ON `practice`.`name` TO `main_changer`@`%` |
+-----------------------------------------------------------------+
2 rows in set (0.00 sec)
mysql> REVOKE DELETE ON practice.name FROM 'main_changer';
Query OK, 0 rows affected (0.11 sec)
mysql> SHOW GRANTS FOR main_changer;
+---------------------------------------------------------+
| Grants for main_changer@%                               |
+---------------------------------------------------------+
| GRANT USAGE ON *.* TO `main_changer`@`%`                |
| GRANT UPDATE ON `practice`.`name` TO `main_changer`@`%` |
+---------------------------------------------------------+
2 rows in set (0.00 sec)

To know what users this change affected, we can visit the mysql.role_edges table again:

mysql> SELECT * FROM mysql.role_edges WHERE FROM_USER = 'main_changer';
+-----------+--------------+-----------+-----------+-------------------+
| FROM_HOST | FROM_USER    | TO_HOST   | TO_USER   | WITH_ADMIN_OPTION |
+-----------+--------------+-----------+-----------+-------------------+
| %         | main_changer | localhost | changer_1 | N                 |
+-----------+--------------+-----------+-----------+-------------------+
1 row in set (0.00 sec)

And we can see that user changer_1 no longer has the DELETE privilege:

mysql> SHOW GRANTS FOR 'changer_1'@'localhost' USING 'main_changer';
+--------------------------------------------------------------------------+
| Grants for changer_1@localhost                                           |
+--------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO `changer_1`@`localhost`                            |
| GRANT UPDATE ON `practice`.`name` TO `changer_1`@`localhost`             |
| GRANT `main_changer`@`%`,`main_read_only`@`%` TO `changer_1`@`localhost` |
+--------------------------------------------------------------------------+
3 rows in set (0.00 sec)

Finally, if we need to get rid of a role completely, we have the DROP ROLE command for that:

mysql> DROP ROLE main_read_only;
Query OK, 0 rows affected (0.17 sec)

And querying the mysql.role_edges table, role main_read_only has been removed:

mysql> SELECT * FROM mysql.role_edges;
+-----------+-----------------+-----------+---------------+-------------------+
| FROM_HOST | FROM_USER       | TO_HOST   | TO_USER       | WITH_ADMIN_OPTION |
+-----------+-----------------+-----------+---------------+-------------------+
| %         | main_changer    | localhost | changer_1     | N                 |
| %         | main_read_write | localhost | reader_1      | N                 |
| %         | main_read_write | localhost | reader_writer | N                 |
+-----------+-----------------+-----------+---------------+-------------------+
3 rows in set (0.00 sec)

(Bonus: This fantastic YouTube video was a great learning resource for me on Roles.)

This example of user creation, role assignment, and setup is rudimentary at best. Yet, roles have their own set of rules that make them far from trivial. My hope is that through this blog post, I have shed light on those areas that are less intuitive than others, enabling readers to better understand potential role uses within their systems.

Thank you for reading.

by Joshua Otwell at January 31, 2019 08:31 AM

January 30, 2019

MariaDB Foundation

MariaDB 5.5.63 and MariaDB Connector/ODBC 3.1.0 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 5.5.63, the latest stable release in the MariaDB 5.5 series and MariaDB Connector/ODBC 3.1.0, the first release candidate in the MariaDB Connector/ODBC 3.1 series. See the release notes and changelogs for details. Download MariaDB 5.5.63 Release Notes Changelog What is MariaDB 5.5? MariaDB APT […]

The post MariaDB 5.5.63 and MariaDB Connector/ODBC 3.1.0 now available appeared first on MariaDB.org.

by Ian Gilfillan at January 30, 2019 04:39 PM

Jean-Jerome Schmidt

MySQL Performance Benchmarking: MySQL 5.7 vs MySQL 8.0

MySQL 8.0 brought enormous changes and modifications that were pushed by the Oracle MySQL Team. Physical files have been changed. For instance, *.frm, *.TRG, *.TRN, and *.par no longer exist. Tons of new features have been added such as CTE (Common Table Expressions), Window Functions, Invisible Indexes, regexp (or Regular Expression)--the latter has been changed and now provides full Unicode support and is multibyte safe. Data dictionary has also changed. It’s now incorporated with a transactional data dictionary that stores information about database objects. Unlike previous versions, dictionary data was stored in metadata files and non-transactional tables. Security has been improved with the new addition of caching_sha2_password which is now the default authentication replacing mysql_native_password and offers more flexibility but tightened security which must use either a secure connection or an unencrypted connection that supports password exchange using an RSA key pair.

With all of these cool features, enhancements, improvements that MySQL 8.0 offers, our team was interested to determine how well the current version MySQL 8.0 performs especially given that our support for MySQL 8.0.x versions in ClusterControl is on its way (so stay tuned on this). This blog post won’t be discussing the features of MySQL 8.0, but intends to benchmark its performance against MySQL 5.7 and see how it has improved then.

Server Setup and Environment

For this benchmark, I intend to use a minimal setup for production using the following AWS EC2 environment:

Instance-type: t2.xlarge instance
Storage: gp2 (SSD storage with minimum of 100 and maximum of 16000 IOPS)
vCPUS: 4
Memory: 16GiB
MySQL 5.7 version: MySQL Community Server (GPL) 5.7.24
MySQL 8.0 version: MySQL Community Server - GPL 8.0.14

There are few notable variables that I have set for this benchmark as well, which are:

  • innodb_max_dirty_pages_pct = 90 ## This is the default value in MySQL 8.0. See here for details.
  • innodb_max_dirty_pages_pct_lwm=10 ## This is the default value in MySQL 8.0
  • innodb_flush_neighbors=0
  • innodb_buffer_pool_instances=8
  • innodb_buffer_pool_size=8GiB

The rest of the variables being set here for both versions (MySQL 5.7 and MySQL 8.0) are tuned up already by ClusterControl for its my.cnf template.

Also, the user I used here does not conform to the new authentication of MySQL 8.0 which uses caching_sha2_password. Instead, both server versions uses mysql_native_password plus innodb_dedicated_server variable is OFF (default), which is a new feature of MySQL 8.0.

To make life easier, I setup MySQL 5.7 Community version node with ClusterControl from a separate host then removed the node in a cluster and shutdown the ClusterControl host to make MySQL 5.7 node dormant (no monitoring traffic). Technically, both nodes MySQL 5.7 and MySQL 8.0 are dormant and no active connections are going through the nodes, so it’s essentially a pure benchmarking test.

Commands and Scripts Used

For this task, sysbench is used for testing and load simulation for the two environments. Here are the following commands or script being used on this test:

sb-prepare.sh

#!/bin/bash

host=$1
#host192.168.10.110
port=3306
user='sysbench'
password='MysqP@55w0rd'
table_size=500000
rate=20
ps_mode='disable'
sysbench /usr/share/sysbench/oltp_read_write.lua --db-driver=mysql --threads=1 --max-requests=0 --time=3600 --mysql-host=$host --mysql-user=$user --mysql-password=$password --mysql-port=$port --tables=10 --report-interval=1 --skip-trx=on --table-size=$table_size --rate=$rate --db-ps-mode=$ps_mode prepare

sb-run.sh

#!/usr/bin/env bash

host=$1
port=3306
user="sysbench"
password="MysqP@55w0rd"
table_size=100000
tables=10
rate=20
ps_mode='disable'
threads=1
events=0
time=5
trx=100
path=$PWD

counter=1

echo "thread,cpu" > ${host}-cpu.csv

for i in 16 32 64 128 256 512 1024 2048; 
do 

    threads=$i

    mysql -h $host -e "SHOW GLOBAL STATUS" >> $host-global-status.log
    tmpfile=$path/${host}-tmp${threads}
    touch $tmpfile
    /bin/bash cpu-checker.sh $tmpfile $host $threads &

    /usr/share/sysbench/oltp_read_write.lua --db-driver=mysql --events=$events --threads=$threads --time=$time --mysql-host=$host --mysql-user=$user --mysql-password=$password --mysql-port=$port --report-interval=1 --skip-trx=on --tables=$tables --table-size=$table_size --rate=$rate --delete_inserts=$trx --order_ranges=$trx --range_selects=on --range-size=$trx --simple_ranges=$trx --db-ps-mode=$ps_mode --mysql-ignore-errors=all run | tee -a $host-sysbench.log

    echo "${i},"`cat ${tmpfile} | sort -nr | head -1` >> ${host}-cpu.csv
    unlink ${tmpfile}

    mysql -h $host -e "SHOW GLOBAL STATUS" >> $host-global-status.log
done

python $path/innodb-ops-parser.py $host

mysql -h $host -e "SHOW GLOBAL VARIABLES" >> $host-global-vars.log

So the script simply prepares the sbtest schema and populates tables and records. Then it performs read/write load tests using /usr/share/sysbench/oltp_read_write.lua script. The script dumps global status and MySQL variables, collects CPU utilization, and parses InnoDB row operations handled by script innodb-ops-parser.py. The scripts then generates *.csv files based on the dumped logs that were collected during the benchmark, then I used an Excel spreadsheet here to generate the graph from *.csv files. Please check the code here in this github repository.

Now, let’s proceed with the graph results!

InnoDB Row Operations

Basically here, I only extracted the InnoDB row operations which does the selects (reads), deletes, inserts, and updates. When number of threads goes up, MySQL 8.0 significantly outperforms MySQL 5.7! Both versions do not have any specific config changes, but only the notable variables I have set. So both versions are pretty much using default values.

Interestingly, with regards to the claims of the MySQL Server Team about the performance of reads and writes in the new version, the graphs point to a significant performance improvement, especially in a high-load server. Imagine the difference between MySQL 5.7 versus MySQL 8.0 for all its InnoDB row operations, there’s a high difference especially when number of threads goes up. MySQL 8.0 reveals that it can perform efficiently regardless of its workload.

Transactions Processed

As shown in the graph above, MySQL 8.0 performance shows again a huge difference in the time it takes to process transactions. The lower, the better it performs which means it’s faster to process transactions. The transactions processed (the second graph) also reveals that both numbers of transactions do not differ from each other. Meaning, both versions executes almost the same number of transactions but differ in how fast it can finish. Although I could say, MySQL 5.7 still can handle a lot at lower load, but the realistic load especially in production could be expected to be higher - especially the busiest period.

The graph above still shows the transactions it was able to process but separates the read from writes. However, there’s actually outliers in the graphs which I didn’t include as they’re tiny tidbits of the result which would skew the graph.

MySQL 8.0 reveals a great improvements especially for doing reads. It displays its efficiency in writes especially for servers with a high workload. Some great added support that impacts MySQL performance for reads in version 8.0 is the ability to create an index in descending order (or forward index scans). Previous versions had only ascending or backward index scan, and MySQL had to do filesort if it needed a descending order (if filesort is needed, you might consider checking the value of max_length_for_sort_data). Descending indexes also make it possible for the optimizer to use multiple-column indexes when the most efficient scan order mixes ascending order for some columns and descending order for others. See here for more details.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

CPU Resources

During this benchmarking, I decided to take some hardware resources, most notably, the CPU utilization.

Let me explain first how I take the CPU resource here during benchmarking. sysbench does not include collective statistics for hardware resources utilized or used during the process when you are benchmarking a database. Because of that, what I did is to create a flag by creating a file, connect to the target host through SSH, and then harvest data from Linux command “top” and parse it while sleeping for a second before collecting again. After that, take the most outstanding increase of CPU usage for the mysqld process and then remove the flag file. You can review the code there I have in github.

So let’s discuss again about the graph result, it seems to reveal that MySQL 8.0 consumes a lot of CPU. More than MySQL 5.7. However, it might have to deal with new variables added in MySQL 8.0. For example, these variables might impact your MySQL 8.0 server:

The variables with its values are left by its default values for this benchmark. The first three variables handles CPU for redo logging, which in MySQL 8.0 has been an improvement due to re-designing how InnoDB writes to the REDO log. The variable innodb_log_spin_cpu_pct_hwm has CPU affinity, which means it would ignore other CPU cores if mysqld is pinned only to 4 cores, for instance. For parallel read threads, in MySQL 8.0, it adds a new variable for which you can tune how many threads to used.

However, I did not dig further into the subject. There can be ways that performance can be improved by taking advantage of the features that MySQL 8.0 has to offer.

Conclusion

There are tons of improvements that are present in MySQL 8.0. The benchmark results reveals that there has been an impressive improvement, not only on managing read workloads, but also on a high read/write workload comparing to MySQL 5.7.

Going over to the new features that MySQL 8.0, it looks to be that it has taken advantage of the most up-to-date technologies not only on software (like great improvement for Memcached, Remote Management for better DevOps work, etc.) but also in hardware. Taken for example, the replacement of latin1 with UTF8MB4 as the default character encoding. This would mean that it would require more disk space since UTF8 needs 2-bytes on the non-US-ASCII characters. Although this benchmark did not take advantage of using the new authentication method with caching_sha2_password, it won’t affect performance whether it uses encryption. Once it’s authenticated, it is stored in cache which means authentication is only done once. So if you are using one user for your client, it won’t be a problem and is more secure than the previous versions.

Since MySQL leverages the most up-to-date hardware and software, it changes its default variables. You can read here for more details.

Overall, MySQL 8.0 has dominated MySQL 5.7 efficiently.

by Paul Namuag at January 30, 2019 08:24 AM

MariaDB Foundation

A Word from the Incoming CEO

It is my pleasure and honour to join the MariaDB Foundation as its new CEO from 1 Feb 2019. Following MariaDB closely since its inception, I welcome the opportunity to be focusing completely on adoption, collaboration and free open development of MariaDB Server. I joined MySQL AB in 2001 and worked in various leadership positions, […]

The post A Word from the Incoming CEO appeared first on MariaDB.org.

by Kaj Arnö at January 30, 2019 05:50 AM

January 29, 2019

MariaDB Foundation

MariaDB 10.4.2 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.4.2, the latest beta release in the MariaDB 10.4 series. See the release notes and changelogs for details. Download MariaDB 10.4.2 Release Notes Changelog What is MariaDB 10.4? MariaDB APT and YUM Repository Configuration Generator Contributors to MariaDB 10.4.2 Aleksey Midenkov (Tempesta) Alexander Barkov […]

The post MariaDB 10.4.2 now available appeared first on MariaDB.org.

by Ian Gilfillan at January 29, 2019 07:11 PM

Peter Zaitsev

Percona Server for MySQL 5.6.43-84.3 Is Now Available

Percona Server for MySQL 8.0

Percona Server for MySQL 5.6Percona is glad to announce the release of Percona Server for MySQL 5.6.43-84.3 on January 29, 2019 (Downloads are available here and from the Percona Software Repositories).

This release merges changes of MySQL 5.6.43, including all the bug fixes in it. Percona Server for MySQL 5.6.43-84.3 is now the current GA release in the 5.6 series. All of Percona’s software is open-source and free.

Bugs Fixed

  • A sequence of LOCK TABLES FOR BACKUP and STOP SLAVE SQL_THREAD could cause replication to be blocked and not possible to be restarted normally. Bug fixed #4758 (upstream #93649).
  • http was replaced with https in http://bugs.percona.com in server crash messages. Bug fixed #4855.
  • Wrong query results could be received in semi-join sub queries with materialization-scan that allowed inner tables of different semi-join nests to interleave. Bug fixed #4907 (upstream bug #92809).
  • The audit logs could be corrupted due to an invalid size of the audit log file when audit_log_rotations was changed at runtime. Bug fixed #4950.
  • There was a typo in mysqld_safe.sh: trottling was replaced with throttling. Bug fixed #240. Thanks to Michael Coburn for the patch.

Other bugs fixed: #2477#3535#3568#3672#3673, #4791#4989#5100#5118#5163#5268#5270#5271

This release also contains fixes for the following CVE issues: CVE-2019-2534, CVE-2019-2529, CVE-2019-2482, CVE-2019-2455, CVE-2019-2503, CVE-2018-0734.

Find the release notes for Percona Server for MySQL 5.6.42-84.2 in our online documentation. Report bugs in the Jira bug tracker.

 

by Borys Belinsky at January 29, 2019 06:29 PM

Upcoming Webinar Thurs 1/31: Percona Server for MongoDB 4.0 Feature Walkthrough

Percona Server for MongoDB 4.0 Feature Walkthrough

Percona Server for MongoDB 4.0 Feature WalkthroughPlease join Vinodh Krishnaswamy as he presents his talk, Percona Server for MongoDB 4.0 Feature Walkthrough on January 31st, 2019, at 6:00 AM PST (UTC-8) / 9:00 AM EST (UTC-5).

Register Now

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database. Moreover, it is a fully-compatible, drop-in replacement for MongoDB 4.0 Community Edition. It also supports MongoDB 4.0 protocols and drivers.

Percona Server for MongoDB extends the functionality of the MongoDB 4.0 Community Edition by including the Percona Memory Engine storage engine, encrypted WiredTiger storage engine, audit logging, SASL authentication, hot backups, and enhanced query profiling. Additionally, Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release includes all features of MongoDB 4.0 Community Edition 4.0. Most notable among these are:

– Multi-Document ACID transactions
– Type conversion through the new aggregation operators
– Enhancements to the Change Streams support

In order to learn more, register for the Percona Server for MongoDB 4.0 Feature Walkthrough.

by Vinodh Krishnaswamy at January 29, 2019 03:00 PM

January 28, 2019

Kurt von Finck

So I created a subreddit for interesting G+ refugees. Emphasis on interesting.

So I created a subreddit for interesting G+ refugees. Emphasis on interesting.

Come and play. Be suave. Don’t be a dick.

Edward Morbius

Kee Hinckley

Rugger Ducky

Sarah Lester

Ward A

Tim S

Matthew H

Yoko F Thunders

Dave Thompson

Grumpy Cat

catty _big

Dan Ramos

Di Cleverly

and the many more I know I’ve forgotten because I need an fud. And invite your friends!

https://reddit.com/r/ploos

by mneptok at January 28, 2019 08:48 PM

Peter Zaitsev

Upcoming Webinar Wed 1/30: Percona XtraDB Cluster: Failure Scenarios and their Recovery

Percona XtraDB Cluster: Failure Scenarios and their Recovery

Percona XtraDB Cluster: Failure Scenarios and their RecoveryPlease join Percona’s Senior Technical Manager, Alkin Tezuysal, and Percona’s Percona XtraDB Cluster Lead, Krunal Bauskar as they present their talk, Percona XtraDB Cluster: Failure Scenarios and their Recovery on Wednesday, January 30th, 2019, at 8:00 AM PST (UTC-8) / 11:00 AM EST (UTC-5).

Register Now

Percona XtraDB Cluster (a.k.a PXC) is an open source, multi-master, high availability MySQL clustering solution. PXC works with your MySQL / Percona Server-created database. Given the multi-master aspect, there are multi-guards to protect a cluster from entering an inconsistent state. Most of these guards are configurable based on their user environment. However, if they are not configured properly they could cause the cluster to stall, fail or error-out.

In this session, we’ll discuss failure scenarios, including a MySQL cluster entering a non-primary state due to network partitioning. We’ll also discuss a cluster stall due to flow control, data inconsistency causing the shutdown of a node and common problems during the initial catch up – a.k.a State Snapshot Transfer (SST). Other issues include delays in the purging of a transaction, a blocking DDL causing the entire cluster to stall and a misconfigured cluster.

We will also go over how to solve some of these problems and how to safely recover from these failures.

To learn more, register for Percona XtraDB Cluster: Failure Scenarios and their Recovery.

by Alkin Tezuysal at January 28, 2019 06:36 PM

Percona Server for MongoDB Operator 0.2.0 Early Access Release Is Now Available

Percona Server for MongoDB

Percona Server for MongoDB OperatorPercona announces the availability of the Percona Server for MongoDB Operator 0.2.0 early access release.

The Percona Server for MongoDB Operator simplifies the deployment and management of Percona Server for MongoDB in a Kubernetes or OpenShift environment. It extends the Kubernetes API with a new custom resource for deploying, configuring and managing the application through the whole life cycle.

Note: PerconaLabs is one of the open source GitHub repositories for unofficial scripts and tools created by Percona staff. These handy utilities can help save your time and effort.

Percona software builds located in the Percona-Lab repository are not officially released software, and also aren’t covered by Percona support or services agreements.

You can install the Percona Server for MongoDB Operator on Kubernetes or OpenShift. While the operator does not support all the Percona Server for MongoDB features in this early access release, instructions on how to install and configure it are already available along with the operator source code in our Github repository.

The Percona Server for MongoDB Operator on Percona-Lab is an early access release. Percona doesn’t recommend it for production environments. 

New features

  • Percona Server for MongoDB backups are now supported and can be performed on a schedule or on demand.
  • Percona Server for MongoDB Operator now supports Replica Set Arbiter nodes to reduce disk IO and occupied space if needed.
  • Service per Pod operation mode implemented in this version allows assigning external or internal static IP addresses to the Replica Set nodes.

Improvements

  • CLOUD-76: Several Percona Server for MongoDB clusters can now share one namespace.

Fixed Bugs

  • CLOUD-97: The Replica Set watcher was not stopped automatically after the custom resource deletion.
  • CLOUD-46: When k8s-mongodb-initiator was running on an already-initialized Replica Set, it still attempted to initiate it.
  • CLOUD-45: The operator was temporarily removing MongoDB nodes from the Replica Set during a Pod update without the need.
  • CLOUD-51: It was not possible to set requests without limits in the custom resource configuration.
  • CLOUD-52: It was not possible to set limits without requests in the custom resource configuration.
  • CLOUD-89: The k8s-mongodb-initiator  was exiting with exit-code 1 instead of 0 if the Replica Set initiation has already happened, e.g., when a custom resource was deleted and recreated without deleting PVC data.
  • CLOUD-96: The operator was crashing after a re-create of the custom resource that already had old PVC data, and caused it to skip Replica Set init.

Percona Server for MongoDB is an enhanced, open source and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB Community Edition. It supports MongoDB protocols and drivers. Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine, as well as several enterprise-grade features. It requires no changes to MongoDB applications or code.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

by Dmitriy Kostiuk at January 28, 2019 05:56 PM

Monitor and Optimize Slow Queries with PMM and EverSQL – Part 2

percona_pmm_eversql

EverSQL is a platform that intelligently tunes your SQL queries by providing query optimization recommendations, and feedback on missing indexes. This is the second post of our EverSQL series, if you missed our introductory post take a look there first and then come back to this article.

We’ll use the Stackoverflow data set again as we did in our first post.

Diving into query optimization

We’ll grab the worst performing query in the list from PMM and optimize it. This query builds a list of the top 50 most recent posts which have a score greater than two, and involves joining two large tables – posts and comments. The original runtime of that query is above 20 minutes and causes high load on the server while running.

worst-query-in-PMM

Assuming you have EverSQL’s chrome extension installed, you’ll see a new button in the PMM Query Analytics page, allowing you to send the query and schema structure directly to EverSQL, to retrieve indexing and query optimization recommendations.

eversql recommendations

 

eversql-dashboard1

After implementing EverSQL’s recommendations, the query’s execution duration significantly improved:

improved-query-response-time

Optimization Internals

So what was the actual optimization in this specific case? And why did it work so well? Let’s look at the original query:

SELECT
   p.title
FROM
   so.posts p
       INNER JOIN
   so.comments c ON p.id = c.postid
WHERE
c.score > 2
GROUP BY p.id
ORDER BY p.creationdate DESC
LIMIT 100;

The tables’ structure:

CREATE TABLE `posts` (
  `Id` int(11) NOT NULL,
  `CreationDate` datetime NOT NULL,
  ...
  PRIMARY KEY (`Id`),
  KEY `posts_idx_creationdate` (`CreationDate`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `comments` (
  `Id` int(11) NOT NULL,
  `CreationDate` datetime NOT NULL,
  `PostId` int(11) NOT NULL,
  `Score` int(11) DEFAULT NULL,
  ....
  PRIMARY KEY (`Id`),
  KEY `comments_idx_postid` (`PostId`),
  KEY `comments_idx_postid_score` (`PostId`,`Score`),
  KEY `comments_idx_score` (`Score`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

This query will return the post title of the latest 100 stackoverflow posts, which had at least one popular comment (with a score higher than two). The posts table contains 39,646,923 records, while the comments table contains 64,510,258 records.

This is the execution plan MySQL (v5.7.20) chose:

original-execution-plan

One of the challenges with this query is that the GROUP BY and ORDER BY clauses contain different fields, which prevent MySQL from using an index for the ORDER BY. As MySQL’s documentation states:

“In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it may still use indexes to find the rows that match the WHERE clause. Examples:  … The query has different ORDER BY and GROUP BY expressions.”.

Now let’s look into the optimized query:

SELECT
   p.title
FROM
   so.posts p
WHERE
   EXISTS( SELECT
           1
       FROM
           so.comments c
       WHERE
           p.id = c.postid AND c.score > 2)
ORDER BY p.creationdate DESC
LIMIT 100;

Since the comments table is joined in this query only to check for existence of matching records in the posts table, we can use an EXISTS subquery instead. This will allow us to avoid inflating the results (by using JOIN) and then deflating them (by using GROUP BY), which are costly operations.

Now that the GROUP BY is redundant and removed, the database can optionally choose to use an index for the ORDER BY clause.

The new execution plan MySQL chooses is:

As mentioned above, this transformation reduced the query execution duration from ~20 minutes to 370ms. We hope you enjoyed this post, please let us know your experiences using the integration between PMM Query Analytics and EverSQL!

As mentioned above, this transformation reduced the query execution duration from ~20 minutes to 370ms.

We hope you enjoyed this post, please let us know your experiences using the integration between PMM Query Analytics and EverSQL!

Co-Author: Tomer Shay

Tomer Shay, EverSQL

 

Tomer Shay is the Founder of EverSQL. He loves being where the challenge is. In the last 12 years, he had the privilege to code a lot and lead teams of developers, while focusing on databases and performance. He enjoys using technology to bring ideas into reality, help people and see them smile.

by Michael Coburn at January 28, 2019 04:17 PM

January 26, 2019

Valeriy Kravchuk

Fun with Bugs #78 - On Some Public Bugs Fixed in MySQL 5.7.25

Today I'd like to continue my tradition of ignoring MySQL 8 (after all, I can not even build 8.0.14 any more on my Ubuntu 14.04, it's not supported suddenly because of old gcc version) and, of all MySQL server versions released by Oracle this week, concentrate on bugs reported in public bugs database and fixed in the latest minor release of MySQL 5.7 branch, 5.7.25.

This time there is only one InnoDB community-reported bug fixed, Buig #87423 - "os0file.cc assertion failed 'offset > 0' in os_file_io_complete", from Vasily Nemkov. See also it's duplicate, Bug #88956, by Aidan Diffey. Assertion failure seems to happen with 32-bit binaries running on 64-bit OS (both bugs are reported on Linux) and is related to infamous ulint type usage in the code, but for some reason release notes and closing comment mention Windows:
"An assertion was raised when attempting to write to a tablespace file greater than 4GB in size on a 64-bit Windows system. The failure was due to a narrowing cast."
Usually there are some bugs around flowers. But sometimes we can enjoy flowers without bugs...
There is a long enough list of replication bugs fixed:
  • Bug #92132 - "secure-file-priv breaks LOAD DATA INFILE replication in statement mode on 5.7.23". This regression bug (that had not got "regression" tag!) was reported by Nicolai Plum.
  • Bug #91941 - "Deadlock during purge_logs_before_date". Great bug report from Nikolai Ikhalainen. Note also that some hints by Jean-François Gagné and proper gdb backtrace analysis by Oracle engineers, Shane Bester and Dmitry Lenev, were needed to force proper processing of this bug. See also related Bug #92108 - "Deadlock by concurrent show binlogs, pfs session_variables table & binlog purge", from Shashank Sahni.
  • Bug #91548 - "LOCK_grant and LOCK_open can deadlock on a gtid slave". Great example of Shane Bester's regular work on bugs.
  • Bug #90640 - "`head->variables.gtid_next.type != UNDEFINED_GTID' ". One of those cases when assertion failure in debug build highlights real problem in the code. Great finding by Roel Van de Paar.
  • Bug #87832 - "Relay_Log_Space is inaccurate and leaks". Nice bug report by Manuel Ung. Related variable could be changed concurrently without any locking and get randomly wrong values.
  • Bug #84752 - "Multi-Slave Replication Fail: bogus data in log event". This bug was reported by Gonzalo Miguel Arruti . Eventually it seems internally reported Bug#22252394 - SLAVE I/O THREAD MAY STOP WHEN BINLOG ROTATES highlighted the real problem. Good to have this case fixed!
  • Bug #83003 - "Using temporary tables on slaves increases GTID sequence number". I am so happy to this bug reported by my former colleague Ovais Tariq fixed in 5.7! Patches provided by Laurynas Biveinis from Percona are finally backported. It's so sad to see slaves out of sync with master when GTIDs are used... For some reason this bug is not listed as replication-related and patches contributed are not mentioned in the release notes.
Some other fixes are also interesting to check:
  • Bug #92131  - "ASan: Direct leak of 272 byte(s) in main.mysqlpump_partial_bkp MTR test case". I mentioned this bug reported by Yura Sorokin from Percona in the past. Good to see it fixed. I truly hope Oracle is testing ASan builds regularly with MTR and we'll not see such reports from community any more. For now efforts of Percona engineers help, as we can see also from Bug #90238 - "Comparison of uninitailized memory in log_in_use" with patches from Zsolt Parragi and Laurynas Biveinis.
  • Bug #92049 - "bogus data when ordering results from variables_by_thread". Thanks to Shane Bester from Oracle (who still reports MySQL bugs in public), we have one bug less in otherwise near perfect Performance Schema.
  • Bug #90742 = "SIGHUP cause mysql server crash.". It was reported by Seunguck Lee. Good to see S1 crashing bug not blindly classified as "security" and hidden forever. Note also great analysis there provided by Jean-François Gagné! It would be impossible for any hidden bug to get this kind of useful feedback.
  • Bug #89214 - "The SELECT will deadlock in the stored procedure, if the result set is empty.". It took some time and efforts for bug reporter (two pg) to prove the point, confirm that the problem happens only with prepared statements and force proper processing. This regression bug (MySQL 5.6 was not affected) still does not have "regression" tag.
That's all bugs I wanted to mention today. To summarize:
  1. I'd surely consider immediate upgrade to 5.7.25 in any environment where replication is used.
  2. It seems Oracle engineers who process bugs still have no habit of adding "regression" tag when obviously needed. I'll ask bug reporters to do this themselves during my FOSDEM talk about MySQL bugs.
  3. Percona still helps Oracle to make MySQL better with proper QA efforts and patches.
  4. Most of the bugs mentioned here affected MySQL 8 as well, so from this post you know about many important fixes that happened in MySQL 8.0.14 :) 

by Valeriy Kravchuk (noreply@blogger.com) at January 26, 2019 08:00 AM

January 25, 2019

Peter Zaitsev

Open Source Database Conference CFP Deadline Sunday January 27

open source database conference 2019

open source database conference 2019This year at our Open Source Database Conference we’re celebrating open source database technologies that don’t fit into the MySQL®, MongoDB®, MariaDB®, or PostgreSQL realms by featuring them in their very own track. The glamorously-named Other Open Source Databases track! As unbiased champions of open source database solutions, we embrace all flavors of open source database, and pride ourselves at presenting one of the biggest events dedicated to any and all OSDBs.

Another innovation this year is the introduction of a Java programming for open source databases track. Maybe that would be of interest?

The conference takes place at the end of May in Austin, a fantastic place to visit, and state capital of Texas.

As mentioned in a recent blog post, the Track Steering Committee featuring some very talented technologists is in place, and we are ready to start reviewing submissions.  We already have great content across all topics, but in order to make it even better, we would like to keep on getting new submissions until the very last day 🙂

The call for papers closes this Sunday, January 27, so there is still time for you to send a talk or two for our review. In the open source databases track, we have the following list in mind for potential good topics:

  • Apache Hive
  • Cassandra
  • ClickHouse
  • CockroachDB
  • Consul
  • Elasticsearch
  • FoundationDB
  • InfluxDB
  • Kafka
  • Neo4j
  • Prometheus
  • Redis
  • ScyllaDB
  • Solr
  • SQLite
  • TiDB
  • Timescale

This list is not exhaustive, so if you can think of any others let me know—and go ahead and submit away, please!

Of course, MySQL, MariaDB, MongoDB and PostgreSQL all have their own tracks, too, so if you have any interesting talks for those, please don’t hesitate to send them in for review by their respective track steering committees.

Did we not mention? Yes, it’s Percona Live!

by Agustín at January 25, 2019 11:10 AM

Chris Calender

Viewing the Originating Host or IP Address with MaxScale’s Proxy Protocol

If you use MaxScale to route queries from various servers to some MariaDB server(s), when viewing the processlist on the MariaDB server, you will see MaxScale’s host for any “host” information related to that connection or its queries.

When tracking down problematic queries, it can be helpful to know what originating host of that query.

MaxScale’s proxy protocol to the rescue.

The proxy protocol was introduced in MaxScale 2.2 and MariaDB 10.3.

To enable it, it is quite simple (essentially just 2 changes).

1. In MariaDB, you need to set the variable proxy_protocol_networks in your my.cnf file (you can specify comma-separated IP addresses and/or subnetworks, as well as localhost and ::1):

proxy-protocol-networks=::1, 192.168.0.0/16, localhost

This one, which I will use as-is, is the example from the manual. It allows IPv6 connections from local machine (::1), connections from IP addresses starting with 192.168., and connections made with Unix domain sockets or named pipes.

It can also be set dynamically:

SET GLOBAL proxy_protocol_networks='::1, 192.168.0.0/16, localhost';

2. In your MaxScale configuration file, maxscale.cnf (default /etc/maxscale.cnf), you need to add the following option under each of the [server%] sections you will want to have this functionality:

proxy_protocol=true

Once those servers are restarted and proxy_protocol is in effect, you should start seeing the originating hosts.

Here is my full testing, verification, and some troubleshooting information:

First off, here is my setup:

Server1:  192.168.1.183:3306 (chris-desktop, MariaDB server)
MaxScale: 192.168.1.183:4006 (chris-desktop, MaxScale Listener)
Remote:   192.168.1.160      (chris-laptop, remote client)

Direct on MariaDB running on host 192.168.1.183 port 3306 (i.e., “Server1”):

MariaDB> create user 'mytest'@'192.168.1.160' IDENTIFIED BY 'xxxx';
MariaDB> set global proxy_protocol_networks='::1, 192.168.0.0/16, localhost';
MariaDB> select @@proxy_protocol_networks;
+--------------------------------+
| @@proxy_protocol_networks      |
+--------------------------------+
| ::1, 192.168.0.0/16, localhost |
+--------------------------------+

So I created a new user, set proxy_protocol_networks, and verified it was set correctly (note: be sure you add the setting to the my.cnf file if you opt to set this dynamically).

Then I added the following under the [server1] section of my maxscale.cnf file (again, do this for each server you want this enabled on):

proxy_protocol=true

I restarted MaxScale:

chris@chris-desktop:~$ sudo /etc/init.d/maxscale restart
* Stopping MaxScale [ OK ]
* Starting MaxScale * maxscale is running
[ OK ]

Then double-check the setting took effect:

sudo maxctrl
maxctrl: show server server1
+--------------------------------------------------------------+
¦ Server ¦ server1                                             ¦
+------------------+-------------------------------------------¦
¦ Address ¦ 127.0.0.1                                          ¦
+------------------+-------------------------------------------¦
¦ Port ¦ 3306                                                  ¦
+------------------+-------------------------------------------¦
¦ State ¦ Master, Slave of External Server, Running            ¦
+------------------+-------------------------------------------¦
¦ Last Event ¦ server_up                                       ¦
+------------------+-------------------------------------------¦
¦ Triggered At ¦ Thu, 17 Jan 2019 18:33:41 GMT                 ¦
+------------------+-------------------------------------------¦
¦ Services ¦ Read-Only-Service                                 ¦
¦ ¦ Read-Write-Service                                         ¦
+------------------+-------------------------------------------¦
¦ Monitors ¦ MySQL-Monitor                                     ¦
+------------------+-------------------------------------------¦
¦ Master ID ¦ 0                                                ¦
+------------------+-------------------------------------------¦
¦ Node ID ¦ 1                                                  ¦
+------------------+-------------------------------------------¦
¦ Slave Server IDs ¦                                           ¦
+------------------+-------------------------------------------¦
¦ Statistics ¦ {                                               ¦
¦ ¦ "connections": 0,                                          ¦
¦ ¦ "total_connections": 0,                                    ¦
¦ ¦ "persistent_connections": 0,                               ¦
¦ ¦ "active_operations": 0,                                    ¦
¦ ¦ "routed_packets": 0,                                       ¦
¦ ¦ "adaptive_avg_select_time": "0ns"                          ¦
¦ ¦ }                                                          ¦
+------------------+-------------------------------------------¦
¦ Parameters ¦ {                                               ¦
¦ ¦ "address": "127.0.0.1",                                    ¦
¦ ¦ "protocol": "MySQLBackend",                                ¦
¦ ¦ "port": 3306,                                              ¦
¦ ¦ "extra_port": 0,                                           ¦
¦ ¦ "authenticator": null,                                     ¦
¦ ¦ "monitoruser": null,                                       ¦
¦ ¦ "monitorpw": null,                                         ¦
¦ ¦ "persistpoolmax": 0,                                       ¦
¦ ¦ "persistmaxtime": 0,                                       ¦
¦ ¦ "proxy_protocol": true,                                    ¦
¦ ¦ "ssl": "false",                                            ¦
¦ ¦ "ssl_cert": null,                                          ¦
¦ ¦ "ssl_key": null,                                           ¦
¦ ¦ "ssl_ca_cert": null,                                       ¦
¦ ¦ "ssl_version": "MAX",                                      ¦
¦ ¦ "ssl_cert_verify_depth": 9,                                ¦
¦ ¦ "ssl_verify_peer_certificate": true,                       ¦
¦ ¦ "disk_space_threshold": null,                              ¦
¦ ¦ "type": "server"                                           ¦
¦ ¦ }                                                          ¦
+--------------------------------------------------------------+

In the above, we see that “proxy_protocol” == “true” under the “Parameters” section.

Now, all that is left is to connect from the remote host (chris-laptop, 192.168.1.160) to the Read-Write-Listener (i.e., host 192.168.1.183, port 4006):

mysql -umytest -pxxxx -h192.168.1.183 -P4006
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.5.5-10.3.12-MariaDB-1:10.3.12+maria~trusty-log mariadb.org binary distribution

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select version();
+--------------------------------------------+
| version()                                  |
+--------------------------------------------+
| 10.3.12-MariaDB-1:10.3.12+maria~trusty-log |
+--------------------------------------------+

mysql> select user(), current_user();
+---------------------+----------------------------+
| user()              | current_user()             |
+---------------------+----------------------------+
| mytest@chris-laptop | mytest@192.168.1.160       |
+---------------------+----------------------------+

Thus my connection works, and I can issue queries. Also, select user() and current_user() report what I would expect.

Now back on Server1, which is where you want to “view” the originating host:

MariaDB> show processlist;
+----+--------+---------------------+------+---------+------+...
| Id | User   | Host                | db   | Command | Time |...
+----+--------+---------------------+------+---------+------+...
...
| 44 | root   | localhost           | NULL | Query   | 1    |...
| 71 | root   | chris-desktop:41256 | NULL | Sleep   | 0    |...
| 88 | mytest | chris-laptop:53983  | NULL | Sleep   | 34   |...
+----+--------+---------------------+------+---------+------+...

In the above, both connections 44 and 71 are from the localhost. 44 uses the socket, thus it reports localhost. 71 uses TCP thus it reports the machine’s hostname (chris-desktop).

And connection 88 is the connection from the remote host to MaxScale listener on port 4006, and we see it correctly reports the originating host (chris-laptop), which now differs from MaxScale’s hostname (chris-desktop).

Voila. 🙂

Now, just to push things a bit, I tried various combinations which I thought people might encounter, so hopefully this helps resolve those cases quickly.

For instance, if I change proxy_protocol_networks to ‘::1, localhost’ (i.e., I just removed the 192.168 subnet from proxy_protocol_networks) and re-attempt the connection, it appears to “connect”, until I issue a command:

mysql -umytest -pxxxx -h192.168.1.183 -P4006

See it appears to connect:

Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.5.5-10.3.12-MariaDB-1:10.3.12+maria~trusty-log

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>

However, the moment I try to issue a query, I see this (note the server is still up and running fine, so no need to worry about that):

mysql> select user(), current_user();
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 8
Current database: *** NONE ***

ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 9
Current database: *** NONE ***

ERROR 2006 (HY000): MySQL server has gone away

If I try a non-existent user, say mytest1@192.168.1.160, it reports an error straight away:

mysql -umytest1 -pxxxx -h192.168.1.183 -P4006
ERROR 1045 (28000): Access denied for user 'mytest1'@'::ffff:192.168.1.160' (using password: YES)

If I reset proxy_protocol_networks back to “::1, 192.168.0.0/16, localhost”, but then set proxy_protocol=false, restart MaxScale, and connect to port 4006, we see it connects and reports the MaxScale address instead of the originating address (as it should):

MariaDB> set global proxy_protocol_networks='::1, 192.168.0.0/16, localhost';
MariaDB> select @@proxy_protocol_networks;
+--------------------------------+
| @@proxy_protocol_networks      |
+--------------------------------+
| ::1, 192.168.0.0/16, localhost |
+--------------------------------+

The “maxctrl: show server server1” command reports “…”proxy_protocol”: false,…”.

Now SHOW PROCESSLIST only shows the MaxScale host (which is expected since we set proxy_protocol=false):

| 96  | root   | chris-desktop:53796 | NULL | Sleep |...
| 113 | mytest | chris-desktop:34444 | NULL | Sleep |...

If interested in further reading on this topic, I would recommend the following:

https://mariadb.com/kb/en/library/proxy-protocol-support/
https://mariadb.com/kb/en/mariadb-maxscale-23-mariadb-maxscale-configuration-usage-scenarios/#proxy_protocol
https://mariadb.com/kb/en/library/server-system-variables/#proxy_protocol_networks
https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt

I hope this helps.

by chris at January 25, 2019 12:05 AM

January 24, 2019

Peter Zaitsev

A Quick Look into TiDB Performance on a Single Server

TiDB MySQL plot

TiDB is an open-source distributed database developed by PingCAP. This is a very interesting project as it is can be used as a MySQL drop-in replacement: it implements MySQL protocol, and basically emulates MySQL. PingCAP defines TiDB is as a “one-stop data warehouse for both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing) workloads”. In this blog post I have decided to see how TiDB performs on a single server compared to MySQL for both OLTP and OLAP workload. Please note, this benchmark is very limited in scope: we are only testing TiDB and MySQL on a single server – TiDB is a distributed database out of the box.

Short version: TiDB supports parallel query execution for selects and can utilize many more CPU cores – MySQL is limited to a single CPU core for a single select query. For the higher-end hardware – ec2 instances in my case – TiDB can be 3-4 times faster for complex select queries (OLAP workload) which do not use, or benefit from, indexes. At the same time point selects and writes, especially inserts, can be 5x-10x slower. Again, please note that this test was on a single server, with a single TiKV process.

Installation

Please note: the following setup is only intended for testing and not for production. 

I installed the latest version of TiDB to take advantage of the latest performance improvements, at the time of writing:

cat make-full-tidb-server
#!/bin/bash
set -x
cd /tidb
wget http://download.pingcap.org/tidb-v2.1.2-linux-amd64.tar.gz
tar -xzf tidb-*.tar.gz
cd tidb-*-linux-amd64/
./bin/pd-server  --data-dir=pd --log-file=pd.log &
sleep 5
./bin/tikv-server --pd="127.0.0.1:2379" --data-dir=tikv -A 127.0.0.1:20165 --log-file=tikv.log &
sleep 5
cd ~/go/src/github.com/pingcap/tidb
make server
./bin/tidb-server --store=tikv --path="127.0.0.1:2379"

The normal installation process is described here (different methods are available).

Benchmarks

The main purpose of this test is to compare MySQL to TiDB. As with any distributed database it is hard to design an “apples to apples” comparison: we may compare a distributed workload spanning across many servers/nodes (in this case TiDB) to a single server workload (in this case MySQL). To overcome this challenge, I decided to focus on “efficiency”. If the distributed database is not efficient – i.e. it may require 10s or 100s of nodes to do the same job as the non-distributed database – it may be cost prohibitive to use such database for a small or medium size DB.

The preliminary results are: TiDB is much more efficient for SELECT (OLAP workload) but much less efficient for WRITES and typical OLTP workload. To overcome these limitations it is possible to use more servers.

For this test I was using two types of benchmarks:

  1. OLAP: a set of complex queries on top of an “ontime” database (airline historical flight information database). For this benchmark I used different AWS ec2 instances with CPU cores ranging from 2 to 96. This is response time test (not a throughput test)
  2. OLTP: sysbench (as always): point-select and write-only standard workloads. This is throughput test, increasing the number of threads.

OLAP / analytical queries test

Database size is 70Gb in MySQL and 30Gb in TiDB (compressed). The table has no secondary indexes (except the primary key).

I used the following four queries:

  1. Simple count(*): select count(*) from ontime;
  2. Simple group by: select count(*), year from ontime group by year order by year;
  3. Complex filter for a full table scan: select * from ontime where UniqueCarrier = 'DL' and TailNum = 'N317NB' and FlightNum = '2' and Origin = 'JFK' and Dest = 'FLL' limit 10;
  4. Complex group by and order by query:
    select SQL_CALC_FOUND_ROWS
    FlightDate, UniqueCarrier as carrier,
    FlightNum,
    Origin,
    Dest
    FROM ontime
    WHERE
    DestState not in ('AK', 'HI', 'PR', 'VI')
    and OriginState not in ('AK', 'HI', 'PR', 'VI')
    and flightdate > '2015-01-01'
    and ArrDelay < 15
    and cancelled = 0 and Diverted = 0
    and DivAirportLandings = '0'
    ORDER by DepDelay DESC
    LIMIT 10;

I used five ec2 instances:

  • t2.medium: 2 CPU cores
  • x1e.xlarge: 4 CPU cores
  • r4.4xlarge: 16 CPU cores
  • m4.16xlarge: 64 CPU cores
  • m5.24xlarge: 96 CPU cores

The following graph represents the results (bars represents the query response time, the smaller the better):

As we can see, TiDB scales very well increasing the number of CPU cores, as we go from lower to higher end instances. t2.medium and x1e.xlarge is interesting here thou:

  1. t2.medium has 2 CPU cores and not enough RAM (2Gb) to store database in memory. Both MySQL/InnoDB and TiDB/TiKV performs a lots of disk reads – this is disk bound workload
  2. x1e.xlarge is an example of the opposite instance type: 4 CPU core and  122GB RAM, I’m using memory bound workload here (where both MySQL and TiDB data is cached).

All other instances have enough RAM to cache the database in memory, and with more CPU TiDB can take advantages of query parallelism and provide better response time.

Sysbench test

Select test

I used point select (meaning select one row by primary key, threads ranges from 1 to 128) with Sysbench on an m4.16xlarge instance (memory bound: no disk reads). The results are here.  The bars represent the number of transactions per second, the more the better:

This workload is actually gives a great advantage to MySQL/InnoDB as it retrieves a single row based on the primary key. MySQL is significantly faster here: 5x to 10x faster. Unlike the previous workload – 1 single slow query – for “point select” queries MySQL scales much better than TiDB with more CPU cores.

Write only test

I have used a write-only sysbench workload as well with threads ranging from 1 to 128. The instance has enough memory to cache full datast. Here are the results:

Here we can see that TiDB is also significatly slower than MySQL (for an in-memory workload).

Limitations of the write only test
Running TiDB as a single server is not a recommended (or documented) configuration, so some optimizations for this case may be missing. To create a production level test, we would need to compare TiDB to MySQL with the binlog enabled + some sort of synchronous/virtually synchronous or semi-sync replication  (e.g. Percona XtraDB Cluster, group replication or semi-sync replication).  Both of these changes are known to decrease the write-throughput of MySQL considerably. Some tuning may be done to reduce the effects of that.
Some of the performance characteristics here are also derived from TiDB using RocksDB. The performance of InnoDB should be higher for an in-memory insert, with an LSM tree performing better for data sets that no longer fit in memory.

Conclusion

TiDB scales very well for OLAP / analytical queries (typically complex queries not able to take advantages of indexes) – this is the area where MySQL performs much worse as it does not take advantage of multiple CPU cores. At the same time, there is always a price to pay: TiDB has worse “efficiency” for fast queries (i.e. select by primary key) and writes. TiDB can scale across multiple servers (nodes). However, if we need to archive the same level of write efficiency as MySQL we will have to setup tens of nodes. In my opinion, TiDB can be a great fit for an analytical workload when you need almost full compatibility with MySQL: syntax compatibility, inserts/updates, etc.

by Alexander Rubin at January 24, 2019 03:18 PM

January 23, 2019

Kurt von Finck

Azazel Hanzaki

Azazel Hanzaki

Frederica Mussolini

Best track on the album.

Disagree? Fine. I’ll have corn and peanuts for dinner so my feces is ready for your consumption.

https://www.youtube.com/watch?v=n1AeYUz-F_I

by mneptok at January 23, 2019 09:47 PM

Peter Zaitsev

MySQL 8.0.14: A Road to Parallel Query Execution is Wide Open!

road to MySQL parallel query

road to MySQL parallel queryFor a very long time – since when multiple CPU cores were commonly available – I dreamed about MySQL having the ability to execute queries in parallel. This feature was lacking from MySQL, and I wrote a lots of posts on how to emulate parallel queries in MySQL using different methods: from simple parallel bash script to using Apache Spark to using ClickHouse together with MySQL. I have watched parallelism coming to PostgreSQL, to new databases like TiDB, to Amazon Aurora… And finally: MySQL 8.0.14 has (for now limited) an ability to perform parallel query execution. At the time of writing it is limited to select count(*) from table queries as well as check table queries.

MySQL 8.0.14 contains this in the release notes: “As of MySQL 8.0.14, InnoDB supports parallel clustered index reads, which can improve CHECK TABLE performance.” Actually parallel clustered index reads also works for simple count(*) (without a “where” condition). You can control the parallel threads with the innodb_parallel_read_threads parameter.

Here is the simple test (machine has 32 cpu cores):

mysql> set local innodb_parallel_read_threads=1;
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from ontime;
+-----------+
| count(*)  |
+-----------+
| 177920306 |
+-----------+
1 row in set (2 min 33.93 sec)
mysql> set local innodb_parallel_read_threads=DEFAULT; -- 4 is default
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from ontime;
+-----------+
| count(*)  |
+-----------+
| 177920306 |
+-----------+
1 row in set (21.85 sec)
mysql> set local innodb_parallel_read_threads=32;
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from ontime;
+-----------+
| count(*)  |
+-----------+
| 177920306 |
+-----------+
1 row in set (5.35 sec)

The following graph shows CPU utilization during the execution with 4 threads and 32 threads:

Unfortunately it only works for count(*) from table without a “where” condition.

Conclusion: although this feature is currently limited it is a great start for MySQL and opens a road to real parallel query executions.


Photo by Vidar Nordli-Mathisen on Unsplash

by Alexander Rubin at January 23, 2019 06:09 PM

MongoDB Replica Set Scenarios and Internals – Part II (Elections)

mongodb node election to primary

In this blog post, we will walk through the internals of the election process in MongoDB®, following on from a previous post on the internals of the replica set. You can read Part 1 here.

For this post, I am refer to the same configurations we discussed before.

Elections: As the term suggests, in MongoDB there is a freedom to “vote”: individual nodes of the cluster can vote and select their primary member for that replica set cluster.

Why Elections? MongoDB maintains high availability through this process.

When do elections take place?

  1. When the node does not found a primary node within the election timeout limit. By default this value is 10s, and from MongoDB version 3.2 this can be changed according to your needs.  The parameter to set this value is
    settings.electionTimeoutMillis
     
     and can be seen in the logs as:

settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 60000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5ba8ed10d4fddccfedeb7492') } }

From the mongo shell, the value for the

electionTimeoutMillis
  can be found in replica set configuration as:

rplint:SECONDARY> rs.conf()
{
	"_id" : "rplint",
	"version" : 3,
	"protocolVersion" : NumberLong(1),
	"members" : [
		{
			"_id" : 0,
			"host" : "m103:25001",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 1,
			"tags" : {
			},
			"slaveDelay" : NumberLong(0),
			"votes" : 1
		},
		{
			"_id" : 1,
			"host" : "192.168.103.100:25002",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 1,
			"tags" : {
			},
			"slaveDelay" : NumberLong(0),
			"votes" : 1
		},
		{
			"_id" : 2,
			"host" : "192.168.103.100:25003",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 1,
			"tags" : {
			},
			"slaveDelay" : NumberLong(0),
			"votes" : 1
		}
	],
	"settings" : {
		"chainingAllowed" : true,
		"heartbeatIntervalMillis" : 2000,
		"heartbeatTimeoutSecs" : 10,
		"electionTimeoutMillis" : 10000,
		"catchUpTimeoutMillis" : 60000,
		"getLastErrorModes" : {
		},
		"getLastErrorDefaults" : {
			"w" : 1,
			"wtimeout" : 0
		},
		"replicaSetId" : ObjectId("5c20ff87272eff3a5e28573f")
	}
}

More precisely the value for

electionTimeoutMillis
  can be found at:

rplint:SECONDARY> rs.conf().settings.electionTimeoutMillis
10000

2.  If the priority of the existing primary node is being taken over by another node. For example, during planned maintenance using replica set configuration settings. The priority of the member node can be changed as explained here

The priority of all three members can be seen from the replica set configuration like this:

rplint:SECONDARY> rs.conf().members[0].priority
1
rplint:SECONDARY>
rplint:SECONDARY>
rplint:SECONDARY> rs.conf().members[2].priority
1
rplint:SECONDARY> rs.conf().members[1].priority
1

How do elections work in a MongoDB replica set cluster?

Before real elections, the node runs a dry election. Dry election? Yes, the node first runs dry elections, and if the node wins a dry election, then an actual election begins. Here’s how:

  1. Candidate node asks every node if another node would vote for it through
    replSetRequestVotes
     , without increasing the term itself.
  2. Primary node steps down if it finds a candidate node term higher than itself. Otherwise the dry election fails, and the replica set continues to run as is did before.
  3. If the dry election succeeds, then an actual election begins.
  4. For the real election, the node increments its term and then votes for itself.
  5. VoterRequester sends
    replSetRequestVotes
     command through ScatterGatherRunner and then each node responds back with their vote.
  6. The candidate that receives votes from the most nodes wins the election.
  7. Once the candidate wins, it transits to primary node. Through heartbeats it sends a notification to all other nodes.
  8. Then the candidate node checks if it needs to catch up from the former primary node.
  9. The node that receives the 
    replSetRequestVotes
     command checks its own term and then votes, but only after ReplicationCoordinator receives confirmation from TopologyCoordinator
  10. The TopologyCoordinator grants the vote after following considerations:
    1. Config version must be matched,
    2. Replica set name must be matched
    3. An arbiter voter must not see any healthy primary of greater or equal priority.

An example

A primary (port:25002) Transition to secondary after receiving the

rs.stepDown()
  command.

2019-01-03T03:05:29.972+0000 I COMMAND  [conn124] Attempting to step down in response to replSetStepDown command
2019-01-03T03:05:29.976+0000 I REPL     [conn124] transition to SECONDARY
driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.15" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "14.04" } }
2019-01-03T03:05:40.874+0000 I REPL     [ReplicationExecutor] Member m103:25001 is now in state PRIMARY
2019-01-03T03:05:41.459+0000 I REPL     [rsBackgroundSync] sync source candidate: m103:25001
2019-01-03T03:05:41.459+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Connecting to m103:25001
2019-01-03T03:05:41.460+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Successfully connected to m103:25001, took 1ms (1 connections now open to m103:25001)
2019-01-03T03:05:41.461+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Connecting to m103:25001
2019-01-03T03:05:41.462+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Successfully connected to m103:25001, took 1ms (2 connections now open to m103:25001)

Dry election at candidate node (port:25001) and success: no primary found.

2019-01-03T03:05:31.498+0000 I REPL     [rsBackgroundSync] could not find member to sync from
2019-01-03T03:05:36.493+0000 I REPL     [SyncSourceFeedback] SyncSourceFeedback error sending update to 192.168.103.100:25002: InvalidSyncSource: Sync source was cleared. Was 192.168.103.100:25002
2019-01-03T03:05:39.390+0000 I REPL     [ReplicationExecutor] Starting an election, since we've seen no PRIMARY in the past 10000ms
2019-01-03T03:05:39.390+0000 I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected. current term: 35
2019-01-03T03:05:39.391+0000 I REPL     [ReplicationExecutor] VoteRequester(term 35 dry run) received a yes vote from 192.168.103.100:25002; response message: { term: 35, voteGranted: true, reason: "", ok: 1.0 }

Dry election succeeds and increments term by 1 (here the term was 35 and is incremented to 36). It transitions to primary and enters catchup mode.

2019-01-03T03:05:39.391+0000 I REPL [ReplicationExecutor] dry election run succeeded, running for election in term 36
2019-01-03T03:05:39.394+0000 I REPL [ReplicationExecutor] VoteRequester(term 36) received a yes vote from 192.168.103.100:25003; response message: { term: 36, voteGranted: true, reason: "", ok: 1.0 }
2019-01-03T03:05:39.395+0000 I REPL [ReplicationExecutor] election succeeded, assuming primary role in term 36
2019-01-03T03:05:39.395+0000 I REPL [ReplicationExecutor] transition to PRIMARY
2019-01-03T03:05:39.395+0000 I REPL [ReplicationExecutor] Entering primary catch-up mode.

Other nodes also receive information about the new primary.

2019-01-03T03:05:31.498+0000 I REPL [rsBackgroundSync] could not find member to sync from
2019-01-03T03:05:36.493+0000 I REPL [SyncSourceFeedback] SyncSourceFeedback error sending update to 192.168.103.100:25002: InvalidSyncSource: Sync source was cleared. Was 192.168.103.100:25002
2019-01-03T03:05:41.499+0000 I REPL [ReplicationExecutor] Member m103:25001 is now in state PRIMARY

This is how MongoDB is able to maintain high availability by electing primary node from the replica set clusters in the case of existing primary node failures.


Photo by Daria Shevtsova from Pexels

by Aayushi Mangal at January 23, 2019 01:22 PM

Upcoming Webinar Thurs 1/24: Databases Gone Serverless?

Databases Gone Serverless Webinar

Databases Gone Serverless WebinarPlease join Percona’s Senior Technical Manager, Alkin Tezuysal, as he presents Databases Gone Serverless? on Thursday, January 24th, at 6:00 AM PDT (UTC-7) / 9:00 AM EDT (UTC-4).

Register Now

Serverless computing is becoming more popular with developers. For instance, it enables them to build and run applications without needing to operate and manage servers. This talk will provide a high-level overview of serverless applications in the database world, including the use cases, possible solutions, services and benefits provided through the cloud ecosystem. In particular, we will focus on the capabilities of the AWS serverless platform.

In order to learn more, register for this webinar on Databases Gone Serverless.

by Alkin Tezuysal at January 23, 2019 01:31 AM

January 22, 2019

Peter Zaitsev

Monitor and Optimize Slow Queries with PMM and EverSQL – Part One

PMM EverSQL optimization integration

A common challenge with continuously deployed applications is that new and modified SQL queries are constantly being introduced to the application. Many companies choose to use a database monitoring system (such as PMM) to identify those slow queries. But identifying slow queries is only the start – what about actually optimizing them?

In this post we’ll demonstrate a new way to both identify and optimize slow queries, by utilizing the recent integration of Percona Monitoring and Management with EverSQL Query Optimizer via Chrome browser extension. This integration allows you to identify slow queries using PMM, and optimize them automatically using EverSQL Query Optimizer.

Optimizing queries with PMM & EverSQL

We’re using PMM to monitor our MySQL instance, which was pre-loaded with the publicly available StackOverflow dataset. PMM is configured to monitor for slow queries from MySQL’s slow log file.

monitor slow queries dashboard on PMM

We’ll begin with a basic example of how EverSQL can provide value for  a simple SELECT statement. In a follow-up blog post we’ll go through a more sophisticated multi-table query to show how response time can be reduced from 20 minutes to milliseconds(!) using EverSQL.

Let’s have a look at one of the slow queries identified by PMM:

PMM EverSQL optimization integration

In this example, the table posts contains two indexes by default (in addition to the primary key). One that contains the column AnswerCount, and the other contains the column CreationDate.

CREATE TABLE `posts` (
 `Id` int(11) NOT NULL,
 `AcceptedAnswerId` int(11) DEFAULT NULL,
 `AnswerCount` int(11) DEFAULT NULL,
 `ClosedDate` datetime DEFAULT NULL,
 ….
 `CreationDate` datetime NOT NULL,
  ….
 `ViewCount` int(11) NOT NULL,
 PRIMARY KEY (`Id`),
 KEY `posts_idx_answercount` (`AnswerCount`),
 KEY `posts_idx_creationdate` (`CreationDate`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

As you can see below, EverSQL identifies that a composite index which contains both columns can be more beneficial in this case, and recommends to add an index for posts(AnswerCount, CreationDate).

EverSQL optimization report

After using pt-online-schema-change to apply the schema modification, using PMM we are able to observe that the query execution duration changed from 3m 40s to 83 milliseconds!

execution time improvement with EverSQL

 

Note that this Extension is available for Chrome from the chrome web store:

EverSQL for Database Monitoring Applications

Summary

If you’re looking for an easy way to both monitor for slow queries and quickly optimize them, consider deploying Percona Monitoring and Management and then integrating it with EverSQL’s Chrome extension!

Co-Author: Tomer Shay

Tomer Shay, EverSQL

 

Tomer Shay is the Founder of EverSQL. He loves being where the challenge is. In the last 12 years, he had the privilege to code a lot and lead teams of developers, while focusing on databases and performance. He enjoys using technology to bring ideas into reality, help people and see them smile.

 

by Michael Coburn at January 22, 2019 04:01 PM

January 21, 2019

Peter Zaitsev

Percona Live 2019: Committee Announced and a Short Extension

Percona Live 2019

Percona Live 2019We had a great response to our call for papers, thank you! However great we believe having the conference in Austin, TX will be, moving to a new area of the US is still a leap of faith until we see the numbers. So we’re thankful that so many of you continue to support and take part in our Open Source Database Conference. We have great plans for the event going forward, and look forward to sharing them with you.

In fact, one new initiative is already underway. Instead of having one relatively small committee that reviews all of the papers for all tracks, we are taking a different approach this year. Each track is led by a Percona engineer or manager in the role of “Track Champion”, and they each are establishing a Track Steering Committee to review the submissions for that one track.

There are many benefits to this approach. First of all, it means that folk with in-depth knowledge of a technology review a smaller number of papers, and can focus on that subject. It also means that the committee members can enter more meaningful dialogue with the Track Champion about the ‘shape’ of the track at the conference, to make sure that the content is of high quality and offers innovative insights… or, at least, that it addresses the issues that are most important to users of the technology.

You may know, too, that at Percona we are committed to being unbiased champions of open source database solutions. By widening the team that influences the nature of the conference, we believe we are better supporting this mission.

So who’s on the committee? You can read all about them on the conference website. Thank you, again, to all that have volunteered time and experience to this effort.

A short extension

So now you know about the changes to the committee structure. You’ve read about the people who are going to read your abstracts. And you’ve realized that the conference is going to be in an exciting new city that you’ve not visited before.

You’re kicking yourself that you didn’t manage to finish that submission, right? Life gets in the way, and wouldn’t you know it, yesterday was the NFL Conference Championship games etc etc (that’s the semi-final to those of us outside North America…)

Well, we have some great news…we’re going to give you one more week. Yes, that’s right, the deadline is extended to Sunday, January 27. Don’t miss out again, this is the last chance for your submission to be guaranteed consideration for the Percona Live Open Source Database Conference 2019. It’s going to be a great event!

And if you’ve still no plans to submit, but are keen to attend?

Mark May 28-30, 2019 in your diary and subscribe to this blog to be amongst the first to hear when ticket sales are launched.

Don’t forget, the earlier you get your tickets, the more $’s you save.

by Lorraine Pocklington, Community Manager at January 21, 2019 04:24 PM

January 20, 2019

Valeriy Kravchuk

Fun with Bugs #77 - On MySQL Bug Reports I am Subscribed to, Part XIV

Slides for my talk about MySQL bugs at FOSDEM 2019 MySQL, MariaDB and Friends Devroon are ready, support customers decided not to break anything badly on weekend, so I have some free time for blogging.  As usual, when I do not have any better idea or useful recent real life experience to share I write about MySQL bugs.

Today I'd like to continue my review of interesting MySQL bug reports added by Community members in December, 2018. I'll review them starting from the oldest:
  • Bug #93701 - "Assertion `maybe_null' failed |Item_func_concat::val_str(String*)". It's debug assertion, not a big deal (S6 bug), so why should anyone care about this report from Ramesh Sivaraman? Because last time same assertion failure ended up as serious enough Bug #83115 fixed in 8.0.11.
  • Bug #93708 - "Page Cleaner will sleep for long time if clock changes". In this bug report from Marcelo Altmann I was mostly impressed by arguing started by Oracle engineer on bug severity (documentation request, feature request vs real bug). IMHO it was a waste of time.
  • Bug #93728 - "mysqld crash after using alter table with SPATIAL key". Nice finding by Rui Xu. Unfortunately MariaDB 10.3.7 is also affected. At least in the error log I also see this error message for each SPATIAL index of the table:
    2019-01-20 19:26:15 9 [ERROR] InnoDB: Record in index `idx2` of table `test`.`tab` was not found on update: TUPLE (info_bits=0, 2 fields): {[32]      $@      $@
          $@      $@(0x0000000000000400000000000000040000000000000004000000000000000
    400),[4]    (0x00000001)} at: COMPACT RECORD(info_bits=0, 1 fields): {[8]infimum
     (0x090E06090D050D00)}
  • Bug #93734 - "MySQL 8.0 is 36 times slower than MySQL 5.7". I like everything about this bug reported by Vadim Tkachenko, CTO of Percona. I like synopsis that sounds cool, missing exact MySQL 8.0.x version (but I suspect 8.0.13), impact of innodb_flush_log_at_trx_commit, suggested fix and the fact that the bug is still "Open" and nobody from Oracle paid any attention to the report from CTO of one of key Oracle partners and supporters of MySQL 8. Wonderful!
  • Bug #93737 - "mysqlbinlog mermory used grows unstoped". I had not tried to reproduce it (and nobody had it seems, as the bug is still "Open"), but this report by Chandler Bong sounded interesting. Let's see what it may end up with...
  • Bug #93746 - "MySQL Crashes with deadlock at Mutex DICT_SYS created dict0dict.cc:1172". Long semaphore waits involving DICT_SYS mutex and leading to watchdog thread killing MySQL are common. Bug processing is still in progress, but I subscribed to this bug by Anton Ravich mostly to follow another bug he refers to as a possibly related, Bug #80919. That bug is declared a duplicate of some bug in Oracle's internal bugs database, without bug number mentioned. This is not the first time I see such am artistic way to process community bugs (that I consider totally wrong and unacceptable).
  • Bug #93760 - "InnoDB got assert failure while figuring out space id at startup". Fungo Wang found this crash while running MTR test case innodb.innodb_redo_debug_1 with debug binaries. Now I wonder if Oracle QA runs MTR tests with debug binaries at all and do they care to check failures (or rely on Percona for this activity)? Any comments?
  • Bug #93761 - "optimize table cause the slave MTS deadlock!". Sounds interesting and serious, but nobody cared to check this bug report. It's "Open" without comments for 3 weeks already.
  • Bug #93767 - "INSTALL COMPONENT fails - handled segfault during installation nothing in log". I am surprised that somebody outside of Oracle already uses new MySQL 8.0 server components feature. My former colleague, Justin Swanhart, tried but failed so far. Let's see how this report may end up. For now Oracle engineer stated that Justin tried to do something this framework was not designed for, see nice last comment there.
  • Bug #93777 - "the Originator of mysql.event is not correct". Pay attention to this bug found by Dennis Gao if you use events in replication environment.
  • Bug #93779 - "dml in table with trigger to other Table map in binlog lead to slave sql stopped". Work on this bug report from Mohamed Atef is still in progress. Bug reported cared to add a lot of details in recent comments, so I hope to see this bug processed soon.
I've also included a couple of bugs reported in January, 2019 in the list above, but I hope you don't mind.
Today I'd like to remember those sunny days in Venice in September and those pesky MySQL bugs reported this winter...
To summarize:
  1. Some bug reports stay "Open" for weeks without any good reason recently.
  2. The shameful practice of referring to bugs in Oracle's internal bugs database without giving a number (that we see in release notes when the bug is fixed and can look for in commit messages at GitHub) should be stopped! Next time I'll blame engineer(s) who this by name, here and in social media. Check how this is properly done by Shane Bester.
  3. I wonder if Oracle QA runs MTR tests cases on debug builds on a regula basis and checks the results, or this is now a job for Percona mostly?
  4. I should write a separate blog post about bugs related to SPATIAL indexes implementation in InnoDB. There should be dragons...
P.S. Don't forget to click on names of bug reporters to see the entire list of still active MySQL bug reports each of them created.

by Valeriy Kravchuk (noreply@blogger.com) at January 20, 2019 06:29 PM

January 18, 2019

Peter Zaitsev

Percona XtraDB Cluster Operator 0.2.0 Early Access Release Is Now Available

Percona XtraDB Cluster Operator

 

Percona announces the release of Percona XtraDB Cluster Operator  0.2.0 early access.

The Percona XtraDB Cluster Operator simplifies the deployment and management of Percona XtraDB Cluster in a Kubernetes or OpenShift environment. It extends the Kubernetes API with a new custom resource for deploying, configuring and managing the application through the whole life cycle.Percona XtraDB Cluster Operator

Note: PerconaLabs and Percona-QA are open source GitHub repositories for unofficial scripts and tools created by Percona staff. These handy utilities can help save your time and effort.

Percona software builds located in the Percona-Lab and Percona-QA repositories are not officially released software, and also aren’t covered by Percona support or services agreements.

You can install the Percona XtraDB Cluster Operator on Kubernetes or OpenShift. While the operator does not support all the Percona XtraDB Cluster features in this early access release, instructions on how to install and configure it are already available along with the operator source code, hosted in our Github repository.

The Percona XtraDB Cluster Operator on Percona-Lab is an early access release. Percona doesn’t recommend it for production environments. 

New features

  • Advanced nodes assignment implemented in this version allows to run containers with Percona XtraDB Cluster nodes on different hosts, availability zones, etc. to achieve higher availability and fault tolerance.
  • Cluster backups are now supported, and can be performed on a schedule or on demand.
  • Percona XtraDB Cluster Operator now supports private container registries like those in OpenShift so that Internet access is not required to deploy the operator.

Improvements

  • CLOUD-69: Annotations and labels are now passed from the deploy/cr.yaml configuration file to a StatefulSet for both Percona XtraDB Cluster and ProxySQL Pods
  • CLOUD-55: Now setting a password for the ProxySQL admin user is supported.
  • CLOUD-48: Migration to operator SDK 0.2.1

Fixed Bugs

  • CLOUD-82: Pods were stopped in random order while the cluster removal, which could cause problems when recreating the cluster with the same name.
  • CLOUD-79: Setting long cluster name in the deploy/cr.yaml file made Percona XtraDB Cluster unable to start.
  • CLOUD-54: The clustercheck tool used monitor user instead of its own clustercheck one for liveness and readiness probes.
Percona XtraDB Cluster is an open source, cost-effective and robust clustering solution for businesses. It integrates Percona Server for MySQL with the Galera replication library to produce a highly-available and scalable MySQL® cluster complete with synchronous multi-master replication, zero data loss and automatic node provisioning using Percona XtraBackup.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

by Dmitriy Kostiuk at January 18, 2019 05:55 PM

Percona XtraBackup 2.4.13 Is Now Available

Percona XtraBackup 8.0

Percona XtraBackupPercona is glad to announce the release of Percona XtraBackup 2.4.13 on January 18, 2018. You can download it from our download site and apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.

New features and improvements:

  • PXB-1548: Percona XtraBackup enables updating the ib_buffer_pool file with the latest pages present in the buffer pool using the --dump-innodb-buffer-pool option. Thanks to Marcelo Altmann for contribution.

Bugs fixed

  • xtrabackup did not delete missing tables from the partial backup which led to error messages logged by the server on startup. Bug fixed PXB-1536.
  • The --history option did not work when autocommit was disabled. Bug fixed PXB-1569.
  • xtrabackup could fail to backup encrypted tablespace when it was recently created or altered. Bug fixed PXB-1648.
  • When the --throttle option was used, the applied value was different from the one specified by the user (off by one error). Bug fixed PXB-1668.
  • It was not allowed for MTS (multi-threaded slaves) without GTID to be backed up with --safe-slave-backup. Bug fixed PXB-1672.
  • Percona Xtrabackup could crash when the ALTER TABLE … TRUNCATE PARTITION command was run during a backup without locking DDL. Bug fixed PXB-1679.
  • xbcrypt could display an assertion failure and generated core if the required parameters are missing. Bug fixed PXB-1683.
  • Using --lock-ddl-per-table caused the server to scan all records of partitioned tables which could lead to the “out of memory error”. Bugs fixed PXB-1691 and PXB-1698.
  • xtrabackup --prepare could hang while performing insert buffer merge. Bug fixed PXB-1704.
  • Incremental backups did not update xtrabackup_binlog_info with --binlog-info=lockless. Bug fixed PXB-1711

Other bugs fixed:  PXB-1570PXB-1609PXB-1632

Release notes with all the improvements for version 2.4.13 are available in our online documentation. Please report any bugs to the issue tracker.

by Borys Belinsky at January 18, 2019 05:46 PM

Replication Manager Works with MariaDB

Complex multi-cluster replication topology

Some time ago I wrote a script to manage asynchronous replication links between Percona XtraDB clusters. The original post can be found here. The script worked well with Percona XtraDB Cluster but it wasn’t working well with MariaDB®.  Finally, the replication manager works with MariaDB.

First, let’s review the purpose of the script. Managing replication links between Galera based clusters is a tedious task. There are many potential slaves and many potential masters. Furthermore, each replication link must have only a single slave. Just try to imagine how you would maintain the following replication topology:

A complex replication topology

The above topology consists of five clusters and four master-to-master links. The replication manager can easily handle this topology. Of course, it is not a fix to the limitations of asynchronous replication. You must make sure your writes are replication safe. You could want, for example, a global user list or to centralize some access logs. Just to refresh memories, here are some of the script highlights:

  • Uses the Galera cluster for Quorum
  • Configurable, arbitrarily complex topologies
  • The script stores the topology in database tables
  • Elects slaves automatically
  • Monitors replication links
  • Slaves can connect to a list of potential masters

As you probably know, MariaDB has a different GTID implementation and syntax for the multi-source replication commands. I took some time to investigate why the script was failing and fixed it. Now, provided you are using MariaDB 10.1.4+ with GTIDs, the replication manager works fine.

You can found the script here. Be aware that although I work for Percona, the script is not officially supported by Percona.

by Yves Trudeau at January 18, 2019 04:05 PM

January 17, 2019

Peter Zaitsev

Percona Server for MySQL 8.0.13-4 Is Now Available

Percona Server for MySQL 8.0

Percona Server for MySQL

Percona announces the release of Percona Server for MySQL 8.0.13-4 on January 17, 2019 (downloads are available here and from the Percona Software Repositories). This release contains a fix for a critical bug that prevented Percona Server for MySQL 5.7.24-26 from being upgraded to version 8.0.13-3 if there are more than around 1000 tables, or if the maximum allocated InnoDB table ID is around 1000. Percona Server for MySQL 8.0.13-4 is now the current GA release in the 8.0 series.

All of Percona’s software is open-source and free.

Percona Server for MySQL 8.0 includes all the features available in MySQL 8.0 Community Edition in addition to enterprise-grade features developed by Percona. For a list of highlighted features from both MySQL 8.0 and Percona Server for MySQL 8.0, please see the GA release announcement.

Note: If you are upgrading from 5.7 to 8.0, please ensure that you read the upgrade guide and the document Changed in Percona Server for MySQL 8.0.

Bugs Fixed

  • It was not possible to upgrade from MySQL 5.7.24-26 to 8.0.13-3 if there were more than around 1000 tables, or if the maximum allocated InnoDB table ID was around 1000. Bug fixed #5245.
  • SHOW BINLOG EVENTS FROM <bad offset> is not diagnosed inside Format_description_log_events. Bug fixed #5126 (Upstream #93544).
  • There was a typo in mysqld_safe.shtrottling was replaced with throttling. Bug fixed #240. Thanks to Michael Coburn for the patch.
  • Percona Server for MySQL 8.0 could crash with the “Assertion failure: dict0dict.cc:7451:space_id != SPACE_UNKNOWN” exception during an upgrade from Percona Server for MySQL 5.7.23 to Percona Server for MySQL 8.0.13-3 with --innodb_file_per_table=OFF. Bug fixed #5222.
  • On Debian or Ubuntu, a conflict was reported on the /usr/bin/innochecksum file when attempting to install Percona Server for MySQL 8 over MySQL 8. Bug fixed #5225.
  • An out-of-bound read exception could occur on debug builds in the compressed columns with dictionaries feature. Bug fixed #5311.
  • The innodb_data_pending_reads server status variable contained an incorrect value. Bug fixed #5264. Thanks to Fangxin Lou for the patch.
  • A memory leak and needless allocation in compression dictionaries could happen in mysqldump. Bug fixed #5307.
  • A compression-related memory leak could happen in mysqlbinlog. Bug fixed #5308.

Other bugs fixed: #4797#5209#5268#5270#5306#5309

Find the release notes for Percona Server for MySQL 8.0.13-4 in our online documentation. Report bugs in the Jira bug tracker.

by Borys Belinsky at January 17, 2019 09:28 PM

Oli Sennhauser

MariaDB/MySQL Environment MyEnv 2.0.2 has been released

FromDual has the pleasure to announce the release of the new version 2.0.2 of its popular MariaDB, Galera Cluster and MySQL multi-instance environment MyEnv.

The new MyEnv can be downloaded here. How to install MyEnv is described in the MyEnv Installation Guide.

In the inconceivable case that you find a bug in the MyEnv please report it to the FromDual bug tracker.

Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.

Upgrade from 1.1.x to 2.0

Please look at the MyEnv 2.0.0 Release Notes.

Upgrade from 2.0.x to 2.0.2

shell> cd ${HOME}/product
shell> tar xf /download/myenv-2.0.2.tar.gz
shell> rm -f myenv
shell> ln -s myenv-2.0.2 myenv

Plug-ins

If you are using plug-ins for showMyEnvStatus create all the links in the new directory structure:

shell> cd ${HOME}/product/myenv
shell> ln -s ../../utl/oem_agent.php plg/showMyEnvStatus/

Upgrade of the instance directory structure

From MyEnv 1.0 to 2.0 the directory structure of instances has fundamentally changed. Nevertheless MyEnv 2.0 works fine with MyEnv 1.0 directory structures.

Changes in MyEnv 2.0.2

MyEnv

  • Error message fixed.
  • bind_address 0.0.0.0 is optimized to *.
  • State up and down are coloured now.
  • Complaint on missing symbolic link to my.cnf added.
  • New start-timeout configuration variable added. Important for Galera SST.
  • Default MariaDB my.cnf hash added to avoid complaints.
  • mysqld is consistently searched in sbin, bin and libexec now for RHEL/CentOS 7 compatibility.
  • Avoid EGPCS error messages during MyEnv start/stop.
  • Not used aReleaseVersion removed, side effect is to not have performance issues any more on up in huge MyEnv set-ups with older MySQL releases.

MyEnv Installer

  • Function answerQuestion on previous error message works now.
  • Try and catch for existing configuration file improved.
  • Default answer is "q" on error and instance name and blacklist name check is fixed.
  • myenv.conf backup file has a correct timestamp now.
  • Create symlink to datadir for my.cnf.
  • Purge of database is done from instancedir and not datadir any more.

MyEnv Utilities

  • galera_monitor.sh output made nicer.
  • Script az_test.php added, initial test found already a bug in MariaDB 10.3.
  • Script slave_monitor.sh added.
  • Option check made more careful for drop_partition.php and merge_partition.php.
  • Timestamp problem fixed for year change in split_partition.php.

For subscriptions of commercial use of MyEnv please get in contact with us.

by Shinguz at January 17, 2019 06:35 PM

Peter Zaitsev

Using Parallel Query with Amazon Aurora for MySQL

parallel query amazon aurora for mysql

parallel query amazon aurora for mysqlParallel query execution is my favorite, non-existent, feature in MySQL. In all versions of MySQL – at least at the time of writing – when you run a single query it will run in one thread, effectively utilizing one CPU core only. Multiple queries run at the same time will be using different threads and will utilize more than one CPU core.

On multi-core machines – which is the majority of the hardware nowadays – and in the cloud, we have multiple cores available for use. With faster disks (i.e. SSD) we can’t utilize the full potential of IOPS with just one thread.

AWS Aurora (based on MySQL 5.6) now has a version which will support parallelism for SELECT queries (utilizing the read capacity of storage nodes underneath the Aurora cluster). In this article, we will look at how this can improve the reporting/analytical query performance in MySQL. I will compare AWS Aurora with MySQL (Percona Server) 5.6 running on an EC2 instance of the same class.

In Short

Aurora Parallel Query response time (for queries which can not use indexes) can be 5x-10x better compared to the non-parallel fully cached operations. This is a significant improvement for the slow queries.

Test data and versions

For my test, I need to choose:

  1. Aurora instance type and comparison
  2. Dataset
  3. Queries

Aurora instance type and comparison

According to Jeff Barr’s excellent article (https://aws.amazon.com/blogs/aws/new-parallel-query-for-amazon-aurora/) the following instance classes will support parallel query (PQ):

“The instance class determines the number of parallel queries that can be active at a given time:

  • db.r*.large – 1 concurrent parallel query session
  • db.r*.xlarge – 2 concurrent parallel query sessions
  • db.r*.2xlarge – 4 concurrent parallel query sessions
  • db.r*.4xlarge – 8 concurrent parallel query sessions
  • db.r*.8xlarge – 16 concurrent parallel query sessions
  • db.r4.16xlarge – 16 concurrent parallel query sessions”

As I want to maximize the concurrency of parallel query sessions, I have chosen db.r4.8xlarge. For the EC2 instance I will use the same class: r4.8xlarge.

Aurora:

mysql> show global variables like '%version%';
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| aurora_version          | 1.18.0                       |
| innodb_version          | 1.2.10                       |
| protocol_version        | 10                           |
| version                 | 5.6.10                       |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | x86_64                       |
| version_compile_os      | Linux                        |
+-------------------------+------------------------------+

MySQL on ec2

mysql> show global variables like '%version%';
+-------------------------+------------------------------------------------------+
| Variable_name           | Value                                                |
+-------------------------+------------------------------------------------------+
| innodb_version          | 5.6.41-84.1                                          |
| protocol_version        | 10                                                   |
| slave_type_conversions  |                                                      |
| tls_version             | TLSv1.1,TLSv1.2                                      |
| version                 | 5.6.41-84.1                                          |
| version_comment         | Percona Server (GPL), Release 84.1, Revision b308619 |
| version_compile_machine | x86_64                                               |
| version_compile_os      | debian-linux-gnu                                     |
| version_suffix          |                                                      |
+-------------------------+------------------------------------------------------+

Table

I’m using the “Airlines On-Time Performance” database from http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time  (You can find the scripts I used here: https://github.com/Percona-Lab/ontime-airline-performance).

mysql> show table status like 'ontime'\G
*************************** 1. row ***************************
          Name: ontime
        Engine: InnoDB
       Version: 10
    Row_format: Compact
          Rows: 173221661
Avg_row_length: 409
   Data_length: 70850183168
Max_data_length: 0
  Index_length: 0
     Data_free: 7340032
Auto_increment: NULL
   Create_time: 2018-09-26 02:03:28
   Update_time: NULL
    Check_time: NULL
     Collation: latin1_swedish_ci
      Checksum: NULL
Create_options:
       Comment:
1 row in set (0.00 sec)

The table is very wide, 84 columns.

Working with Aurora PQ (Parallel Query)

Documentation: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-mysql-parallel-query.html

Aurora PQ works by doing a full table scan (parallel reads are done on the storage level). The InnoDB buffer pool is not used when Parallel Query is utilized.

For the purposes of the test I turned PQ on and off (normally AWS Aurora uses its own heuristics to determine if the PQ will be helpful or not):

Turn on and force:

mysql> set session aurora_pq = 1;
Query OK, 0 rows affected (0.00 sec)
mysql> set aurora_pq_force = 1;
Query OK, 0 rows affected (0.00 sec)

Turn off:

mysql> set session aurora_pq = 0;
Query OK, 0 rows affected (0.00 sec)

The EXPLAIN plan in MySQL will also show the details about parallel query execution statistics.

Queries

Here, I use the “reporting” queries, running only one query at a time. The queries are similar to those I’ve used in older blog posts comparing MySQL and Apache Spark performance (https://www.percona.com/blog/2016/08/17/apache-spark-makes-slow-mysql-queries-10x-faster/ )

Here is a summary of the queries:

  1. Simple queries:
    • select count(*) from ontime where flightdate > '2017-01-01'
    • select avg(DepDelay/ArrDelay+1) from ontime
  2. Complex filter, single table:

select SQL_CALC_FOUND_ROWS
FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest
FROM ontime
WHERE
  DestState not in ('AK', 'HI', 'PR', 'VI')
  and OriginState not in ('AK', 'HI', 'PR', 'VI')
  and flightdate > '2015-01-01'
   and ArrDelay < 15
and cancelled = 0
and Diverted = 0
and DivAirportLandings = 0
  ORDER by DepDelay DESC
LIMIT 10;

3. Complex filter, join “reference” table

select SQL_CALC_FOUND_ROWS
FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay
FROM ontime_ind o
JOIN carriers c on o.carrier = c.carrier_code
WHERE
  (carrier_name like 'United%' or carrier_name like 'Delta%')
  and ArrDelay > 30
  ORDER by DepDelay DESC
LIMIT 10\G

4. select one row only, no index

Query 1a: simple, count(*)

Let’s take a look at the most simple query: count(*). This variant of the “ontime” table has no secondary indexes.

select count(*) from ontime where flightdate > '2017-01-01';

Aurora, pq (parallel query) disabled:

I disabled the PQ first to compare:

mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (8 min 25.49 sec)
mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (2 min 48.81 sec)
mysql> mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (2 min 48.25 sec)
Please note: the first run was “cold run”; data was read from disk. The second and third run used the cached data.
Now let's enable and force Aurora PQ:
mysql> set session aurora_pq = 1;
Query OK, 0 rows affected (0.00 sec)
mysql> set aurora_pq_force = 1; 
Query OK, 0 rows affected (0.00 sec)
mysql> explain select count(*) from ontime where flightdate > '2017-01-01'\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: ontime
        type: ALL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 173706586
       Extra: Using where; Using parallel query (1 columns, 1 filters, 0 exprs; 0 extra)
1 row in set (0.00 sec)

(from the EXPLAIN plan, we can see that parallel query is used).

Results:

mysql> select count(*) from ontime where flightdate > '2017-01-01';                                                                                                                          
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (16.53 sec)
mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (16.56 sec)
mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (16.36 sec)
mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (16.56 sec)
mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (16.36 sec)

As we can see the results are very stable. It does not use any cache (ie: innodb buffer pool) either. The result is also interesting: utilizing multiple threads (up to 16 threads) and reading data from disk (using disk cache, probably) can be ~10x faster compared to reading from memory in a single thread.

Result: ~10x performance gain, no index used

Query 1b: simple, avg

set aurora_pq = 1; set aurora_pq_force=1;
select avg(DepDelay) from ontime;
+---------------+
| avg(DepDelay) |
+---------------+
|        8.2666 |
+---------------+
1 row in set (1 min 48.17 sec)
set aurora_pq = 0; set aurora_pq_force=0;  
select avg(DepDelay) from ontime;
+---------------+
| avg(DepDelay) |
+---------------+
|        8.2666 |
+---------------+
1 row in set (2 min 49.95 sec)
Here we can see that PQ gives use ~2x performance increase.

Summary of simple query performance

Here is what we learned comparing Aurora PQ performance to native MySQL query execution:

  1. Select count(*), not using index: 10x performance increase with Aurora PQ.
  2. select avg(…), not using index: 2x performance increase with Aurora PQ.

Query 2: Complex filter, single table

The following query will always be slow in MySQL. This combination of the filters in the WHERE condition makes it extremely hard to prepare a good set of indexes to make this query faster.

select SQL_CALC_FOUND_ROWS
FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest
FROM ontime
WHERE
  DestState not in ('AK', 'HI', 'PR', 'VI')
  and OriginState not in ('AK', 'HI', 'PR', 'VI')
  and flightdate > '2015-01-01'
  and ArrDelay < 15
and cancelled = 0
and Diverted = 0
and DivAirportLandings = '0'
ORDER by DepDelay DESC
LIMIT 10;

Let’s compare the query performance with and without PQ.

PQ disabled:

mysql> set aurora_pq_force = 0;
Query OK, 0 rows affected (0.00 sec)
mysql> set aurora_pq = 0;                                                                                                                                                                  
Query OK, 0 rows affected (0.00 sec)
mysql> explain select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE    DestState not in ('AK', 'HI', 'PR', 'VI') and OriginState not in ('AK', 'HI', 'PR', 'VI') and flightdate > '2015-01-01'     and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings = 0 ORDER by DepDelay DESC LIMIT 10\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: ontime
        type: ALL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 173706586
       Extra: Using where; Using filesort
1 row in set (0.00 sec)
mysql> select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE    DestState not in ('AK', 'HI', 'PR', 'VI') and OriginState not in ('AK', 'HI', 'PR', 'VI') and flightdate > '2015-01-01'     and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings = 0 ORDER by DepDelay DESC LIMIT 10;
+------------+---------+-----------+--------+------+
| FlightDate | carrier | FlightNum | Origin | Dest |
+------------+---------+-----------+--------+------+
| 2017-10-09 | OO      | 5028      | SBP    | SFO  |
| 2015-11-03 | VX      | 969       | SAN    | SFO  |
| 2015-05-29 | VX      | 720       | TUL    | AUS  |
| 2016-03-11 | UA      | 380       | SFO    | BOS  |
| 2016-06-13 | DL      | 2066      | JFK    | SAN  |
| 2016-11-14 | UA      | 1600      | EWR    | LAX  |
| 2016-11-09 | WN      | 2318      | BDL    | LAS  |
| 2016-11-09 | UA      | 1652      | IAD    | LAX  |
| 2016-11-13 | AA      | 23        | JFK    | LAX  |
| 2016-11-12 | UA      | 800       | EWR    | SFO  |
+------------+---------+-----------+--------+------+

10 rows in set (3 min 42.47 sec)

/* another run */

10 rows in set (3 min 46.90 sec)

This query is 100% cached. Here is the graph from PMM showing the number of read requests:

  1. Read requests: logical requests from the buffer pool
  2. Disk reads: physical requests from disk

Buffer pool requests:

Buffer pool requests from PMM

Now let’s enable and force PQ:

PQ enabled:

mysql> set session aurora_pq = 1;
Query OK, 0 rows affected (0.00 sec)
mysql> set aurora_pq_force = 1;                                                                                                                              Query OK, 0 rows affected (0.00 sec)
mysql> explain select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE    DestState not in ('AK', 'HI', 'PR', 'VI') and OriginState not in ('AK', 'HI', 'PR', 'VI') and flightdate > '2015-01-01'     and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings = 0 ORDER by DepDelay DESC LIMIT 10\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: ontime
        type: ALL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 173706586
       Extra: Using where; Using filesort; Using parallel query (12 columns, 4 filters, 3 exprs; 0 extra)
1 row in set (0.00 sec)
mysql> select SQL_CALC_FOUND_ROWS                                                                                                                                                                      -> FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest -> FROM ontime
   -> WHERE
   ->  DestState not in ('AK', 'HI', 'PR', 'VI')
   ->  and OriginState not in ('AK', 'HI', 'PR', 'VI')
   ->  and flightdate > '2015-01-01'
   ->   and ArrDelay < 15
   -> and cancelled = 0
   -> and Diverted = 0
   -> and DivAirportLandings = 0
   ->  ORDER by DepDelay DESC
   -> LIMIT 10;
+------------+---------+-----------+--------+------+
| FlightDate | carrier | FlightNum | Origin | Dest |
+------------+---------+-----------+--------+------+
| 2017-10-09 | OO      | 5028      | SBP    | SFO  |
| 2015-11-03 | VX      | 969       | SAN    | SFO  |
| 2015-05-29 | VX      | 720       | TUL    | AUS  |
| 2016-03-11 | UA      | 380       | SFO    | BOS  |
| 2016-06-13 | DL      | 2066      | JFK    | SAN  |
| 2016-11-14 | UA      | 1600      | EWR    | LAX  |
| 2016-11-09 | WN      | 2318      | BDL    | LAS  |
| 2016-11-09 | UA      | 1652      | IAD    | LAX  |
| 2016-11-13 | AA      | 23        | JFK    | LAX  |
| 2016-11-12 | UA      | 800       | EWR    | SFO  |
+------------+---------+-----------+--------+------+
10 rows in set (41.88 sec)
/* run 2 */
10 rows in set (28.49 sec)
/* run 3 */
10 rows in set (29.60 sec)

Now let’s compare the requests:

InnoDB Buffer Pool Requests

As we can see, Aurora PQ is almost NOT utilizing the buffer pool (there are a minor number of read requests. Compare the max of 4K requests per second with PQ to the constant 600K requests per second in the previous graph).

Result: ~8x performance gain

Query 3: Complex filter, join “reference” table

In this example I join two tables: the main “ontime” table and a reference table. If we have both tables without indexes it will simply be too slow in MySQL. To make it better, I have created an index for both tables and so it will use indexes for the join:

CREATE TABLE `carriers` (
 `carrier_code` varchar(8) NOT NULL DEFAULT '',
 `carrier_name` varchar(200) DEFAULT NULL,
 PRIMARY KEY (`carrier_code`),
 KEY `carrier_name` (`carrier_name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
mysql> show create table ontime_ind\G
...
 PRIMARY KEY (`id`),
 KEY `comb1` (`Carrier`,`Year`,`ArrDelayMinutes`),
 KEY `FlightDate` (`FlightDate`)
) ENGINE=InnoDB AUTO_INCREMENT=178116912 DEFAULT CHARSET=latin1

Query:

select SQL_CALC_FOUND_ROWS
FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay
FROM ontime_ind o
JOIN carriers c on o.carrier = c.carrier_code
WHERE
  (carrier_name like 'United%' or carrier_name like 'Delta%')
  and ArrDelay > 30
  ORDER by DepDelay DESC
LIMIT 10\G

PQ disabled, explain plan:

mysql> set aurora_pq_force = 0;
Query OK, 0 rows affected (0.00 sec)
mysql> set aurora_pq = 0;                                                                                                                                                                  
Query OK, 0 rows affected (0.00 sec)
mysql> explain
   -> select SQL_CALC_FOUND_ROWS
   -> FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay
   -> FROM ontime_ind o
   -> JOIN carriers c on o.carrier = c.carrier_code
   -> WHERE
   ->  (carrier_name like 'United%' or carrier_name like 'Delta%')
   ->  and ArrDelay > 30
   ->  ORDER by DepDelay DESC
   -> LIMIT 10\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: c
        type: range
possible_keys: PRIMARY,carrier_name
         key: carrier_name
     key_len: 203
         ref: NULL
        rows: 3
       Extra: Using where; Using index; Using temporary; Using filesort
*************************** 2. row ***************************
          id: 1
 select_type: SIMPLE
       table: o
        type: ref
possible_keys: comb1
         key: comb1
     key_len: 3
         ref: ontime.c.carrier_code
        rows: 2711597
       Extra: Using index condition; Using where
2 rows in set (0.01 sec)

As we can see MySQL uses indexes for the join. Response times:

/* run 1 – cold run */

10 rows in set (29 min 17.39 sec)

/* run 2  – warm run */

10 rows in set (2 min 45.16 sec)

PQ enabled, explain plan:

mysql> explain
   -> select SQL_CALC_FOUND_ROWS
   -> FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay
   -> FROM ontime_ind o
   -> JOIN carriers c on o.carrier = c.carrier_code
   -> WHERE
   ->  (carrier_name like 'United%' or carrier_name like 'Delta%')
   ->  and ArrDelay > 30
   ->  ORDER by DepDelay DESC
   -> LIMIT 10\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: c
        type: ALL
possible_keys: PRIMARY,carrier_name
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 1650
       Extra: Using where; Using temporary; Using filesort; Using parallel query (2 columns, 0 filters, 1 exprs; 0 extra)
*************************** 2. row ***************************
          id: 1
 select_type: SIMPLE
       table: o
        type: ALL
possible_keys: comb1
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 173542245
       Extra: Using where; Using join buffer (Hash Join Outer table o); Using parallel query (11 columns, 1 filters, 1 exprs; 0 extra)
2 rows in set (0.00 sec)

As we can see, Aurora does not use any indexes and uses a parallel scan instead.

Response time:

mysql> select SQL_CALC_FOUND_ROWS
   -> FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay
   -> FROM ontime_ind o
   -> JOIN carriers c on o.carrier = c.carrier_code
   -> WHERE
   ->  (carrier_name like 'United%' or carrier_name like 'Delta%')
   ->  and ArrDelay > 30
   ->  ORDER by DepDelay DESC
   -> LIMIT 10\G
...
*************************** 4. row ***************************
   FlightDate: 2017-05-04
UniqueCarrier: UA
      TailNum: N68821
    FlightNum: 1205
       Origin: KOA
OriginCityName: Kona, HI
         Dest: LAX
 DestCityName: Los Angeles, CA
     DepDelay: 1457
     ArrDelay: 1459
*************************** 5. row ***************************
   FlightDate: 1991-03-12
UniqueCarrier: DL
      TailNum:
    FlightNum: 1118
       Origin: ATL
OriginCityName: Atlanta, GA
         Dest: STL
 DestCityName: St. Louis, MO
...
10 rows in set (28.78 sec)
mysql> select found_rows();
+--------------+
| found_rows() |
+--------------+
|      4180974 |
+--------------+
1 row in set (0.00 sec)

Result: ~5x performance gain

(this is actually comparing the index cached read to a non-index PQ execution)

Summary

Aurora PQ can significantly improve the performance of reporting queries as such queries may be extremely hard to optimize in MySQL, even when using indexes. With indexes, Aurora PQ response time can be 5x-10x better compared to the non-parallel, fully cached operations. Aurora PQ can help improve performance of complex queries by performing parallel reads.

The following table summarizes the query response times:

Query Time, No PQ, index Time, PQ
select count(*) from ontime where flightdate > ‘2017-01-01’ 2 min 48.81 sec 16.53 sec
select avg(DepDelay) from ontime; 2 min 49.95 sec 1 min 48.17 sec
select SQL_CALC_FOUND_ROWS

FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest

FROM ontime

WHERE

DestState not in (‘AK’, ‘HI’, ‘PR’, ‘VI’)

and OriginState not in (‘AK’, ‘HI’, ‘PR’, ‘VI’)

and flightdate > ‘2015-01-01’

and ArrDelay < 15

and cancelled = 0

and Diverted = 0

and DivAirportLandings = 0

ORDER by DepDelay DESC

LIMIT 10;

3 min 42.47 sec 28.49 sec
select SQL_CALC_FOUND_ROWS

FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay

FROM ontime_ind o

JOIN carriers c on o.carrier = c.carrier_code

WHERE

(carrier_name like ‘United%’ or carrier_name like ‘Delta%’)

and ArrDelay > 30

ORDER by DepDelay DESC

LIMIT 10\G

2 min 45.16 sec 28.78 sec


Photo by Thomas Lipke on Unsplash

by Alexander Rubin at January 17, 2019 06:31 PM

January 16, 2019

Peter Zaitsev

Percona Server for MongoDB 3.2.22-3.13 Is Now Available

Percona Server for MongoDB

Percona Server for MongoDB 3.2

Percona is glad to announce the release of Percona Server for MongoDB 3.2.22-3.13 on January, 16 2019. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.2 Community Edition. It supports MongoDB 3.2 protocols and drivers.

Percona Server for MongoDB extends the functionality of MongoDB Community Edition by including the Percona Memory Engine storage engine, as well as several enterprise-grade features. Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.2.22. There are no additional improvements or new features on top of those upstream fixes.

The Percona Server for MongoDB 3.2.22-3.13 release notes are available in the official documentation.

by Borys Belinsky at January 16, 2019 04:04 PM

January 15, 2019

Kurt von Finck

Valeriy Kravchuk

Using dbdeployer With MariaDB Server

Some time ago I've noted that one of the tools I use for testing various MySQL and MariaDB cases and to reproduce potential bugs, MySQL-Sandbox, is not updated any more. It turned out that active development switched to its port in Go called dbdeployer. You can find detailed information about dbdeployer and reasons behind developing it provided by its author, Giuseppe Maxia, here and there. See also this post at Percona blog for some quick review of its main features. One of the points of dbdeployer (and reasons to use Go) is that it is built once (per platform supported) somewhere and then binaries are downloaded from GitHub and used everywhere, without any problems with dependencies etc.

I've added checking dbdeployer to my long ToDo list, as I planned to use it (if not MySQL Sandbox) for some tests and posts related to resolving typical practical problems with MariaDB GTID-based replication. Yesterday I've allocated some time to finally try it and, as usual, I've started with building it from source (as for me databases-related software that I can not build from source on my test systems is not any attractive as something new to study and use). I was immediately surprised by the lack of instructions on how to do this at GitHub, no Makefile of any kind etc. All I was able to find is build.sh script. Correction: just check README.md on how to build it properly, as Giuseppe Maxia explained in the comment.

Good, regular structure is important for deployment
Fortunately this is not the first project written in Go that I try to build (or change somehow and then build). The first one was this replication manager (that has proper build instructions in docs). So, I though I knew what to do. I've installed missing golang package on my netbook with Ubuntu 14.04 that I had at hand and tried the following typical steps:
openxs@ao756:~/go$ export GOPATH=$HOME/go
openxs@ao756:~/go$ echo $GOPATH
/home/openxs/go
openxs@ao756:~/go$ go get github.com/datacharmer/dbdeployer
# github.com/datacharmer/dbdeployer/common
src/github.com/datacharmer/dbdeployer/common/strutils.go:170: undefined: sort.Slice
...
That was a bit surprising, but quick Google search shown that this could be caused by outdated (pre-1.8) version of golang package. So, dbdeployer requires golang 1.8 or newer and there was no such package for my good old Ubuntu (it has some 1.2.x only). One day I'll upgrade it, but so far I am OK with 14.04 for all other testing purposes, so I had to give up on the idea to build from source temporary. 

Today during few free minutes I've retried on my good old desktop box with Fedora 27 (where I surely built some Go project(s) successfully):
[openxs@fc23 go]$ uname -a
Linux fc23 4.18.19-100.fc27.x86_64 #1 SMP Wed Nov 14 22:04:34 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[openxs@fc23 ~]$ ls go
pkg  src
[openxs@fc23 ~]$ echo $GOPATH

[openxs@fc23 ~]$ export GOPATH=$HOME/go
[openxs@fc23 ~]$ cd go
[openxs@fc23 go]$ go versiongo version go1.9.7 linux/amd64
This environment should work for build, so I've proceeded with:
[openxs@fc23 go]$ go get github.com/datacharmer/dbdeployer
[openxs@fc23 go]$ ls src/github.com/datacharmer/   jmoiron/       nsf/           tanji/
go-sql-driver/ mattn/         ogier/
[openxs@fc23 go]$ ls src/github.com/datacharmer/dbdeployer/
abbreviations/ compare/       docs/          mkreadme/      test/
.build/        concurrent/    .git/          rest/          unpack/
cmd/           cookbook/      .github/       sandbox/       vendor/
common/        defaults/      globals/       scripts/
Now let's try that scripts/build.sh with linux as a parameter, as it's a way to build Linux binaries based on what I found:
[openxs@fc23 go]$ MKDOCS=1 src/github.com/datacharmer/dbdeployer/scripts/build.sh linux
+ env GOOS=linux GOARCH=386 go build --tags docs -o dbdeployer-1.17.0-docs.linux .
+ env GOOS=linux GOARCH=amd64 go build -o sort_versions.linux sort_versions.go
/home/openxs/go/src/github.com/datacharmer/dbdeployer
-rwxrwxr-x. 1 openxs openxs 8.1M Jan 14 10:27 dbdeployer-1.17.0-docs.linux
-rw-rw-r--. 1 openxs openxs 3.0M Jan 14 10:27 dbdeployer-1.17.0-docs.linux.tar.gz
[openxs@fc23 go]$ ls
bin  pkg  src
[openxs@fc23 go]$ ls bin
dbdeployer
[openxs@fc23 go]$ bin/dbdeployer --version
dbdeployer version 1.17.0
Now we know how to build dbdeployer from source, if needed. If some dependencies are missing you'll be informed and similar go get ... command should allow to install it.

I was somewhat surprised to see MariaDB NOT mentioned at all in README.md. It says:
"DBdeployer is a tool that deploys MySQL database servers easily."
while good old MySQL-Sandbox also mentions MariaDB explicitly:
"This package is a sandbox for testing features under any version of MySQL from 3.23 to 8.0 (and any version of MariaDB.)"
So, my idea was to double check that dbdeployer is both MySQL-Sandbox compatible and MariaDB compatible (it is). I have several sandboxes already created in the past. I also have MariaDB 10.2.21 .tar.gz binaries that I want to use with dbdeployer for further testing:
[openxs@fc23 go]$ ls ~/sandboxes/
clear_all        rsandbox_mariadb-10_0_19  send_kill_all  test_replication
plugin.conf      rsandbox_mariadb-10_1_12  start_all      use_all
restart_all      rsandbox_mysql-8_0_12     status_all
rsandbox_8_0_12  sandbox_action            stop_all
[openxs@fc23 go]$ ls ~/*.tar.gz
/home/openxs/galera-25.3.22-x86_64.tar.gz
/home/openxs/galera-25.3.24-x86_64.tar.gz
/home/openxs/galera-25.3.25-glibc_214-x86_64.tar.gz
/home/openxs/mariadb-10.2.12-linux-x86_64.tar.gz
/home/openxs/mariadb-10.2.21-linux-x86_64.tar.gz
With dbdeployer one has to unpack .tar.gz first with dbdeployer unpack command. So, I tried it immediately:
[openxs@fc23 go]$ bin/dbdeployer unpack ~/mariadb-10.2.21-linux-x86_64.tar.gz
directory '/home/openxs/opt/mysql' not found
You should create it or provide an alternate base directory using --sandbox-binary
It seems the tool now wants to use ~/opt/mysql as a directory to unpack to, while MySQL_Sandbox silently used ~:
[openxs@fc23 go]$ ls ~ | grep 8.0
8.0.12
I made a lame try to force it to use ~, but failed for the reason I was too lazy to study:
[openxs@fc23 go]$ bin/dbdeployer --sandbox-binary=/home/openxs unpack /home/openxs/mariadb-10.2.21-linux-x86_64.tar.gzUnpacking tarball /home/openxs/mariadb-10.2.21-linux-x86_64.tar.gz to $HOME/10.2.21
.........100.........200....&tar.Header{Name:"mariadb-10.2.21-linux-x86_64/mysql-test/mysql-test-run", Mode:511, Uid:1021, Gid:1004, Size:0, ModTime:time.Time{wall:0x0, ext:63681810892, loc:(*time.Location)(0xa47aa0)}, Typeflag:0x32, Linkname:"./mysql-test-run.pl", Uname:"dbart", Gname:"my", Devmajor:0, Devminor:0, AccessTime:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, ChangeTime:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Xattrs:map[string]string(nil)}
#ERROR: symlink ./mysql-test-run.pl mariadb-10.2.21-linux-x86_64/mysql-test/mysql-test-run: file exists
I just created ~/opt/mysql and proceeded with default configuration. After unpack step completed successfully I've proceeded with deploy step to create new replication sandbox:
[openxs@fc23 go]$ bin/dbdeployer unpack /home/openxs/mariadb-10.2.21-linux-x86_64.tar.gz
Unpacking tarball /home/openxs/mariadb-10.2.21-linux-x86_64.tar.gz to $HOME/opt/mysql/10.2.21
.........100.........200.........300.........400.........500.........600.........700.........800... ...
Renaming directory /home/openxs/opt/mysql/mariadb-10.2.21-linux-x86_64 to /home/openxs/opt/mysql/10.2.21

[openxs@fc23 go]$ bin/dbdeployer deploy replication 10.2.21
Installing and starting master
. sandbox server started
Installing and starting slave1
. sandbox server started
Installing and starting slave2
. sandbox server started
$HOME/sandboxes/rsandbox_10_2_21/initialize_slaves
initializing slave 1
initializing slave 2
Replication directory installed in $HOME/sandboxes/rsandbox_10_2_21
run 'dbdeployer usage multiple' for basic instructions'
We have access to nice enough documentation:
[openxs@fc23 go]$ bin/dbdeployer usage multiple
 USING MULTIPLE SERVER SANDBOX
On a replication sandbox, you have the same commands (run "dbdeployer usage single"),
with an "_all" suffix, meaning that you propagate the command to all the members.
Then you have "./m" as a shortcut to use the master, "./s1" and "./s2" to access
the slaves (and "s3", "s4" ... if you define more).

In group sandboxes without a master slave relationship (group replication and
multiple sandboxes) the nodes can be accessed by ./n1, ./n2, ./n3, and so on.

start_all    [options] > starts all nodes
status_all             > get the status of all nodes
restart_all  [options] > restarts all nodes
stop_all               > stops all nodes
use_all         "SQL"  > runs a SQL statement in all nodes
use_all_masters "SQL"  > runs a SQL statement in all masters
use_all_slaves "SQL"   > runs a SQL statement in all slaves
clear_all              > stops all nodes and removes all data
m                      > invokes MySQL client in the master
s1, s2, n1, n2         > invokes MySQL client in slave 1, 2, node 1, 2


The scripts "check_slaves" or "check_nodes" give the status of replication in the sandbox.
Typical sandbox directory (with some differences like use_all_slaves etc) is created in ~/sandboxes/ and shortcut commands work as expected:
[openxs@fc23 go]$ cd ~/sandboxes/rsandbox_10_2_21/
[openxs@fc23 rsandbox_10_2_21]$ ls
check_slaves       n2           sbdescription.json  test_sb_all
clear_all          node1        send_kill_all       use_all
initialize_slaves  node2        start_all           use_all_masters
m                  restart_all  status_all          use_all_slaves
master             s1           stop_all
n1                 s2           test_replication

[openxs@fc23 rsandbox_10_2_21]$ ls master/
add_option    init_db         restart             show_binlog    status   use
clear         load_grants     sbdescription.json  show_log       stop
data          my              sb_include          show_relaylog  test_sb
grants.mysql  my.sandbox.cnf  send_kill           start          tmp

[openxs@fc23 rsandbox_10_2_21]$ ls ../rsandbox_mariadb-10_1_12/
check_slaves             m       node1        s2             test_replication
clear_all                master  node2        send_kill_all  use_all
connection.json          n1      README       start_all
default_connection.json  n2      restart_all  status_all
initialize_slaves        n3      s1           stop_all

[openxs@fc23 rsandbox_10_2_21]$ ls ../rsandbox_mariadb-10_1_12/master/
add_option       default_connection.json  my              send_kill      tmp
change_paths     grants_5_7_6.mysql       mycli           show_binlog    use
change_ports     grants.mysql             my.sandbox.cnf  show_relaylog  USING
clear            json_in_db               proxy_start     start
connection.json  load_grants              README          status
data             msb                      restart         stop

[openxs@fc23 rsandbox_10_2_21]$ ./status_all
REPLICATION  /home/openxs/sandboxes/rsandbox_10_2_21
master : master on  -  port     23322 (23322)
node1 : node1 on  -  port       23323 (23323)
node2 : node2 on  -  port       23324 (23324)
[openxs@fc23 rsandbox_10_2_21]$ ./use_all "show variables like 'gtid%'"
# master
Variable_name   Value
gtid_binlog_pos 0-100-12
gtid_binlog_state       0-100-12
gtid_current_pos        0-100-12
gtid_domain_id  0
gtid_ignore_duplicates  OFF
gtid_seq_no     0
gtid_slave_pos
gtid_strict_mode        OFF
# server: 1
Variable_name   Value
gtid_binlog_pos
gtid_binlog_state
gtid_current_pos        0-100-12
gtid_domain_id  0
gtid_ignore_duplicates  OFF
gtid_seq_no     0
gtid_slave_pos  0-100-12
gtid_strict_mode        OFF
# server: 2
Variable_name   Value
gtid_binlog_pos
gtid_binlog_state
gtid_current_pos        0-100-12
gtid_domain_id  0
gtid_ignore_duplicates  OFF
gtid_seq_no     0
gtid_slave_pos  0-100-12
gtid_strict_mode        OFF
[openxs@fc23 rsandbox_10_2_21]$ ./m
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 14
Server version: 10.2.21-MariaDB-log MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

master [localhost:23322] {msandbox} ((none)) > show master status;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 |     2835 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

master [localhost:23322] {msandbox} ((none)) > show variables like 'gtid%';
+------------------------+----------+
| Variable_name          | Value    |
+------------------------+----------+
| gtid_binlog_pos        | 0-100-12 |
| gtid_binlog_state      | 0-100-12 |
| gtid_current_pos       | 0-100-12 |
| gtid_domain_id         | 0        |
| gtid_ignore_duplicates | OFF      |
| gtid_seq_no            | 0        |
| gtid_slave_pos         |          |
| gtid_strict_mode       | OFF      |
+------------------------+----------+
8 rows in set (0.00 sec)

master [localhost:23322] {msandbox} ((none)) > exit
Bye
For my further tests I needed slaves to have log_slave_updates enabled and gtid_strict_mode=ON. So, I've added these settings to my.sandbox.cnf in node1 and node2 subdirectories for both slaves and restarted them:
[openxs@fc23 rsandbox_10_2_21]$ ./restart_all# executing 'stop' on /home/openxs/sandboxes/rsandbox_10_2_21
stop /home/openxs/sandboxes/rsandbox_10_2_21/node1
stop /home/openxs/sandboxes/rsandbox_10_2_21/node2
stop /home/openxs/sandboxes/rsandbox_10_2_21/master
# executing 'start' on /home/openxs/sandboxes/rsandbox_10_2_21
executing 'start' on master
. sandbox server started
executing 'start' on slave 1
. sandbox server started
executing 'start' on slave 2
. sandbox server started
I need a table to play with and I want to check that slaves are in sync:
[openxs@fc23 rsandbox_10_2_21]$ ./m
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 12
Server version: 10.2.21-MariaDB-log MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

master [localhost:23322] {msandbox} ((none)) > use test
Database changed
master [localhost:23322] {msandbox} (test) > create table t1(id int primary key, c1 int);
Query OK, 0 rows affected (0.17 sec)

master [localhost:23322] {msandbox} (test) > exit
Bye
[openxs@fc23 rsandbox_10_2_21]$ ./use_all "show variables like 'gtid%'"
# master
Variable_name   Value
gtid_binlog_pos 0-100-13
gtid_binlog_state       0-100-13
gtid_current_pos        0-100-13
gtid_domain_id  0
gtid_ignore_duplicates  OFF
gtid_seq_no     0
gtid_slave_pos
gtid_strict_mode        OFF
# server: 1
Variable_name   Value
gtid_binlog_pos 0-100-13
gtid_binlog_state       0-100-13
gtid_current_pos        0-100-13
gtid_domain_id  0
gtid_ignore_duplicates  OFF
gtid_seq_no     0
gtid_slave_pos  0-100-13
gtid_strict_mode        ON
# server: 2
Variable_name   Value
gtid_binlog_pos 0-100-13
gtid_binlog_state       0-100-13
gtid_current_pos        0-100-13
gtid_domain_id  0
gtid_ignore_duplicates  OFF
gtid_seq_no     0
gtid_slave_pos  0-100-13
gtid_strict_mode        ON
[openxs@fc23 rsandbox_10_2_21]$
Note the value of gtid_current_pos on master and gtid_slave_pos on each slave. They are the same and slaves are in sync. If you want to find out more about the format of GTIDs in MariaDB or all that gtid% server variables, please, check this KB article.

* * *

To summarize, dbdeployer is a nice port of MySQL-Sandbox into Go, with some additional features. It can be easily built from source if you have golang version 1.8 or newer (or just downloaded if you have not). Sandboxes created with dbdeployer may co-exists with older sandboxes in the same default directory (but .tar.gz files are unpacked into different directory by default). It still works well with MariaDB. I am going to use replication sandboxes built with it for some further testing of various real life use cases and problems of MariaDB's GTIDs implementation (that may be presented in further posts).

by Valeriy Kravchuk (noreply@blogger.com) at January 15, 2019 05:57 PM

Peter Zaitsev

Customizing Per-Process Metrics in PMM

Process Memory Usage - a filtered graph in PMM

If you have set up per-process metrics in Percona Monitoring and Management, you may have found yourself in need of tuning it further to not only group processes together, but to display some of them in isolation. In this blogpost we will explore how to modify the rules for grouping processes, so that you can make the most out of this awesome PMM integration.

Let’s say you have followed the link above on how to set up the per-process metrics integration on PMM, and you have imported the dashboard to show these metrics. You will see something like the following:

PMM database and system monitoring and management software

This is an internal testing server we use, in which you can see a high number of VBoxHeadless (29) and mysqld (99) processes running. All the metrics in the dashboard will be grouped by the name of the command used. But, what if we want to see metrics for only one of these processes in isolation? As things stand, we will not be able to do so. It may not make sense to do so in a testing environment, but if you are running multiple mysqld processes (or mongos, postgres, etc) bound to different ports, you may want to see metrics for each of them separately.

Modifying the configuration file

Enter all.yaml!

In the process-exporter documentation on using a configuration file, we can see the following:

The general format of the -config.path YAML file is a top-level process_names section, containing a list of name matchers. […] A process may only belong to one group: even if multiple items would match, the first one listed in the file wins.

This means that even if we have two rules that would match a process, only the first one will be taken into account. This will allow us to both list processes by themselves, and not miss any non-grouped process. How? Let’s imagine we have the following processes running:

mysqld --port=1
mysqld --port=2
mysqld --port=3
mysqld --port=4

And we wanted to be able to tell apart the instances running in ports 1 and 2 from the other ones, we could use the following rules:

- name: "mysqld_port_1"
 cmdline:
 - '.*mysqld.*port=1.*'
- name: "mysqld_port_2"
 cmdline:
 - '.*mysqld.*port=2.*'
- name: "{{.Comm}}"
 cmdline:
 - '.+'

In cmdline we will need the regular expression against which to match the process command running. In this case, we made use of the fact that they were using different ports, but any difference in the command strings can be used. The last rule is the one that will default to “anything else” (with the regular expression that matches anything).

The default rule at the end will make sure you don’t miss any other process, so unless you want only some processes metrics collected, you should always have a rule for it.

A real life working example of configuring per-process metrics

In case all these generic information didn’t make much sense, we will present a concrete example, hoping that it will make everything fit together nicely.

In this example we want to have the mysqld instance using the mysql_sandbox16679.sock socket isolated from all the others, and the VM with ID finishing in 97eafa2795da listed by their own. All other processes are to be grouped together by using the basename of the executable.

You can check the output from ps aux to see the full command used. For instance:

shell> ps aux | grep 97eafa2795da
agustin+ 27785  0.7 0.2 5619280 542536 ?      Sl Nov28 228:24 /usr/lib/virtualbox/VBoxHeadless --comment centos_node1_1543443575974_22181 --startvm a0151e29-35dd-4c14-8e37-97eafa2795da --vrde config

So, we can use the following regular expression for it (we use .* to match any string):

.*VBoxHeadless.*97eafa2795da.*

The same applies to the regular expression for the mysqld process.

The configuration file will end up looking like:

shell>  cat /etc/process-exporter/all.yaml
process_names:
 - name: "Custom VBox"
   cmdline:
   - '.*VBoxHeadless.*97eafa2795da.*'
 - name: "Custom MySQL"
   cmdline:
   - '.*mysqld.*mysql_sandbox16679.sock.*'
 - name: "{{.Comm}}"
   cmdline:
   - '.+'

Let’s restart the service, so that new changes apply, and we will check the graphs after five minutes, to see new changes. Note that you may have to reload the page for the changes to apply.

shell> systemctl restart process-exporter

After refreshing, we will see the new list of processes in the drop-down list:

A new list of processes in PMM after filtering

And after we select them, we will be able to see data for those processes in particular:

Thanks to the default configuration at the end, we are still capturing data from all the other mysqld processes. However, they will have their own group, as mentioned before:

System Processes Metrics graph in PMM

 

by Agustín at January 15, 2019 04:20 PM

January 14, 2019

Peter Zaitsev

Upcoming Webinar Thurs 1/17: How to Rock with MyRocks

How to Rock with MyRocks

How to Rock with MyRocksPlease join Percona’s Chief Technology Officer, Vadim Tkachenko, as he presents How to Rock with MyRocks on Thursday, January 17th at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

Register Now

MyRocks is a new storage engine from Facebook and is available in Percona Server for MySQL. In what cases will you want to use it? We will check different workloads and when MyRocks is most suitable for you. Also, as for any new engine, it’s important to set it up and tune it properly. So, we will review the most important settings to pay attention to.

Register for this webinar to learn How to Rock with MyRocks.

by Vadim Tkachenko at January 14, 2019 09:35 PM

Should You Use ClickHouse as a Main Operational Database?

ClickHouse as a main operational database

ClickHouse as a main operational databaseFirst of all, this post is not a recommendation but more like a “what if” story. What if we use ClickHouse (which is a columnar analytical database) as our main datastore? Well, typically, an analytical database is not a replacement for a transactional or key/value datastore. However, ClickHouse is super efficient for timeseries and provides “sharding” out of the box (scalability beyond one node).  So can we use it as our main datastore?

Let’s imagine we are running a webservice and provide a public API. Public API as -a-service has become a good business model: examples include social networks like Facebook/Twitter, messaging as a service like Twilio, and even credit card authorization platforms like Marqeta. Let’s also imagine we need to store all messages (SMS messages, email messages, etc) we are sending and allow our customers to get various information about the message. This information can be a mix of analytical (OLAP) queries (i.e. how many messages was send for some time period and how much it cost) and a typical key/value queries like: “return 1 message by the message id”.

Using a columnar analytical database can be a big challenge here. Although such databases can be very efficient with counts and averages, some queries will be slow or simply non existent. Analytical databases are optimized for a low number of slow queries. The most important limitations of the analytical databases are:

  1. Deletes and updates are non-existent or slow
  2. Inserts are efficient for bulk inserts only
  3. No secondary indexes means that point selects (select by ID) tend to be very slow

This is all true for ClickHouse, however, we may be able to live with it for our task.

To simulate text messages I have used ~3 billion of reddit comments (10 years from 2007 to 2017), downloaded from pushshift.io . Vadim published a blog post about analyzing reddit comments with ClickHouse. In my case, I’m using this data as a simulation of text messages, and will show how we can use ClickHouse as a backend for an API.

Loading the JSON data to Clickhouse

I used the following table in Clickhouse to load all data:

CREATE TABLE reddit.rc(
body String,
score_hidden Nullable(UInt8),
archived Nullable(UInt8),
name String,
author String,
author_flair_text Nullable(String),
downs Nullable(Int32),
created_utc UInt32,
subreddit_id String,
link_id Nullable(String),
parent_id Nullable(String),
score Nullable(Int16),
retrieved_on Nullable(UInt32),
controversiality Nullable(Int8),
gilded Nullable(Int8),
id String,
subreddit String,
ups Nullable(Int16),
distinguished Nullable(String),
author_flair_css_class Nullable(String),
stickied Nullable(UInt8),
edited Nullable(UInt8)
) ENGINE = MergeTree() PARTITION BY toYYYYMM(toDate(created_utc)) ORDER BY created_utc ;

Then I used the following command to load the JSON data (downloaded from pushshift.io) to ClickHouse:

$ bzip2 -d -c RC_20*.bz2 | clickhouse-client --input_format_skip_unknown_fields 1 --input_format_allow_errors_num 1000000 -d reddit -n --query="INSERT INTO rc FORMAT JSONEachRow"

The data on disk in ClickHouse is not significantly larger than compressed files, which is great:

#  du -sh /data/clickhouse/data/reddit/rc/
638G    /data/clickhouse/data/reddit/rc/
# du -sh /data/reddit/
404G    /data/reddit/

We have ~4 billion rows:

SELECT
    toDate(min(created_utc)),
    toDate(max(created_utc)),
    count(*)
FROM rc
┌─toDate(min(created_utc))─┬─toDate(max(created_utc))─┬────count()─┐
│               2006-01-01 │               2018-05-31 │ 4148248585 │
└──────────────────────────┴──────────────────────────┴────────────┘
1 rows in set. Elapsed: 11.554 sec. Processed 4.15 billion rows, 16.59 GB (359.02 million rows/s., 1.44 GB/s.)

The data is partitioned and sorted by created_utc so queries which include created_utc will be able to using partition pruning: therefore skip the not-needed partitions. However, let’s say our API needs to support the following features, which are not common for analytical databases:

  1. Selecting a single comment/message by ID
  2. Retrieving the last 10 or 100 of the messages/comments
  3. Updating a single message in the past (e.g. in the case of messages, we may need to update the final price; in the case of comments, we may need to upvote or downvote a comment)
  4. Deleting messages
  5. Text search

With the latest ClickHouse version, all of these features are available, but some of them may not perform fast enough.

Retrieving a single row in ClickHouse

Again, this is not a typical operation in any analytical database, those databases are simply not optimized for it. ClickHouse does not have secondary indexes, and we are using created_utc as a primary key (sort by). So, selecting a message by just ID will require a full table scan:

SELECT
    id,
    created_utc
FROM rc
WHERE id = 'dbumnpz'
┌─id──────┬─created_utc─┐
│ dbumnpz │  1483228800 │
└─────────┴─────────────┘
1 rows in set. Elapsed: 18.070 sec. Processed 4.15 billion rows, 66.37 GB (229.57 million rows/s., 3.67 GB/s.)

Only if we know the timestamp (created_utc)… Then it will be lighting fast: ClickHouse will use the primary key:

SELECT *
FROM rc
WHERE (id = 'dbumnpz') AND (created_utc = 1483228800)
...
1 rows in set. Elapsed: 0.010 sec. Processed 8.19 thousand rows, 131.32 KB (840.27 thousand rows/s., 13.47 MB/s.)

Actually, we can simulate an additional index set by creating a materialized view in ClickHouse:

create materialized view rc_id_v
ENGINE MergeTree() PARTITION BY toYYYYMM(toDate(created_utc)) ORDER BY (id)
POPULATE AS SELECT id, created_utc from rc;

Here I’m creating a materialized view and populating it initially from the main (rc) table. The view will be updated automatically when there are any inserts into table reddit.rc. The view is actually another MergeTree table sorted by id. Now we can use this query:

SELECT *
FROM rc
WHERE (id = 'dbumnpz') AND (created_utc =
(
    SELECT created_utc
    FROM rc_id_v
    WHERE id = 'dbumnpz'
))
...
1 rows in set. Elapsed: 0.053 sec. Processed 8.19 thousand rows, 131.32 KB (153.41 thousand rows/s., 2.46 MB/s.)

This is a single query which will join our materialized view to pass the created_utc (timestamp) to the original table. It is a little bit slower but still less than 100ms response time.

Using this trick (materialized views) we can potentially simulate other indexes.

Retrieving the last 10 messages

This is where ClickHouse is not very efficient. Let’s say we want to retrieve the last 10 comments:

SELECT
    id,
    created_utc
FROM rc
ORDER BY created_utc DESC
LIMIT 10
┌─id──────┬─created_utc─┐
│ dzwso7l │  1527811199 │
│ dzwso7j │  1527811199 │
│ dzwso7k │  1527811199 │
│ dzwso7m │  1527811199 │
│ dzwso7h │  1527811199 │
│ dzwso7n │  1527811199 │
│ dzwso7o │  1527811199 │
│ dzwso7p │  1527811199 │
│ dzwso7i │  1527811199 │
│ dzwso7g │  1527811199 │
└─────────┴─────────────┘
10 rows in set. Elapsed: 24.281 sec. Processed 4.15 billion rows, 82.96 GB (170.84 million rows/s., 3.42 GB/s.)

In a conventional relational database (like MySQL) this can be done by reading a btree index sequentially from the end, as the index is sorted (like “tail” command on linux). In a partitioned massively parallel database system, the storage format and sorting algorithm may not be optimized for that operation as we are reading multiple partitions in parallel. Currently, an issue has been opened to make the “tailing” based on the primary key much faster: slow order by primary key with small limit on big data. As a temporary workaround we can do something like this:

SELECT count()
FROM rc
WHERE (created_utc > (
(
    SELECT max(created_utc)
    FROM rc
) - ((60 * 60) * 24))) AND (subreddit = 'programming')
┌─count()─┐
│    1248 │
└─────────┘
1 rows in set. Elapsed: 4.510 sec. Processed 3.05 million rows, 56.83 MB (675.38 thousand rows/s., 12.60 MB/s.) ```

It is still a five seconds query. Hopefully, this type of query will become faster in ClickHouse.

Updating / deleting data in ClickHouse

The latest ClickHouse version allows running update/delete in the form of “ALTER TABLE .. UPDATE / DELETE” (it is called mutations in ClickHouse terms). For example, we may want to upvote a specific comment.

SELECT score
FROM rc_2017
WHERE (id = 'dbumnpz') AND (created_utc =
(
    SELECT created_utc
    FROM rc_id_v
    WHERE id = 'dbumnpz'
))
┌─score─┐
│     2 │
└───────┘
1 rows in set. Elapsed: 0.048 sec. Processed 8.19 thousand rows, 131.08 KB (168.93 thousand rows/s., 2.70 MB/s.)
:) alter table rc_2017 update score = score +1 where id =  'dbumnpz' and created_utc = (select created_utc from rc_id_v where id =  'dbumnpz');
ALTER TABLE rc_2017
    UPDATE score = score + 1 WHERE (id = 'dbumnpz') AND (created_utc =
    (
        SELECT created_utc
        FROM rc_id_v
        WHERE id = 'dbumnpz'
    ))
Ok.
0 rows in set. Elapsed: 0.052 sec.

“Mutation” queries will return immediately and will be executed asynchronously. We can see the progress by reading from the system.mutations table:

select * from system.mutations\G
SELECT *
FROM system.mutations
Row 1:
──────
database:                   reddit
table:                      rc_2017
mutation_id:                mutation_857.txt
command:                    UPDATE score = score + 1 WHERE (id = 'dbumnpz') AND (created_utc = (SELECT created_utc FROM reddit.rc_id_v  WHERE id = 'dbumnpz'))
create_time:                2018-12-27 22:22:05
block_numbers.partition_id: ['']
block_numbers.number:       [857]
parts_to_do:                0
is_done:                    1
1 rows in set. Elapsed: 0.002 sec.

Now we can try deleting comments that have been marked for deletion (body showing “[deleted]”):

ALTER TABLE rc_2017
    DELETE WHERE body = '[deleted]'
Ok.
0 rows in set. Elapsed: 0.002 sec.
:) select * from system.mutations\G
SELECT *
FROM system.mutations
...
Row 2:
──────
database:                   reddit
table:                      rc_2017
mutation_id:                mutation_858.txt
command:                    DELETE WHERE body = '[deleted]'
create_time:                2018-12-27 22:41:01
block_numbers.partition_id: ['']
block_numbers.number:       [858]
parts_to_do:                64
is_done:                    0
2 rows in set. Elapsed: 0.017 sec.

After a while, we can do the count again:

:) select * from system.mutations\G
SELECT *
FROM system.mutations
...
Row 2:
──────
database:                   reddit
table:                      rc_2017
mutation_id:                mutation_858.txt
command:                    DELETE WHERE body = '[deleted]'
create_time:                2018-12-27 22:41:01
block_numbers.partition_id: ['']
block_numbers.number:       [858]
parts_to_do:                0
is_done:                    1

As we can see our “mutation” is done.

Text analysis

ClickHouse does not offer full text search, however we can use some text functions. In my previous blog post about ClickHouse I used it to find the most popular wikipedia page of the month. This time I’m trying to find the news keywords of the year using all reddit comments: basically I’m calculating the most frequently used new words for the specific year (algorithm based on an article about finding trending topics using Google Books n-grams data). To do that I’m using the ClickHouse function alphaTokens(body) which will split the “body” field into words. From there, I can count the words or use arrayJoin to create a list (similar to MySQL’s group_concat function). Here is the example:

First I created a table word_by_year_news:

create table word_by_year_news ENGINE MergeTree() PARTITION BY y ORDER BY (y) as
select a.w as w, b.y as y, sum(a.occurrences)/b.total as ratio from
(
select
 lower(arrayJoin(alphaTokens(body))) as w,
 toYear(toDate(created_utc)) as y,
 count() as occurrences
from rc
where body <> '[deleted]'
and created_utc < toUnixTimestamp('2018-01-01 00:00:00')
and created_utc >= toUnixTimestamp('2007-01-01 00:00:00')
and subreddit in ('news', 'politics', 'worldnews')
group by w, y
having length(w) > 4
) as a
ANY INNER JOIN
(
select
 toYear(toDate(created_utc)) as y,
 sum(length(alphaTokens(body))) as total
from rc
where body <> '[deleted]'
and subreddit in ('news', 'politics', 'worldnews')
and created_utc < toUnixTimestamp('2018-01-01 00:00:00')
and created_utc >= toUnixTimestamp('2007-01-01 00:00:00')
group by y
) AS b
ON a.y = b.y
group by
  a.w,
  b.y,
  b.total;
0 rows in set. Elapsed: 787.032 sec. Processed 7.35 billion rows, 194.32 GB (9.34 million rows/s., 246.90 MB/s.)

This will store all frequent words (I’m filtering by subreddits; the examples are: “news, politics and worldnews” or “programming”) as well as its occurrence this year; actually I want to store “relative” occurrence which is called “ratio” above: for each word I divide its occurrence by the number of total words this year (this is needed as the number of comments grows significantly year by year).

Now we can actually calculate the words of the year:

SELECT
    groupArray(w) as words,
    y + 1 as year
FROM
(
    SELECT
        w,
        CAST((y - 1) AS UInt16) AS y,
        ratio AS a_ratio
    FROM word_by_year_news
    WHERE ratio > 0.00001
) AS a
ALL INNER JOIN
(
    SELECT
        w,
        y,
        ratio AS b_ratio
    FROM word_by_year_news
    WHERE ratio > 0.00001
) AS b USING (w, y)
WHERE (y > 0) AND (a_ratio / b_ratio > 3)
GROUP BY y
ORDER BY
    y
LIMIT 100;
10 rows in set. Elapsed: 0.232 sec. Processed 14.61 million rows, 118.82 MB (63.01 million rows/s., 512.29 MB/s.)

And the results are (here I’m grouping words for each year):

For “programming” subreddit:

┌─year─┬─words─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 2007 │ ['audio','patents','swing','phones','gmail','opera','devices','phone','adobe','vista','backup','mercurial','mobile','passwords','scala','license','copyright','licenses','photoshop'] │
│ 2008 │ ['webkit','twitter','teacher','android','itunes']                                                                                                                                     │
│ 2009 │ ['downvotes','upvote','drupal','android','upvoted']                                                                                                                                   │
│ 2010 │ ['codecs','imgur','floppy','codec','adobe','android']                                                                                                                                 │
│ 2011 │ ['scala','currency','println']                                                                                                                                                        │
│ 2013 │ ['voting','maven']                                                                                                                                                                    │
│ 2014 │ ['compose','xamarin','markdown','scrum','comic']                                                                                                                                      │
│ 2015 │ ['china','sourceforge','subscription','chinese','kotlin']                                                                                                                             │
│ 2016 │ ['systemd','gitlab','autotldr']                                                                                                                                                       │
│ 2017 │ ['offices','electron','vscode','blockchain','flash','collision']                                                                                                                      │
└──────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

For news subreddit:

┌─year─┬─words──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 2008 │ ['michigan','delegates','obama','alaska','georgia','russians','hamas','biden','hussein','barack','elitist','mccain']                                                                                                                                       │
│ 2009 │ ['stimulus','reform','medicare','franken','healthcare','payer','insurance','downvotes','hospitals','patients','option','health']                                                                                                                           │
│ 2010 │ ['blockade','arizona']                                                                                                                                                                                                                                     │
│ 2011 │ ['protests','occupy','romney','weiner','protesters']                                                                                                                                                                                                       │
│ 2012 │ ['santorum','returns','martin','obamacare','romney']                                                                                                                                                                                                       │
│ 2013 │ ['boston','chemical','surveillance']                                                                                                                                                                                                                       │
│ 2014 │ ['plane','poland','radar','subreddits','palestinians','putin','submission','russia','automoderator','compose','rockets','palestinian','hamas','virus','removal','russians','russian']                                                                      │
│ 2015 │ ['refugees','refugee','sanders','debates','hillary','removal','participating','removed','greece','clinton']                                                                                                                                                │
│ 2016 │ ['morons','emails','opponent','establishment','trump','reply','speeches','presidency','clintons','electoral','donald','trumps','downvote','november','subreddit','shill','domain','johnson','classified','bernie','nominee','users','returns','primaries','foundation','voters','autotldr','clinton','email','supporter','election','feedback','clever','leaks','accuse','candidate','upvote','rulesandregs','convention','conduct','uncommon','server','trolls','supporters','hillary'] │
│ 2017 │ ['impeached','downvotes','monitored','accusations','alabama','violation','treason','nazis','index','submit','impeachment','troll','collusion','bannon','neutrality','permanent','insults','violations']                                                    │
└──────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Conclusion

ClickHouse is a great massively parallel analytical system. It is extremely efficient and can potentially (with some hacks) be used as a main backend database powering a public API gateway serving both realtime and analytical queries. At the same time, it was not originally designed that way. Let me know in the comments if you are using ClickHouse for this or similar projects.


Photo by John Baker on Unsplash

 

 

by Alexander Rubin at January 14, 2019 04:34 PM

January 13, 2019

Valeriy Kravchuk

Understanding Status of MariaDB Server JIRA Issues

In my previous blog post on MariaDB's JIRA for MySQL users who are familiar with MySQL bugs database (but may be new to JIRA) I've presented some details about statuses that JIRA issues may have. There is no one to one correspondence with MySQL bug's statuses that I once described in details here. In case of MariaDB Server bugs ("JIRA issues") one may have to check not only "Status" field, but also "Resolution" filed and even "Labels" field to quickly understand what is the real status and what MariaDB engineers decided or are waiting for. So, I think some additional clarifications may help MySQL users who check or report MariaDB bugs as well.

Let me present details of this statuses correspondence in a simple table, where the first column contains MySQL's bug status, while 3 other columns contain the content of corresponding MariaDB Server JIRA issue's fields, "Status", "Resolution" and "Labels". There is also "Comment" column with some explanation on what else is usually done in JIRA issue when it gets this set of values defining its status or what this may mean in MySQL bugs database etc. Most important MySQL bug statuses are taken from this my post (there are more of them, but others are rarely used, especially when real work on bugs was moved into internal bugs database by Oracle, or were removed since that post as it happened to "To be fixed later").

MySQL Bug StatusMariaDB JIRA StatusMariaDB JIRA ResolutionMariaDB JIRA LabelComment
OpenOPENUnresolvedTypical status for just reported bug
ClosedCLOSEDFixedYou should see list of versions that got the fix in the Fix Version/s field
DuplicateCLOSEDDuplicateSo, in MariaDB it's "closed as a duplicate"
AnalyzingOPENUnresolvedUsually bug is assigned when some engineer is working on it, including analysis stage
VerifiedCONFIRMEDUnresolvedCONFIRMED bugs are usually assigned in JIRA while in MySQL "Verified" bugs are usually unassigned
Won't fixCLOSEDWon't FixUsually remains assigned
Can't repeatCLOSEDCannot reproduceUnlike in MySQL, usually means that both engineer and bug reporter are not able to reproduce this
No FeedbackCLOSEDIncompleteneed_feedbackAs in MySQL, bug should stay with "need_feedback" label for some time before it's closed as incomplete
Need FeedbackOPENUnresolvedneed_feedbackUsually in the last comment in the bug you can find out what kind of feedback is required. No automatic setting to "No Feedback" in 30 days
Not a BugCLOSEDNot a Bug 
UnsupportedCLOSEDWon't FixThere is no special "Unsupported" status in MariaDB. Most likely when there is a reason NOT to fix it's stated in the comment.

In the table above you can click on some links to see the list of MariaDB bugs with the status discussed in the table row. This is how I am going to use this post from now on, as a quick search starting point :) It will also be mentioned on one of slides of my upcoming FOSDEM 2019 talk.

by Valeriy Kravchuk (noreply@blogger.com) at January 13, 2019 06:03 PM

Kurt von Finck

Bruce Shark

Bruce Shark

Dave Thompson

catty _big

https://youtu.be/ttrNjqT9KJc

by mneptok at January 13, 2019 06:00 AM

January 11, 2019

Peter Zaitsev

AWS Aurora MySQL – HA, DR, and Durability Explained in Simple Terms

It’s a few weeks after AWS re:Invent 2018 and my head is still spinning from all of the information released at this year’s conference. This year I was able to enjoy a few sessions focused on Aurora deep dives. In fact, I walked away from the conference realizing that my own understanding of High Availability (HA), Disaster Recovery (DR), and Durability in Aurora had been off for quite a while. Consequently, I decided to put this blog out there, both to collect the ideas in one place for myself, and to share them in general. Unlike some of our previous blogs, I’m not focused on analyzing Aurora performance or examining the architecture behind Aurora. Instead, I want to focus on how HA, DR, and Durability are defined and implemented within the Aurora ecosystem.  We’ll get just deep enough into the weeds to be able to examine these capabilities alone.

introducing the aurora storage engine 1

Aurora MySQL – What is it?

We’ll start with a simplified discussion of what Aurora is from a very high level.  In its simplest description, Aurora MySQL is made up of a MySQL-compatible compute layer and a multi-AZ (multi availability zone) storage layer. In the context of an HA discussion, it is important to start at this level, so we understand the redundancy that is built into the platform versus what is optional, or configurable.

Aurora Storage

The Aurora Storage layer presents a volume to the compute layer. This volume is built out in 10GB increments called protection groups.  Each protection group is built from six storage nodes, two from each of three availability zones (AZs).  These are represented in the diagram above in green.  When the compute layer—represented in blue—sends a write I/O to the storage layer, the data gets replicated six times across three AZs.

Durable by Default

In addition to the six-way replication, Aurora employs a 4-of-6 quorum for all write operations. This means that for each commit that happens at the database compute layer, the database node waits until it receives write acknowledgment from at least four out of six storage nodes. By receiving acknowledgment from four storage nodes, we know that the write has been saved in at least two AZs.  The storage layer itself has intelligence built-in to ensure that each of the six storage nodes has a copy of the data. This does not require any interaction with the compute tier. By ensuring that there are always at least four copies of data, across at least two datacenters (AZs), and ensuring that the storage nodes are self-healing and always maintain six copies, it can be said that the Aurora Storage platform has the characteristic of Durable by Default.  The Aurora storage architecture is the same no matter how large or small your Aurora compute architecture is.

One might think that waiting to receive four acknowledgments represents a lot of I/O time and is therefore an expensive write operation.  However, Aurora database nodes do not behave the way a typical MySQL database instance would. Some of the round-trip execution time is mitigated by the way in which Aurora MySQL nodes write transactions to disk. For more information on exactly how this works, check out Amazon Senior Engineering Manager, Kamal Gupta’s deep-dive into Aurora MySQL from AWS re:Invent 2018.

HA and DR Options

While durability can be said to be a default characteristic to the platform, HA and DR are configurable capabilities. Let’s take a look at some of the HA and DR options available. Aurora databases are deployed as members of an Aurora DB Cluster. The cluster configuration is fairly flexible. Database nodes are given the roles of either Writer or Reader. In most cases, there will only be one Writer node. The Reader nodes are known as Aurora Replicas. A single Aurora Cluster may contain up to 15 Aurora Replicas. We’ll discuss a few common configurations and the associated levels of HA and DR which they provide. This is only a sample of possible configurations: it is not meant to represent an exhaustive list of the possible configuration options available on the Aurora platform.

Single-AZ, Single Instance Deployment

great durability with Aurora but DA and HA less so

The most basic implementation of Aurora is a single compute instance in a single availability zone. The compute instance is monitored by the Aurora Cluster service and will be restarted if the database instance or compute VM has a failure. In this architecture, there is no redundancy at the compute level. Therefore, there is no database level HA or DR. The storage tier provides the same high level of durability described in the sections above. The image below is a view of what this configuration looks like in the AWS Console.

Single-AZ, Multi-Instance

Introducing HA into an Amazon Aurora solutionHA can be added to a basic Aurora implementation by adding an Aurora Replica.  We increase our HA level by adding Aurora Replicas within the same AZ. If desired, the Aurora Replicas can be used to also service some of the read traffic for the Aurora Cluster. This configuration cannot be said to provide DR because there are no database nodes outside the single datacenter or AZ. If that datacenter were to fail, then database availability would be lost until it was manually restored in another datacenter (AZ). It’s important to note that while Aurora has a lot of built-in automation, you will only benefit from that automation if your base configuration facilitates a path for the automation to follow. If you have a single-AZ base deployment, then you will not have the benefit of automated Multi-AZ availability. However, as in the previous case, durability remains the same. Again, durability is a characteristic of the storage layer. The image below is a view of what this configuration looks like in the AWS Console. Note that the Writer and Reader are in the same AZ.

Multi-AZ Options

Partial disaster recovery with Amazon auroraBuilding on our previous example, we can increase our level of HA and add partial DR capabilities to the configuration by adding more Aurora Replicas. At this point we will add one additional replica in the same AZ, bringing the local AZ replica count to three database instances. We will also add one replica in each of the two remaining regional AZs. Aurora provides the option to configure automated failover priority for the Aurora Replicas. Choosing your failover priority is best defined by the individual business needs. That said, one way to define the priority might be to set the first failover to the local-AZ replicas, and subsequent failover priority to the replicas in the other AZs. It is important to remember that AZs within a region are physical datacenters located within the same metro area. This configuration will provide protection for a disaster localized to the datacenter. It will not, however, provide protection for a city-wide disaster. The image below is a view of what this configuration looks like in the AWS Console. Note that we now have two Readers in the same AZ as the Writer and two Readers in two other AZs.

Cross-Region Options

The three configuration types we’ve discussed up to this point represent configuration options available within an AZ or metro area. There are also options available for cross-region replication in the form of both logical and physical replication.

Logical Replication

Aurora supports replication to up to five additional regions with logical replication.  It is important to note that, depending on the workload, logical replication across regions can be notably susceptible to replication lag.

Physical Replication

Durability, High Availability and Disaster Recovery with Amazon AuroraOne of the many announcements to come out of re:Invent 2018 is a product called Aurora Global Database. This is Aurora’s implementation of cross-region physical replication. Amazon’s published details on the solution indicate that it is storage level replication implemented on dedicated cross-region infrastructure with sub-second latency. In general terms, the idea behind a cross-region architecture is that the second region could be an exact duplicate of the primary region. This means that the primary region can have up to 15 Aurora Replicas and the secondary region can also have up to 15 Aurora Replicas. There is one database instance in the secondary region in the role of writer for that region. This instance can be configured to take over as the master for both regions in the case of a regional failure. In this scenario the secondary region becomes primary, and the writer in that region becomes the primary database writer. This configuration provides protection in the case of a regional disaster. It’s going to take some time to test this, but at the moment this architecture appears to provide the most comprehensive combination of Durability, HA, and DR. The trade-offs have yet to be thoroughly explored.

Multi-Master Options

Amazon is in the process of building out a new capability called Aurora Multi-Master. Currently, this feature is in preview phase and has not been released for general availability. While there were a lot of talks at re:Invent 2018 which highlighted some of the components of this feature, there is still no affirmative date for release. Early analysis points to the feature being localized to the AZ. It is not known if cross-region Multi-Master will be supported, but it seems unlikely.

Summary

As a post re:Invent takeaway, what I learned was that there is an Aurora configuration to fit almost any workload that requires strong performance behind it. Not all heavy workloads also demand HA and DR. If this describes one of your workloads, then there is an Aurora configuration that fits your needs. On the flip side, it is also important to remember that while data durability is an intrinsic quality of Aurora, HA and DR are not. These are completely configurable. This means that the Aurora architect in your organization must put thought and due diligence into the way they design your Aurora deployment. While we all need to be conscious of costs, don’t let cost consciousness become a blinder to reality. Just because your environment is running in Aurora does not mean you automatically have HA and DR for your database. In Aurora, HA and DR are configuration options, and just like the on-premise world, viable HA and DR have additional costs associated with them.

For More Information See Also:

 

 

 

by Brian Walters at January 11, 2019 07:53 PM

January 10, 2019

Peter Zaitsev

Percona Backup for MongoDB 0.2.0-Alpha Is Now Available

Percona Backup for MongoDB

Percona Backup for MongoDBPercona announces the first public release of Percona Backup for MongoDB 0.2.0-Alpha on January 10, 2019.

Percona Backup for MongoDB is a distributed, low-impact solution for consistent backups of MongoDB sharded clusters and replica sets. This is a tool for creating consistent backups across a MongoDB sharded cluster (or a single replica set), and for restoring those backups to a specific point in time. Percona Backup for MongoDB uses a distributed client/server architecture to perform backup/restore actions. The project was inspired by (and intends to replace) the Percona-Lab/mongodb_consistent_backup tool.

This release features:

  • Consistent backup of sharded clusters
  • Compression of oplogs and logical backups
  • Backup and restore from local files
  • Backup to S3
  • Running the backup on a single replica set using the safest node (preferably non-Primary or hidden nodes with the lowest replication priority and smallest replication lag)

Future releases will include:

Percona Backup for MongoDB supports Percona Server for MongoDB or MongoDB Community Server version 3.6 or higher with MongoDB replication enabled. Binaries for the supported platforms as well as the tarball with source code are available from the GitHub repository (https://github.com/percona/percona-backup-mongodb/releases/tag/v0.2.0). For more information about Percona Backup for MongoDB and the installation steps, see this README file.

Note Percona doesn’t recommend this release for production, and its API and configuration fields are likely to change in the future. It does not feature any API level security. You are welcome to report any bugs you encounter in our bug tracking system.

Percona Backup for MongoDB

Percona Backup for MongoDB process and interactions between key components.

 

by Borys Belinsky at January 10, 2019 08:14 PM

ProxySQL 1.4.13 and Updated proxysql-admin Tool

ProxySQL 1.4.14

ProxySQL 1.4.12

ProxySQL 1.4.13, released by ProxySQL, is now available for download in the Percona Repository along with an updated version of Percona’s proxysql-admin tool.

ProxySQL is a high-performance proxy, currently for MySQL and its forks (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. René Cannaò created ProxySQL for DBAs as a means of solving complex replication topology issues.

The ProxySQL 1.4.13 source and binary packages available at https://percona.com/downloads/proxysql include ProxySQL Admin – a tool, developed by Percona to configure Percona XtraDB Cluster nodes into ProxySQL. Docker images for release 1.4.13 are available as well: https://hub.docker.com/r/percona/proxysql/. You can download the original ProxySQL from https://github.com/sysown/proxysql/releases. GitHub hosts the documentation in the wiki format.

Improvements

  • PSQLADM-53: Improved validation when --write-node is used with proxysql-admin
  • PSQLADM-122: galera/node monitor log now reports the count of async slave nodes that are online.

Bugs Fixed

  • PSQLADM-124: If the scheduler is configured with a –config-file that points to a file that doesn’t exist, the ERR_FILE was pointing to /dev/null. As a result, the user would not be notified about the error.
  • PSQLADM-126proxysql-admincould show an error when --syncusers was used and and mysql_users table was empty.
  • PSQLADM-127: proxysql_galera_checker could corrupt the scheduler configuration after restart
  • PSQLADM-129: Stopping or restarting ProxySQL can lead to multiple instances of proxysql_galera_checker running at the same time

ProxySQL is available under Open Source license GPLv3.

by Borys Belinsky at January 10, 2019 05:01 PM

PostgreSQL Updatable Views: Performing Schema Updates With Minimal Downtime

postgres updatable views

postgres updatable viewsRecently, one of our customers asked us how to minimize downtime when upgrading the database structure with changes that are not backwards-compatible. It’s an interesting question and I would like to visit some alternatives here. I will use PostgreSQL for this series of posts and walk through updatable views, INSTEAD OF Triggers, and the Rule System. Later, we’ll discuss alternatives available for other databases like MySQL.

This first post will give an overview of the problem and also the first implementation of the solution in PostgreSQL using updatable Views.

The Motivation

Software is like a living organism and as such, they evolve. It’s not surprising that the database schemas also evolve, and this brings us a problem: how to minimize downtime when performing upgrades? Or even further, is it possible to upgrade them without activating maintenance mode thereby making the service unavailable for our customers?

Let’s say that we want to push out an update 2.0. It’s a major update, and in this update, there are application code changes and changes to the database such as altered tables, dropped columns, new tables and so on. Checking the changelog, we notice that most of the database changes are backwards-compatible but a few modified tables are not so we can’t just push out the new database changes without breaking some functionality in the existing codebase. To avoid triggering errors while we upgrade the database, we need to shutdown the application servers, update the database, update the codebase, and then get the servers back and running again. That means that we need an unwanted maintenance window!

As per our definition of the problem, we want to get to the point where we don’t have to use this maintenance window, a point where the old and new codebase could coexist for a period of time while we upgrade the system. One solution is to not make changes that the current codebase can’t handle, but, as you may have already assumed, it isn’t really an option when we are constantly trying to optimize and improve our databases. Another option, then, would be to use PostgreSQL updatable views.

Updatable Views

PostgreSQL has introduced automatically updatable views in 9.3. The documentation[1] says that simple views are automatically updatable and the system will allow INSERT, UPDATE or DELETE statements to be used on the view in the same way as on a regular table. A view is automatically updatable if it satisfies all of the following conditions:

  • The view must have exactly one entry in its FROM list, which must be a table or another updatable view.
  • The view definition must not contain WITH, DISTINCT, GROUP BY, HAVING, LIMIT, or OFFSET clauses at the top level.
  • The view definition must not contain set operations (UNION, INTERSECT or EXCEPT) at the top level.
  • The view’s select list must not contain any aggregates, window functions, or set-returning functions.

Note that the idea is to give a simple mechanism that helps when using views, and if the view is automatically updatable the system will convert any INSERT, UPDATE or DELETE statement on the view into the corresponding statement on the underlying base table. This can also be used to increase the security granularity giving the power to define privilege that operates at the level. If using a WHERE clause in the view we can use the CHECK OPTION to prevent the user from being able to UPDATE or INSERT rows that are not in the scope of the view. For example, let’s say we have a view created to limit the user to view records from a specific country.  If the user changes the country of any record, those records would disappear from the view. The CHECK OPTION can help to prevent this from happening. I recommend reading the documentation for more information about how views work in PostgreSQL.

Implementation

Using updatable views makes the implementation as simple as creating views. For our example I will use the below table:

test=# CREATE TABLE t (id INTEGER PRIMARY KEY, name VARCHAR(100) NOT NULL, password VARCHAR(300) NOT NULL, date_created TIMESTAMP NOT NULL DEFAULT now());
CREATE TABLE
test=# INSERT INTO t(id, name, password) VALUES (1, 'user_1', 'pwd_1'), (2, 'user_2','pwd_2'),(3,'user_3','pwd_3'),(4,'user_4','pwd_4'),(5,'user_5','pwd_5');
INSERT 0 5
test=# SELECT * FROM t;
id | name | password | date_created
----+--------+----------+----------------------------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455
3 | user_3 | pwd_3 | 2018-12-27 07:50:39.562455
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455
(5 rows)

We then changed the schema renaming the columns password to pwd, date_created to dt_created and added 2 more columns, pwd_salt and comment. The added columns are not a real problem because they can be either nullable or have a default value but the column name change is a problem. The changes are:

test=# create schema v_10;
CREATE SCHEMA
test=# CREATE VIEW v_10.t AS SELECT id, name, password AS password, date_created AS date_created FROM public.t;
CREATE VIEW
test=# ALTER TABLE public.t RENAME COLUMN password TO pwd;
ALTER TABLE
test=# ALTER TABLE public.t RENAME COLUMN date_created TO dt_created;
ALTER TABLE
test=# ALTER TABLE public.t ADD COLUMN pwd_salt VARCHAR(100);
ALTER TABLE
test=# ALTER TABLE public.t ADD COLUMN comment VARCHAR(500);
ALTER TABLE

To make sure our application will work properly we’ve defined that the tables will be in a specific main schema, in this example is the PUBLIC schema and the views will be in the versioned schemas. In this case, if we have a change in one specific version that needs a view guaranteeing backwards-compatibility, we just create the view inside the versioned schema and apply the changes to the table in the main schema. The application will always define the “search_path” as “versioned_schema,main_schema”, which is “v_10, public” in this example:

test=# SET search_path TO v_10, public;
SET
test=# SELECT * FROM t;
id | name | password | date_created
----+--------+----------+----------------------------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455
3 | user_3 | pwd_3 | 2018-12-27 07:50:39.562455
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455
(5 rows)
test=# select * from public.t;
id | name | pwd | dt_created | pwd_salt | comment
----+--------+-------+----------------------------+----------+---------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455 | |
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455 | |
3 | user_3 | pwd_3 | 2018-12-27 07:50:39.562455 | |
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455 | |
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455 | |
(5 rows)

As we can see, the application still sees the old schema, but does this work? What if someone updates the password of ID #3? Let’s check:

test=# UPDATE t SET password = 'new_pwd_3' WHERE id = 3;
UPDATE 1
test=# SELECT * FROM t;
id | name | password | date_created
----+--------+-----------+----------------------------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455
3 | user_3 | new_pwd_3 | 2018-12-27 07:50:39.562455
(5 rows)
test=# SELECT * FROM public.t;
id | name | pwd | dt_created | pwd_salt | comment
----+--------+-----------+----------------------------+----------+---------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455 | |
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455 | |
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455 | |
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455 | |
3 | user_3 | new_pwd_3 | 2018-12-27 07:50:39.562455 | |
(5 rows)

As we can see, the updatable view worked just like a charm! The new and old application codebase can coexist and work together while we roll up our upgrades. There are some restrictions, as explained in the documentation, like having only one table or view in the WHERE clause but for its simplicity, upgradable views do a great job. For more complex cases where we need to split/join tables? Well, we will discuss these in future articles and show how we can solve them with both TRIGGERS and the PostgreSQL Rule System.

References

[1] https://www.postgresql.org/docs/current/sql-createview.html


Photo by Egor Kamelev from Pexels

by Charly Batista at January 10, 2019 09:35 AM

Kurt von Finck

January 09, 2019

Peter Zaitsev

Percona Toolkit 3.0.13 Is Now Available

percona toolkit

percona toolkitPercona announces the release of Percona Toolkit 3.0.13 for January 9, 2019.

Percona Toolkit is a collection of advanced open source command-line tools, developed and used by the Percona technical staff, that are engineered to perform a variety of MySQL®, MongoDB® and system tasks that are too difficult or complex to perform manually. With over 1,000,000 downloads, Percona Toolkit supports Percona Server for MySQL, MySQL®, MariaDB®, Percona Server for MongoDB and MongoDB.

Percona Toolkit, like all Percona software, is free and open source. You can download packages from the website or install from official repositories.

This release includes the following changes:

Bug fixes:

  • PT-1673: pt-show-grants was incompatible with MariaDB 10+ (thanks Tim Birkett)
  • PT-1638: pt-online-schema-change was erroneously taking MariaDB 10.x for MySQL 8.0 and rejecting to work with it to avoid the upstream bug #89441 scope.
  • PT-1616: pt-table-checksum failed to resume on large tables with binary strings containing invalid UTF-8 characters.
  • PT-1573: pt-query-digest didn’t work in case of log_timestamps = SYSTEM my.cnf option.
  • PT-157: Specifying a non-primary key index with the ‘i’ part of the --source argument made pt-archiver to ignore the --primary-key-only option presence.

Improvements:

  • PT-1340: pt-stalk now doesn’t call mysqladmin debug command by default to avoid flooding in the error log. CMD_MYSQLADMIN="mysqladmin debug" environment variable reverts pt-stalk to the previous way of operation.
  • PT-1637: A new --fail-on-stopped-replication option  allows pt-table-checksum to detect failing slave nodes.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

by Dmitriy Kostiuk at January 09, 2019 04:49 PM

Amazon Aurora Serverless – The Sleeping Beauty

Amazon RDS Aurora Serverless activation times

One of the most exciting features Amazon Aurora Serverless brings to the table is its ability to go to sleep (pause) when idle. This is a fantastic feature for development and test environments. You get access to a powerful database to run tests quickly, but it goes easy on your wallet as you only pay for storage when the instance is paused.

You can configure Amazon RDS Aurora Serverless to go to sleep after a specified period of time. This can be set to anywhere between five minutes and 24 hours

configure Amazon RDS Aurora Serverless sleep time

For this feature to work, however, inactivity has to be complete. If you have so much as a single query or even maintain an idle open connection, Amazon Aurora Serverless will not be able to pause.

This means, for example, that pretty much any monitoring you may have enabled, including our own Percona Monitoring and Management (PMM) will prevent the instance from pausing. It would be great if Amazon RDS Aurora Serverless would allow us to specify user accounts to ignore, or additional service endpoints which should not prevent it from pausing, but currently you need to get by without such monitoring and diagnostic tools, or else enable them only for duration of the test run.

If you’re using Amazon Aurora Serverless to back very low traffic applications, you might consider disabling the automatic pause function, since waking up currently takes quite a while. Otherwise, your users should be prepared for a 30+ seconds wait while Amazon Aurora Serverless activates.

Having such a high time to activate means you need to be mindful of timeout configuration in your test/dev scripts so you do not have to deal with sporadic failures. Or you can also use something like the mysqladmin ping command to activate the instance before your test run.

Some activation experiments

Let’s now take a closer look at Amazon RDS Aurora Serverless activation times. These times are measured for MySQL 5.6 based Aurora Serverless – the only one currently available. I expect numbers could be different in other editions

Amazon RDS Aurora Serverless activation times

I measured the time it takes to run a trivial query (SELECT 1) after the instance goes to sleep. You’ll see I manually scaled the Amazon RDS Aurora Serverless instance to a desired capacity in ACU (Aurora Compute Units), and then had the script wait for six minutes to allow for pause to happen before running the query. The test was performed 12 times and the Min/Max/Avg times of these test runs for different settings of ACU are presented above.

You can see there is some variation between min and max times. I would expect to have even higher outliers, so plan for an activation time of more than a minute as a worst case scenario.

Also note that there is an interesting difference in the activation time between instance sizes. While in my tests the smallest possible size (2 ACU) consistently took longer to activate compared to the medium size (8 ACU), the even bigger size (64 ACU) was the slowest of all.

So make no assumptions about how long it would take for instance of given size to wake up with your workload, but rather test it if it is important consideration for you.

In some (rare) cases I also observed some internal timeouts during the resume process:

[root@ip-172-31-16-160 serverless]# mysqladmin ping -h serverless-test.cluster-XXXX.us-east-2.rds.amazonaws.com -u user -ppassword
mysqladmin: connect to server at 'serverless-test.cluster-XXXX.us-east-2.rds.amazonaws.com' failed
error: 'Database was unable to resume within timeout period.'

What about Autoscaling?

Finally, you may wonder how such Amazon Aurora Serverless pausing plays with Amazon Aurora Serverless Autoscaling ?

In my tests, I observed that resume always restores the instance size to the same ACU as it was before it was paused. However, this is where pausing configuration matters a great deal. According to this document, Amazon Aurora Serverless will not scale down more frequently than once per 900 seconds. While the document does not clarify over what period of time the conditions initiating scale down – cpu usage, connection usage etc – have to be met for scale down to be triggered, I can see that if the instance is idle for five minutes the scale down is not performed – it is just put to sleep.

At the same time, if you change this default five minute period to a longer time, the idle instance will be automatically scaled down a notch every 900 seconds before it finally goes to sleep. Consequently, when it is awakened it will not be at the last stage at which the load was applied, but instead at the stage it was at when it was scaled down. Also, scaling down is considered an event by itself, which resets the idle counter and delays the pause. For example: if the initial instance scale is 8, and the pause timer is set to 1h, it takes 1h 30 minutes for the pause to actually happen – 30 minutes to do scale down twice, plus 1 hour at the minimum size for pause to trigger

Here is a graph to illustrate this:

Amazon Aurora Serverless scale down timings

This also shows that when the load is re-applied at about 13:47, it recovers to the last number of ACU it had before the pause.

This means that a pause time of more than 15 minutes makes the pause behavior substantially different to the default.

Summary

  • Amazon Aurora Serverless automatic pause is a great for test/dev environments.
  • Resume time is relatively long, can reach as much as one minute.
  • Consider disabling automatic pausing for low traffic production applications, or at least let your users know they need to wait when they wake up the application.
  • Pause and Resume behavior is different in practice for a pause timeout of more than 15 minutes. Sticking to the default 5 minutes is recommended unless you really know what you’re doing.

by Peter Zaitsev at January 09, 2019 12:59 PM

January 08, 2019

Peter Zaitsev

Percona Live 2019 Tracks

Percona Live 2019

Percona Live Percona Live 2019Open Source Database Conference 2019 in North America has moved to Austin, Texas: a cool place to be, and host to many big names in the tech space. Read what Dave Stokes, MySQL Community Manager for Oracle, has to say in favor of Austin.

If you need a conference ticket for Austin, put in your proposal now!

Those who are successful with their presentation or tutorial submissions will receive a pass to the full three days of the event. Closing date for the call for papers is Sunday, January 20.

Percona is adopting an industry trend by organizing the conference into 13 separate tracks with one Percona expert coordinating community input for each one. We believe subject-specific mini-committees of experts should provide better results than a single mega-committee covering everything.

The MySQL track is being led by Alkin Tezuysal, Senior Technical Manager

MariaDB is the responsibility of Sveta Smirnova, Principle Support Escalation Specialist.

MongoDB is being driven by Consultant Doug Duncan.

PostgreSQL is being pushed forward by Avinash Vallarapu, PostgreSQL Support Tech Lead

Other Open Source Databases well, this important challenge has been handed to Senior Support Engineer Agustín Gallego

Java Development for Open Source Databases might be of interest to developers and is being led by Rodrigo Trindade, Service Delivery Manager

Kubernetes track is being headed by Mykola Marzhan who is our Kubernetes Technical Lead

Database Security and Compliance will be overseen by Denis Farar, General Counsel and VP of HR (but make no mistake, this is still a track where tech content is very welcome)

Automation & AI topics, at the leading edge of database technology challenges, are the responsibility of Max Bubenick, Platform Lead.

Observability & Monitoring talk selection will be led by Roma Novikov, Director of Platform Engineering – so get those PMM and other OS monitoring proposals at the ready!

Polyglot Persistence is in the hands of our Senior Software Engineer Ibrar Ahmed who is waiting to hear all about your experiences with cross-database applications, data exchange and how to meet the challenges of a hybrid database world.

Migration to OpenSource Databases which is a similar-but-different track full of challenges parallel to that of polyglot applications is being watched over by Marco Tusa, Managing Consultant

Business & Enterprise track will be driven by Brian Walters, Director of Solution Engineering who is keen to hear of your case studies and experiences of the impact of open source databases on your process and organizations.

Cloud is a special case, since it touches on virtually all aspects of open source database technology. If your talk has particular relevance to ‘cloud’ then please add this track with your submission. Similarly Innovative Technologies can apply across the board, and if you have something to share that is truly new, then add that to your track list. Those that are most exciting in the context of cloud or innovative in their approach may be selected for their cloud or innovation merit, whichever track they belong to.

Our track champions will engage with community experts to select papers and shape content. If you would like to contribute by taking on talk selection, please let me know.

New speakers, and those with less experience, are welcome, we are here to help, so first check out my community blog post with links to info and video workshops on how to put together a selection-worthy proposal. Even old-hands might find some inspiration!

All in all, we think this is a great move, with the track champions contributing their passion, experience and knowledge of contemporary open source issues to the development of excellent content.  Although we’re changing several things at once, no one gets a prize for standing still. We hope you’ll continue to support and grow with us this great, open source, database focused event! Put a note in your diary to join us from May 28 – 30 in Austin, Texas.

Finally, if you would like to get in touch with any of our track champions, please let me know

by Lorraine Pocklington, Community Manager at January 08, 2019 08:23 PM

Upcoming Webinar Wed 1/9: Walkthrough of Percona Server MySQL 8.0

Walkthrough of Percona Server for MySQL 8.0

Walkthrough of Percona Server for MySQL 8.0Please join Percona’s MySQL Product Manager, Tyler Duzan as he presents Walkthrough of Percona Server MySQL 8.0 on Wednesday, January 9th at 11:00 AM PDT (UTC-7) / 2:00 PM (UTC-4).

Register Now

Our Percona Server for MySQL 8.0 software is the company’s free, enhanced, drop-in replacement for MySQL Community Edition. The software includes all of the great features in MySQL Community Edition 8.0. Additionally, it includes enterprise-class features from Percona made available free and open source. Thousands of enterprises trust Percona Server for MySQL to deliver excellent performance and reliability for their databases and mission-critical applications. Furthermore, our open source software meets their need for a mature, proven and cost-effective MySQL solution.

In sum, register for this webinar for a walkthrough of Percona Server for MySQL 8.0.

by Tyler Duzan at January 08, 2019 07:13 PM