Planet MariaDB

March 16, 2018

Peter Zaitsev

Percona Server for MongoDB 3.2.19-3.10 Is Now Available

Percona Server for MongoDB 3.4

Percona Server for MongoDB 3.2Percona announces the release of Percona Server for MongoDB 3.2.19-3.10 on March 16, 2018. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.2 Community Edition. It supports MongoDB 3.2 protocols and drivers.

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features. requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.2.19 and includes the following additional change:

  • #PSMDB-191: Fixed a bug in MongoRocks engine initialization code which caused wrong initialization of _maxPrefix value. This could lead to reuse of dropped prefix and to accidental removal of data from the collection using reused prefix.

    In some specific conditions data records could disappear at arbitrary moment of time from the collections or indexes created after server restart.

    This could happen as the result of the following sequence of events:
    1. User deletes one or more indexes or collections. These should be the ones using maximum existing prefixes values.
    2. User shuts down the server before MongoRocks compaction thread executes compactions of deleted ranges.
    3. User restarts the server and creates new collections. Due to the bug those new collections and their indexes may get the same prefix values which were deleted and not yet compacted. User inserts some data into the new collections.
    4. After the server restart MongoRocks compaction thread continues executing compactions of the deleted ranges and this process may eventually delete data from the collections sharing prefixes with deleted ranges.

The Percona Server for MongoDB 3.2.19-3.10 release notes are available in the official documentation.

by Dmitriy Kostiuk at March 16, 2018 06:03 PM

Percona Toolkit 3.0.8 Is Now Available

Percona Server for MongoDBPercona announces the release of Percona Toolkit 3.0.8 on March 16, 2018.

Percona Toolkit is a collection of advanced open source command-line tools, developed and used by the Percona technical staff, that are engineered to perform a variety of MySQL®, MongoDB® and system tasks that are too difficult or complex to perform manually. With over 1,000,000 downloads, Percona Toolkit supports Percona Server for MySQL, MySQL, MariaDB®, Percona Server for MongoDB and MongoDB.

Percona Toolkit, like all Percona software, is free and open source. You can download packages from the website or install from official repositories.

This release includes the following changes:

New Features:

  • PT-1500: Added the --output=secure-slowlog option to pt-query-digestto replace queries in the output by their fingerprints. This provides the ability to santize a slow log in order to adhere to GDPR as the slow query log has the potential to leak sensitive information since query WHERE clauses are included.
    As an example of how the fingerprint anonymizes queries we’ll show this using an UPDATE statement:
    UPDATE sbtest12 SET k='abc' WHERE id=12345
    after fingerprinting, it becomes:
    UPDATE sbtest? SET k=k? WHERE id=?

Bug Fixes:

  • PT-1492:  pt-kill in version 3.0.7 ignores the value of the --busy-time option
  • PT-1503: The post-install script fails on VM due to improper UUID file detection

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

by Borys Belinsky at March 16, 2018 03:31 PM

This Week in Data with Colin Charles 32: Stack Overflow Developer Survey, SCALE16x and Interesting MySQL 8 Version Numbers

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

SCALE16x is over. Percona had a large showing — both Peter Zaitsev & myself had talks, and the booth in the expo hall saw Rick Golba, Marc Sherwood, and Dave Avery also pop by. The lead scanner suggests we had several hundred conversations 🙂 — read Dave Avery’s summary. My talk went well, though during Q&A the number of questions I got about MariaDB Server was quite interesting (considering it wasn’t part of my talk!). It is clear people are concerned about compatibility (because I spent close to 45 minutes after my talk answering Q&A outside too).

I got to catch up with Dave Stokes and asked him why there were version numbers being skipped in MySQL 8 (as noted in last week’s column). Now there’s a blog post explaining it: MySQL 8.0: It Goes to 11!. It has to do with version number alignment across the product line.

This week we saw something cool come out of Stack Overflow: their Developer Survey Results 2018. There were over 100,000 developers participating in this survey, a marked increase from 2017 when they only had 64,000.

About 66,264 respondents answered the question about what databases they use. MySQL is by far the most popular with 58.7% of the respondents saying they use it. This is followed by PostgreSQL getting 32.9%, MongoDB getting 25.9%, and MariaDB 13.4%. I’m surprised that Amazon RDS/Aurora got 5.1%. In 2017, the first year they introduced the database component, only 29,452 respondents participated, with 55.6% using MySQL, 26.5% using PostgreSQL, and 21% using MongoDB (MariaDB was not broken out last year).

When it came to the most “loved/dread/wanted” databases, apparently 62% of respondents loved PostgreSQL, with 58.8% loving Amazon RDS/Aurora, 55.1% MongoDB, 53.3% MariaDB Server, and 48.7% only loving MySQL. In terms of dread, 51.3% dread MySQL, while only 46.7% dread MariaDB; MongoDB has 44.9% dreading it, and PostgreSQL only 38%. As for the most wanted databases? 18.6% for MongoDB, 11.4% for PostgreSQL, 7.5% for MySQL, and 3.4% for MariaDB Server. It’s clear MongoDB topping the list ensures they have a lot to celebrate, as evidenced by this: Stack Overflow Research of 100,000 Developers Finds MongoDB is the Most Wanted Database. (In 2017, 60.8% loved PostgreSQL, 55% MongoDB, and 49.6% for MySQL; MySQL was the 3rd most dreaded database with 50.4%, followed by 45% for MongoDB, and 39.2% for PostgreSQL; as for the most wanted, MongoDB won with 20.8%, PostgreSQL got second at 11.5%, and MySQL 8.5%).

So if Stack Overflow surveys are an indication of usage, MySQL is still way more popular than anything else, including MariaDB Server regardless of its current distribution. Speaking of MariaDB, the MariaDB Foundation now accepts donations in cryptocurrencies.

MongoDB Evolved is something you should totally check out. I wish something like this exists for MySQL, since tonnes of people ask questions, e.g. “Does MySQL support transactions?”, etc.


Link List

Upcoming appearances


I look forward to feedback/tips via e-mail at or on Twitter @bytebot.


by Colin Charles at March 16, 2018 02:51 PM

Jean-Jerome Schmidt

Migrating from MySQL to PostgreSQL - What You Should Know

Whether migrating a database or project from MySQL to PostgreSQL, or choosing PostgreSQL for a new project with only MySQL knowledge, there are a few things to know about PostgreSQL and the differences between the two database systems.

PostgreSQL is a fully open source database system released under its own license, the PostgreSQL License, which is described as "a liberal Open Source license, similar to the BSD or MIT licenses.” This has allowed The PostgreSQL Global Development Group (commonly referred to as PGDG), who develops and maintains the open source project, to improve the project with help from people around the world, turning it into one of the most stable and feature rich database solutions available. Today, PostgreSQL competes with the top proprietary and open source database systems for features, performance, and popularity.

PostgreSQL is a highly compliant Relational Database System that’s scalable, customizable, and has a thriving community of people improving it every day.

What PostgreSQL Needs

In a previous blog, we discussed setting up and optimizing PostgreSQL for a new project. It is a good introduction to PostgreSQL configuration and behavior, and can be found here:

If migrating an application from MySQL to PostgreSQL, the best place to start would be to host it on similar hardware or hosting platform as the source MySQL database.

On Premise

If hosting the database on premise, bare metal hosts (rather than Virtual Machines) are generally the best option for hosting PostgreSQL. Virtual Machines do add some helpful features at times, but they come at the cost of losing power and performance from the host in general, while bare metal allows the PostgreSQL software to have full access to performance with fewer layers between it and the hardware. On premise hosts would need an administrator to maintain the databases, whether it’s a full time employee or contractor, whichever makes more sense for the application needs.

In The Cloud

Cloud hosting has come a long way in the past few years, and countless companies across the world host their databases in cloud based servers. Since cloud hosts are highly configurable, the right size and power of host can be selected for the specific needs of the database, with a cost that matches.

Depending on the hosting option used, new hosts can be provisioned quickly, memory / cpu / disk can be tweaked quickly, and even additional backup methods can be available. When choosing a cloud host, look for whether a host is dedicated or shared, dedicated being better for extremely high load databases. Another key is to make sure the IOPS available for the cloud host is good enough for the database activity needs. Even with a large memory pool for PostgreSQL, there will always be disk operations to write data to disk, or fetch data when not in memory.

Cloud Services

Since PostgreSQL is increasing in popularity, it’s being found available on many cloud database hosting services like Heroku, Amazon AWS, and others, and is quickly catching up to the popularity of MySQL. These services allow a third party to host and manage a PostgreSQL database easily, allowing focus to remain on the application.

Concepts / term comparisons

There are a few comparisons to cover when migrating from MySQL to PostgreSQL, common configuration parameters, terms, or concepts that operate similarly but have their differences.

Database Terms

Various database terms can have different meanings within different implementations of the technology. Between MySQL and PostgreSQL, there’s a few basic terms that are understood slightly differently, so a translation is sometimes needed.


In MySQL, a ‘cluster’ usually refers to multiple MySQL database hosts connected together to appear as a single database or set of databases to clients.

In PostgreSQL, when referencing a ‘cluster’, it is a single running instance of the database software and all its sub-processes, which then contains one or more databases.


In MySQL, queries can access tables from different databases at the same time (provided the user has permission to access each database).

FROM customer_database.customer_table t1
JOIN orders_database.order_table t2 ON t1.customer_id = t2.customer_id
WHERE name = ‘Bob’;

However in PostgreSQL this cannot happen unless using Foreign Data Wrappers (a topic for another time). Instead, a PostgreSQL database has the option for multiple ‘schemas’ which operate similarly to databases in MySQL. Schemas contain the tables, indexes, etc, and can be accessed simultaneously by the same connection to the database that houses them.

FROM customer_schema.customer_table t1
JOIN orders_schema.order_table t2 ON t1.customer_id = t2.customer_id
WHERE name = ‘Bob’;

Interfacing with the PostgreSQL

In the MySQL command line client (mysql), interfacing with the database uses key works like ‘DESCRIBE table’ or ‘SHOW TABLES’. The PostgreSQL command line client (psql) uses its own form of ‘backslash commands’. For example, instead of ‘SHOW TABLES’, PostgreSQL’s command is ‘\dt’, and instead of ‘SHOW DATABASES;’, the command is ‘\l’.

A full list of commands for ‘psql’ can be found by the backslash command ‘\?’ within psql.

Language Support

Like MySQL, PostgreSQL has libraries and plugins for all major languages, as well as ODBC drivers along the lines of MySQL and Oracle. Finding a great and stable library for any language needed is an easy task.

Stored Procedures

Unlike MySQL, PostgreSQL has a wide range of supported Procedural Languages to choose from. In the base install of PostgreSQL, the supported languages are PL/pgSQL (SQL Procedural Language), PL/Tcl (Tcl Procedural Language), PL/Perl (Perl Procedural Language), and PL/Python (Python Procedural Language). Third party developers may have more languages not officially supported by the main PostgreSQL group.


  • Memory

    MySQL tunes this with key_buffer_size when using MyISAM, and with innodb_buffer_pool_size when using InnoDB.

    PostgreSQL uses shared_buffers for the main memory block given to the database for caching data, and generally sticks around 1/4th of system memory unless certain scenarios require that to change. Queries using memory for sorting use the work_mem value, which should be increased cautiously.

Tools for migration

Migrating to PostgreSQL can take some work, but there are tools the community has developed to help with the process. Generally they will convert / migrate the data from MySQL to PostgreSQL, and recreate tables / indexes. Stored Procedures or functions, are a different story, and usually require manual re-writing either in part, or from the ground up.

Some example tools available are pgloader and FromMySqlToPostgreSql. Pgloader is a tool written in Common Lisp that imports data from MySQL into PostgreSQL using the COPY command, and loads data, indexes, foreign keys, and comments with data conversion to represent the data correctly in PostgreSQL as intended. FromMySqlToPostgreSql is a similar tool written in PHP, and can convert MySQL data types to PostgreSQL as well as foreign keys and indexes. Both tools are free, however many other tools (free and paid) exist and are newly developed as new versions of each database software are released.

Converting should always include in depth evaluation after the migration to make sure data was converted correctly and functionality works as expected. Testing beforehand is always encouraged for timings and data validation.

Replication Options

If coming from MySQL where replication has been used, or replication is needed at all for any reason, PostgreSQL has several options available, each with its own pros and cons, depending on what is needed through replication.

  • Built In:

    By default, PostgreSQL has its own built in replication mode for Point In Time Recovery (PITR). This can be set up using either file-based log shipping, where Write Ahead Log files are shipped to a standby server where they are read and replayed, or Streaming Replication, where a read only standby server fetches transaction logs over a database connection to replay them.

    Either one of these built in options can be set up as either a ‘warm standby’ or ‘hot standby.’ A ‘warm standby’ doesn’t allow connections but is ready to become a master at any time to replace a master having issues. A ‘hot standby’ allows read-only connections to connect and issue queries, in addition to being ready to become a read/write master at any time as well if needed.

  • Slony:

    One of the oldest replication tools for PostgreSQL is Slony, which is a trigger based replication method that allows a high level of customization. Slony allows the setup of a Master node and any number of Replica nodes, and the ability to switch the Master to any node desired, and allows the administrator to choose which tables (if not wanting all tables) to replicate. It’s been used not just for replicating data in case of failure / load balancing, but shipping specific data to other services, or even minimal downtime upgrades, since replication can go across different versions of PostgreSQL.

    Slony does have the main requirement that every table to be replicated have either a PRIMARY KEY, or a UNIQUE index without nullable columns.

  • Bucardo:

    When it comes to multi-master options, Bucardo is one of few for PostgreSQL. Like Slony, it’s a third party software package that sits on top of PostgreSQL. Bucardo calls itself “an asynchronous PostgreSQL replication system, allowing for both multi-master and multi-slave operations.” The main benefit is multi-master replication, that works fairly well, however it does lack conflict resolution, so applications should be aware of possible issues and fix accordingly.

    There are many other replication tools as well, and finding the one that works best for an application depends on the specific needs.

Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl


PostgreSQL has a thriving community willing to help with any issues / info that may be needed.

  • IRC

    An active IRC chatroom named #postgresql is available on freenode, as administrators and developers world wide chat about PostgreSQL and related projects / issues. There’s even smaller rooms for specifics like Slony, Bucardo, and more.

  • Mailing lists

    There are a handful of PostgreSQL mailing lists for ‘general’, ‘admin’, ‘performance’, and even ‘novice’ (a great place to start if new to PostgreSQL in general). The mailing lists are subscribed to by many around the world, and provide a very useful wealth of resources to answer any question that may need answering.

    A full list of PostgreSQL mailing lists can be found at

  • User Groups

    User groups are a great place to get involved and active in the community, and many large cities worldwide have a PostgreSQL User Group (PUG) available to join and attend, and if not, consider starting one. These groups are great for networking, learning new technologies, and even just asking questions in person to people from any level of experience.

  • Documentation

    Most Importantly, PostgreSQL is documented very well. Any information for configuration parameters, SQL functions, usage, all can be easily learned through the official documentation provided on PostgreSQL’s website. If at all anything is unclear, the community will help in the previous outlined options.

by Brian Fehrle at March 16, 2018 10:15 AM

March 15, 2018

Oli Sennhauser

MySQL Environment MyEnv 2.0.0 has been released

FromDual has the pleasure to announce the release of the new version 2.0.0 of its popular MySQL, Galera Cluster and MariaDB multi-instance environment MyEnv.

The new MyEnv can be downloaded here.

In the inconceivable case that you find a bug in the MyEnv please report it to the FromDual bug tracker.

Any feedback, statements and testimonials are welcome as well! Please send them to

Upgrade from 1.1.x to 2.0.0

# cd ${HOME}/product
# tar xf /download/myenv-2.0.0.tar.gz
# rm -f myenv
# ln -s myenv-2.0.0 myenv


If you are using plug-ins for showMyEnvStatus create all the links in the new directory structure:

cd ${HOME}/product/myenv
ln -s ../../utl/oem_agent.php plg/showMyEnvStatus/

Upgrade of the instance directory structure

From MyEnv v1 to v2 the directory structure of instances has fundamentally changed. Nevertheless MyEnv v2 works fine with MyEnv v1 directory structures.

Old structure


New structure


But over time you possibly want to migrate the old structure to the new one. The following steps describe how you upgrade MyEnv instance structure v1 to v2:

mysql@chef:~ [mysql-57, 3320]> mypprod
mysql@chef:~ [mypprod, 3309]> stop
mysql@chef:~ [mypprod, 3309]> mkdir ~/database/mypprod
mysql@chef:~ [mypprod, 3309]> mkdir ~/database/mypprod/binlog ~/database/mypprod/data ~/database/mypprod/etc ~/database/mypprod/log ~/database/mypprod/tmp
mysql@chef:~ [mypprod, 3309]> mv ~/data/mypprod/binary-log.* ~/database/mypprod/binlog/
mysql@chef:~ [mypprod, 3309]> mv ~/data/mypprod/my.cnf ~/database/mypprod/etc/
mysql@chef:~ [mypprod, 3309]> mv ~/data/mypprod/error.log ~/database/mypprod/log/
mysql@chef:~ [mypprod, 3309]> mv ~/data/mypprod/slow.log ~/database/mypprod/log/
mysql@chef:~ [mypprod, 3309]> mv ~/data/mypprod/general.log ~/database/mypprod/log/
mysql@chef:~ [mypprod, 3309]> mv ~/data/mypprod/* ~/database/mypprod/data/
mysql@chef:~ [mypprod, 3309]> rmdir ~/data/mypprod
mysql@chef:~ [mypprod, 3309]> vi /etc/myenv/myenv.conf

- datadir              = /home/mysql/data/mypprod
+ datadir              = /home/mysql/database/mypprod/data
- my.cnf               = /home/mysql/data/mypprod/my.cnf
+ my.cnf               = /home/mysql/database/mypprod/etc/my.cnf
+ instancedir          = /home/mysql/database/mypprod

mysql@chef:~ [mypprod, 3309]> source ~/.bash_profile
mysql@chef:~ [mypprod, 3309]> cde
mysql@chef:~/database/mypprod/etc [mypprod, 3309]> vi my.cnf 

- log_bin                                = binary-log
+ log_bin                                = /home/mysql/database/mypprod/binlog/binary-log
- datadir                                = /home/mysql/data/mypprod
+ datadir                                = /home/mysql/database/mypprod/data
- tmpdir                                 = /tmp
+ tmpdir                                 = /home/mysql/database/mypprod/tmp
- log_error                              = error.log
+ log_error                              = /home/mysql/database/mypprod/log/error.log
- slow_query_log_file                    = slow.log
+ slow_query_log_file                    = /home/mysql/database/mypprod/log/slow.log
- general_log_file                       = general.log
+ general_log_file                       = /home/mysql/database/mypprod/log/general.log

mysql@chef:~/database/mypprod/etc [mypprod, 3309]> cdb
mysql@chef:~/database/mypprod/binlog [mypprod, 3309]> vi binary-log.index 

- ./binary-log.000001
+ /home/mysql/database/mypprod/binlog/binary-log.000001
- ./binary-log.000001
+ /home/mysql/database/mypprod/binlog/binary-log.000001

mysql@chef:~/database/mypprod/binlog [mypprod, 3309]> start
mysql@chef:~/database/mypprod/binlog [mypprod, 3309]> exit

Changes in MyEnv 2.0.0


  • New v2 instance directory structure and instancedir variable introduced, aliases adapted accordingly.
  • Configuration files aliases.conf and variables.conf made more user friendly.
  • PHP 7 support added.
  • Made MyEnv MySQL 8.0 ready.
  • Packaging (DEB/RPM) for RHEL 6 and 7 and SLES 11 and 12 DEB (Ubuntu/Debian) available.
  • OEM agent plug-in made ready for OEM v12.
  • More strict configuration checking.
  • Version more verbose.
  • Database health check mysqladmin replace by UNIX socket probing.
  • Various bug fixes (#168, #161, ...)
  • MyEnv made ready for systemd.
  • Bind-address output nicer in up.
  • New variables added to my.cnf template (super_read_only, innodb_tmpdir, innodb_flush_log_at_trx_commit, MySQL Group Replication, crash-safe Replication, GTID, MySQL 8.0)

MyEnv Installer

  • Installer made ready for systemd.
  • Question for angel process (mysqld_safe) and cgroups added.
  • Check for duplicate socket added.
  • Various bug fixes.
  • Purge data implemented.

MyEnv Utilities

  • Utility mysqlstat.php added.
  • Scripts for keepalived added.
  • Utilities and removed.
  • Famous, insert_test.php and test table improved.

For subscriptions of commercial use of MyEnv please get in contact with us.

by Shinguz at March 15, 2018 08:33 PM

Peter Zaitsev

Verifying Query Performance Using ProxySQL

Query Performance Using ProxySQL

In this blog post, we’ll look at how you can verify query performance using ProxySQL.

In the previous blog post, I showed you how many information can you get from the “stats.stats_mysql_query_digest” table in ProxySQL. I also mentioned you could even collect and graph these metrics. I will show you this is not just theory, it is possible.

These graphs could be very useful to understand the impact of the changes what you made on the query count or execution time.

I used our all-time favorite benchmark tool called Sysbench. I was running the following query:

UPDATE sbtest1 SET c=? WHERE k=?

There was no index on “k” when I started the test. During the test, I added an index. We expect to see some changes in the graphs.

I selected the “stats.stats_mysql_query_digest” into a file in every second, then I used Percona Monitoring and Management (PMM) to create graphs from the metrics. (I am going write another blog post on how can you use PMM to create graphs from any kind of metrics.)

Without the index, the update was running only 2-3 times per second. By adding the index, it went up to 400-500 hundred. We can see the results immediately on the graph.

Let’s see the average execution time:

Without the index, it took 600000-700000 microseconds, which is around 0.7s. By adding an index, it dropped to 0.01s. This is a big win, but most importantly we can see the effects on the query response time and query count if we are making some changes to the schema, query or configuration as well.


If you already have a ProxySQL server collecting and graphing these metrics, they could be quite useful when you are optimizing your queries. They can help make sure you are moving in the right direction with your tunings/modifications.

by Tibor Korocz at March 15, 2018 06:42 PM

Percona Live 2018 Featured Talk: MongoDB for a High Volume Logistics Application with Eric Potvin

Eric Potvin Shipwire

Percona Live 2018 Featured TalkWelcome to another interview blog for the rapidly-approaching Percona Live 2018. Each post in this series highlights a Percona Live 2018 featured talk at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post highlights Eric Potvin, Software Engineer at Shipwire. His talk is titled MongoDB for a High Volume Logistics Application. Leveraging old technology with modern solutions can help improve and optimize documents manipulation. In our conversation, we discussed how Eric worked with MongoDB to achieve this end:

Percona: Who are you, and how did you get into databases? What was your path to your current responsibilities?

Eric: I was introduced to databases with Oracle back in college. I was fascinated by how we can store data and retrieve it instantly. After college, I worked as a back-end and front-end developer in many companies that used RDMSs. Most of these companies couldn’t afford the cost of an Oracle database, so (back in 2003), I started with MySQL version 3. Over the years, MySQL improved, and I had to keep up. This lead to my decision to get my MySQL 5 Developer Certification in 2009 (

Percona: Your talk is titled “MongoDB for a High Volume Logistics Application”. What high volume logistics application are you working with, and what issue were you facing that required a database solution?

Eric Potvin ShipwireEric: We are not using any third-party software, we built in-house software that accommodated our specific needs. We are working with multiple companies (if you prefer, warehouses), and each of them has their own specific standards and formats.

Storing data into MySQL might seem like a good idea at first, but querying data turned out to be painful and very slow. This is why we chose MongoDB. We can simply import all these documents without constantly “ALTER” our schema. This gives us the flexibility we need when we need to process any information we received.

Percona: Why a non-relation instead of a relational database?

Eric: MySQL is great, don’t get me wrong there. But MongoDB offers manageability and a Dynamic schema that fit our needs. Every business is different, and have their own format. We cannot enforce a specific data structure, so we need a solution that can be adapted instantly without frequently updating our database schema. This is why MongoDB provides the perfect solution to store any information or documents our clients and customers are sending to us.

Percona: Why should people attend your talk? What do you hope people will take away from it?

Eric: This session talks about what we did to improve our system. The important thing here is this session can be applied to any situation or any type of business. People who attend this session will understand how MongoDB can efficiently store, retrieve and manage document-oriented information. In addition, they will learn — and more importantly, understand — how to manage document-oriented data and how it is a solution that can enrich their application potentials.

Percona: What are you looking forward to at Percona Live (besides your talk)?

Eric: I am looking forward to a very exciting Percona Live this year. When I saw the schedule, I wasn’t sure what to attend. There are so many great talks about scaling, optimization, monitoring and security. These are my “go to” keywords, and they are everywhere this year!

Want to find out more about this Percona Live 2018 featured talk, and MongoDB? Register for Percona Live 2018, and see his talk MongoDB for a High Volume Logistics Application. Register now to get the best price! Use the discount code SeeMeSpeakPL18 for 10% off.

Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.


by Dave Avery at March 15, 2018 03:59 PM

Jean-Jerome Schmidt

Comparing Oracle RAC HA Solution to Galera Cluster for MySQL or MariaDB

Business has continuously desired to derive insights from information to make reliable, smarter, real-time, fact-based decisions. As firms rely more on data and databases, information and data processing is the core of many business operations and business decisions. The faith in the database is total. None of the day-to-day company services can run without the underlying database platforms. As a consequence, the necessity on scalability and performance of database system software is more critical than ever. The principal benefits of the clustered database system are scalability and high availability. In this blog, we will try to compare Oracle RAC and Galera Cluster in the light of these two aspects. Real Application Clusters (RAC) is Oracle’s premium solution to clustering Oracle databases and provides High Availability and Scalability. Galera Cluster is the most popular clustering technology for MySQL and MariaDB.

Architecture overview

Oracle RAC uses Oracle Clusterware software to bind multiple servers. Oracle Clusterware is a cluster management solution that is integrated with Oracle Database, but it can also be used with other services, not only the database. The Oracle Clusterware is an additional software installed on servers running the same operating system, which lets the servers to be chained together to operate as if they were one server.

Oracle Clusterware watches the instance and automatically restarts it if a crash occurs. If your application is well designed, you may not experience any service interruption. Only a group of sessions (those connected to the failed instance) is affected by the failure. The blackout can be efficiently masked to the end user using advanced RAC features like Fast Application Notification and the Oracle client’s Fast Connection Failover. Oracle Clusterware controls node membership and prevents split brain symptoms in which two or more instances attempt to control the instance.

Galera Cluster is a synchronous active-active database clustering technology for MySQL and MariaDB. Galera Cluster differs from what is known as Oracle’s MySQL Cluster - NDB. MariaDB cluster is based on the multi-master replication plugin provided by Codership. Since version 5.5, the Galera plugin (wsrep API) is an integral part of MariaDB. Percona XtraDB Cluster (PXC) is also based on the Galera plugin. The Galera plugin architecture stands on three core layers: certification, replication, and group communication framework. Certification layer prepares the write-sets and does the certification checks on them, guaranteeing that they can be applied. Replication layer manages the replication protocol and provides the total ordering capability. Group Communication Framework implements a plugin architecture which allows other systems to connect via gcomm back-end schema.

To keep the state identical across the cluster, the wsrep API uses a Global Transaction ID. GTID unique identifier is created and associated with each transaction committed on the database node. In Oracle RAC, various database instances share access to resources such as data blocks in the buffer cache to enqueue data blocks. Access to the shared resources between RAC instances needs to be coordinated to avoid conflict. To organize shared access to these resources, the distributed cache maintains information such as data block ID, which RAC instance holds the current version of this data block, and the lock mode in which each instance contains the data block.

Data storage key concepts

Oracle RAC relies on a distributed disk architecture. The database files, control files and online redo logs for the database need be accessible to each node in the cluster. There is a variation of ways to configure shared storage including directly attached disks, Storage Area Networks (SAN), and Network Attached Storage (NAS) and Oracle ASM. Two most popular are OCFS and ASM. Oracle Cluster File System (OCFS) is a shared file system designed specifically for Oracle RAC. OCFS eliminates the requirement that Oracle database files be connected to logical drives and enables all nodes to share a single Oracle Home ASM, RAW Device. Oracle ASM is Oracle's advised storage management solution that provides an alternative to conventional volume managers, file systems, and raw devices. The Oracle ASM provides a virtualization layer between the database and storage. It treats multiple disks as a single disk group and lets you dynamically add or remove drives while maintaining databases online.

There is no need to build sophisticated shared disk storage for Galera, as each node has its full copy of data. However it is a good practice to make the storage reliable with battery-backed write caches.

Oracle RAC, Cluster storage
Oracle RAC, Cluster storage
Galera replication, disks attached to database nodes
Galera replication, disks attached to database nodes

Cluster nodes communication and cache

Oracle Real Application Clusters has a shared cache architecture, it utilizes Oracle Grid Infrastructure to enable the sharing of server and storage resources. Communication between nodes is the critical aspect of cluster integrity. Each node must have at least two network adapters or network interface cards: one for the public network interface, and one for the interconnect. Each cluster node is connected to all other nodes via a private high-speed network, also recognized as the cluster interconnect.

Oracle RAC, network architecture
Oracle RAC, network architecture

The private network is typically formed with Gigabit Ethernet, but for high-volume environments, many vendors offer low-latency, high-bandwidth solutions designed for Oracle RAC. Linux also extends a means of bonding multiple physical NICs into a single virtual NIC to provide increased bandwidth and availability.

While the default approach to connecting Galera nodes is to use a single NIC per host, you can have more than one card. ClusterControl can assist you with such setup. The main difference is the bandwidth requirement on the interconnect. Oracle RAC ships blocks of data between instances, so it places a heavier load on the interconnect as compared to Galera write-sets (which consist of a list of operations).

With Redundant Interconnect Usage in RAC, you can identify multiple interfaces to use for the private cluster network, without the need of using bonding or other technologies. This functionality is available starting with Oracle Database 11gR2. If you use the Oracle Clusterware excessive interconnect feature, then you must use IPv4 addresses for the interfaces (UDP is a default).

To manage high availability, each cluster node is assigned a virtual IP address (VIP). In the event of node failure, the failed node's IP address can be reassigned to a surviving node to allow applications continue to reach the database through the same IP address.

Sophisticated network setup is necessary to Oracle's Cache Fusion technology to couple the physical memory in each host into a single cache. Oracle Cache Fusion provides data stored in the cache of one Oracle instance to be accessed by any other instance by transporting it across the private network. It also protects data integrity and cache coherency by transmitting locking and supplementary synchronization information beyond cluster nodes.

On top of the described network setup, you can set a single database address for your application - Single Client Access Name (SCAN). The primary purpose of SCAN is to provide ease of connection management. For instance, you can add new nodes to the cluster without changing your client connection string. This functionality is because Oracle will automatically distribute requests accordingly based on the SCAN IPs which point to the underlying VIPs. Scan listeners do the bridge between clients and the underlying local listeners which are VIP-dependent.

For Galera Cluster, the equivalent of SCAN would be adding a database proxy in front of the Galera nodes. The proxy would be a single point of contact for applications, it can blacklist failed nodes and route queries to healthy nodes. The proxy itself can be made redundant with Keepalived and Virtual IP.

Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Failover and data recovery

The main difference between Oracle RAC and MySQL Galera Cluster is that Galera is shared nothing architecture. Instead of shared disks, Galera uses certification based replication with group communication and transaction ordering to achieve synchronous replication. A database cluster should be able to survive a loss of a node, although it's achieved in different ways. In case of Galera, the critical aspect is the number of nodes, Galera requires a quorum to stay operational. A three node cluster can survive the crash of one node. With more nodes in your cluster, your availability will grow. Oracle RAC doesn't require a quorum to stay operational after a node crash. It is because of the access to distributed storage that keeps consistent information about cluster state. However, your data storage could be a potential point of failure in your high availability plan. While it's reasonably straightforward task to have Galera cluster nodes spread across geolocation data centers, it wouldn't be that easy with RAC. Oracle RAC requires additional high-end disk mirroring however, basic RAID like redundancy can be achieved inside an ASM diskgroup.

Disk Group Type Supported Mirroring Levels Default Mirroring Level
External redundancy Unprotected (none) Unprotected
Normal redundancy Two-way, three-way, unprotected (none) Two-way
High redundancy Three-way Three-way
Flex redundancy Two-way, three-way, unprotected (none) Two-way (newly-created)
Extended redundancy Two-way, three-way, unprotected (none) Two-way
ASM Disk Group redundancy

Locking Schemes

In a single-user database, a user can alter data without concern for other sessions modifying the same data at the same time. However, in a multi-user database multi-node environment, this become more tricky. A multi-user database must provide the following:

  • data concurrency - the assurance that users can access data at the same time,
  • data consistency - the assurance that each user sees a consistent view of the data.

Cluster instances require three main types of concurrency locking:

  • Data concurrency reads on different instances,
  • Data concurrency reads and writes on different instances,
  • Data concurrency writes on different instances.

Oracle lets you choose the policy for locking, either pessimistic or optimistic, depending on your requirements. To obtain concurrency locking, RAC has two additional buffers. They are Global Cache Service (GCS) and Global Enqueue Service (GES). These two services cover the Cache Fusion process, resource transfers, and resource escalations among the instances. GES include cache locks, dictionary locks, transaction locks and table locks. GCS maintains the block modes and block transfers between the instances.

In Galera cluster, each node has its storage and buffers. When a transaction is started, database resources local to that node are involved. At commit, the operations that are part of that transaction are broadcasted as part of a write-set, to the rest of the group. Since all nodes have the same state, the write-set will either be successful on all nodes or it will fail on all nodes.

Galera Cluster uses at the cluster-level optimistic concurrency control, which can appear in transactions that result in a COMMIT aborting. The first commit wins. When aborts occur at the cluster level, Galera Cluster gives a deadlock error. This may or may not impact your application architecture. High number of rows to replicate in a single transaction would impact node responses, although there are techniques to avoid such behavior.

Hardware & Software requirements

Configuring both clusters hardware doesn’t require potent resources. Minimal Oracle RAC cluster configuration would be satisfied by two servers with two CPUs, physical memory at least 1.5 GB of RAM, an amount of swap space equal to the amount of RAM and two Gigabit Ethernet NICs. Galera’s minimum node configuration is three nodes (one of nodes can be an arbitrator, gardb), each with 1GHz single-core CPU 512MB RAM, 100 Mbps network card. While these are the minimal, we can safely say that in both cases you would probably like to have more resources for your production system.

Each node stores software so you would need to prepare several gigabytes of your storage. Oracle and Galera both have the ability to individually patch the nodes by taking them down one at a time. This rolling patch avoids a complete application outage as there are always database nodes available to handle traffic.

What is important to mention is that a production Galera cluster can easily run on VM’s or basic bare metal, while RAC would need investment in sophisticated shared storage and fiber communication.

Monitoring and management

Oracle Enterprise Manager is the favored approach for monitoring Oracle RAC and Oracle Clusterware. Oracle Enterprise Manager is an Oracle Web-based unified management system for monitoring and administering your database environment. It’s part of Oracle Enterprise License and should be installed on separate server. Cluster control monitoring and management is done via combination on crsctl and srvctl commands which are part of cluster binaries. Below you can find a couple of example commands.

Clusterware Resource Status Check:

    crsctl status resource -t (or shorter: crsctl stat res -t)


$ crsctl stat res

Check the status of the Oracle Clusterware stack:

    crsctl check cluster


$ crsctl check cluster -all
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Check the status of Oracle High Availability Services and the Oracle Clusterware stack on the local server:

    crsctl check crs


$ crsctl check crs
CRS-4638: Oracle High Availablity Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Stop Oracle High Availability Services on the local server.

    crsctl stop has

Stop Oracle High Availability Services on the local server.

    crsctl start has

Displays the status of node applications:

    srvctl status nodeapps

Displays the configuration information for all SCAN VIPs

    srvctl config scan


srvctl config scan -scannumber 1
SCAN name: testscan, Network: 1
Subnet IPv4:, static
Subnet IPv6: 
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes:
SCAN VIP is individually disabled on nodes:

The Cluster Verification Utility (CVU) performs system checks in preparation for installation, patch updates, or other system changes:

    cluvfy comp ocr


Verifying OCR integrity
Checking OCR integrity...
Checking the absence of a non-clustered configurationl...
All nodes free of non-clustered, local-only configurations
ASM Running check passed. ASM is running on all specified nodes
Checking OCR config file “/etc/oracle/ocr.loc"...
OCR config file “/etc/oracle/ocr.loc" check successful
Disk group for ocr location “+DATA" available on all the nodes
This check does not verify the integrity of the OCR contents. Execute ‘ocrcheck' as a privileged user to verify the contents of OCR.
OCR integrity check passed
Verification of OCR integrity was successful.

Galera nodes and the cluster requires the wsrep API to report several statuses, which is exposed. There are currently 34 dedicated status variables can be viewed with SHOW STATUS statement.

mysql> SHOW STATUS LIKE 'wsrep_%';

The administration of MySQL Galera Cluster in many aspects is very similar. There are just few exceptions like bootstrapping the cluster from initial node or recovering nodes via SST or IST operations.

Bootstrapping cluster:

$ service mysql bootstrap # sysvinit
$ service mysql start --wsrep-new-cluster # sysvinit
$ galera_new_cluster # systemd
$ mysqld_safe --wsrep-new-cluster # command line

The equivalent Web-based, out of the box solution to manage and monitor Galera Cluster is ClusterControl. It provides a web-based interface to deploy clusters, monitors key metrics, provides database advisors, and take care of management tasks like backup and restore, automatic patching, traffic encryption and availability management.

Restrictions on workload

Oracle provides SCAN technology which we found missing in Galera Cluster. The benefit of SCAN is that the client’s connection information does not need to change if you add or remove nodes or databases in the cluster. When using SCAN, the Oracle database randomly connects to one of the available SCAN listeners (typically three) in a round robin fashion and balances the connections between them. Two kinds load balancing can be configured: client side, connect time load balancing and on the server side, run time load balancing. Although there is nothing similar within Galera cluster itself, the same functionality can be addressed with additional software like ProxySQL, HAProxy, Maxscale combined with Keepalived.

When it comes to application workload design for Galera Cluster, you should avoid conflicting updates on the same row, as it leads to deadlocks across the cluster. Avoid bulk inserts or updates, as these might be larger than the maximum allowed writeset. That might also cause cluster stalls.

Designing Oracle HA with RAC you need to keep in mind that RAC only protects against server failure, and you need to mirror the storage and have network redundancy. Modern web applications require access to location-independent data services, and because of RAC’s storage architecture limitations, it can be tricky to achieve. You also need to spend a notable amount of time to gain relevant knowledge to manage the environment; it is a long process. On the application workload side, there are some drawbacks. Distributing separated read or write operations on the same dataset is not optimal because latency is added by supplementary internode data exchange. Things like partitioning, sequence cache, and sorting operations should be reviewed before migrating to RAC.

Multi data-center redundancy

According to the Oracle documentation, the maximum distance between two boxes connected in a point-to-point fashion and running synchronously can be only 10 km. Using specialized devices, this distance can be increased to 100 km.

Galera Cluster is well known for its multi-datacenter replication capabilities. It has rich support for Wider Area Networks network settings. It can be configured for high network latency by taking Round-Trip Time (RTT) measurements between cluster nodes and adjusting necessary parameters. The wsrep_provider_options parameters allow you to configure settings like suspect_timeout, interactive_timeout, join_retrans_timouts and many more.

Using Galera and RAC in Cloud

Per Oracle note no third-party cloud currently meets Oracle’s requirements regarding natively provided shared storage. “Native” in this context means that the cloud provider must support shared storage as part of their infrastructure as per Oracle’s support policy.

Thanks to its shared nothing architecture, which is not tied to a sophisticated storage solution, Galera cluster can be easily deployed in a cloud environment. Things like:

  • optimized network protocol,
  • topology-aware replication,
  • traffic encryption,
  • detection and automatic eviction of unreliable nodes,

makes cloud migration process more reliable.

Licenses and hidden costs

Oracle licensing is a complex topic and would require a separate blog article. The cluster factor makes it even more difficult. The cost goes up as we have to add some options to license a complete RAC solution. Here we just want to highlight what to expect and where to find more information.

RAC is a feature of Oracle Enterprise Edition license. Oracle Enterprise license is split into two types, per named user and per processor. If you consider Enterprise Edition with per core license, then the single core cost is RAC 23,000 USD + Oracle DB EE 47,500 USD, and you still need to add a ~ 22% support fee. We would like to refer to a great blog on pricing found on

Flashdba calculated the price of a four node Oracle RAC. The total amount was 902,400 USD plus additional 595,584 USD for three years DB maintenance, and that does not include features like partitioning or in-memory database, all that with 60% Oracle discount.

Galera Cluster is an open source solution that anyone can run for free. Subscriptions are available for production implementations that require vendor support. A good TCO calculation can be found at


While there are significant differences in architecture, both clusters share the main principles and can achieve similar goals. Oracle enterprise product comes with everything out of the box (and it's price). With a cost in the range of >1M USD as seen above, it is a high-end solution that many enterprises would not be able to afford. Galera Cluster can be described as a decent high availability solution for the masses. In certain cases, Galera may well be a very good alternative to Oracle RAC. One drawback is that you have to build your own stack, although that can be completely automated with ClusterControl. We’d love to hear your thoughts on this.

by Bart Oles at March 15, 2018 10:42 AM

Peter Zaitsev

Saw Percona at SCaLE 16x? Come See Even More at Percona Live 2018!

Did you see Percona at SCaLE 16x? I spent a couple of days there learning about open source software, databases, and other interesting topics. You can get even more open source database information at Percona Live 2018.

SCaLE is the largest community-run open-source and free software conference in North America. It is held annually in the greater Los Angeles area. This year’s event took place on March 8-11, 2018, at the Pasadena Convention Center. SCaLE 16X hosted 150 exhibitors this year, along with nearly 130 sessions, tutorials and special events.

Percona has been attending now for a number of years, and this year was no exception. Besides our booth in the Exhibit Hall, we had two speakers giving three different talks:

Percona at Scale 16xPeter Zaitsev, Percona CEO and Founder

Using MySQL for Distributed Database Architectures

In modern data architectures, we’re increasingly moving from single node design systems to distributed architectures using multiple nodes – often spread across multiple databases and multiple continents. Such architectures bring many benefits (such as scalability and resiliency), but can also bring a lot of pain if not correctly architected.

In this presentation, we looked into how we can use MySQL to engineer such systems. Firstly, we looked into the application data requirements that can shape which distributed architectures will work for an application, and what are their benefits and tradeoffs. Then we looked into how to implement the architectures with MySQL, using conventional and proven options such as MySQL Replication, as well as newer options such as:

    • MySQL Multi-Source Replication
    • MySQL Group Replication
    • Percona XtraDB Cluster and Galera
    • Application-driven replication using Kafka

Finally, since a common cause of production problems is a misunderstanding of how distributed systems are designed to behave during failure, we examined what can commonly happen to cause architecture scenarios to fail.

Why We’re Excited About MySQL 8.0

There are many great new features in MySQL 8.0, but how exactly can they help your applications? This session took a practical look at MySQL 8.0 features and improvements. We looked at the bugs, issues and limitations of previous MySQL versions and how MySQL 8.0 addresses them. It also covered what you can do with MySQL 8.0 that you couldn’t before.

Percona at Scale 16x 2Colin Charles

Capacity Planning for your Data Stores

Imagine a ticket sales website that does normal events like an M2M concert, but also occasionally sells tickets to the very popular play Harry Potter and the Cursed Child. This is a perfect capacity planning example. Selling tickets requires that you never sell more tickets than you actually have. You want to load-balance your queries, to shard your data stores and split reads and writes. You need to determine where the system bottlenecks, so you need a baseline for your regular traffic. The website must be able to handle the increased load for extremely popular performances, but you don’t want to buy servers that aren’t doing anything for much of the time. (This is also why the cloud is so popular today.)

Colin Charles explored storage capacity planning for OLTP and data warehousing uses and explains how metrics collection helps you plan your requirements. Coupled with the elastic nature of clouds, you should never have an error establishing database connection. Along the way, Colin also covered tools such as Box Anemometer, innotop, the slow query log, Percona Toolkit (pt-query-digest), vmstat, Facebook’s Prophet, and Percona Monitoring and Management (PMM).

Liked SCaLE 16x? Come to Percona Live 2018!

If you attended the SCaLE 16x conference, there was a multitude of excellent talks on many different open source topics. Many of these same speakers, companies, sponsors and attendees will also be at the Percona Live 2018 Open Source Database Conference in Santa Clara, CA, on April 23 – 25, 2018.

Join the open source database community in Santa Clara, California, to learn about the core topics in MySQL, MongoDB and other open source databases. Get briefed on the hottest topics, learn about building and maintaining high-performing deployments and listen to technical experts and top industry leaders. The Percona Live 2018 – Open Source Database Conference is a great event for users of any level exploring open source database technologies.

Some of these speakers and companies attending include:

. . . and many more.

Hurry and register before the event sells out!

by Dave Avery at March 15, 2018 02:33 AM

March 14, 2018

Peter Zaitsev

Adding Custom Graphs and Dashboards to Percona Monitoring and Management

PMM custom graphs

In this blog post, we’ll look at how to create PMM custom graphs and dashboards to track what you need to see in your database.

Percona Monitoring and Management (PMM)‘s default set of graphs is pretty complete: it covers most of the stuff a DBA requires to fully visualize database servers. However, sometimes custom information is needed in graphical form. Otherwise, you just feel your PMM deployment is a missing a graph.

Recently, a customer request came in asking for a better understanding of a specific metric: table growth, or more specifically the daily table growth (in bytes) for the last 30 days.

The graph we came up with looks like this:PMM custom graphs
. . .which graphs the information that comes from this query:


But what does that query mean, and how do I create one myself? I’m glad you asked! Let’s go deep into the technical details!

Before creating any graph, we must ensure that we have the data that will represent graphically. So, the first step is to ensure data collection.

Data collection

This data is already collected by the Percona mysqld_exporter, as defined in the “Collector Flags” table from the GitHub repo:

PMM custom graphs

Cool! Now we need a Prometheus query in order to get the relevant data. Luckily, the Prometheus documentation is very helpful and we came up with a query in no time.

Prometheus query

What do we need for the query? In this case, it is a metric, a label and a time range. Every PMM deployment has access to the Prometheus console by adding “/prometheus” to the URL. The console is incredibly helpful when playing with queries. The console looks like this:

PMM custom graphs

The metric

The time series values collected by the exporter are stored in the metrics inside of Prometheus. For our case, the metric name is called mysql_info_schema_table_size, which I figured out by using the Prometheus console “Expression” text input and its autocomplete feature. This shows you the options available as you’re writing. All the metrics collected by mysqld_export start with “mysql”.

The label

Labels are different per metric, but they are intuitively named. We need the instance and component labels. Instance is the hostname and component is equivalent to the column name of a MySQL table. The component we need is “data_length”.

The time frame

This is easy: since is a daily value, the time frame is 1d. 

The time frame is not mandatory, but it is a parameter asked for by the function we’re going to use to calculate the increase, which is called increase().

That’s how we ended up with the query that feeds the metrics, which end up in here:

PMM custom graphs
You will notice it’s using a variable: $host. We define that variable in the dashboard creation, explained below.

PMM dashboard

PMM best practice is to take a copy of the existing dashboard using Setting > Save as…, since edits to Percona-provided dashboards are not preserved during upgrades. In this example, we will start with an empty dashboard.

Adding a new dashboard is as easy as clicking the “New” button from the Grafana dropdown menu:

PMM custom graphs

After that, you choose the type of element that you want on a new row, which is a Graph in this case:

PMM custom graphs

We like to use variables for our graphs – changing which server we analyze, for example. To add variables to the dashboard, we need to head up to the Templating option and add the variables:

PMM custom graphs

Make sure you put a meaningful name for your dashboard, and you’re all set! A good practice will be to export the JSON definition of your dashboard as a backup for future recovery, or to just share it with others.

The final dashboard is called “MySQL Table Size” and holds another graph showing the table size during the timeframe for the top ten biggest tables. It looks like this:

PMM custom graphs

The top right of the screen has some drop down links, the ones that look like this:

PMM custom graphs
You can add links on the “Link” tab of the dashboard settings:

PMM custom graphs
In case you are wondering, the query for the “Table size” graph is:

topk(10,sort_desc(sum(mysql_info_schema_table_size{instance="$host",component=~".*length"}) by (schema, table)))

So next time you want to enhance PMM and you know that there is data already inside Prometheus, but PMM lacks the visualization you want, just add it! Create a new graph and put it to your own custom dashboard!

by Daniel Guzmán Burgos at March 14, 2018 11:29 PM

Basic Internal Troubleshooting Tools for MySQL Server Webinar: Q & A

Troubleshooting Tools for MySQL

Troubleshooting Tools for MySQLIn this blog, I will provide answers to the Q & A for the Basic Internal Troubleshooting Tools for MySQL Server webinar.

First, I want to thank everybody for attending my February 15, 2018, webinar on troubleshooting tools for MySQL. The recording and slides for the webinar are available here. Below is the list of your questions that I was unable to answer fully during the webinar.

Q: How do we prevent the schema prefix from appearing in the show create view. This is causing issue with restore on another server with a different DB. See the issue here and reproducible test case:

A: I shortened the example in order to fit it in this blog:

mysql> create table t1(f1 int);
Query OK, 0 rows affected (3.47 sec)
mysql> create view v1 as select * from t1;
Query OK, 0 rows affected (0.21 sec)
mysql> show create view v1G
*************************** 1. row ***************************
                View: v1
         Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `v1` AS select `t1`.`f1` AS `f1` from `t1`
character_set_client: utf8
collation_connection: utf8_general_ci
1 row in set (0.00 sec)
mysql> select * from information_schema.views where table_schema='test'G
*************************** 1. row ***************************
       TABLE_CATALOG: def
        TABLE_SCHEMA: test
          TABLE_NAME: v1
     VIEW_DEFINITION: select `test`.`t1`.`f1` AS `f1` from `test`.`t1`
             DEFINER: root@localhost
1 row in set (0.00 sec)

The issue you experienced happened because even if you created a view as

SELECT foo FROM table1;
, it is stored as
SELECT foo FROM your_schema.table1;
. You can see it if you query the 
  file for the view:

sveta@Thinkie:~/build/ps-5.7/mysql-test$ cat var/mysqld.1/data/test/v1.frm
query=select `test`.`t1`.`f1` AS `f1` from `test`.`t1`
timestamp=2018-02-24 10:27:45
source=select * from t1
view_body_utf8=select `test`.`t1`.`f1` AS `f1` from `test`.`t1`

You cannot prevent the schema prefix from being stored. If you restore the view on a different server with a different database name, you should edit the view definition manually. If you already restored the view that points to a non-existent schema, just recreate it.

 is metadata only and does not hold any data, so this operation is non-blocking and will run momentarily.

Q: What is thread/sql/compress_gtid_table in performance_schema.threads?


  is name of the instrument. You can read this and other instruments as below:

  • thread/
     is a group of instruments. In this case, it is the instruments that are visible in the
  • thread/sql/
     is the group of instruments that are part of the server kernel code. If you are not familiar with MySQL source tree, download the source code tarball and check its content. The main components are:
    • sql
        – server kernel
    • storage
       – where storage engines code located (
        is InnoDB code,
        is MyISAM code and so on)
    • vio
       – input-output functions
    • mysys
       – code, shared between all parts of the server
    • client
       – client library and utilities
    • strings
       – functions to work with strings

This is not full list. For more information consult MySQL Internals Manual

  • thread/sql/compress_gtid_table
     is the name of the particular instrument.

Unfortunately, there is no link to source code for instrumented threads in the table

, but we can easily find them in the
 directory. The function
 is defined in
and we can check comments and find what it is doing:

The main function of the compression thread.
- compress the gtid_executed table when get a compression signal.
@param p_thd Thread requesting to compress the table
@retval 0 OK. always, the compression thread will swallow any error
for going to wait for next compression signal until
it is terminated.
extern "C" {
static void *compress_gtid_table(void *p_thd)

You can also find the description of mysql.gtid_executed compression in the User Reference Manual.

You can follow the same actions to find out what other MySQL threads are doing.

Q: How does a novice on MySQL learn the core basics about MySQL. The documentation can be very vast which surpasses my understanding right now. Are there any good intro books you can recommend for a System Admin?

A: I learned MySQL a long time ago, and a book that I can recommend written for version 5.0. This is “MySQL 5.0 Certification Study Guide” by Paul DuBois,‎ Stefan Hinz and Carsten Pedersen. The book is in two parts: one is devoted to SQL developers and explains how to run and tune queries. The second part is for DBAs and describes how to tune MySQL server. I asked my colleagues to suggest more modern books for you, and this one is still on the list for many. This is in all cases an awesome book for beginners, just note that MySQL has changed a lot since 5.0 and you need to deepen your knowledge after you finish reading this book.

Another book that was recommended is “MySQL” by Paul DuBois. It is written for beginners and has plenty of content. Paul DuBois has been working on (and continues to work on) the official MySQL documentation for many years, and knows MySQL in great detail.

Another book is “Murach’s MySQL” by Joel Murach, which is used as a course book in many colleges for “Introduction into Databases” type classes.

For System Administrators, you can read “Systems Performance: Enterprise and the Cloud” by Brendan Gregg. This book talks about how to tune operating systems for performance. This is one of the consistent tasks we have to do when administering MySQL. I also recommend that you study Brendan Gregg’s website, which is a great source of information for everyone who is interested in operating system performance tuning.

After you finish the books for novices, you can check out “High Performance MySQL, 3rd Edition” by Peter Zaitsev, Vadim Tkachenko, Baron Schwartz and “MySQL Troubleshooting” by Sveta Smirnova (yours truly =) ). These two books require at least basic MySQL knowledge, however.

Q: Does the database migration goes on same way? Do these tools work for migration as well?

A: The tools I discussed in this webinar are available for any version of MySQL/Percona/MariaDB server. You may use them for migration. For example, it is always useful to compare configuration (

) on both “old” and “new” servers. It helps if you observe performance drops on the “new” server. Or you can check table definitions before and after migration. There are many more uses for these tools during the migration process.

Q: How can we take backup of a single schema from a MySQL AWS instance without affecting the performance of applications. An AWS RDS instance to be more clear. mysqldump we cannot use in RDS instance in the current scenario.

A: You can connect to your RDS instance with mysqldump from your local machine, exactly like your MySQL clients connect to it. Then you can collect a dump of a single database, table or even specify the option –where to limit the resulting set to only a portion of the table. Note, by default

 is blocking, but if you backup solely transactional tables (InnoDB, TokuDB, MyRocks) you can run
 with the option
, which starts the transaction at the beginning of the backup job.

Alternatively, you can use AWS Database Migration Service, which allows you to replicate your databases. Then you can take a backup of a single schema using whatever method you like.

Q: Why do some sites suggest to turn off information and performance schema? Is it important to keep it on or turn it off?

A: You cannot turn off Information Schema. It is always available.

Performance Schema in earlier versions (before 5.6.14) was resource-consuming, even if it was idle while enabled. These limitations were fixed a long time ago, and you don’t need to keep it off. At least unless you hit some new bug.

Q: How do we handle storage level threshold if a data file size grows and reaches max threshold when unnoticed? Can you please help on this question?

A: Do you mean what will happen if the data file grows until filesystem has no space? In this case, clients receive the error

"OS error code 28: No space left on device"
  until space is freed and mysqld can start functioning normally again. If it can write into error log file (for example, if it is located on different disk), you will see messages about error 28 in the error log file too.

Q: What are the performance bottlenecks when enabling performance_schema. Is there any benchmark we can have?

A: Just enabling Performance Schema in version 5.6 and up does not cause any performance issue. With version 5.7, it can also start with almost zero allocated memory, so it won’t affect your other buffers. The Performance Schema causes impact when you enable particular instruments. Most of them are instruments that start with the name

. I performed benchmarks on effects of particular Performance Schema instruments and published them in this post.

Q: Suggest us some tips about creating a real-time dashboards for the same as we have some replication environment? it would be great if you can help us here for building business level dashboards

A: This is topic for yet another webinar or, better still, a tutorial. For starters, I recommend you to check out the “MySQL Replication” dashboard in PMM and extend it using the metrics that you need.

Thanks for attending the webinar on internal troubleshooting tools for MySQL.

by Sveta Smirnova at March 14, 2018 11:26 PM

MariaDB AB

Recap of M|18, the MariaDB User Conference – Video Recordings Now Available

Recap of M|18, the MariaDB User Conference – Video Recordings Now Available MariaDB Team Wed, 03/14/2018 - 14:17

Thank you to everyone who joined us at our second annual MariaDB user conference, M|18, in New York City on February 26 and 27. DBAs, open source enthusiasts, engineers, executives and more from all over the world came together to explore and learn.

Couldn’t make the event or want to relive your favorite session?

Watch 40+ M|18 session recordings on demand.


Session highlights


  • The welcome keynote from MariaDB’s CEO, Michael Howard, announced several new initiatives, including MariaDB Labs, a research division focused on solving extreme challenges within the database industry on the topics of machine learning; distributed computing; and next-generation chips, storage and memory.
  • Corporate Banking and Future-Ready Technology,” with Ng Peng Khim and Joan Tay from DBS (Development Bank of Singapore), provided an update on the bank’s journey from Oracle Enterprise to MariaDB TX.
  • Massive Scale with MariaDB,” with ServiceNow’s Tim Yim, revealed that the company has a mind-blowing 85,000 MariaDB databases that ServiceNow manages with a custom “multi-instance deployment” implementation.
  • How We Made the Move to MariaDB at FNI,” with William Woods from Financial Network, Inc., shared the considerations, including security, that led FNI to migrate from Oracle to MariaDB.
  • MariaDB AX Panel Discussion on Analytical Use Cases,” featuring Aziz Vahora from Pinger, Jack Sprague from Center of Information Management, and Patrice Linel from Genus Plc., provided an opportunity to hear directly from MariaDB AX users from diverse industries and learn why they chose MariaDB.
  • How Facebook Migrated to MyRocks,” with Facebook’s Yoshinori Matsunobu, covered how Facebook created the MyRocks storage engine (which is in RC for MariaDB Server 10.3) to replace InnoDB in the Facebook users database, and highlighted MyRocks features being developed.
  • Panel Discussion: Open Source in the Enterprise,” featured Bill Montgomery of Red Hat, Dheeraj Golla of Copart, and Phil Mazza of Financial Network, Inc., sharing why they chose to adopt open source, the business and technical challenges they faced and how other enterprises can succeed with open source software.
  • How Copart Switched to MariaDB and Reduced Costs During Growth,” featured Dheeraj Golla and Pravin Malali of Copart explaining why Copart chose to standardize on MariaDB TX, how they convinced operational and development teams to adopt it, and how they’re now evaluating MariaDB AX as a replacement for their current analytics database.


Networking with games

Learning opportunities abounded at M|18, all while having fun. At the opening-night party and closing reception, attendees enjoyed food, drink and conversation – plus a little good-natured competition.






Thanks to the attendees and speakers, M|18 was trending on Twitter. Here are a few of our favorite conference tweets.


An extra thanks …

… to our sponsors for their generous support.



M|19 will be announced soon!


In the next few weeks, we’ll release the dates for our next MariaDB user conference, M|19. Be on the lookout for the announcement!

MariaDB's second annual MariaDB user conference, M|18, took place in New York City on February 26 and 27. DBAs, open source enthusiasts, engineers, executives and more from all over the world came together to explore and learn. Couldn’t make the event or want to relive your favorite session? Watch 45+ M|18 session recordings on demand.

Login or Register to post comments

by MariaDB Team at March 14, 2018 06:17 PM

MariaDB Connector/J 2.2.3 and 1.7.3 now available

MariaDB Connector/J 2.2.3 and 1.7.3 now available dbart Wed, 03/14/2018 - 12:48

The MariaDB project is pleased to announce the immediate availability of MariaDB Connector/J 2.2.3 and MariaDB Connector/J 1.7.3. See the release notes and changelogs for details and visit to download.

Download MariaDB Connector/J 2.2.3

Release Notes Changelog About MariaDB Connector/J

Download MariaDB Connector/J 1.7.3

Release Notes Changelog About MariaDB Connector/J

The MariaDB project is pleased to announce the immediate availability of MariaDB Connector/J 2.2.3 and MariaDB Connector/J 1.7.3. See the release notes and changelogs for details.

Login or Register to post comments

by dbart at March 14, 2018 04:48 PM

March 13, 2018

Peter Zaitsev

Don’t Get Hit with a Database Disaster: Database Security Compliance

Percona Live 2018 security talks

In this post, we discuss database security compliance, what you should be looking at and where to get more information.

As Percona’s Chief Customer Officer, I get the opportunity to talk with a lot of customers. Hearing about the problems that both their technical teams face, as well as the business challenges their companies experience first-hand is incredibly valuable in terms of what the market is facing in general. Not every problem you see has a purely technical solution, and not every good technical solution solves the core business problem.

Matt Yonkovit, Percona CCOAs database technology advances and data continues to be the core blood of most modern applications, DBA’s will have a say in business level strategic planning more than ever. This coincides with the advances in technology and automation that make many classic manual “DBA” jobs and tasks obsolete. Traditional DBA’s are evolving into a blend of system architect, data strategist and master database architect. I want to talk about the business problems that not only the C-Suite care about, but DBAs as a whole need to care about in the near future.

Let’s start with one topic everyone should have near the top of their list: security.

We did a recent survey of our customers, and their biggest concern right now is security and compliance.

Not long ago, most DBA’s I knew dismissed this topic as “someone else’s problem” (I remember being told that the database is only as secure as the network, so fix the network!). Long gone are the days when network security was enough. Even the DBA’s who did worry about security only did so within the limited scope of what the database system could provide out of the box.  Again, not enough.

So let me run an experiment:

Raise your hand if your company has some bigger security initiative this year. 

I’m betting a lot of you raised your hand!

Security is not new to the enterprise. It’s been a priority for years now. However, it has not been receiving a hyper-focus in the open source database space until the last three years or so. Why? There have been a number of high profile database security breaches in the last year, all highlighting a need for better database security. This series of serious data breaches have exposed how fragile some security protocols in companies are. If that was not enough, new government regulations and laws have made data protection non-optional. This means you have to take the security of your database seriously, or there could be fines and penalties.

Percona Live 2018 security talksGovernment regulations are nothing new, but the breadth and depth of these are growing and are opening up a whole new challenge for databases systems and administrators. GDPR was signed into law two years ago (you can read more here: and and is scheduled to take effect on May 25, 2018. This has many businesses scrambling not only to understand the impact, but figure out how they need to comply. These regulations redefine simple things, like what constitutes “personal data” (for instance, your anonymous buying preferences or location history even without your name).

New requirements also mean some areas get a bit more complicated as they approach the gray area of definition. For instance, GDPR guarantees the right to be forgotten. What does this mean? In theory, it means end-users can request that all their personal information is removed from your systems as if they did not exist. Seems simple, but in reality, you can go as far down the rabbit hole as you want. Does your application support this already? What about legacy applications? Even if the apps can handle it, does this mean previously taken database backups have to forget you as well? There is a lot to process for sure.

So what are the things you can do?

  1. Educate yourself and understand expectations, even if you weren’t involved in compliance discussions before.
  2. Start working on incremental improvements now on your data security. This is especially true in the area’s where you have some control, without massive changes to the application. Encryption at rest is a great place to start if you don’t have it.
  3. Start talking with others in the organization about how to identify and protect personal information.
  4. Look to increase security by default by getting involved in new applications early in the design phase.

The good news is you are not alone in tackling this challenge. Every company must address it. Because of this focus on security, we felt strongly about ensuring we had a security track at Percona Live 2018 this year. These talks from Fastly, Facebook, Percona, and others provide information on how companies around the globe are tackling these security issues. In true open source fashion, we are better when we learn and grow from one another.

What are the Percona Live 2018 security talks?

We have a ton of great security content this year at Percona Live, across a bunch of technologies and open source software. Some of the more interesting Percona Live 2018 security talks are:

Want to attend Percona Live 2018 security talks? Register for Percona Live 2018. Register now to get the best price! Use the discount code SeeMeSpeakPL18 for 10% off.

Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

by Matt Yonkovit at March 13, 2018 10:03 PM

The Multi-Source GTID Replication Maze

Multi-Source GTID Replication

In this blog post, we’ll look at how to navigate some of the complexities of multi-source GTID replication.

GTID replication is often a real challenge for DBAs, especially if this has to do with multi-source GTID replication. A while back, I came across a really interesting customer environment with shards where multi-master, multi-source, multi-threaded MySQL 5.6 MIXED replication was active. This is a highly complex environment that has both pros and cons, introducing risks as a trade-off for specific customer requirements.

This is the set up of part of this environment:

I started looking into this setup when a statement broke replication between db1 and db10. Replication broke due to a statement executed on a schema that was not present on db10. This also resulted in changes originating from db1 to not being pushed down to db100 as db10, as we stopped the replication thread (for db1 channel).

On the other hand, replication was not stopped on db2 because the schema in question was present on db2. Replication between db2 and db20 was broken as well because the schema was not present in db20.

In order to fix db1->db10 replication, four GTID sets were injected in db10.

Here are some interesting blog posts regarding how to handle/fix GTID replication issues:

After injecting the GTID sets, we started replication again and everything ran fine.


After that, we had to check the db2->db20 replication, which, as I’ve already said, was broken as well. In this case, injecting only the first GTID trx into db20 instead of all of those causing issues on db10 was enough!

You may wonder how this is possible. Right? The answer is that the rest of them were replicated from db10 to db20, although the channel was not the same.

Another strange thing is the fact that although the replication thread for the db2->db20 channel was stopped (broken), checking the slave status on db20 showed that Executed_Gtid_Set was moving for all channels even though Retrieved_Gtid_Set for the broken one was stopped! So what was happening there?

This raised my curiosity, so I decided to do some further investigation and created scenarios regarding other strange things that could happen. An interesting one was about the replication filters. In our case, I thought “What would happen in the following scenario … ?”

Let’s say we write a row from db1 to db123.table789. This row is replicated to db10 (let’s say using channel 1) and to db2 (let’s say using channel2). On channel 1, we filter out the db123.% tables, on channel2 we don’t. db1 writes the row and the entry to the binary log. db2 writes the row after reading the entry from the binary log and subsequently writes the entry to its own binary log and replicates this change to db20. This change is also replicated to db10. So now, on db10 (depending on which channel finds the GTID first) it either gets filtered on channel1 and written to its own bin log at just startcommit with any actual DDL/DML removed, or if it is read first on channel2 (db1->db2 and then db20->db10) then it is NOT filtered out and executed instead. Is this correct? It definitely ISN’T!

Points of interest

You can find answers to the above questions in the points of interest listed below. Although it’s not really clear through the official documentation, this is what happens with GTID replication and multi-source GTID replication:

  • As we know GTID sets are unique across all nodes in a given cluster. In multi-source replication, Executed_Gtid_Set is common for all channels. This means that regardless the originating channel, when a GTID transaction is executed it is recorded in all channels’ Executed_Gtid_Set. Although it’s logical (each database is unique, so if a trx is going to affect a database it shouldn’t be tightened to a single channel regardless of the channel it uses), the documentation doesn’t provide much info around this.
  • When we have multi-source, multi-level replication, there are cases where the GTID sets originating from one master can end up on one slave via different replication paths. It’s not clear if it applies any special algorithm (although it doesn’t seem that there could be one), but the preferred method seems to be FIFO. The fastest wins! This means that GTID sets can travel to the slave via different channels, and it’s related to how fast the upper-level slaves can commit changes. In fact, the path doesn’t really matter as it only executes each GTID trx once.
  • Replication filters are global regardless the channel. This means they apply each filter to all channels. This is normal as we can’t define a replication filter per channel. In order to be able to debug such cases, adding a small replication delay per channel seems a good idea.

by Ananias Tsalouchidis at March 13, 2018 09:56 PM

Webinar Thursday, March 15, 2018: Basic External MySQL Troubleshooting Tools

Troubleshooting Tools

MySQL Troubleshooting ToolsPlease join Percona’s Principal Support Engineer, Sveta Smirnova, as she presents Basic External MySQL Troubleshooting Tools on March 15, 2018 at 10:00 am PDT (UTC-7) / 1:00 pm EDT (UTC-4).

In my troubleshooting webinar series, I normally like to discuss built-in instruments available via the SQL interface. While they are effective and help to understand what is going on, external tools are also designed to make life of a database administrator easier.

In this webinar, I will discuss the external tools, toolkits and graphical instruments most valued by Support teams and customers. I will show the main advantages of these tools, and provide examples on how to effectively use them.

I will cover Percona Toolkit, MySQL Utilities, MySQL Sandbox, Percona Monitoring and Management (PMM) and a few other instruments.

Register for the webinar now.

Troubleshooting ToolsSveta Smirnova, Principal Technical Services Engineer

Sveta joined Percona in 2015. Her main professional interests are problem-solving, working with tricky issues, bugs, finding patterns that can quickly solve typical issues, and teaching others how to deal with MySQL issues, bugs and gotchas effectively. Before joining Percona Sveta worked as Support Engineer in the MySQL Bugs Analysis Support Group in MySQL AB-Sun-Oracle. She is the author of the book “MySQL Troubleshooting” and JSON UDF functions for MySQL.

by Sveta Smirnova at March 13, 2018 08:38 PM

Jean-Jerome Schmidt

Updated: Become a ClusterControl DBA: Managing your Database Configurations

In the past five posts of the blog series, we covered deployment of clustering/replication (MySQL / Galera, MySQL Replication, MongoDB & PostgreSQL), management & monitoring of your existing databases and clusters, performance monitoring and health, how to make your setup highly available through HAProxy and MaxScale and in the last post, how to prepare yourself for disasters by scheduling backups.

Since ClusterControl 1.2.11, we made major enhancements to the database configuration manager. The new version allows changing of parameters on multiple database hosts at the same time and, if possible, changing their values at runtime.

We featured the new MySQL Configuration Management in a Tips & Tricks blog post, but this blog post will go more in depth and cover Configuration Management within ClusterControl for MySQL, PostgreSQL and MongoDB.

ClusterControl Configuration management

The configuration management interface can be found under Manage > Configurations. From here, you can view or change the configurations of your database nodes and other tools that ClusterControl manages. ClusterControl will import the latest configuration from all nodes and overwrite previous copies made. Currently there is no historical data kept.

If you’d rather like to manually edit the config files directly on the nodes, you can re-import the altered configuration by pressing the Import button.

And last but not least: you can create or edit configuration templates. These templates are used whenever you deploy new nodes in your cluster. Of course any changes made to the templates will not retroactively applied to the already deployed nodes that were created using these templates.

MySQL Configuration Management

As previously mentioned, the MySQL configuration management got a complete overhaul in ClusterControl 1.2.11. The interface is now more intuitive. When changing the parameters ClusterControl checks whether the parameter actually exists. This ensures your configuration will not deny startup of MySQL due to parameters that don’t exist.

From Manage -> Configurations, you will find an overview of all config files used within the selected cluster, including load balancer nodes.

We use a tree structure to easily view hosts and their respective configuration files. At the bottom of the tree, you will find the configuration templates available for this cluster.

Changing parameters

Suppose we need to change a simple parameter like the maximum number of allowed connections (max_connections), we can simply change this parameter at runtime.

First select the hosts to apply this change to.

Then select the section you want to change. In most cases, you will want to change the MYSQLD section. If you would like to change the default character set for MySQL, you will have to change that in both MYSQLD and client sections.

If necessary you can also create a new section by simply typing the new section name. This will create a new section in the my.cnf.

Once we change a parameter and set its new value by pressing “Proceed”, ClusterControl will check if the parameter exists for this version of MySQL. This is to prevent any non-existent parameters to block the initialization of MySQL on the next restart.

When we press “proceed” for the max_connections change, we will receive a confirmation that it has been applied to the configuration and set at runtime using SET GLOBAL. A restart is not required as max_connections is a parameter we can change at runtime.

Now suppose we want to change the bufferpool size, this would require a restart of MySQL before it takes effect:

And as expected the value has been changed in the configuration file, but a restart is required. You can do this by logging into the host manually and restarting the MySQL process. Another way to do this from ClusterControl is by using the Nodes dashboard.

Restarting nodes in a Galera cluster

You can perform a restart per node by selecting “Restart Node” and pressing the “Proceed” button.

When you select “Initial Start” on a Galera node, ClusterControl will empty the MySQL data directory and force a full copy this way. This is, obviously, unnecessary for a configuration change. Make sure you leave the “initial” checkbox unchecked in the confirmation dialog. This will stop and start MySQL on the host but depending on your workload and bufferpool size this could take a while as MySQL will start flushing the dirty pages from the InnoDB bufferpool to disk. These are the pages that have been modified in memory but not on disk.

Restarting nodes in a MySQL master-slave topologies

For MySQL master-slave topologies you can’t just restart node by node. Unless downtime of the master is acceptable, you will have to apply the configuration changes to the slaves first and then promote a slave to become the new master.

You can go through the slaves one by one and execute a “Restart Node” on them.

After applying the changes to all slaves, promote a slave to become the new master:

After the slave has become the new master, you can shutdown and restart the old master node to apply the change.

Importing configurations

Now that we have applied the change directly on the database, as well as the configuration file, it will take until the next configuration import to see the change reflected in the configuration stored in ClusterControl. If you are less patient, you can schedule an immediate configuration import by pressing the “Import” button.

PostgreSQL Configuration Management

For PostgreSQL, the Configuration Management works a bit different from the MySQL Configuration Management. In general, you have the same functionality here: change the configuration, import configurations for all nodes and define/alter templates.

The difference here is that you can immediately change the whole configuration file and write this configuration back to the database node.

If the changes made requires a restart, a “Restart” button will appear that allows you to restart the node to apply the changes.

MongoDB Configuration Management

The MongoDB Configuration Management works similar to the MySQL Configuration Management: you can change the configuration, import configurations for all nodes, change parameters and alter templates.

Changing the configuration is pretty straightforward, by using Change Parameter dialog (as described in the "Changing Parameters" section::

Once changed, you can see the post-modification action proposed by ClusterControl in the "Config Change Log" dialog:

You can then proceed to restart the respective MongoDB nodes, one node at a time, to load the changes.

Final thoughts

In this blog post we learned about how to manage, alter and template your configurations in ClusterControl. Changing the templates can save you a lot of time when you have deployed only one node in your topology. As the template will be used for new nodes, this will save you from altering all configurations afterwards. However for MySQL and MongoDB based nodes, changing the configuration on all nodes has become trivial due to the new Configuration Management interface.

As a reminder, we recently covered in the same series deployment of clustering/replication (MySQL / Galera, MySQL Replication, MongoDB & PostgreSQL), management & monitoring of your existing databases and clusters, performance monitoring and health, how to make your setup highly available through HAProxy and MaxScale and in the last post, how to prepare yourself for disasters by scheduling backups.

by ashraf at March 13, 2018 09:08 AM

MariaDB AB

MariaDB MaxScale: Masking and Firewall

MariaDB MaxScale: Masking and Firewall Johan Tue, 03/13/2018 - 03:13

To properly prevent some columns from being accessed, the masking filter that was introduced in MariaDB MaxScale 2.1 must be used together with the database firewall filter. In version 2.1, it was somewhat cumbersome to setup the database firewall filter for use together with the masking filter. With the recently released MariaDB MaxScale 2.2 (GA), we have introduced new features that makes the combined use of the masking filter and the database firewall filter much easier.

Before moving to the new functionality introduced in version 2.2, let us recap what the masking filter actually can do. With the masking filter it is possible to mask the returned values, so that even though a column can be SELECTed, the actual value will not be revealed. This can be used for restricting access to some information without making the actual column inaccessible.

For instance, suppose there is a table person that, among other columns, contains the column ssn where the social security number of a person is stored. With the masking filter it is possible to specify that when the ssn field is queried, a masked value is returned, unless the user making the query is a specific one.

That is, when making the query

> SELECT name, ssn FROM person;

instead of getting the real result, as in

| name  | ssn         |
| Bob   | 435-22-3267 |
| Alice | 721-07-4426 |

the ssn would be masked, as in

| name  | ssn         |
| Bob   | XXX-XX-XXXX |
| Alice | XXX-XX-XXXX |

Taking masking into use is quite straightforward; in the service section it is referred to just like any other filter


and at minimum the filter configuration looks as follows:


The value of the rules key is the path of the file containing the rules. The full documentation for the masking filter can be found here, so we will not cover all details but only the basics and what is new in MaxScale 2.2. The rules file used in the previous example looks as follows:

   "rules": [
           "replace": {
               "column": "ssn"
           "with": {
               "value": "XXX-XX-XXXX"

The content must be JSON object with a single key rules whose value is an array of rule objects. Each rule object can contain various keys, the most notable of which are replace and with. The value of replace tells what should be replaced and the value of with what it should be replaced with. So, above we specify that the value of a column called ssn should be replaced with the string “XXX-XX-XXXX”.

The string specified with value is used to replace the actual value only if the length of the real value is exactly the same as the length of the string. If that is not the case, then the value will by default be replaced with as many “X”s as necessary. This is different from MaxScale 2.1, where the catch all string had to be specified using a fill key. The fill key can still be specified if some other string but “X” should be used.

Note that with the above rules, the masking will be applied to all users, but with the key applies_to it can be specified what users are affected and with the key exempted what users are exempted, so it is straightforward to apply the masking to only a specific set of users.

More Functionality

Replacing the entire value with something else was the only option there was in MaxScale 2.1. In MaxScale 2.2 we have provided additional alternatives for how the value is replaced.

Partial Replacement

In MaxScale 2.2, it is no longer necessary to replace the entire value, but it is possible to replace only some part of it. Consider, for instance, the previous example where we masked the social security number entirely. If we would like to only mask the last 4 digits of the number it could be accomplished with the following rule.

   "rules": [
           "replace": {
               "column": "ssn",
               "match": "....$"
           "with": {
               "value": "YYYY"

With the match key a PCRE2 regular expression can be defined. If some part of the value matches that regular expression, then that and only that part is replaced. The string above matches the four last characters - whatever they are - so the last four characters will be masked.

With partial matching, the string specified with value is used, provided the length of that string is exactly the same as the string matched by the regular expression. With these rules, the result looks as follows:

| name  | ssn         |
| Bob   | 435-22-YYYY |
| Alice | 721-07-YYYY |


Another feature introduced in MaxScale 2.2 is obfuscation. That is, instead of simply replacing a value with a fixed string, the value is replaced with a new value that is created from the original value, but in such a manner that the original value cannot easily be obtained from the generated one. Using obfuscation instead of replacement is easy; simply replace the replace keyword with obfuscate and remove the with object entirely.

   "rules": [
           "obfuscate": {
               "column": "ssn"

And the result is as follows:

| name  | ssn         |
| Bob   | +|bv%~6d{y  |
| Alice | ~H;Oj#Q~~(\ |

Currently partial matching is not supported in conjunction with obfuscation but is always done to the entire value. Partial matching will be supported in the future.

The obfuscation algorithm is basically a hashing function that in principle makes it impossible to obtain the original value from the obfuscated one. Note, however, that if the range of values is limited, it is straightforward to figure out the possible original values by running the full range of values through the obfuscation algorithm, thus obtaining a mapping from values to obfuscated values, and then comparing the result of the query with the values in that mapping.

Note on function blocking and masking:

The masking filter works only on the result set, which means it is easy to bypass it. Consider the following:

> select name, ssn from person;
| name  | ssn         |
| Bob   | XXX-XX-XXXX |
| Alice | XXX-XX-XXXX |

> select name, concat(ssn) from person;
| name  | concat(ssn) |
| Bob   | 435-22-3267 |
| Alice | 721-07-4426 |

Simply by not using the column name as such, but “hiding” it in a function, means that the masking can be bypassed.

Combining the Masking and Firewall Filters

Preventing the bypassing of the masking filter using the firewall filter was possible already in MaxScale 2.1, but was quite awkward as the required firewall rules became quite extensive. Basically it was necessary to explicitly prevent the use of any function using which the masking could be bypassed. In MaxScale 2.2 we have provided additional firewall functionality using which the task becomes much easier.

Starting with the service in the configuration file, the requests must first be passed through the firewall before they are provided to the masking filter.


And the firewall needs a section of its own.


Then comes the firewall rules. What we want to prevent is the use of functions in conjunction with the column we want to mask. That can be accomplished using the following firewall rules.

rule no_functions_with_ssn match uses_function ssn

users %@% match any rules no_functions_with_ssn

The rule above matches if the column ssn is used in conjunction with any function. If we now try to bypass the masking, the result will be as follows:

> select name, concat(ssn) from person;

ERROR 1141 (HY000): Access denied for user 'cecil'@'' to database 'testdb': Permission denied to column 'ssn' with function.

Every function cannot be used for bypassing the masking, so it is also possible to whitelist certain functions. For instance, if we want to allow the use of the function LENGTH, that can be accomplished as follows:

rule only_length_with_ssn match not_function length columns ssn

users %@% match any rules only_length_with_ssn

This rule matches if the column ssn is used in conjunction with any other function but LENGTH. So, the following works

> select name, length(ssn) from person;
| name  | length(ssn) |
| Bob   |          11 |
| Alice |          11 |

but the use of any other function fails

> select name, concat(ssn) from person;
ERROR 1141 (HY000): Access denied for user 'cecil'@'' to database 'testdb': Permission denied to column 'ssn' with function 'concat'.


The masking and firewall filter can together be used for ensuring that unauthorized users cannot access the actual value of some column. In MariaDB MaxScale 2.2, the functionality of the firewall has been extended so that it is easy to prevent actions that otherwise could be used for bypassing the masking filter. Download MariaDB MaxScale now to get started.

To properly prevent some columns from being accessed, the masking filter that was introduced in MariaDB MaxScale 2.1 must be used together with the database firewall filter. In version 2.1, it was somewhat cumbersome to setup the database firewall filter for use together with the masking filter. With the recently released MariaDB MaxScale 2.2 (GA), we have introduced new features that makes the combined use of the masking filter and the database firewall filter much easier.

Alexandru Zeve

Alexandru Zeve

Wed, 03/14/2018 - 10:21

readwritesplit router with masking filter problems


I tried using the masking filter toghether with a readwritesplit router , and for basic stuff it works. However, for complex queries, i get the following message: "warning: (81) [masking] Received data, although expected nothing." several times and the application hangs, no other error messages (log-info is enabled). I cannot find anything related to this message. Is there any limitation when combining readwritesplit router and the masking filter? (when using the masking filter with the readconnroute router, everything works fine)

Login or Register to post comments

by Johan at March 13, 2018 07:13 AM

March 12, 2018

Peter Zaitsev

ProxySQL 1.4.6 and Updated proxysql-admin Tool Now in the Percona Repository

ProxySQL 1.4.5

ProxySQL 1.4.5ProxySQL 1.4.6, released by ProxySQL, is now available for download in the Percona Repository along with an updated version of Percona’s proxysql-admin tool.

ProxySQL is a high-performance proxy, currently for MySQL and its forks (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. René Cannaò created ProxySQL for DBAs as a means of solving complex replication topology issues.

The ProxySQL 1.4.6 source and binary packages available at include ProxySQL Admin – a tool, developed by Percona to configure Percona XtraDB Cluster nodes into ProxySQL. Docker images for release 1.4.6 are available as well: You can download the original ProxySQL from

This release fixes the following bugs in ProxySQL Admin:

Usability improvements:

  • #PSQLADM-32: Now, proxysql_admin script can configure multiple clusters in ProxySQL, when there are unique cluster names specified by the wsrep_cluster_name option, and the proxysql_admin.cnf configuration contains different ProxySQL READ/WRITE hostgroup and different application user for each cluster. Currently multiple clusters support is not compatible with host priority feature, which works only with a single cluster.
  • PR #81: The new version substantially increases the number of test cases in the ProxySQL Admin test-suite.

Bug fixes:

  • Fixed #PSQLADM-35proxysql_galera_checker monitoring script was unable to discover new writer nodes.
  • Fixed #PSQLADM-36: upgrade to ProxySQL 1.4.5 from the previous version was broken.
  • Fixed #79 by properly quoting the MONITOR_USERNAME environment variable in the admin script query.

ProxySQL is available under OpenSource license GPLv3.

by Dmitriy Kostiuk at March 12, 2018 11:57 PM

MariaDB Foundation

MariaDB Foundation now accepts cryptocurrency donations

The MariaDB Foundation has added Bitcoin, Bitcoin Cash, Ethereum, Litecoin, Monero and Ripple to the list of ways to contribute financially. After experiencing problems with Paypal and disabling it as a means of contribution, our individual donations dropped off as not everyone found the alternatives, such as paying directly into the bank account, convenient. We […]

The post MariaDB Foundation now accepts cryptocurrency donations appeared first on

by Ian Gilfillan at March 12, 2018 12:50 PM

Peter Zaitsev

Mass Upgrade MongoDB Versions: from 2.6 to 3.6

Upgrade MongoDB

Upgrade MongoDBIn this blog post, we’re going to look at how to upgrade MongoDB when leaping versions.

Here at Percona, we see every type of upgrade you could imagine. Lately, however, I see an increase very old version upgrades (OVE). This is when you upgrade MongoDB from a version more than one step before the version you are upgrading to. Some examples are:

  • 2.6 -> 3.2
  • 2.6 -> 3.4
  • 2.6 -> 3.6
  • 3.0 -> 3.4
  • 3.0 -> 3.6
  • 3.2 -> 3.6

Luckily, they all have the same basic path to upgrade. Unluckily, it’s not an online upgrade. You need to dump/import again on the new version. For a relatively small system with few indexes and less than 100GB of data, this isn’t a big deal.

Many people might ask: “Why can’t I just upgrade 2.6->3.0->3.2->3.4->3.6?” You can do this and it could be fine, but it is hazardous. How many times have you upgraded five major versions in one row with no issues? How much work is involved with testing one driver change let alone five? What about the changes in the optimizer, storage engine, and driver versions, bulk feature differences and moving from stand-alone config servers to a replica set way they are now? Upgrading to the new election protocol, which implies more overhead?

Even if you navigate all of that, you still have to worry about what you didn’t think about. In a perfect world, you would have an ability to build a fresh duplicate cluster on 3.6 and then run production traffic on it to make sure things still work.

My advice is to only plan an in-place upgrade for a single version, and even then you should talk to an in-house expert or consulting firm to make sure you are making the right changes for future support.

As such, I am going to break things down into two areas:

  • Upgrade from the previous version (3.4.12 -> 3.6.3, for example)
  • Upgrading using dump/import

Upgrading from previous versions when using a Replica Set and in place

Generally speaking, if you are taking this path the manual is a great help ( However, this is specific to 3.6, and my goal is to make this a bit more generic. As such, let’s break it down into steps acceptable in all systems.

Read the upgrade page for your version. At the end of the process below, you might have extra work to do.

  1. Set the setFeatureCompatibilityVersion to the previous version ‘db.adminCommand( { setFeatureCompatibilityVersion: “3.4” } )’
  2. Make your current primary prefer to be primary using something like below, where I assume the primary you want is the first node in the list
  3. Now in reverse order from rs.config().members, take the highest member ID and stop one node at a time
    1. Stop the mongod node
    2. Run yum/apt upgrade, or replace the binary files with new ones
    3. Try to start the process manually, this might fail if you failed to note and fix configuration file changes
      1. A good example of this is requirements to set the engine to MMAPv1 moving from 3.0 -> 3.2, or how “smallfiles” was removed as an option and could cause the host not to start.
    4. Once started on the new version, make sure replication can keep up with ‘rs.printSlaveReplicationInfo()’
    5. Repeat this process one at a time until only node “0” (your primary) is done.
  4. Reverse your work from step three, and remove priority on the primary node. This might cause an election, but it rarely changes the primary.
  5. If the primary has not changed, run ‘rs.stepdown(300, 30)’. This tells it to let someone else be primary, gives the secondaries 30 seconds to catch up, and doesn’t allow itself to be prior for 270 more seconds.
  6. Inside those 270 seconds, you must shutdown the node and repeat step four (but only for this one node).
  7. You are done with a replica set, however, check the nodes on anything you needed to do on the Mongos layer.
    1. In MongoDB 3.6, we require config servers to be a replica set. This is easily done if the configdb configuration line on a mongos is “xxx/host1:port,host2:port2,host3:port” (Replica Set) or “host1:port,host2:port,host3:port” (Non-Replica Set). If you do not do this BEFORE upgrading mongos, it will fail to start. Treat Configs as a replica set upgrade if they are already in one.
  8. You can do one shard/replica set at a time, but if you do, the balancer MUST be off during this time to prevent odd confusion and rollbacks.
  9. You’re done!

As you can see, this is pretty generic. But it is a good set of rules to follow since each version might have a deprecated feature, removed configuration options or other changes. By reading the documentation, you should be OK. However, having someone who has done tens to hundreds of these already is very helpful.

Back to the challenge at hand, how would you like to follow this process five times in a row per replica-set/shard if you were moving to 2.6->3.6? What would be the risk of human error in all of that? Hopefully your starting just from an operational reason why we advise against OVE’s. But that’s only one side. During each of these iterations, you also need to redeploy the application, test to ensure it still works by running some type of Load or UAT system — including upgrading the driver for each version and applying builds for that driver (as some functions may change). I don’t know about you, but as a DBA, architecture, product owner and support manager this is just to much risk.

What are our options for doing this in a much more straightforward fashion, without causing engineer fatigue, splitting risk trees and other such concerns?

Upgrading using the dump/import method

Before we get into this one, we should talk about a couple of points. You do have a choice about online and offline modes for this. I will only cover the offline mode. You need to collect and apply operations occurring during this process for the online mode, and I do not recommend this for our support customers. It is something I have helped do for our consulting customer. This is because we can make sure the process works for your environment, and at the end of the day my job to make sure data is available and safe over anything else.

If you’re sharded, this must be done in parallel. You should use MCB ( This is a good idea even if you’re not sharded, as it works with sharded and plain replica sets to ensure all the backups and config servers (if applicable) are “dumped” to the same point in time.

Next, if you are not using virtualization or the cloud, you’ll need to order in 2x the hardware and have a place for the old equipment. While not optimal, you might consider the above approach only for just the last version even with its risk if you don’t have the budget for anything else. With virtualization or cloud, people can typically use more hardware for a short time, and the cost is only the use of the equipment for that time. This is easily budgeted as part of the upgrade cost against the risks of not upgrading.

  1. Use MCB to take a backup and save it. An example config is:
         host: localhost
         port: 27017
         log_dir: /var/log/mongodb-consistent-backup
             method: mongodump
             name: upgrade
             location: /mongo_backups/upgrade_XX_to_3.6
             max_lag_secs: 10
                 wait_secs: [1+] (default: 300)
                 ping_secs: [1+] (default: 3)
             method: tar
             compression: none
  2. It figures out if it’s sharded or not. Additionally, it reaches out and maybe even backs up from another secondary as needed. When done, you will have a structure like:
    >find /mongo_backups/upgrade_XX_to_3.6
    and so on...
  3. Now that we have a backup, we can build new hardware for rs1. As I said, we will focus only on this one replica set in this example (a backup would just have that folder in a single replica set back up):
    • Setup all nodes with a 3.6 compatible configuration file. You do not need to keep the same engine, use WiredTiger (default) if you’re not sure.
    • Disable authentication for now.
    • Start the nodes and ensure the replica sets are working and healthy (rs.initiate, rs.add, rs.status).
    • Run import using mongorestore and –oplog on the extracted tar file.
    • Drop admin.users. If the salt has changed, you’ll need to recreate all users (2.6 -> 3.0+).
    • Once the restore is complete, use rs.printReplicationInfo or PMM to verify when the replication is caught up from the import.
    • Start up your application pointing to the new location using the new driver you’ve already tested on this version, grab a beer and you’re done!

Hopefully, you can see how much more comfortable this route is. You know all the new features are working, and you do not need to do anything else (like in the old system) to make sure you have switched to replica-set configs or something.

In this process to upgrade MongoDB, if you used MCB you can do the same for sharding. However, you will keep all of you existing sharding, which the default dump/restore sharded does for you. It should be noted that in a future version they could change the layout of the config servers and this process might need adaption. If you think is the case, drop a question in the Percona Forums, Twitter, or even the contact-us page and we will be glad to help.

I want to thank you for reading this blog on how to upgrade MongoDB, and hope it helps. This is just a base guideline and there are many specific things per-version to consider that are outside of the scope of this blog. Percona has support and experts to help guide you if you have any questions.

by David Murphy at March 12, 2018 10:10 AM

March 11, 2018

Valeriy Kravchuk

Checking User Threads With gdb in MySQL 5.7+

In one of my gdb-related posts last year I noted that there is no more simple global list of user threads in MySQL 5.7+:
"I had highlighted Global_THD_manager singleton also as during my next gdb sessions I had found out that simple global list of threads is also gone and in 5.7 everything is done via that Global_THD_manager. This is a topic for some other post, though."
In that post and many times later when I had to deal with MySQL 5.7+ I just checked OS threads one by one in gdb using thread  1 ... thread N commands. This is not efficient at best, as I also hit numerous background threads that I often do not care about. So, a couple of weeks ago I finally decided to get back to this topic and find out how to check just user threads one by one in recent MySQL versions. I had a nice hint by Shane Bester on how to get information about $i-th thread (that he shared in one of his comments to my Facebook post):
set $value = (THD**)(Global_THD_manager::thd_manager-> + (sizeof(THD**) * $i))
I've attached gdb to an instance of Percona Server 5.7.x that I had running in my CentOS 6.9 VM and tried few commands to check types and content of the Global_THD_manager elements:
(gdb) p Global_THD_manager::thd_manager
$1 = (Global_THD_manager *) 0x7fab087fd000
(gdb) p Global_THD_manager::thd_manager->thd_list
$2 = {m_size = 2, m_capacity = 500, m_buff = {{
      data = "\000\060b\344\252\177\000\000\000\220i\344\252\177\000\000\000\200x\344\252\177", '\000' <repeats 3977 times>, align = {<No data fields>}}},
  m_array_ptr = 0x7fab087fd010, m_psi_key = 0}

So, we see that internally there is some array of elements thd_list with m_size items (2 in my case) probably stored in some pre-allocated buffer of m_capacity (500) elements, stored in The type of elements is not clear, but we can try Shane's hint and assume that they are of type THD**. Let's try to check what we see there after type castings:
(gdb) p (THD**)(Global_THD_manager::thd_manager->
$4 = (THD **) 0x7fab087fd010
(gdb) p  *(THD**)(Global_THD_manager::thd_manager->
$5 = (THD *) 0x7faae4623000
(gdb) p  **(THD**)(Global_THD_manager::thd_manager->
$6 = {<MDL_context_owner> = {
    _vptr.MDL_context_owner = 0x1c51f50}, <Query_arena> = {
So, we get reasonable addresses and when we dereference the resulting THD** pointer twice we indeed get a structure that looks like THD of MySQL 5.7+ (it's very different, say, in MariaDB 10.1.x), with reasonable content (that is huge and skipped above).

I've tried to get processlist id of thread based on findings of that post using intermediate gdb variables:

(gdb) set $ppthd = (THD**)(Global_THD_manager::thd_manager->
(gdb) p *($ppthd)
$7 = (THD *) 0x7faae4623000

(gdb) set $pthd = *($ppthd)
(gdb) p $pthd->m_thread_id
$10 = 5
and then directly, using offsets and checking for security contexts of threads:
(gdb) p  (**(THD**)(Global_THD_manager::thd_manager->
$14 = {m_ptr = 0x7faae463b060 "myuser", m_length = 6, m_charset = 0x1d21760,
  m_alloced_length = 8, m_is_alloced = true}
(gdb) p  (**(THD**)(Global_THD_manager::thd_manager-> + (sizeof(THD**)))).m_main_security_ctx.m_user
$15 = {m_ptr = 0x7faae46b1090 "root", m_length = 4, m_charset = 0x1d21760,
  m_alloced_length = 8, m_is_alloced = true}
(gdb) p  (**(THD**)(Global_THD_manager::thd_manager-> + (sizeof(THD**)))).m_thread_id
$16 = 9
to confirm that I correctly get user names and thread ids for both 2 user threads I had in that "list". As usual Shane Bester was right!

Now, if you want to get more details about Global_THD_manager, you can just check the sql/mysqld_thd_manager.h file. I was interested mostly in the following:
  int get_num_thread_running() const { return num_thread_running; }
  uint get_thd_count() const { return global_thd_count; }

  static Global_THD_manager *thd_manager;

  // Array of current THDs. Protected by LOCK_thd_list.
  typedef Prealloced_array<THD*, 500, true> THD_array;
  THD_array thd_list;

  // Array of thread ID in current use. Protected by LOCK_thread_ids.
  typedef Prealloced_array<my_thread_id, 1000, true> Thread_id_array;
  Thread_id_array thread_ids;
First of all, how consistent it is to use both int and uint data types for values that are always >=0... The fact that our thd_list elements is actually some template-based container, Prealloced_array, it also interesting, as it would be useful to find out how it is implemented. We can find all relevant details in the include/prealloced_array.h file. I'd like to highlight the following here:
"The interface is chosen to be similar to std::vector."

  size_t         m_size;
  size_t         m_capacity;
  // This buffer must be properly aligned.
  my_aligned_storage<Prealloc * sizeof(Element_type), MY_ALIGNOF(double)>m_buff;
Element_type *m_array_ptr;
To summarize, MySQL 5.7+ uses more C++ now, with templates, singletons, iterators and more, but still Oracle prefers to implement their own container types instead of using some standard ones. One of these generic types, Prealloced_array, is widely used and is easy to deal with in gdb, as long as you know the element type.

by Valeriy Kravchuk ( at March 11, 2018 02:42 PM

March 10, 2018

Peter Zaitsev

This Week in Data with Colin Charles 31: Meltdown/Spectre Performance Regressions and Percona Live 2018

Colin Charles

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Have you been following the Meltdown/Spectre performance regressions? Some of the best blog posts have been coming from Brendan Gregg, who’s keynoting at Percona Live this year. We’ve also got Scott Simpson from Upwork giving a keynote about how and why they use MongoDB. This is in addition to all the other fun talks we have, so please register now. Don’t forget to also book your hotel room!

Even though the Percona Live conference now covers much more than just MySQL, it’s worth noting that the MySQL Community Awards 2018: Call for Nominations! is happening now. You have until Friday, March 15, 2018, to make a nomination. Winners get into the Hall of Fame. Yes, I am also on the committee to make selections.

Another highlight: Open-sourcing a 10x reduction in Apache Cassandra tail latency by Dikang Gu of Instagram (Facebook). This is again thanks to RocksDB. Check out Rocksandra, and don’t forget to register for Percona Live to see the talk: Cassandra on RocksDB.

This week, I spent some time at Percona Headquarters in Raleigh, North Carolina. The building from the outside is pictured well in Google Maps. I thought it might be fun to show you a few photos (the office is huge with quite a handful working there despite the fact that Percona is largely remote).

Peter at Percona Headquarters
Percona awards and bookshelf, featuring some very antique encyclopedias.


Peter at Percona Headquarters 2
Peter Zaitsev, Percona CEO, outside his office (no, it is not an open office plan – everyone has rooms, including visitors like myself).


We’re all at SCALE16x now – so come see our talks (Peter Zaitsev and I are both speaking), and we have a booth where you can say hello to Rick Golba, Marc Sherwood and Dave Avery.


Link List

Upcoming appearances

  • SCALE16x – Pasadena, California, USA – March 8-11 2018
  • FOSSASIA 2018 – Singapore – March 22-25 2018


I look forward to feedback/tips via e-mail at or on Twitter @bytebot.

by Colin Charles at March 10, 2018 12:25 AM

March 09, 2018

Peter Zaitsev

Using the New MongoDB 3.6 Expression Query Operator $expr


$exprIn this blog, we will discuss the new expression query operator $expr, added to MongoDB in version 3.6. To show the power of this functionality, I will demonstrate the use of this feature with a simple example.

The $expr Query Operator

With the exception of a few basic operators ($and, $or, $lt, $gt, etc.), before MongoDB 3.6 you could only use several powerful expressions operators on query results via the aggregation pipeline. In practice, this meant that MongoDB .find() queries could not take advantage of a lot of powerful server features.

In 3.6 and above, support for a new query operator named $expr was added to the MongoDB .find() operation. This allows queries to take advantage of the unavailable operators previously available only in aggregations.

Users that are familiar with the aggregation framework will remember that expressions/conditions in an aggregation are evaluated on a per-document basis. Aggregations allow document fields to be used as variables in conditionals (by prefixing the field name with a dollar sign). The new .find() $expr operator adds that same flexibility and power to the .find(), and perhaps more importantly in this article: .findAndModify() commands!

I hope to show how this new functionality creates some very powerful and efficient application workflows in MongoDB.

Our Example Application

In this example, let’s pretend we are designing a store inventory application based on MongoDB. Among other things, one of the major functions of the store inventory application is to update items when they’re sold. In this article, we will focus on this action only.

Each item in the inventory system is stored in a single MongoDB document containing:

  1. A numeric “itemId”
  2. A “name” string
  3. The number of times the item has been sold (“sold”)
  4. The total inventory available (“total”). Importantly, each item may have a different total inventory available.

An example “items” collection document:

> db.items.findOne()
	"_id" : ObjectId("5a85d32f8a734d82e8bcb5b5"),
	"itemId" : 123456,
	"name" : "a really cool item",
	"total" : 10,
	"sold" : 0

Some additional expectations are:

  1. When the application calls the “sold item” workflow, if all items are sold an empty document (or “null”) should be returned to the application.
  2. If the item is still available, the number of sold items should be incremented and the updated document is returned to the application.

Pre-3.6: Example #1

Before 3.6, a common way to tackle our example application’s “sold item” workflow was by using a .findAndModify() operation. A .findAndModify() does exactly what it suggests: finds documents and modifies the matching documents however you specified.

In this example, our .findAndModify() operation should contain a query for the exact “itemId” we are selling and an update document to tell MongoDB to increment the number of sold items during the query.

By default, .findAndModify() returns a document BEFORE the modification/update took place. In our example, we want the result from AFTER the update, so we will also add the boolean option “new” (set to true) to cause the updated document to be returned.

In the MongoDB shell this .findAndModify() operation for “itemId” 123456 would look like this:

  query: { itemId: 123456 },
  update: { $inc: { sold: 1 } },
  new: true

But there’s a problem: “itemId” 123456 only has ten items available:

> db.items.find(
    { itemId: 123456 },
    { _id: 0, total: 1 }
{ "total" : 10 }

What if we run this .findAndModify() more than 10 times? Our query does not check if we exceeded the “total” items, this won’t work!

Here we can see after running the query 11 times, our “sold” count of 11 is incorrect:

> db.items.find(
    { itemId: 123456 },
    { _id: 0, sold: 1 }
{ "sold" : 11 }

If ALL items in the inventory system had a total of 10 items, this would be quite simple; the .findAndModify() operation could just be modified to consider that “sold” should be less than ($lt) 10:

  query: {
    itemId: 123456,
    sold: { $lt: 10 }
  update: {
    $inc: { sold: 1 }
  new: true

But this isn’t good enough either.

In our example, each document has it’s own “total” and “sold” counts. We can’t rely on every item having ten items available, every item may have a different “total”.

Pre-3.6: Example #2

In the pre-3.6 world, there weren’t too many ways to address the problem we found in our first approach, other than breaking the database logic into two different calls and having the application run some logic on the result in the middle. Let’s try that now.

Here’s what this new “sold item” workflow would look like:

  1. A .find() query to fetch the document needed, based on “itemId”:
    > db.items.find(
        { itemId: 123456 }
    { ... }
  2. The application analyses the result document and checks if “sold” field is greater than the “total”.
  3. If there are items available an .update() operation is run to increment the “sold” number for the item. The updated document is returned by the application:
    > db.items.update({
        { itemId: 123456 },
        { $inc: { sold: 1 } }
    { ... }

Aside from increased code complexity, there are several problems with this approach that might not be obvious:

  1. Atomicity: A single MongoDB operation is atomic at a single document level. In other words, if two operations update a document, one of the operations must wait. In the situation in “Example #1”, where our query AND document increment occur in the same operation, we can feel safe knowing that our “sold” and “total” counts were updated atomically. Unfortunately, in our new approach, we’ve broken our query and update into 2 x separate database calls. This means that this operation is not entirely atomic and prone to race conditions. If many sessions run this logic at the same time, it’s possible for another session to increment the counter between your first and second database operation!
  2. Several Operations: In storage engines like WiredTiger and RocksDB, operations wait in a queue when the system is busy. In our new approach, we must wait to enter the storage engine twice. Under serious load, this could create a cascading bottleneck in your architecture. It could cause application servers to stall and backlog operations in lockstep with the overwhelmed database. The most efficient approach is to perform the query and increment in a single operation.
  3. Network Inefficiency: Performing two database commands requires double the serialization overhead and network round trip time. There is also a minor increase in bandwidth usage required.

Post-3.6: Example #3

In this third example, let’s utilize the new Expression Query Operator ($expr) that was added in 3.6 to make this workflow as efficient as possible. Note that the approach in this example only works on MongoDB 3.6 or the upcoming Percona Server for MongoDB 3.6 (or greater)!

The Expression Query Operator allows powerful operators to be used on the result document(s) of regular .find() queries. How? Expression Queries/$expr are actually running an aggregation after translating your find filter/condition to a $match aggregation pipeline stage.

In our new approach, we will only need to use one basic operator in our expression: $lt (ie: less-than). The $lt operator is used in our expression to check that our “sold” field is less-than the “total” field. Using the $lt under the new $expr operator, we can make the MongoDB server compare the “sold” vs. “total” count of the item we are querying and only return and increment the document if the expression is true, all in a single, atomic, server-side operation! 

Here is what our improved query looks like:

  query: {
    itemId: 123456,
    $expr: {
      $lt: [
  update: {
    $inc: { sold: 1 }
  new: true

Notice the $lt fields “sold” and “total” are prefixed with a dollar sign ($). This tells the aggregation to use the real value of each matched document in the less-than comparison dynamically. This resolves a problem we encountered earlier, and now this query only succeeds if there are items available (“sold” is less-than “total)! This efficiently pushes logic down to the database.

The itemId: 123456 has a “total” value of 10. If we run the .findAndModify() 10 times, we get this result:

> db.items.findAndModify({
   query: {
     itemId: 123456,
     $expr: {
       $lt: [
   update: {
     $inc: { sold: 1 }
   new: true
	"_id" : ObjectId("5a85d32f8a734d82e8bcb5b5"),
	"itemId" : 123456,
	"name" : "a really cool item",
	"total" : 10,
	"sold" : 10

Ten sold, ten total. Great!

If we run the same query one more time we receive a “null”:

   query: {
     itemId: 123456,
     $expr: {
       $lt: [
   update: {
     $inc: { sold: 1 }
   new: true

Perfect! A null is returned because our $lt condition in our $expr operator did not succeed, just like we wanted.

Let’s make sure there was NOT an 11th increment, an issue we had in our first example:

> db.items.find(
    { itemId: 123456 },
    { _id: 0, sold: 1 }
{ "sold" : 10 }

Here we can see the 11th increment did not run because the $expr failed to pass. 10 of 10 items are sold. This is very cool!


Here we can see the combination of two existing server features (.findAndModify() and document fields as variables in aggregations) and a new 3.6 feature ($expr) have solved many problems with a previously inefficient and potentially dangerous data workflow.

More on $expr can be found here:

This combination of functionality was able to provide the atomicity of our query in “Example #1” with the safe logic-checking that was done in “Example #2”. The end result is both efficient AND safe!

This article goes to show that while each component of MongoDB’s functionality is powerful on its own, the features can be even more powerful when they work together. In this case, with the help of a brand new 3.6 feature! What powerful feature combinations are you missing out on?

by Tim Vaillancourt at March 09, 2018 07:12 PM

MariaDB AB

MyRocks Storage Engine in MariaDB is Now Release Candidate

MyRocks Storage Engine in MariaDB is Now Release Candidate Sergey Petrunya Fri, 03/09/2018 - 11:19

The MyRocks storage engine was introduced in MariaDB Server 10.2 as an alpha plugin – the maturity of plugins is separate from the database. It became a beta plugin earlier this year, and with the release of MariaDB Server 10.3.5 (RC) last week, it is now a release candidate plugin.

So, what is MyRocks? It is a storage engine like InnoDB, but optimized for disk space and write efficiency. It uses a log-structured merge-tree (LSM Tree) technology to achieve higher compression and write performance. MyRocks is developed by Facebook, where it is used in production.

We make MyRocks available for MariaDB users via binaries and packages for recent versions of Ubuntu/Debian (deb), Red Hat/CentOS (rpm), Generic Linux (tarballs) and Microsoft Windows. For developers, you can continue to use features like common table expressions (CTEs) and window functions. For administrators, you can continue to enable and configuration parallel replication.

While it’s easy to get started with MyRocks using MariaDB, you have to run a couple of commands to enable it. The process is documented in our knowledge base (and there are links for further MyRocks documentation).

Additional Resources

The MyRocks storage engine was introduced in MariaDB Server 10.2 as an alpha plugin – the maturity of plugins is separate from the database. It became a beta plugin earlier this year, and with the release of MariaDB Server 10.3.5 (RC) last week, it is now a release candidate plugin.

Login or Register to post comments

by Sergey Petrunya at March 09, 2018 04:19 PM

Peter Zaitsev

Sneak Peek at Proxytop Utility


In this blog post, I’ll be looking at a new tool Proxytop for managing MySQL topologies using ProxySQL. Proxytop is a self-contained, real-time monitoring tool for ProxySQL. As some of you already know ProxySQL is a popular open source, high performance and protocol-aware proxy server for MySQL and its forks (Percona and MariaDB).

My lab uses MySQL and ProxySQL on Docker containers provided by Nick Vyzas. This lab also uses Alexey Kopytov’s Sysbench utility to perform benchmarking against ProxySQL.


Installation of Proxytop is pretty straightforward:

## You may first need to install system Python and MySQL dev packages
## e.g. "sudo apt install python-dev libmysqlclient-dev"
pip install MySQL-python npyscreen
wget -P /usr/bin

At this stage, we have everything we need to demonstrate Proxytop. The lab we have setup provides a bunch of bash scripts to demonstrate load for reruns. I’m using following script under the bin directory:

root@localhost docker-mysql-proxysql]# ./bin/docker-benchmark.bash
[Fri Feb 16 10:19:58 BRST 2018] Dropping 'sysbench' schema if present and preparing test dataset:mysql: [Warning] Using a password on the command line interface can be insecure.
[Fri Feb 16 10:19:58 BRST 2018] Running Sysbench Benchmarksi against ProxySQL:sysbench 1.0.12 (using bundled LuaJIT 2.1.0-beta2)

This script is totally customizable to benchmark as parameters can be tuned within the script:


Now let’s take a look at the Proxytop utility. It has menu driven style similarly to Innotop. Once you are in the tool, use [tab] to toggle between screens. Various shortcuts are also available to do things like changing sort order (‘s’), filter on specific criteria (‘l’) or changing the refresh interval for the view you are on (‘+’ / ‘-’).

Current, y it supports viewing the following aspects of a ProxySQL instance.

  • ConnPool – “ProxySQL Connection Pool” statistics
  • QueryRules – “ProxySQL Query Rules” statistics and definitions
  • GloStat – “ProxySQL Global Status” statistics
  • ProcList – “ProxySQL Processlist” for all incoming DML / DQL
  • ComCount – “ProxySQL Command Counter” statistics

We’ll go each of these screens in detail.

ConnPool Screen:

This screen basically shows the Connection Pool, specifically:

  • MySQL hostname and port
  • Assigned ProxySQL hostgroup
  • Connection statistics: Used / Free / OK / Error
  • MySQL Server state in ProxySQL i.e. ONLINE / OFFLINE / etc.
  • MySQL Server latency

Query Rules Screen:

This screen shows query rules and their use by count, and can be sorted either by rule_id or hits (ascending or descending) by cycling through the ordering list by pressing “s”.

It also allows you to view the actual definition of each rule by selecting and entering a rule. In the popup window, you will find a list of the relevant and defined columns for the query rule. For example:

If you have a lot of query rules defined, you can filter on a specific rule by pressing the letter “l”:

Global Statistics Screen: This screen shows Global Statistics from ProxySQL divided into four sections.

  • Connection Information
  • Prepared Statement Information
  • Command Information
  • Query Cache information

Proclist Screen: In this screen, we’re able to see running active queries with a minimum of a five-second refresh interval. In this way you can monitor long running queries in flight for troubleshooting:

ComCount Screen: This screen shows all command types executed with the total time and counts for each type, and also provides drill down to view the number of queries executed within specific ranges. This way type of workload can be easily identified both during testing and production:

You can drill down on each Com by using arrows and hitting enter key:

We all know the power of command line utilities such as proxysql-admin. The proxysql-admin utility is designed to be part of the configuration and ad-hoc monitoring of ProxySQL that is explained here in this blog post. Proxytop is designed to be menu driven to repeat commands in intervals. You can easily monitor and administer ProxySQL from the command line, but sometimes running recursive commands and monitoring over a period of time is annoying. This tool helps with that situation.

by Alkin Tezuysal at March 09, 2018 03:18 PM

March 08, 2018

Peter Zaitsev

Binlog Encryption with Percona Server for MySQL

binlog encryption

In this blog post, we’ll look at how to turn on binlog encryption in Percona Server for MySQL.

Why do I need this?

As you probably know, Percona Server for MySQL’s binlog contains sensitive information. Replication uses the binlog to copy events between servers. They contain all the information from one server so that it can be applied on another. In other words, if somebody has access to a binlog, it means they have access to all the data in the server. Moreover, said person (or, “Hacker”) could create a clone copy of our server by just making a replica of it. In the end, they have access to our binlog. This shows how important protecting a binlog really is – leakage of binlogs not only make a particular table/tablespace or a group of tables accessible to a hacker, but literally the whole server is at risk. The same situation is true with relay log – a relay log is really a copy of binlog on the slave server.

But have no fear – a new feature to the rescue – binary log encryption. Since Percona Server for MySQL version 5.7.20-19 (beta version) it is possible to enable binlog encryption for all the binlogs and relay logs produced by the server.

How do you turn it on?

To start binlog encryption, you need to start the server with –encrypt-binlog=1. This, in turn, requires –master_verify_checksum and –binlog_checksum both to be ON. Also, you need to install one of the keyring plugins.

From now on all the binlogs and relay logs produced by the server get encrypted. However, for the replication to be safe as a whole the connection between servers also has to be encrypted. See for details on how to do this.

Please note that this does not mean that all binlogs in our replication schema get encrypted. Remember you need to turn on encrypt-binlog on slave servers too, even if they do not produce binlog files. Slave servers still produce relay logs when replicating from a master server. Turn on encrypt-binlog on slave servers so that their relay logs also get encrypted.

How does this work in the big picture?

The master encrypts the event before writing it into the binlog. The slave connects to master and ask for events. The master decrypts the events from the binary log and sends them over to slave.

Note that events send between the master and slave servers are not encrypted! This is why the connection between the master and slave needs to use a secure channel, i.e., TLS.

The slave receives events from the master, encrypts them and writes them down into the relay log.

That is why we need to enable encrypt-binlog on a slave. The relay log has to get encrypted too.

Next, the slave decrypts events from relay log and applies them. After applying the event the slave encrypts it and writes it down into its binlog file (given binlog is enabled on the slave).

In summary to make our replication secure, we need:

  • Turn on encrypt-binlog on the master
  • Turn on encrypt-binlog on the slave
  • The connection between master and slave needs to use TLS.

It’s worth noting that servers in replication have no idea if other servers are encrypted or not.

Why do master_verify_checksum and binlog_checksum need to be turned ON?

This is needed for “authenticate encryption”. Simply put, this is how we make sure that what we decrypt has not been changed by a third party. Also, it checks if the key that was used to decrypt the event was the correct one.

Digging deeper with mysqlbinlog

Mysqlbinlog is a standalone application that lets you read binlog files. As I write this blog post, it is not capable of decrypting binary logs – at least not by itself. However, it still can read encrypted binlog files when using a running Percona Server for MySQL. Use option –read-from-remote-server to read binary log produced by a given server.

Let’s see what happens when we try to read an encrypted binlog with mysqlbinlog without read-from-remote-server enabled. You will get something like this:

As you can see it is only possible to read binary log till event type 9f gets read. This event is the Start_encryption_event. After this event the rest of the binlog is encrypted. One thing to note is that Start_encryption_event is never propagated in replication. For instance, the master server is run with –encryt_binlog. This means that the server writes Start_encryption_event to its binary logs. However it is never sent to the slave server (the slave has no idea whether the master is encrypted).

Another option you can use with mysqlbinlog is –force option. It forces mysqlbinlog to read all the events from the binlog, even if they are encrypted. You will see something like this in the output:

As you can see, it is only possible to read two first events – until the Start_encryption_event. However, this time we can see that there are other events that follow, which are encrypted.

Running mysqlbinlog (without –read-from-remote) on encrypted binary logs may only make sense if we want to see if a given binary log is encrypted. For point-in-time recovery, and for other purposes that might require reading encrypted binlog, we would use mysqlbinlog with –read-from-remote option.

For instance, if we want to read binlog master-bin.000001, and Percona Server for MySQL is running on, port 3033, with user:robert, password:hard_password, we would use mysqlbinlog like this:

mysqlbinlog –read-from-remote-server –protocol=tcp –host= –port=3033 –user=robert –password=hard_password master-bing.000001.

When you look at the output of this command, you see something like this:

You can now see the decrypted binlog. One interesting thing to note here is that we do not see our Start_encryption_event (type 9f). This proves my point – Start_encryption_event never leaves the server (we are reading from the server now as we use –read-from-remote-server).

For more information how to use mysqlbinlog for point-in-time recovery see

However, for more modern approaches for point-in-time recovery that do not use mysqlbinlog and make use of parallel appliers, see here:

Have fun with binlog encryption!

by Robert Golebiowski at March 08, 2018 10:11 PM

Migrating MySQL Users to Amazon RDS

Migrating MySQL Users to Amazon RDS

Migrating MySQL Users to Amazon RDSIn this blog post, we’ll look at what is needed when migrating MySQL users to Amazon RDS. We’ll discuss how we can transform MySQL user grants and make them compatible with Amazon RDS.

In order to deliver a managed service experience, Amazon RDS does not provide shell access to the underlying operating system. It also restricts access to certain procedures that require advanced privileges.

Every MySQL instance has some users with ALL PRIVILEGES, and you can’t directly migrate these users to Amazon RDS because it does not support following privileges for regular users.

  • SUPER – Enable use of other administrative operations such as CHANGE MASTER TO, KILL, PURGE BINARY LOGS, SET GLOBAL, and mysqladmin debug command. Level: Global.
  • SHUTDOWN – Enable use of mysqladmin shutdown. Level: Global.
  • FILE – Enable the user to cause the server to read or write files. Level: Global.
  • CREATE TABLESPACE – Enable tablespaces and log file groups to be created, altered, or dropped. Level: Global.

The RDS parameter groups manage changes to the MySQL configuration (dynamic and non-dynamic variables). Amazon RDS also provides stored procedures to perform various administrative tasks that require SUPER privileges.

For example, we’ve got this user in MySQL instance running on Amazon EC2.

db01 (none)> show grants for percona@'%';
| Grants for percona@%                                                                                                              |
1 row in set (0.00 sec)

If we try to run the same grants in RDS, it will fail.

[RDS] (none)> GRANT ALL PRIVILEGES ON *.* TO 'percona'@'%' IDENTIFIED BY PASSWORD '*497030855D20D6B22E65436D0DFC75AA347B32F0' WITH GRANT OPTION;
ERROR 1045 (28000): Access denied for user 'admin'@'%' (using password: YES)

We’ll follow these steps for migrating users to RDS.

  1. Identify users with privileges that aren’t supported by RDS.
  2. Export their grants using pt-show-grants.
  3. Import grants in a separate clean MySQL instance running the same version.
  4. Remove the forbidden privileges using the REVOKE statement.
  5. Export grants again using pt-show-grants and load them to RDS.

Identify users having privileges that aren’t supported by RDS

First, we’ll find the users with privileges that aren’t supported by Amazon RDS. I’ve excluded the localhost users because there is no direct shell access in RDS and you shouldn’t migrate these users.

db01 (none)> select concat("'",user,"'@'",host,"'") as 'user',
CONCAT("REVOKE SUPER, SHUTDOWN, FILE, CREATE TABLESPACE ON *.* FROM '",user,"'@'",host,"';") as 'query' from mysql.user
where host not in  ('localhost','')
and (Super_Priv='Y' OR Shutdown_priv='Y' OR File_priv='Y' OR Create_tablespace_priv='Y');
| user          | query                                                                      |
| 'appuser'@'%' | REVOKE SUPER, SHUTDOWN, FILE, CREATE TABLESPACE ON *.* FROM 'appuser'@'%'; |
| 'percona'@'%' | REVOKE SUPER, SHUTDOWN, FILE, CREATE TABLESPACE ON *.* FROM 'percona'@'%'; |
2 rows in set (0.00 sec)

We’ve two users with incompatible grants. Let’s transform their grants to make them compatible with RDS. We’ll use the query in second column output later in this process.

Export grants using pt-show-grants

The next step is exporting these two users’ grants using pt-show-grants:

[root@db01 ~]# pt-show-grants --only='appuser'@'%','percona'@'%'
-- Grants dumped by pt-show-grants
-- Dumped from server Localhost via UNIX socket, MySQL 5.6.38-83.0 at 2018-02-24 10:02:21
-- Grants for 'appuser'@'%'
GRANT FILE ON *.* TO 'appuser'@'%' IDENTIFIED BY PASSWORD '*46BDE570B30DFEDC739A339B0AFA17DB62C54213';
-- Grants for 'percona'@'%'

As we can see from above output, both users have at least one privilege that isn’t supported by RDS. Now, all we need to do is to import these users into a separate clean MySQL instance running the same version, and REVOKE the privileges that aren’t supported by RDS.

Import users in a separate MySQL instance running the same version

I’m going to import grants in a separate VM where I’ve just installed Percona Server for MySQL 5.6. Let’s call this instance as db02:

[root@db02 ~]# pt-show-grants --host=db01 --only='appuser'@'%','percona'@'%' --user=percona --ask-pass | mysql
Enter password:

Remove the forbidden privileges using the REVOKE statement

In this step, we will use REVOKE statement from Step 1 to remove the privileges that aren’t supported by Amazon RDS:

Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)

Export grants again using pt-show-grants and load them to RDS

At this point, db02 has the grants that are compatible with RDS. Let’s take a look at them:

[root@db02 ~]# pt-show-grants --only='appuser'@'%','percona'@'%'
-- Grants dumped by pt-show-grants
-- Dumped from server Localhost via UNIX socket, MySQL 5.6.39-83.1 at 2018-02-24 10:10:38
-- Grants for 'appuser'@'%'
GRANT USAGE ON *.* TO 'appuser'@'%' IDENTIFIED BY PASSWORD '*46BDE570B30DFEDC739A339B0AFA17DB62C54213';
-- Grants for 'percona'@'%'

These grants look good, these can be safely migrated to RDS now. Let’s do it:

[RDS] mysql> GRANT USAGE ON *.* TO 'appuser'@'%' IDENTIFIED BY PASSWORD '*46BDE570B30DFEDC739A339B0AFA17DB62C54213';
Query OK, 0 rows affected (0.32 sec)
Query OK, 0 rows affected (0.31 sec)
Query OK, 0 rows affected (0.34 sec)

We have successfully migrated users to Amazon RDS, which would have failed in direct migration.

What about rest of the users that don’t have SUPER/SHUTDOWN/FILE/CREATE TABLESPACE privileges? Well, it’s easy. We can migrate them directly using pt-show-grants. They don’t need any transformation before migration.

List them using the following query:

db01 (none)> select concat("'",user,"'@'",host,"'") as 'user' from mysql.user where host not in  ('localhost','') and (Super_Priv<>'Y' AND Shutdown_priv<>'Y' AND File_priv<>'Y' AND Create_tablespace_priv<>'Y');
| user                  |
| 'readonly'@'%'        |
| 'repl'@'' |
2 rows in set (0.01 sec)

Export them using pt-show grants and load to RDS.

[root@db01 ~]# pt-show-grants --only='readonly'@'%','repl'@'' | mysql --host=<rds.endpoint> --user=percona -p
Enter password:


Amazon RDS is a great platform for hosting your MySQL databases. When migrating MySQL users to Amazon RDS, some grants might fail because of having privileges that aren’t supported by RDS. Using pt-show-grants from Percona Toolkit and a separate clean MySQL instance, we can easily transform grants and migrate MySQL users to Amazon RDS without any hassle.

by Alok Pathak at March 08, 2018 07:49 PM

Jean-Jerome Schmidt

Updated: ClusterControl Tips & Tricks - Transparent Database Failover for your Applications

ClusterControl is a great tool to deploy and manage databases clusters - if you are into MySQL, you can easily deploy clusters based on both traditional MySQL master-slave replication, Galera Cluster or MySQL NDB Cluster. To achieve high availability, deploying a cluster is not enough though. Nodes may (and will most probably) go down, and your system has to be able to adapt to those changes.

This adaptation can happen at different levels. You can implement some kind of logic within the application - it would check the state of cluster nodes and direct traffic to the ones which are reachable at the given moment. You can also build a proxy layer which will implement high availability in your system. In this blog post, we’d like to share some tips on how you can achieve that using ClusterControl.

Deploying HAProxy using the ClusterControl

HAProxy is the standard - one of the most popular proxies used in connection with MySQL (but not only, of course). ClusterControl supports deployment and monitoring of HAProxy nodes. It also helps to implement high availability of the proxy itself using keepalived.

Deployment is pretty simple - you need to pick or fill in the IP address of a host where HAProxy will be installed, pick port, load balancing policy, decide if ClusterControl should use existing repository or the most recent source code to deploy HAProxy. You can also pick which backend nodes you’d like to have included in the proxy configuration, and whether they should be active or backup.

By default, the HAProxy instance deployed by ClusterControl will work on MySQL Cluster (NDB), Galera Cluster, PostgreSQL streaming replication and MySQL Replication. For master-slave replication, ClusterControl can configure two listeners, one for read-only and another one for read-write. Applications will then have to send reads and writes to the respective ports. For multi-master replication, ClusterControl will setup the standard  TCP load-balancing based on least connection balancing algorithm (e.g., for Galera Cluster where all nodes are writeable).

Keepalived is used to add high availability to the proxy layer. When you have at least two HAProxy nodes in your system, you can install Keepalived from the ClusterControl UI.

You’ll have to pick two HAProxy nodes and they will be configured as an active - standby pair. A Virtual IP would be assigned to the active server and, should it fail, it will be reassigned to the standby proxy. This way you can just connect to the VIP and all your queries will be routed to the currently active and working HAProxy node.

You can find more details in how the internals are configured by reading through our HAProxy tutorial.

Deploying ProxySQL using ClusterControl

While HAProxy is a rock-solid proxy and very popular choice, it lacks database awareness, e.g., read-write split. The only way to do it in HAProxy is to create two backends and listen on two ports - one for reads and one for writes. This is, usually, fine but it requires you to implement changes in your application - the application has to understand what is a read and what is a write, and then direct those queries to the correct port. It’d be much easier to just connect to a single port and let the proxy decide what to do next - this is something HAProxy cannot do as what it does is just routing packets - no packet inspection is done and, especially, it has no understanding of the MySQL protocol.

ProxySQL solves this problem - it talks MySQL protocol and it can (among other things) perform a read-write split through its powerful query rules and route the incoming MySQL traffic according to various criterias. Installation of MaxScale from ClusterControl is simple - you want to go to Manage -> Load Balancer section and fill the “Deploy ProxySQL” tab with the required data.

In short, we need to pick where ProxySQL will be installed, what administration user and password it should have, which monitoring user it should use to connect to the  MySQL backends and verify their status and monitor state. From ClusterControl, you can either create a new user to be used by the application - you can decide on its name, password, access to which databases are granted and what MySQL privileges that user will have. Such user will be created on both MySQL and ProxySQL side. Second option, more suitable for existing infrastructures, is to use the existing database users. You need to pass username and password, and such user will be created only on ProxySQL.

Finally, you need to answer a question: are you using implicit transactions? By that we understand transactions started by running SET autocommit=0; If you do use it, ClusterControl will configure ProxySQL to send all of the traffic to the master. This is required to ensure ProxySQL will handle transactions correctly in ProxySQL 1.3.x and earlier. If you don’t use SET autocommit=0 to create new transaction, ClusterControl will configure read/write split.

ProxySQL, as every proxy, can become a single point of failure and it has to be made redundant to achieve high availability. There are a couple of methods to do that. One of them is to collocate ProxySQL on the web nodes. The idea here is that, most of the time, the ProxySQL process will work just fine and the reason for its unavailability is that the whole node went down. In such case, if ProxySQL is collocated with the web node, not much harm has been done because that particular web node will not be available either.

Another method, is to use Keepalived in a similar way like we did in the case of HAProxy.

You can find more details in how the internals are configured by reading through our ProxySQL tutorial.

by ashraf at March 08, 2018 10:50 AM

March 07, 2018

Peter Zaitsev

Using MongoDB 3.6 Change Streams

MongoDB 3.6 Change Streams

In this blog post, we will explore MongoDB 3.6 change streams – a powerful new feature that added in the latest version of MongoDB.

MongoDB 3.6 Change Streams

Streaming data workflows are becoming increasingly popular in open-source due to their ability to allow efficient, asynchronous, near-real-time updates that benefit users becoming more and more accustomed to real-time experiences in technology.

Before the rise of streaming workflows and open-source streaming frameworks such as Akka, Spark and Storm, it was much more common to process data in what is called a “batch-processing” workflow. Here, potentially massive amounts of data are queried and processed in a large batch, often once or a few times daily. This processing style has the drawbacks of operating on data AFTER it was written/changed, the inefficiencies caused by querying large amounts of data in a single instance, not to mention the latency in receiving results when doing so.

Stream workflows benefit from the high-efficiency of processing at change-time (usually asynchronously) while also providing more up-to-date results as a free side effect. These benefits make this approach popular in “real-time” user-facing systems like social media, gaming and trading and even backend ETL systems.


Before MongoDB 3.6, the most common way to implement a “change stream”-like data system in MongoDB was by using tailable cursors to “watch” the MongoDB replication oplog.

The use of the oplog requires that you enable replication, whether or not you have a single node. The tailable cursor method is possible as the replication oplog is a queriable capped-collection in MongoDB, available to any MongoDB shell or driver.

The drawbacks to using the oplog as a change stream source are:

  1. Server-wide Changelog. As the oplog is intended for replication purposes, it represents ALL changes to data in a server. Chances are the stream of changes you’re interested is limited to one or a handful of collections, not every change in the entire server/replica-set! This usually means an application that “tails” the oplog must read and throw away entries it doesn’t care about as it processes the stream. This is inefficient in processing and network usage as many oplog changes are not useful but still processed.
  2. Tailable Cursors. While creating a tailable cursor is possible in virtually any MongoDB driver, it’s often not the most friendly thing to code and generally drivers do not have a singular “helper function” to do so. Usually, the application developer needs to use loops and handle various conditions that may occur in the tailing of the tailable cursor. Remember, more code generally equals more opportunities for bugs.
  3. Internal Oplog Format. Designed for “internal” server use, the oplog format is optimized for efficiency and uses obscure field names to describe changes. Also, theoretically the oplog format could change in the future. These two problems can lead to increased code maintenance.

MongoDB 3.6 Change Streams

MongoDB 3.6 added “Change Streams”, handled via the new collection-level method named “”. This function opens a Change Stream Cursor on a given collection, returning changes to the collection in a predictable format called Change Events.

An example of a change returned by this new feature (due to an insert to “test.test” in another session):

	"_id" : {
		"_data" : BinData(0,"glqW/CsAAAABRmRfaWQAZFqW/CukzAygJGLkVwBaEASV+CeIpHBBKKVaH0KcDV5OBA==")
	"operationType" : "insert",
	"fullDocument" : {
		"_id" : ObjectId("5a96fc2ba4cc0ca02462e457"),
		"x" : 1
	"ns" : {
		"db" : "test",
		"coll" : "test"
	"documentKey" : {
		"_id" : ObjectId("5a96fc2ba4cc0ca02462e457")

The “” function takes in an aggregation pipeline as it’s first optional field and a document of “options” as the second optional field. Passing no fields to the function causes it to perform no aggregation and use default options.

“” supports the following aggregation functions to be passed as an optional pipeline:

  1. $match
  2. $project
  3. $addFields
  4. $replaceRoot
  5. $redact

Similar to the pre-3.6 method described earlier, the change stream feature requires you enable replication, and the operation errors if it is not. If you run a standalone server, you can still enable replication with a single member only.

The benefits of the new feature are numerous:

  1. Collection-Level. Streaming of changes can now occur on a per-collection basis (not a server-wide basis!). Further filtering is possible via passing a $match aggregation to the function.
  2. Efficient Processing. Collection and $match-level filtering mean only relevant changes are returned to the application instead of every change occurring in the server, reducing processing and network usage.
  3. Change Event Format. Changes are presented as Change Events, not internal-replication oplog entries.
  4. Simplified Approach. Application developers have less code to maintain due to moving a lot of the logic required to implement “tailable cursors” server-side.
  5. Majority Read Concern. The change streams feature uses Majority Read Concern, meaning changes returned are guaranteed to be durable following a replica set rollback. This is fantastic for data integrity!

Resuming Change Streams

By default, Change Streams will stop on error or if no changes occurred in the default timeout of 1000 milliseconds, this timeout could be overridden using the ‘maxAwaitTimeMS’ option to your operation.

This behavior means Change Streams sometimes need to be resumed from the last successful change. Resuming change streams from the last successful change can be done by passing the ‘_id’ of the last event read as the ‘resumeAfter’ option to your operation.

Production Considerations

As change streams use MongoDB’s replication technologies behind the scenes, there are some things to consider in production:

  1. Majority Read Concern. Change streams require a majority of data-bearing members to be alive, meaning streams may pause if you lose a majority of members or if the majority is relying on an arbiter.
  2. Oplog Size. The oplog must be large enough for the stream events to exist until the time of processing by the application.
  3. Drop / Rename Collection. Change streams on dropped or renamed collections receive an error, breaking the stream.

In a sharded cluster, there are additional items to consider:

  1. Fan-Out Stream. To maintain total ordering, the change stream is executed on all shards in a cluster and is as fast as the slowest shard.
  2. Multi-Updates. Under sharding, failed multi-document updates can sometimes create change events for orphaned documents. Try to avoid using multi updates if this is important. This problem is fixable in MongoDB 4.0+ via ACID-compliant transactions.

Use Case: Email Sender

Let’s pretend we have an email system that uses MongoDB as the source of truth. A single email is a single MongoDB document in the collection: “email.emails”.

In our example, we must write an application that sends our emails (over SMTP) when the “sendNow” boolean field is set to “true” in the emails collection document.

When an email is ready to be sent, the application issues this update on a single ‘_id’ field:

> db.emails.update(
    { "_id": ObjectId("5a97fdd4a4cc0ca02462e45c") },
    { $set: { sendNow: true } }
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Using change streams, we can “watch” the emails collection for update operations that match our criteria!

Below I have created a change stream that uses an aggregation. The aggregation matches change events containing the “update” operationType and the field “sendNow” is set to “true” in the update. We store updated fields under the “updateDescription.updatedFields” sub-document in Change Events, so the full name for the “sendNow” field becomes: “updateDescription.updatedFields.sendNow”.

As our sender application only needs the document key to query an email for sending, I added a $project aggregation step to strip-down the result to only the Change Event “documentKey” plus the “_id” field that is returned by default.

The result is this operation:

    { $match: {
      "operationType": "update",
      "updateDescription.updatedFields.sendNow": true
    } },
    { $project: {
      documentKey: 1
    } }
	"_id" : {
		"_data" : BinData(0,"glqYAA4AAAABRmRfaWQAZFqX/dSkzAygJGLkXABaEARxbXGgdm5K9ZnzwSfCfmNbBA==")
	"documentKey" : {
		"_id" : ObjectId("5a97fdd4a4cc0ca02462e45c")

Now, when emails get marked as “sendNow” we have a stream of document keys that are ready to be sent immediately!

This makes a very intuitive and responsive workflow! In this case, our email sender now knows it can send the email with ‘_id’ of ObjectId(“5a97fdd4a4cc0ca02462e45c”)!

Use Case: Backend Synchronization

Often large infrastructures have several data-related components that require synchronization. Some examples are caching tiers (Redis, Memcache, etc.), search engines (Apache Solr, Elasticsearch, etc.) and backend analytics systems.

Change streams make it easy for systems other than MongoDB to “hook into” a real-time stream of events, making synchronization of several backends easy. Used correctly, using this feature can also remove or reduce dependencies/reliance on message queues.

Some ideas this brings to mind:

  • Caching tiers pre-emptively cache data based on change events
  • Search Engines index important data based on change events
  • Replicating changes to an incompatible backend data stores (business analytics, cold-storage, etc.)


I hope this article gives you some ideas on how to use MongoDB 3.6 change streams, a powerful new feature!

by Tim Vaillancourt at March 07, 2018 11:50 PM

Percona Live 2018 Featured Talk: Securing Your Data on PostgreSQL with Payal Singh

Payal PostgreSQL 1

Percona Live 2018 Featured TalkWelcome to another interview blog for the rapidly-approaching Percona Live 2018. Each post in this series highlights a Percona Live 2018 featured talk at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post highlights Payal Singh, DBA at OmniTI Computer Consulting Inc. Her talk is titled Securing Your Data on PostgreSQL. There is often a lack of understanding about how best to manage minimum basic application security features – especially with major security features being released with every major version of PostgreSQL. In our conversation, we discussed how Payal works to improve application security using Postgres:

Percona: Who are you, and how did you get into databases? What was your path to your current responsibilities?

Payal: I’m primarily a data addict. I fell in love with databases when it was first taught to me in high school. The declarative SQL syntax was intuitive to me, and efficient compared to other languages I had used (C and C++). I realized that if given the opportunity, I’d choose to become a database administrator. I joined OmniTI in summer of 2012 as a web engineer intern during my Masters, but grabbed the chance to work on an internal database migration project. Working with the DBA team gave me a lot of new insight and exposure, especially into open source databases. The more I learned, the more I loved my job. Right after completing my Masters I joined OmniTI as a full-time database administrator, and never looked back!

Percona: Your talk is titled ” Securing Your Data on PostgreSQL”. Why do you think that security (or the lack of it) is such an issue?

Payal: Securing your data is critical. In my experience, the one reason people using commercial databases are apprehensive of switching to open source alternatives is a lack of exposure to security features. If you look at open source databases today, specifically PostgreSQL, it has the most advanced security features: data encryptionauditingrow-level security to name a few. People don’t know about them, though. As a FOSS project, we don’t have a centralized marketing team to advertise these features to our potential user base, which makes it necessary to spread information through other channels. Speaking about it at a popular conference like Percona Live is one of them!

In addition to public awareness, Postgres is advancing at a lightning pace. With each new major version released every year, a bunch of new security feature additions and major improvements in existing security features are added. So much so that it becomes challenging to keep up with all these features, even for existing Postgres users. My talk on Postgres security aims to inform current as well as prospective Postgres users about the advanced security features that exist and their use case, useful tips to use them, the gotchas, what’s lacking and what’s currently under development.

Percona: Is PostgreSQL better or worse with security and security options than either MySQL or MongoDB? Why?

Payal PostgreSQL 1Payal: I may be a little biased, but I think Postgres is the best database from a security point of view. MySQL is pretty close though! There are quite a few reasons why I consider Postgres to be the best, but I’d like to save that discussion for my talk at Percona Live! For starters though, I think that Postgres’s authentication and role architecture significantly clearer and more straightforward than MySQL’s implementation. Focusing strictly on security, I’d also say that access control and management is more granular and customizable in Postgres than it is in MySQL – although here I’d have to say MySQL’s ACL is easier and more intuitive to manage.

Percona: What is the biggest challenge for database security we are facing?

Payal: For all the databases? I’d say with the rapid growth of IoT, encrypted data processing is a huge requirement that none of the well-known databases currently provide. Even encryption of data at rest outside of the IoT context requires more attention. It is one of the few things that a DBMS can do as a last-ditch effort to protect its data in SQL injection attacks, if all other layers of security (network, application layer, etc.) have failed (which very often is the case).

Percona: Why should people attend your talk? What do you hope people will take away from it? 

Payal: My talk is a run-through of all current and future Postgres security features, from the basic to the very advanced and niche. It is not an isolated talk that assumes Postgres is the only database in the world. I often compare and contrast other database implementations of similar security features as well. Not only is it a decent one-hour primer for people new and interested in Postgres, but also a good way to weigh the pros and cons among databases from a security viewpoint.

Percona: What are you looking forward to at Percona Live (besides your talk)?

Payal: I’m looking forward to all the great talks! I got a lot of information out of the talks at Percona Live last year. The tutorials on new MySQL features were especially great!

Want to find out more about this Percona Live 2018 featured talk, and Payal and PostgreSQL security? Register for Percona Live 2018, and see her talk Securing Your Data on PostgreSQL. Register now to get the best price! Use the discount code SeeMeSpeakPL18 for 10% off.

Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

by Dave Avery at March 07, 2018 06:43 PM

Henrik Ingo

impress.js 1.0 is released!

I have released impress.js v1.0. Impress.js is a framework for creating awesome 3D presentations with standard HTML5 and CSS. (Similar to Prezi + 3D and open source.)

From the release notes:

  • New plugin based architecture allows adding more features without bloating core src/impress.js file
  • Source files are in src/ and compiled into js/impress.js with npm run build. End users should continue to use js/impress.js as before.
  • 19 new plugins
  • Integrates impressConsole.js by default (press 'P' to open speaker console)
  • Markdown support for those that are too much in a hurry to type HTML
  • read more

by hingo at March 07, 2018 02:18 PM

March 06, 2018

Peter Zaitsev

Enabling InnoDB Tablespace Encryption on Percona XtraDB Cluster 5.7

InnoDB Tablespace Encryption

InnoDB Tablespace EncryptionSecurity is one of the hottest topics lately, and in this blog post, I will walk you through what needs to be configured to have a working three-node Percona XtraDB Cluster running with InnoDB Tablespace Encryption enabled.

This article will not cover the basics of setting up a cluster nor will it cover how to create SSL certs and keys since both of these topics have been well explained here and here.

Just to give you a brief history, InnoDB tablespace encryption was introduced in MySQL 5.7.11, and starting from Percona XtraDB Cluster 5.7.16 this feature was fully supported if coupled with SSL-based encryption of SST traffic. However, for this blog post I recommend using the latest Percona XtraDB Cluster 5.7.20-19 release. It has the recent fix that affects incremental state transfer when keyring-file-data is set.

What do you need to enable InnoDB tablespace encryption? If you are an avid reader of this blog, then you might have read this awesome article from Manjot Singh and Matthew Boehm about MySQL Encryption at rest – Part 2. The two important configuration options are:

This alone lets you use the plugin to encrypt InnoDB tablespaces, given that InnoDB file per table is enabled. But to get state transfer (SST/IST) to work between cluster nodes you should also configure the SSL-related configuration in the [mysqld] and [sst] sections of your configuration file.

Doing It the Easy Way

To make life easier and less complicated, we’ve added an option to take care of the job for you through the automatic configuration of SSL encryption with one variable: pxc-encrypt-cluster-traffic=ON. This is the recommended option. Once set, it will look for the SSL keys and certificate files in the ssl-ca, ssl-cert and ssl-key options under [mysqld]. If you don’t set these, it then looks for the necessary SSL keys and certificate files in the data directory.

The next step is to create the SSL certs and keys by following the instructions in the manual. Note that for some distributions, like RPM packages, the SSL keys and certificate file are automatically created upon data directory initialization by invoking mysql_ssl_rsa_setup.

You only need to securely transfer the SSL files from one node to another. This doesn’t include the keyring file, which the wsrep_sst_xtrabackup-v2 script handles.

We recommend using wsrep_sst_method=xtrabackup-v2, so we need to declare the keyring-file-data option under the [xtrabackup] section of the configuration file.


Taking everything into consideration, we should have something like this as a working configuration file:

socket = /var/lib/mysql/mysql.sock
datadir = /var/lib/mysql
user = mysql
log-error = /var/log/mysqld.err
wsrep_cluster_name = my_pxc_cluster
wsrep_provider = /usr/lib64/
wsrep_auto_increment_control = ON
wsrep_sst_method = xtrabackup-v2
wsrep_sst_auth = sstuser:passw0rd
wsrep_cluster_address = gcomm://,,
wsrep_node_address =
wsrep_node_name = pxc_node1
innodb_autoinc_lock_mode = 2
innodb_file_per_table = 1
server_id = 100
log_bin = mysql-bin
pxc-encrypt-cluster-traffic = ON
early-plugin-load =
keyring-file-data = /var/lib/mysql-keyring/keyring
streamfmt = xbstream
keyring-file-data = /var/lib/mysql-keyring/keyring

Doing It The Hard Way

If you prefer to keep your SSL keys and certificate files in a separate directory outside of the data directory, then you should declare the SSL-related variables under the [mysqld] section like this:

ssl-ca = /etc/mysql/ca.pem
ssl-cert = /etc/mysql/server-cert.pem
ssl-key = /etc/mysql/server-key.pem
pxc-encrypt-cluster-traffic = ON
early-plugin-load =
keyring-file-data = /var/lib/mysql-keyring/keyring

Lastly, if you prefer not to use the pxc-encrypt-cluster-traffic variable, you will need to declare the same SSL-related variables under the [sst] section like this:

streamfmt = xbstream
encrypt = 4
ssl-ca = /etc/mysql/ca.pem
ssl-cert = /etc/mysql/server-cert.pem
ssl-key = /etc/mysql/server-key.pem

And here is sample content from the log in the JOINER node

2018-02-01T15:19:14.099468Z 0 [Note] WSREP: Member 1.0 (pxc_enc_10.0.3.9) requested state transfer from '*any*'. Selected 0.0 (pxc_enc_10.0.3.152)(SYNCED) as donor.
2018-02-01T15:19:14.099562Z 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 70230)
2018-02-01T15:19:14.099753Z 2 [Note] WSREP: Requesting state transfer: success, donor: 0
2018-02-01T15:19:14.099873Z 2 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 39679102-c8e3-11e7-bdf7-77fd128ab7b1:70230
2018-02-01T15:19:16.526266Z 0 [Note] WSREP: (424bffbb, 'ssl://') connection to peer 424bffbb with addr ssl:// timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-02-01T15:19:16.526845Z 0 [Note] WSREP: (424bffbb, 'ssl://') turning message relay requesting off
        2018-02-01T15:19:18.807781Z WSREP_SST: [INFO] donor keyring received at: '/var/lib/mysql-keyring/donor-keyring'
        2018-02-01T15:19:18.826503Z WSREP_SST: [INFO] Proceeding with SST.........
        2018-02-01T15:19:18.994007Z WSREP_SST: [INFO] ............Waiting for SST streaming to complete!
2018-02-01T15:19:34.571567Z 0 [Note] WSREP: 0.0 (pxc_enc_10.0.3.152): State transfer to 1.0 (pxc_enc_10.0.3.9) complete.
2018-02-01T15:19:34.572022Z 0 [Note] WSREP: Member 0.0 (pxc_enc_10.0.3.152) synced with group.
        2018-02-01T15:19:34.684483Z WSREP_SST: [INFO] Preparing the backup at /var/lib/mysql//.sst
        2018-02-01T15:19:41.733319Z WSREP_SST: [INFO] Moving the backup to /var/lib/mysql/
        2018-02-01T15:19:41.853808Z WSREP_SST: [INFO] Moving sst keyring into place: moving /var/lib/mysql-keyring/donor-keyring to /var/lib/mysql-keyring/keyring
        2018-02-01T15:19:41.866181Z WSREP_SST: [INFO] Galera co-ords from recovery: 39679102-c8e3-11e7-bdf7-77fd128ab7b1:70230
2018-02-01T15:19:41.877406Z 0 [Note] WSREP: SST complete, seqno: 70230

And from the DONOR node, we will see this from the log file:

2018-02-01T15:19:14.099510Z 0 [Note] WSREP: Member 1.0 (pxc_enc_10.0.3.9) requested state transfer from '*any*'. Selected 0.0 (pxc_enc_10.0.3.152)(SYNCED) as donor.
2018-02-01T15:19:14.099615Z 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 70230)
2018-02-01T15:19:14.099877Z 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-02-01T15:19:14.100194Z 0 [Note] WSREP: Initiating SST/IST transfer on DONOR side (wsrep_sst_xtrabackup-v2 --role 'donor' --address '' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix ''  --binlog 'pxc_enc_1-bin' --gtid '39679102-c8e3-11e7-bdf7-77fd128ab7b1:70230')
2018-02-01T15:19:14.100986Z 1 [Note] WSREP: DONOR thread signaled with 0
2018-02-01T15:19:16.071875Z 0 [Note] WSREP: (133946dd, 'ssl://') turning message relay requesting off
        2018-02-01T15:19:17.679926Z WSREP_SST: [INFO] Streaming donor-keyring file before SST
        2018-02-01T15:19:28.810417Z WSREP_SST: [INFO] Streaming the backup to joiner at 4444
2018-02-01T15:19:34.570569Z 0 [Note] WSREP: 0.0 (pxc_enc_10.0.3.152): State transfer to 1.0 (pxc_enc_10.0.3.9) complete.
2018-02-01T15:19:34.570628Z 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 70230)
2018-02-01T15:19:34.571971Z 0 [Note] WSREP: Member 0.0 (pxc_enc_10.0.3.152) synced with group.
2018-02-01T15:19:34.572061Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 70230)

It’s easy setting up Percona XtraDB Cluster with InnoDB tablespace encryption. We just need to make sure to declare the above configuration on all nodes for state transfer to work.

by Jericho Rivera at March 06, 2018 10:40 PM

Percona Monitoring and Management 1.8.1 Is Now Available

Percona Monitoring and Management

Percona announces the release of Percona Monitoring and Management 1.8.1. PMM (Percona Monitoring and Management) is a free and open-source platform for managing and monitoring MySQL and MongoDB performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

This release contains bug fixes only and supersedes Percona Monitoring and Management 1.8.0.


  • PMM-2051: The ProxySQL Overview dashboard enables selecting more than one host group in the Hostgroup field.
  • PMM-2163: Dashboards based on the rds_exporter now use the node_cpu_average metric instead of node_cpu

Bug fixes

  • PMM-854: In some cases, databases and tables could be detected incorrectly
  • PMM-1745: For some queries, Query Abstract showed incorrect database name in QAN.
  • PMM-1928: In some cases, Query Analytics added a wrong schema
  • PMM-2014: Query Analytics could incorrectly include a schema from another server
  • PMM-2082: The PMM Query Analytics Settings dashboard had minor user interface problems.
  • PMM-2122: The time selector in Query Analytics showed time in the local timezone while time values in the Query Abstract: were in the UTC format.
  • PMM-2127: There was a typo in the QAN interface when there was no data
  • PMM-2129: In some cases, QAN could show an incorrect fingerprint if the query contained no table.
  • PMM-2171: The JSON section in Query Analytics was displayed incorrectly
  • PMM-2172: The CPU Usage metrics was not consistent in the System Summary dashboard
  • PMM-2173: Summary values were inconsistent
  • PMM-2174: Amazon Aurora nodes were not shown in the System Overview dashboard
  • PMM-2176: Lengthy queries were not displayed correctly in Query Analytics.
  • PMM-2177: The Incorrect Table name error appeared on the first load of Query Analytics.
  • PMM-2184: When port forwarding was used with Docker, the permanent redirects would break

by Borys Belinsky at March 06, 2018 07:52 PM

MariaDB AB

MaxScale HA setup using Keepalived and MaxCtrl

MaxScale HA setup using Keepalived and MaxCtrl Esa Korhonen Tue, 03/06/2018 - 11:38

MariaDB MaxScale is a database proxy which does load balancing and query routing from client applications to backend database servers. In a basic configuration, MaxScale is a single point of failure. In this blog post we show how to setup a more resilient MaxScale HA cluster using Keepalived and MaxCtrl.

Keepalived is a routing software for load balancing and high-availability. It has several applications, but for this tutorial the goal is to set up a simple IP failover between two machines running MaxScale. If the main server fails the backup machine takes over, receiving any new connections. The Keepalived settings used in this tutorial follow the example given in simple keepalived failover setup on Ubuntu 14.04.

The configuration examples in this blog are for a setup where two MaxScales are monitoring one database cluster. Two hosts and one client machine are used, all in the same LAN. Hosts run MaxScale and Keepalived. The backend servers may be running on one of the hosts, e.g. in docker containers, or on separate machines for a more realistic setup. Clients connect to the virtual IP (VIP), which is claimed by the current master host.

MaxScale HA Cluster (1).jpg

Once configured and running, the different Keepalived nodes continuously broadcast their status to the network and listen for each other. If a node does not receive a status message from another node with a higher priority than itself, it will claim the VIP, effectively becoming the master. Thus, a node can be put online or removed by starting and stopping the Keepalived service.

If the current master node is removed (e.g. by stopping the service or pulling the network cable) the remaining nodes will quickly elect a new master and future traffic to the VIP will be directed to that node. Any connections to the old master node will naturally break. If the old master comes back online, it will again claim the VIP, breaking any connections to the backup machine.

MaxScale has no knowledge of this even happening. Both MaxScales are running normally, monitoring the backend servers and listening for client connections. Since clients are connecting through the VIP, only the machine claiming the VIP will receive incoming connections. The connections between MaxScale and the backends are using real IPs and are unaffected by the VIP.


MaxScale does not require any specific configuration to work with Keepalived in this simple setup, it just needs to be running on both hosts. The MaxScale configurations should be roughly similar on both hosts if you plan on synchronizing any changes between the MaxScale instances.  Specifically, both instances should have the same services and listeners so they appear identical to client applications. Setting the service-level setting “version_string” to different values on the MaxScale nodes is recommended, as it will be printed to any connecting clients indicating which node was connected to.

[Read-Write Service]

Keepalived requires specific setups on both machines. On the primary host, the /etc/keepalived/keepalived.conf-file should be as follows.

vrrp_instance VI_1 {
state MASTER
    interface eth0
    virtual_router_id 51
    priority 150
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass mypass
    virtual_ipaddress {

The state must be MASTER on both hosts. virtual_router_id and auth_pass must be identical on all hosts. The interface defines the network interface used. This depends on the system, but often the correct value is eth0, enp0s12f3 or similar. priority defines the voting strength between different Keepalived instances when negotiating on which should be the master. The instances should have different values of priority. In this example, the backup host(s) could have priority 149, 148 and so on. advert_int is the interval between a host “advertising” its existence to other Keepalived host. One second is a reasonable value.

virtual_ipaddress (VIP) is the IP the different Keepalived hosts try to claim and must be identical between the hosts. For IP negotiation to work, the VIP must be in the local network address space and unclaimed by any other machine in the LAN.

An example keepalived.conf-file for a backup host is listed below.

vrrp_instance VI_1 {
   state MASTER
   interface eth0
   virtual_router_id 51
   priority 100
   advert_int 1
   authentication {
   auth_type PASS
     auth_pass mypass
   virtual_ipaddress {

Once the Keepalived service is running, recent log entries can be printed with the command service keepalived status.

Aug 11 10:27:59 maxscale2 Keepalived_vrrp[27369]: VRRP_Instance(VI_1) Received higher prio advert
Aug 11 10:27:59 maxscale2 Keepalived_vrrp[27369]: VRRP_Instance(VI_1) Entering BACKUP STATE
Aug 11 10:27:59 maxscale2 Keepalived_vrrp[27369]: VRRP_Instance(VI_1) removing protocol VIPs.

MariaDB MaxScale Health Check

So far, none of this tutorial has been MaxScale-specific and the health of the MaxScale process has been ignored. To ensure that MaxScale is running on the current master host, a check script should be set. Keepalived runs the script  regularly and if the script returns an error value, the Keepalived node will assume that it has failed, stops broadcasting its state and relinquishes the VIP. This allows another node to take the master status and claim the VIP.


To define a check script, modify the configuration as follows. The example is for the primary node. See Keepalived Check and Notify Scripts for more information.

vrrp_script chk_myscript {
    script "/home/scripts/"
    interval 2 # check every 2 seconds
    fall 2 # require 2 failures for KO
    rise 2 # require 2 successes for OK
vrrp_instance VI_1 {
    state MASTER
    interface wlp2s0
    virtual_router_id 51
    priority 150
    advert_int 1
    authentication {
    auth_type PASS
       auth_pass mypass
    virtual_ipaddress {
    track_script {

An example script,, is listed below. The script uses MaxAdmin to try to contact the locally running MaxScale and request a server list, then check that the list has at least some expected elements. The timeout command ensures the MaxAdmin call exits in reasonable time. The script detects if MaxScale has crashed, is stuck or is totally overburdened and no longer responds to connections. Simply checking that the MaxScale process is running would be a simple yet likely an adequate option.

rm $fileName
timeout 2s maxadmin list servers > $fileName
if [ $to_result -ge 1 ]
    echo Timed out or error, timeout returned $to_result
    exit 3
    echo MaxAdmin success, rval is $to_result
    echo Checking maxadmin output sanity
    grep1=$(grep server1 $fileName)
    grep2=$(grep server2 $fileName)
    if [ "$grep1" ] && [ "$grep2" ]
        echo All is fine
        exit 0
        echo Something is wrong
        exit 3    

Aug 11 10:51:56 maxscale2 Keepalived_vrrp[20257]: VRRP_Script(chk_myscript) failed
Aug 11 10:51:57 maxscale2 Keepalived_vrrp[20257]: VRRP_Instance(VI_1) Entering FAULT STATE
Aug 11 10:51:57 maxscale2 Keepalived_vrrp[20257]: VRRP_Instance(VI_1) removing protocol VIPs.
Aug 11 10:51:57 maxscale2 Keepalived_vrrp[20257]: VRRP_Instance(VI_1) Now in FAULT state

MaxScale active/passive-setting

MariaDB MaxScale 2.2.2 introduced master/slave replication cluster management features (failover, switchover and rejoin). When running a setup with multiple MaxScales, only one MaxScale instance should be allowed to modify the master/slave replication cluster at any given time. This instance should be the one with MASTER Keepalived status. MaxScale does not know its Keepalived state, but MaxCtrl (a replacement for MaxAdmin) can set a MaxScale instance to passive mode. A passive MaxScale behaves similar to an active one with the exception that it won't perform failover, switchover or rejoin. Even manual versions of these commands will end in error. The passive/active mode differences may be expanded in the future.

To have Keepalived modify the MaxScale operating mode, a notify script is needed. This script is ran whenever Keepalived changes its state. The script file is defined in the Keepalived configuration file as notify.

  virtual_ipaddress {
  track_script {
  notify /home/scripts/

Keepalived calls the script with three parameters. In our case, only the third parameter, STATE, is relevant. An example script is below.


case $STATE in
 "MASTER") echo "Setting this MaxScale node to active mode" > $OUTFILE
              maxctrl alter maxscale passive false
              exit 0
 "BACKUP") echo "Setting this MaxScale node to passive mode" > $OUTFILE
              maxctrl alter maxscale passive true
              exit 0
 "FAULT")  echo "MaxScale failed the status check." > $OUTFILE
              maxctrl alter maxscale passive true
              exit 0
    *)     echo "Unknown state" > $OUTFILE
              exit 1

The script logs the current state to a text file and sets the operating mode of MaxScale. The FAULT case also attempts to set MaxScale to passive mode, although the MaxCtrl command will likely fail.

If all MaxScale/Keepalived instances have a similar notify script, only one MaxScale should ever be in active mode. The mode of a MaxScale instance can be checked with the command maxctrl show maxscale, shown below. This MaxScale is “active”. A later blog post will show MaxCtrl use in more detail.

[vagrant@maxscale1 ~]$ maxctrl show maxscale
│ Version      │ 2.2.2                                                  │
│ Parameters   │ {                                                      │
│              │ "libdir": "/usr/lib64/maxscale",                       │
│              │ "datadir": "/var/lib/maxscale",   
│              │ "passive": false,                                      │
│              │ "query_classifier": ""                                 │
│              │ }                                                      │

Get started with MariaDB MaxScale—download it today!

In this blog, we configure a simple HA setup of two MariaDB MaxScales using Keepalived and MaxCtrl.

Login or Register to post comments

by Esa Korhonen at March 06, 2018 04:38 PM

Peter Zaitsev

Webinar Thursday March 8, 2018: How Percona Maintains Optimal Customer Health

Percona Maintains Optimal Customer Health

Percona Technical Account ManagerPlease join Percona Technical Account Manager Tim Sharp, as he presents How Percona Maintains Optimal Customer Health on Thursday, March 8, 2018 at 11:00 am PST (UTC -8) / 2:00 pm EST (UTC -5).


How do you guarantee optimal database performance for your critical applications and services? Proactive monitoring and intervention is one way.

Percona Technical Account Managers (TAM) are expert DBAs that provide proactive assistance to our Managed Service customers – helping to guarantee performance, availability and reliability.

In today’s webinar, we’ll discuss some of the tools and methodologies used by Percona’s Technical Account Manager (TAM) team to ensure our customers’ database infrastructures are both healthy and optimized. These practices include activities such as:

  • Regularly testing backups
  • Performing index reviews
  • Capacity planning

Please join us as we explore some of the best practices employed by Percona’s TAMs for maintaining our Managed Services customers.

Register for the webinar now.

Tim SharpTim Sharp, Technical Account Manager

Tim joined Percona in the summer of 2013 as a MySQL Technical Account Manager. His areas of knowledge include embedded database technologies, customer service and the traditional Linux/Apache/MySQL/PHP stack. In his free time, Tim enjoys cheese, IPA beer, exploring the hiking trails of Cascadia and knitting. Tim lives on Vancouver Island with his partner Aeron, a Bernese Mountain Dog named Maggy and a kitten called Peaches.

by Tim Sharp at March 06, 2018 04:14 PM

Jean-Jerome Schmidt

New Webinar on How to Design Open Source Databases for High Availability

Join us March 27th for this webinar on how to design open source databases for high availability with Ashraf Sharif, Senior Support Engineer at Severalnines. From discussing high availability concepts through to failover or switch over mechanisms, Ashraf will cover all the need-to-know information when it comes to building highly available database infrastructures.

It’s been said that not designing for failure leads to failure; but what is the best way to design a database system from the ground up to withstand failure?

Designing open source databases for high availability can be a challenge as failures happen in many different ways, which sometimes go beyond imagination. This is one of the consequences of the complexity of today’s open source database environments.

At Severalnines we’re big fans of high availability databases and have seen our fair share of failure scenarios across the thousands of database deployment attempts that we come across every year.

In this webinar, we’ll look at the different types of failures you might encounter and what mechanisms can be used to address them. We will also look at some of popular high availability solutions used today, and how they can help you achieve different levels of availability.

Sign up for the webinar

Date, Time & Registration


Tuesday, March 27th at 09:00 BST / 10:00 CEST (Germany, France, Sweden)

Register Now

North America/LatAm

Tuesday, March 27th at 09:00 PDT (US) / 12:00 EDT (US)

Register Now


  • Why design for High Availability?
  • High availability concepts
    • CAP theorem
    • PACELC theorem
  • Trade offs
    • Deployment and operational cost
    • System complexity
    • Performance issues
    • Lock management
  • Architecting databases for failures
    • Capacity planning
    • Redundancy
    • Load balancing
    • Failover and switchover
    • Quorum and split brain
    • Fencing
    • Multi datacenter and multi-cloud setups
    • Recovery policy
  • High availability solutions
    • Database architecture determines Availability
    • Active-Standby failover solution with shared storage or DRBD
    • Master-slave replication
    • Master-master cluster
  • Failover and switchover mechanisms
    • Reverse proxy
    • Caching
    • Virtual IP address
    • Application connector

Sign up for the webinar


Ashraf Sharif is System Support Engineer at Severalnines. He was previously involved in hosting world and LAMP stack, where he worked as principal consultant and head of support team and delivered clustering solutions for large websites in the South East Asia region. His professional interests are on system scalability and high availability.

by jj at March 06, 2018 02:35 PM

MariaDB Foundation

2017 in the MariaDB Foundation

2017 was an excellent year for MariaDB. The Foundation was joined by several new sponsoring members; IBM, Alibaba Cloud, Tencent Cloud and Microsoft. This was the first year the Foundation did not run out of funds at the end of the year and we were able to fully compensate our staff. The members have also […]

The post 2017 in the MariaDB Foundation appeared first on

by Ian Gilfillan at March 06, 2018 10:02 AM

March 05, 2018

Peter Zaitsev

TPCC-Like Workload for Sysbench 1.0

TPCC-Like Workload for Sysbench

TPCC-Like Workload for SysbenchIn this post I’ll look at some of our recent work for benchmark enthusiasts: a TPCC-like workload for Sysbench (version 1.0 or later).

Despite being 25 years old, the TPC-C benchmark can still provide an interesting intensive workload for a database in my opinion. It runs multi-statement transactions and is write-heavy. We also decided to use Sysbench 1.0, which allows much more flexible LUA scripting that allows us to implement TPCC-like workload.

For a long time, we used the tpcc-mysql ( tool for performance evaluations of MySQL and Percona Server for MySQL, but we recognize that the tool is far from being intuitive and simple to use. So we hope the adaptation for Sysbench will make it easier to run.

Although we are trying to mimic the TPC-C standard guidance, there are some major variations we decided to introduce.

First, we do not use fully random text fields. These are hard to compress, and we want to be able to evaluate different compression methods in InnoDB and MyRocks.

Second, we allow you to use multiple table sets, compared to the standard one set of nine tables. The reason is that we want to test workloads on multiple tables and to somewhat emulate SaaS environments, where multiple clients share the same database.

So, there is a DISCLAIMER: this benchmark script was not validated and certified by the TPC organization. The results obtained can’t be named as TPC-C results, and the results are not comparable with any official TPC-C results:

How to run the benchmark:

We tried to make it as easy as possible to run the benchmark. You still need to take the following steps:

  1. Make sure you have Sysbench 1.0+ properly installed
  2. Get our scripts, located at
  3. Prepare the dataset
  4. Run

The command line might look like this:

./tpcc.lua --mysql-socket=/tmp/mysql.sock --mysql-user=root --mysql-db=sbt --threads=20 --tables=10 --scale=100 prepare

Where --scale is the number of warehouses, and --tables is the number of tables sets.

As a rough estimation, 100 warehouses with 1 table set produces about 10GB of data in non-compressed InnoDB tables (so 100 warehouses with 10 table sets gives about 100GB).

The nice thing about Sysbench is that it can load data in parallel (using N --threads). It also allows some extra options. For example, for MyRocks:

./tpcc.lua --mysql-socket=/tmp/mysql.sock --mysql-user=root --mysql-db=sbr --threads=20 --tables=10 --scale=100 --use_fk=0
--mysql_storage_engine=rocksdb --mysql_table_options='COLLATE latin1_bin' --trx_level=RC prepare

As MyRocks does not support Foreign Keys, so --use_fk=0. also MyRocks in Percona Server for MySQL does not support Repeatable-Read, so we use READ-COMMITTED (--trx_level=RC). MyRocks also requires a binary collation for string fields in indexes (--mysql_table_options='COLLATE latin1_bin').

To run the benchmark, execute:

./tpcc.lua --mysql-socket=/tmp/mysql.sock --mysql-user=root --mysql-db=sbt --time=300 --threads=64 --report-interval=1 --tables=10 --scale=100

We hope a TPCC-like workload for Sysbench will be helpful for database performance evaluations. Now that Sysbench includes support for PostgreSQL, these TPCC-like benchmarks should allow for more consistent performance comparisons.

Happy benchmarking!

by Vadim Tkachenko at March 05, 2018 11:20 PM

Percona Server for MongoDB 3.4.13-2.11 Is Now Available

Percona Server for MongoDB 3.4

Percona Server for MongoDB 3.4Percona announces the release of Percona Server for MongoDB 3.4.13-2.11 on March 5, 2018. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB 3.4 is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.4 Community Edition. It supports MongoDB 3.4 protocols and drivers.

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine and MongoRocks storage engines, as well as several enterprise-grade features:

Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.4.13 and includes the following additional changes:

  • PSMDB-191: Fixed a bug in MongoRocks engine initialization code which caused wrong initialization of _maxPrefix value. This could lead to reuse of dropped prefix and accidental removal of data from the collection using a reused prefix.In some specific conditions, data records could disappear at an arbitrary moment of time from the collections or indexes created after server restart.This could happen as the result of the following sequence of events:
    • User deletes one or more indexes or collections. These should be the ones using maximum existing prefixes values.
    • User shuts down the server before MongoRocks compaction thread executes compactions of deleted ranges.
    • User restarts the server and creates new collections. Due to the bug, those new collections and their indexes may get the same prefix values which were deleted and not yet compacted. The user inserts some data into the new collections.
    • After the server restart MongoRocks compaction thread continues executing compactions of the deleted ranges and this process may eventually delete data from the collections sharing prefixes with deleted ranges.
  • PSMDB-164: MongoRocks would fail to repair if metadata was inconsistent with dropped idents.
  • SERVER-30790: ServerStatus on MongoRocks is now accessing the storage engine without any locks.

by Hrvoje Matijakovic at March 05, 2018 10:55 PM

Webinar Tuesday March 6, 2018: Percona Software News and Roadmap Update

Percona Software News

Percona Software NewsCome and listen to Percona CEO Peter Zaitsev on March 6, 2018 at 12:00 pm PST (UTC -8) / 3:00 pm EST (UTC -5) as Peter discusses Percona software news and what’s new in Percona open source software, including Percona Server for MySQL and MongoDB, Percona XtraDB ClusterPercona XtraBackup, Percona Toolkit, and Percona Monitoring and Management.


During this webinar, Peter will talk about newly released features in Percona software, show a few quick demos and share with you highlights from the Percona open source software roadmap.

Peter will also talk about new developments in Percona commercial services and finish with a Q&A.

Register for this webinar now.

Peter ZaitsevPeter Zaitsev, CEO

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 140 professionals in 30 plus countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many startups. Percona was named to the Inc. 5000 in 2013, 2014, 2015 and 2016.

Peter was an early employee at MySQL AB, eventually leading the company’s High-Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High-Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Database Performance Blog. He has also been tapped as a contributor to Fortune and DZone, and his recent ebook Practical MySQL Performance Optimization Volume 1 is one of’s most popular downloads.

by Peter Zaitsev at March 05, 2018 06:58 PM

March 04, 2018

Valeriy Kravchuk

On InnoDB's FULLTEXT Indexes

I had recently written about InnoDB features that I try to avoid by all means if not hate: "online" DDL and persistent optimizer statistics. Time to add one more to the list - FULLTEXT indexes.

This feature had a lot of problems when initially introduced in MySQL 5.6. There was a nice series of blog posts about the initial experience with it by my colleague from Percona (at that times) : part I, part II, and part III. Many of the problems mentioned there were resolved or properly documented since that times, but even more were discovered. So, InnoDB FULLTEXT indexes may be used, with care, when MyISAM or other engines/means to add fulltext search is not an option. The list of bugs that are still important and must be taken into account is presented below.

What forced me to get back to this feature recently and hate it sincerely is one customer issue that led to this bug report: MDEV-14773  - "ALTER TABLE ... MODIFY COLUMN ... hangs for InnoDB table with FULLTEXT index". Note that I have to refer to MariaDB bug report here, as related upstream Bug #88844 is hidden from community (probably considered a shame, if not a security problem)! The bug is simple: if one applies any ALTER to the InnoDB table with FULLTEXT index, even not related that index and columns in in any way, chances are high that this ALTER may cause a kind of hang/infinite loop/conflict of the thread that tries to drop temporary table used by ALTER, as one of last steps, and FTS background optimize thread. Similar to other two problematic features, new background threads were introduced and their cooperation with other threads in InnoDB seems to be not that well designed/implemented.

There are many other bugs to take into account if you ever plan to add any single FULLTEXT index to your InnoDB table. Here is the list of the most important ones, mostly still "Verified" or open and ignored, that I collected during one of calm night shifts this week:
  • Bug #78048 - "INNODB Full text Case sensitive not working". This bug was fixed only recently, in MySQL 5.6.39, 5.7.21, and 8.0.4.
  • Bug #83776 - "InnoDB FULLTEXT search returns incorrect result for operators on ignored words". Still "Verified" on all GA versions and 8.0.x.
  • Bug #76210 - "InnoDB FULLTEXT index returns wrong results for key/value pair documents". This bug was reported by Justin Swanhart 3 years ago, quickly verified and then seems to be ignored.
  • Bug #86036 - "InnoDB FULLTEXT index has too strict innodb_ft_result_cache_limit max limit". I reported this bug 10 months ago, and it was immediately "Verified". It seems FULLTEXT indexes are hardly useful in general for large InnoDB tables because of this limitation.
  • Bug #78977 - "Enable InnoDB fulltext index to use generated FTS_DOC_ID column". This is a feature request (still "Open") to get rid of this well known limitation/specific column.
  • Bug #86460 - "Deleted DOCID are not maintained during OPTIMIZE of InnoDB FULLTEXT tables". If you want to get rid of deleted DOC_IDs in the INNODB_FT_DELETED, better just run ALTER TABLE ... ENGINE=InnoDB.
  • Bug #75763 - "InnoDB FULLTEXT index reduces insert performance by up to 6x on JSON docs". yet another verified bug report by Justin Swanhart.
  • Bug #69762 - "InnoDB fulltext match against in boolean mode misses results on join". Let me quote last comment there:
    "Since innodb doesn't support fulltext search on columns without fulltext index, and it is very complicated to support search on columns in multiple fulltext indexes in optimizer, it won't be fixed.

    We admit it's a point innodb fulltext is not compatible with myisam."
  • Bug #85880 - "Fulltext query is too slow when each ngram token match a lot of documents". This bug is still "Open".
  • Bug #78485 - "Fulltext search with char * produces a syntax error with InnoDB". Yet another verified regression comparing to MyISAM FULLTEXT indexes. Nobody cares for 2.5 years.
  • Bug #80432 - "No results in fulltext search for top level domain in domain part of email ". It ended up as "Won't fix", but at least a workaround was provided by Oracle developer.
  • Bug #81819 - "ALTER TABLE...LOCK=NONE is not allowed when FULLTEXT INDEX exists". Online ALTER just does not work for tables with FULLTEXT indexes. This is a serious limitation.
  • Bug #72132 - "Auxiliary tables for InnoDB FTS indexes are always created in shared tablespace". This my bug report was fixed in .5.6.20+ and 5.7.5+, but the fact that this regression was not noted for a long time internally says a lot about the way the feature was developed and maintained.
  • Bug #83560  - "InnoDB FTS - output from mysqldump extremely slow and blocks unrelated inserts". I have yet to check the metadata locks set when the table with FULLTEXT index is used in various SQL statements, but from this "Verified" report it is clear that just lading a dump of a table with FULLTEXT indexes may work too slow for any large table.
  • Bug #71551 - "ft_boolean_syntax has no impact on InnoDB FTS". yet another inconsistency with MyISAM FULLTEXT indexes that was reported 4 years ago and "Verified", but still ignored after that.
  • Bug #83741 - "InnoDB: Failing assertion: lock->magic_n == 22643". Surely, debug assertions can be ignored, but in most cases they are in the code for a good reason. This failure was reported by Roel Van de Paar from Percona.
  • Bug #83397 - "INSERT INTO ... SELECT FROM ... fails if source has > 65535 rows on FTS". This "Verified" bug alone, reported by Daniël van Eeden, makes InnoDB FULLTEXT indexes hardly usable in production for large tables.
  • Bug #80296 - "FTS query exceeds result cache limit". The bug is "Closed" silently (by the bug reporter maybe, Monty Solomon?), but users report that recent enough versions like 5.6.35 and 5.7.17 are still affected. See also Bug #82971 (no fix for MySQL 5.6.x for sure).
  • Bug #85876 - "Fulltext search can not find word which contains "," or ".".  Still "Verified" for 1 months.
  • Bug #68987 - "MySQL crash with InnoDB assertion failure in file". Crash was reported in MySQL 5.6.10, not repeatable. Then (different?) assertion failure was reported in debug builds only in MySQL 5.6.21+, and verified. Not sure what's going on with this bug report...
  • Bug #83398 - "Slow and unexpected explain output on FTS". The fact that EXPLAIN may be slow when the table with FULLTEXT index is involved is now documented, so this report by Daniël van Eeden is closed.
  • Bug #81930 - "incorrect result with InnoDB FTS and subquery". This bug report about wrong results by Sergei Golubchik from MariaDB was immediately "Verified", but ignored since that time.
  • Bug #80347 - "mysqldump backup restore fails due to invalid FTS_DOC_ID (Error 182 and 1030)". There is a workaround based on mydumper/myloader at least...
To summarize, InnoDB FULLTEXT indexes is one of the most problematic InnoDB features for any production use because:
  • There are all kinds of serious bugs, from wrong results to hangs, debug assertions and crashes, that do not seem to get any internal priority and stay "Verified" for years.
  • There are performance regressions and missing features comparing to MyISAM FULLTEXT indexes, so migration may cause problems.
  • InnoDB FULLTEXT indexes are not designed to work with really large tables/result sets.
  • You should expect problems during routine DBA activities, like ALTERing tables or dumps and restores when any table with InnoDB FULLTEXT index is involved. 
If you still plan/have to use it, please, make sure to use the latest MySQL version, check the list above carefully and test/check the results of fulltext searches and routine DBA operations like altering the table. You may get a lot of surprises. Consider alternatives like Sphinx seriously.

by Valeriy Kravchuk ( at March 04, 2018 06:28 PM

March 02, 2018

Peter Zaitsev

Percona XtraDB Cluster 5.7.21-29.26 Is Now Available

Percona XtraDB Cluster 5.7

Percona XtraDB Cluster 5.6Percona announces the release of Percona XtraDB Cluster 5.7.21-29.26 (PXC) on March 2, 2018. Binaries are available from the downloads section or our software repositories.

Percona XtraDB Cluster 5.7.21-29.26 is now the current release, based on the following:

Starting from now, Percona XtraDB Cluster issue tracking system is moved from launchpad to JIRA. All Percona software is open-source and free.

Fixed Bugs

  • PXC-2039: Node consistency was compromised for INSERT INTO ... ON DUPLICATE KEY UPDATE workload because the regression introduced in Percona XtraDB Cluster 5.7.17-29.20made it possible to abort local transactions without further re-evaluation in case of a lock conflict.
  • PXC-2054 Redo optimized DDL operations (like sorted index build) were not blocked in case of a running backup process, leading to SST failure. To fix this, --lock-ddl option blocks now all DDL during the xtrabackup backup stage.
  • General code improvement was made in the GTID event handling, when events are captured as a part of the slave replication and appended to the Galera replicated write-set. This fixed PXC-2041 (starting async slave on a single node Percona XtraDB Cluster led to a crash) and PXC-2058 (binlog-based master-slave replication broke the cluster) caused by the incorrect handling in the GTID append logic.
  • An issue caused by non-coincidence between the order of recovered transaction and the global seqno assigned to the transaction was fixed ensuring that the updated recovery wsrep coordinates are persisted.
  • PXC-904: Replication filters were not working with account management statements like CREATE USER in case of Galera replication; as a result, such commands were blocked by the replication filters on async slave nodes but not on Galera ones.
  • PXC-2043: SST script was trying to use pv (the pipe viewer) for progress and rlimit options even on nodes with no pv installed, resulting in SST fail instead of just ignoring these options for inappropriate nodes.
  • PXC-911: When node’s own IP address was defined in the wsrep_cluster_address variable, the node was receiving “no messages seen in” warnings from its own IP address in the info log.

This release also contains fixes for the following CVE issues: CVE-2018-2565, CVE-2018-2573, CVE-2018-2576, CVE-2018-2583, CVE-2018-2586, CVE-2018-2590, CVE-2018-2612, CVE-2018-2600, CVE-2018-2622, CVE-2018-2640, CVE-2018-2645, CVE-2018-2646, CVE-2018-2647, CVE-2018-2665, CVE-2018-2667, CVE-2018-2668, CVE-2018-2696, CVE-2018-2703, CVE-2017-3737.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

by Dmitriy Kostiuk at March 02, 2018 07:46 PM

Percona Monitoring and Management for MongoDB: Basic Graphs for a Good System Overview

Percona Monitoring and Management for MongoDB

In this blog post, we will discuss how to use Percona Monitoring and Management for MongoDB monitoring, and how to get some key graphs to monitor your MongoDB database.

All production environments need metrics and historical data for easy and fast comparison of performance, throughput in the time.

Percona Monitoring and Management (PMM) can help your company with that. PMM does have a client in each instance and a server, and every second the server connects to the client to capture data and plot this data into easily-understood graphs.

Most of the data captured comes from the following administrative commands/stats:

  • db.serverStatus()
  • db.printSlaveReplicationInfo()
  • rs.status() and so on…

For this blog, I’m using a three-instance replica set configured, which is the most common architecture.

We are going to use the MongoDB Overview and MongoDB Replicaset dashboards in PMM 1.7 to demonstrate how to interpret metrics proactively.

PMM does come with a few MongoDB dashboards, the first one we are going to use is MongoDB Overview. This Dashboard gives us an overview of a single instance and can be very useful to find instance isolated issues.

Command Operations

The number of operations the database is receiving. If the instance is a primary the possible values are: insert, delete, query or getmore. For secondaries, these values can be repl_insert, repl_delete, query and getmore.

MongoDB command:


Number of Connections

Rapidly increasing the number of connections can be a possible issue. With the connections graph we can see the connection pattern. This graph is valid for both primary and secondaries.

MongoDB command:


Queued operations:

If this value is different than 0 means there is/was a query that had to wait before to run. This graph must be as close to 0 as possible, a high number of queued operations means the database is under a high load. Queues can be either read or writes, both are bad.

MongoDB command:


Page Faults

If your database size is bigger than the RAM, it is very probable (and OK) to have page faults. If this graph is always high, you may need to consider upgrading the memory.

MongoDB command:


With those graphs, we can see what each instance is doing. But PMM does offer a better view for a replica set. So we can see what the database is doing in a wide view.

Replica Set Dashboard

The first lines give a detailed explanation about the instance, such as what is the state of the instance, when was the last elections, the storage engine and the number of members in the replica set.

Most of the graphs do have a line per instance. Others require us to choose the instance at the top. In this case, I’m using node1 as an example and this instance is the primary in the replica.

In a replica set, the most common and necessary metrics to view are Replication Lag.

This is how many seconds a secondary is behind its primary. Usually, a few seconds (such as 0 to 3) is OK considering that MongoDB replication is asynchronous. There is no strict value: 10 seconds or 30 seconds can be an issue, but it really changes for business to business.

MongoDB command:


Oplog Window

Replica sets need the oplogs to replicate their data for the other members. The is a capped collection that can only handle a fixed amount of data. The difference between the first and the last timestamp in the collection is called oplog window. This is the amount of time a secondary can be offline before an initial sync is needed to sync the instance.

MongoDB command:


Heartbeat time

How long are the servers taking to confirm they are alive? A high number can mean the clocks are different or there is a serious network issue.

MongoDB command:


There are a couple of other graphs available, check our online demo for more information:

I hope you find this article about Percona Monitoring and Management for MongoDB useful! Please feel free to contact me @AdamoTonete or @percona on Twitter anytime!

by Adamo Tonete at March 02, 2018 05:10 PM

This Week in Data with Colin Charles 30: Schedule for Percona Live, and Tracking Those Missing Features

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Have you registered for Percona Live already? The tutorial grid, the schedules for day 1 and day 2 are pretty amazing, and there is even an extra track being added, for a total of 10 concurrent/parallel tracks during day 1 & day 2. If you submitted a talk and it didn’t get accepted (competition was high), you should have received a discount code to register for the event.

I plan to write more dedicated blog posts around M|18 and the MariaDB Developer’s Unconference. Hang on till next week? I gave my presentation, targeted at developers, titled MySQL features missing in MariaDB Server. It was a room of maybe 40-45 people, and had some debate, but not much more; there were some other excellent presentations as well.

Next week, Percona will be at SCALE16x. We are sponsors, so there will also be a booth (where you can get some interesting schwag), and don’t forget that both Peter Zaitsev and I have a talk (one on Friday, the other on Saturday). Looking forward to seeing you all there.


Link List

Upcoming appearances

  • SCALE16x – Pasadena, California, USA – March 8-11 2018
  • FOSSASIA 2018 – Singapore – March 22-25 2018


I look forward to feedback/tips via e-mail at or on Twitter @bytebot.


by Colin Charles at March 02, 2018 02:50 PM

Jean-Jerome Schmidt

Failover for MySQL Replication (and others) - Should it be Automated?

Automatic failover for MySQL Replication has been subject to debate for many years.

Is it a good thing or a bad thing?

For those with long memory in the MySQL world, they might remember the GitHub outage in 2012 which was mainly caused by software taking the wrong decisions.

GitHub had then just migrated to a combo of MySQL Replication, Corosync, Pacemaker and Percona Replication Manager. PRM decided to do a failover after failing health checks on the master, which was overloaded during a schema migration. A new master was selected, but it performed poorly because of cold caches. The high query load from the busy site caused PRM heartbeats to fail again on the cold master, and PRM then triggered another failover to the original master. And the problems just continued, as summarized below.

Fast forward a couple of years and GitHub is back with a pretty sophisticated framework for managing MySQL Replication and automated failover! As Shlomi Noach puts it:

“To that effect, we employ automated master failovers. The time it would take a human to wake up & fix a failed master is beyond our expectancy of availability, and operating such a failover is sometimes non-trivial. We expect master failures to be automatically detected and recovered within 30 seconds or less, and we expect failover to result in minimal loss of available hosts.”

Most companies are not GitHub, but one could argue that no company likes outages. Outages are disruptive to any business, and they also cost money. My guess is that most companies out there probably wished they had some sort of automated failover, and the reasons not to implement it are probably the complexity of the existing solutions, lack of competence in implementing such solutions, or lack of trust in software to take such an important decision.

There are a number of automated failover solutions out there, including (and not limited to) MHA, MMM, MRM, mysqlfailover, Orchestrator and ClusterControl. Some of them have been on the market for a number of years, others are more recent. That is a good sign, multiple solutions mean that the market is there and people are trying to address the problem.

When we designed automatic failover within ClusterControl, we used a few guiding principles:

  • Make sure the master is really dead before you failover

    In case of a network partition, where the failover software loses contact with the master, it will stop seeing it. But the master might be working well and can be seen by the rest of the replication topology.

    ClusterControl gathers information from all the database nodes as well as any database proxies/load balancers used, and then builds a representation of the topology. It will not attempt a failover if the slaves can see the master, nor if ClusterControl is not 100% sure about the state of the master.

    ClusterControl also makes it easy to visualize the topology of the setup, as well as the status of the different nodes (this is ClusterControl’s understanding of the state of the system, based on the information it gathers).

  • Failover only once

    Much has been written about flapping. It can get very messy if the availability tool decides to do multiple failovers. That’s a dangerous situation. Each master elected, however brief the period it held the master role, might have their own sets of changes that were never replicated to any server. So you may end up with inconsistency across all the elected masters.

  • Do not failover to an inconsistent slave

    When selecting a slave to promote as master, we ensure the slave does not have inconsistencies, e.g. errant transactions, as this may very well break replication.

  • Only write to the master

    Replication goes from the master to the slave(s). Writing directly to a slave would create a diverging dataset, and that can be a potential source of problem. We set the slaves to read_only, and super_read_only in more recent versions of MySQL or MariaDB. We also advise the use of a load balancer, e.g., ProxySQL or MaxScale, to shield the application layer from the underlying database topology and any changes to it. The load balancer also enforces writes on the current master.

  • Do not automatically recover the failed master

    If the master has failed and a new master has been elected, ClusterControl will not try to recover the failed master. Why? That server might have data that has not yet been replicated, and the administrator would need to do some investigation into the failure. Ok, you can still configure ClusterControl to wipe out the data on the failed master and have it join as a slave to the new master - if you are ok with losing some data. But by default, ClusterControl will let the failed master be, until someone looks at it and decides to re-introduce it into the topology.

So, should you automate failover? It depends on how you have configured replication. Circular replication setups with multiple write-able masters or complex topologies are probably not good candidates for auto failover. We would stick to the above principles when designing a replication solution.

On PostgreSQL

When it comes to PostgreSQL streaming replication, ClusterControl uses similar principles to automate failover. For PostgreSQL, ClusterControl supports both asynchronous and synchronous replication models between the master and the slaves. In both cases and in the event of failure, the slave with the most up-to-date data is elected as the new master. Failed masters are not automatically recovered/fixed to rejoin the replication setup.

There are a few protective measures taken to make sure the failed master is down and stays down, e.g. it is removed from the load balancing set in the proxy and it is killed if e.g. the user would restart it manually. It is a bit more challenging there to detect network splits between ClusterControl and the master, since the slaves do not provide any information about the status of the master they are replicating from. So a proxy in front of the database setup is important as it can provide another path to the master.

On MongoDB

MongoDB replication within a replicaset via the oplog is very similar to binlog replication, so how come MongoDB automatically recovers a failed master? The problem is still there, and MongoDB addresses that by rolling back any changes that were not replicated to the slaves at the time of failure. That data is removed and placed in a ‘rollback’ folder, so it is up to the administrator to restore it.

To find out more, check out ClusterControl; and feel free to comment or ask questions below.

by vinay at March 02, 2018 11:34 AM

March 01, 2018

Open Query Pty Ltd

RDS Aurora MySQL Cost

big bag of money getting carriedI promised to do a pricing post on the Amazon RDS Aurora MySQL pricing, so here we go.  All pricing is noted in USD (we’ll explain why)

We compared pricing of equivalent EC2+EBS server instances, and verified our calculation model with Amazon’s own calculator and examples.  We use the pricing for Australia (Sydney data centre). Following are the relevant Amazon pricing pages from which we took the pricing numbers, formulae, and calculation examples:

Base Pricing Details

Specs         EC2     RDS Aurora MySQL  
instance type vCPU ECU GB RAM   Storage Linux/hr   instance type Price/hr
r4.large 2 7 15.25 EBS Only $0.160 db.r4.large $0.350
r4.xlarge 4 13.5 30.5 EBS Only $0.319 db.r4.xlarge $0.700
r4.2xlarge 8 27 61 EBS Only $0.638 db.r4.2xlarge $1.400
r4.4xlarge 16 53 122 EBS Only $1.277 db.r4.4xlarge $2.800
r4.8xlarge 32 99 244 EBS Only $2.554 db.r4.8xlarge $5.600
r4.16xlarge 64 195 488 EBS Only $5.107 db.r4.16xlarge $11.200

That’s not all we need, because both EBS and Aurora have some additional costs we need to factor in.

EBS pricing components (EBS Provisioned IOPS SSD (io1) volume)

“Volume storage for EBS Provisioned IOPS SSD (io1) volumes is charged by the amount you provision in GB per month until you release the storage. With Provisioned IOPS SSD (io1) volumes, you are also charged by the amount you provision in IOPS (input/output operations per second) per month. Provisioned storage and provisioned IOPS for io1 volumes will be billed in per-second increments, with a 60 second minimum.”

  • Storage Rate $0.138 /GB/month of provisioned storage
    “For example, let’s say that you provision a 2000 GB volume for 12 hours (43,200 seconds) in a 30 day month. In a region that charges $0.125 per GB-month, you would be charged $4.167 for the volume ($0.125 per GB-month * 2000 GB * 43,200 seconds / (86,400 seconds/day * 30 day-month)).”
  • I/O Rate $0.072 /provisioned IOPS-month
    “Additionally, you provision 1000 IOPS for your volume. In a region that charges $0.065 per provisioned IOPS-month, you would be charged $1.083 for the IOPS that you provisioned ($0.065 per provisioned IOPS-month * 1000 IOPS provisioned * 43,200 seconds /(86,400 seconds /day * 30 day-month)).”

Other Aurora pricing components

  • Storage Rate $0.110 /GB/month
    (No price calculation examples given for Aurora storage and I/O)
  • I/O Rate $0.220 /1 million requests
    (Presuming IOPS equivalence / Aurora ratio noted from arch talk)

So this provides us with a common base, instance types that are equivalent between Aurora and EC2.  All other Aurora instances types are different, so it’s not possible to do a direct comparison in those cases.  Presumably we can make the assumption that the pricing ratio will similar for equivalent specs.

On Demand vs Reserved Instances

We realise we’re calculating on the basis of On Demand pricing.  But we’re comparing pricing within AWS space, so presumably the savings for Reserved Instances are in a similar ballpark.

Other factors

  • We have 720 hours in a 30 day month, which is 2592000 seconds.
  • 70% read/write ratio – 70% reads (used to calculate the effective Aurora IOPS)
  • 10% read cache miss -10% cache miss rate on reads
  • Aurora I/O ratio: 3 (Aurora requiring 2 IOPS for a commit vs 6 in MySQL – even though this is a pile of extreme hogwash in terms of that pessimistic MySQL baseline)

We also spotted this note regarding cross-AZ Aurora traffic:

“Amazon RDS DB Instances inside VPC: For data transferred between an Amazon EC2 instance and Amazon RDS DB Instance in different Availability Zones of the same Region, Amazon EC2 Regional Data Transfer charges apply on both sides of transfer.”

So this would apply to application DB queries issued across an AZ boundary, which would commonly happen during failover scenarios.  In fact, we know that this happens during regular operations with some EC2 setups, because the loadbalancing already goes cross-AZ.  So that costs extra also.  Now you know!  (note: we did not factor this in to our calculations.)

Calculation Divergence

Our model comes up with identical outcomes for the examples Amazon provided, however it comes up 10-15% lower than Amazon’s calculator for specific Aurora configurations.  We presume that the difference may lie in the calculated Aurora I/O rate, as that’s the only real “unknown” in the model.  Amazon’s calculator does not show what formulae it uses for the sub-components, nor sub-totals, and we didn’t bother to tweak until we got at the same result.

It’s curious though, as the the architecture talk makes specific claims about Aurora’s I/O efficiency (which presume optimal Aurora situation and a dismal MySQL reference setup, something which I already raised in our initial Aurora post).  So apparently the Amazon calculator assumes worse I/O performance than the technical architecture talk!

Anyhow, let’s just say our costing is conservative, as the actual cost is higher on the Aurora end.


Here we compare with say a MySQL/MariaDB Galera setup across 3 AZs running on EC2+EBS.  While this should be similar in overall availability and read-capacity, note that

  1. you can write to all nodes in a Galera cluster, whereas Aurora currently has a single writer/master;
  2. Galera doesn’t require failover changes as all its nodes are technically writers anyhow, whereas Aurora failover causes a cluster outage of at least 30 seconds.
Servers R/Zones Instance GB DB I/O rate     EC2 EBS     Aurora      
          Read IOPS   Instances Storage I/O EC2 Total   Instances Storage I/O Aurora Total
3 3 r4.xlarge 250 2,000 740 $689 $104 $160 $952   $1,512 $83 $141 $1,735
6 3 r4.xlarge 250 2,000 740 $1,378 $207 $320 $1,905   $3,024 $83 $141 $3,247

When using the Amazon calculator, Aurora comes out at about double the EC2.  But don’t take our word for it, do try this for yourself.

Currency Consequences

While pricing figures are distinct per country that Amazon operates in, the charges are always in USD.  So this means that the indicated pricing is, in the end, in USD, and thus subject to currency fluctuations (if your default currency is not USD).  What does this mean?

USD AUD rate chart 2008-2018
USD-AUD rate chart 2008-2018, from

So USD 1,000 can cost as little as AUD 906 or as much as AUD 1,653, at different times over the last 10 years.  That’s quite a range!


As shown above, our calculation with Aurora MySQL shows it costing about twice as much.  This is based on a reference MySQL/MariaDB+Galera with roughly the same scaling and resilience profile (e.g. the ability to survive DC outages).  In functional terms, particularly with Aurora’s 30+second outage profile during failover, Galera comes out on top at half the cost.

So when is Aurora cheaper, as claimed by Amazon?

Amazon makes claims in the realm of “1/10th the cost”. Well, that may well be the case when comparing with the TCO of Oracle or MS SQL Server, and it’s fairly typical when comparing a proprietary system with an Open Source based one (mind again that Aurora is not actually Open Source as Amazon does not make their source code available, but it’s based on MySQL).

The only other way we see is to seriously compromise on the availability (resilience).  In our second sample calculation, we use 2 instances per AZ.  This is not primarily for performance, but so that application servers in an AZ don’t have to do cross-DC queries when one instance fails.  In the case of Aurora, spinning up a new instance on the same dataset requires 15 minutes.  So, do you want to take that hit?  If so, you can save money there.  If not, it’s still costly.

But hang on, if you’re willing to make the compromise on availability, you could reduce the Galera setup also, to only one instance per AZ.  Yep!

So, no matter how you tweak it, Aurora is about twice the cost, with (in our opinion) a less interesting failover profile.

The Price of RDS Convenience

What you get with RDS/Aurora is the promise of convenience, and that’s what you pay for.  But, mind that our comparison worked all within AWS space anyway, the EC2 instances we used for MySQL/MariaDB+Galera already use the same basic infrastructure, dashboard and management API as well.  So you pay double just to go to RDS/Aurora, relative to building on EC2.

To us, that cost seems high.  If you spend some, or even all that money on engineering that convenience around your particular setup, and even outsource that task and its maintenance, you get a nicer setup at the same or a lower cost.  And last but not least, that cost will be more predictable – most likely the extra work will be charged in your own currency, too.

Cost Predictability and Budget

You can do a reasonable ball-park calculation of AWS EC2 instances that are always active, but EBS already has some I/O charges which make the actual cost rather more variable, and Aurora adds a few more variables on top of that.  I’m still amazed that companies go for this, even though they traditionally prefer a known fixed cost (even if higher) over a variable cost.  Choosing the variable cost breaks with some fundamental business rules, for the sake of some convenience.

The advantage of known fixed costs is that you can budget properly, as well as project future costs based on growth and other business factors.  Purposefully ditching that realm, while exposing yourself to currency fluctuations at the same time, seems most curious.  How do companies work this into their budgets?  Because others do so?  Well, following the neighbours is not always a good idea.  In this case, it might be costly as well as financially risky.

by Arjen Lentz at March 01, 2018 11:43 PM

February 27, 2018

MariaDB AB

Write Optimizations for Qualcomm Centriq 2400 in MariaDB 10.3.5 Release Candidate

Write Optimizations for Qualcomm Centriq 2400 in MariaDB 10.3.5 Release Candidate david_thompson_g Mon, 02/26/2018 - 22:27

MariaDB has been collaborating with Qualcomm Datacenter Technologies in pushing the performance envelop by leveraging innovative ARM-based hardware architecture with MariaDB’s unique database architecture.  As part of the Qualcomm Centriq™ 2400 product launch back in Nov 2017,  we demonstrated the strong read scalability of MariaDB on this chip.  Since then, MariaDB and Qualcomm engineering have been working to improve the scalability of write operations which we would like to share with the developer community today.

We are pleased to announce a number of performance improvements that are being made available in the recently shipped 10.3 release candidate 10.3.4.  By leveraging the highly parallelized 48-core Qualcomm Centriq 2400 processor running at 2.6GHz with 6 memory channels in a fully coherent ring architecture, our interest is to extract write performance optimization in a single row write use case for a highly threaded application. 

MariaDB uses the sysbench benchmark software to measure performance.  In this blog, we'll examine the following 2 benchmarks using sysbench 1.0:

  • Oltp_update_index : This simulates updating a single row value by primary key index where a secondary index must be updated as a result of the update.
  • Oltp_update_nonindex: This simulates updating a single row value by primary key index where there is no secondary index. This obviously requires less work than oltp_update_index.

What we see is that as the number of concurrent threads increase, the performance is up to 48% faster in 10.3 than 10.2 on the Centriq™ 2400:



chart (1).png

The improvements made remove points of contention and optimize for the ARM64 chipset, specifically:

  • MDEV-15090 : Reduce the overhead of writing undo log records
  • MDEV-15132 : Avoid accessing the TRX_SYS page
  • MDEV-15019 : InnoDB: store ReadView on trx
  • MDEV-14756 : Remove trx_sys_t::rw_trx_list
  • MDEV-14482 : Cache line contention on ut_rnd_ulint_counter()
  • MDEV-15158 : On commit, do not write to the TRX_SYS page
  • MDEV-15104 : Remove trx_sys_t::rw_trx_ids and trx_sys_t::serialisation_list
  • MDEV-14638 : Replace trx_sys_t::rw_trx_set with LF_HASH
  • MDEV-14529 : InnoDB rw-locks: optimize memory barriers
  • MDEV-14374 : UT_DELAY code : Removing hardware barrier for arm64 bit platform
  • MDEV-14505 : Threads_running becomes scalability bottleneck

In summary what this means is that MariaDB will perform significantly better under high levels of concurrent updates improving response times in your applications at peak load.

The improvements will also provide benefits to other chip architectures but a much greater gain is seen on the Centriq™ 2400 due to its design capable of supporting much high thread count. By utilizing physical cores vs hyper-threading a lower number of cores the  Centriq™ 2400 demonstrates an additional 13% gain over a comparable reference Broadwell platform.

As Centriq™ 2400 systems come to market this year we are excited to see customer workloads taking advantage of the scalability combined with lower power consumption to run high scale database workloads.

MariaDB has been collaborating with Qualcomm Datacenter Technologies in pushing the performance envelop by leveraging innovative ARM-based hardware architecture with MariaDB’s unique database architecture.  As part of the Qualcomm Centriq™ 2400 product launch back in Nov 2017,  we demonstrated the strong read scalability of MariaDB on this chip.  Since then, MariaDB and Qualcomm engineering have been working to improve the scalability of write operations which we would like to share with the developer community today.

Login or Register to post comments

by david_thompson_g at February 27, 2018 03:27 AM

Open Query Pty Ltd

Keeping Data Secure

a safeWe often get asked about data security (how to keep things safe) and local regulations and certifications regarding same. Our general thoughts on this are as follows

  1. Government regulations tend to end up becoming part of the risk/cost/benefit equations in a business, which is not particularly comforting for customers.
    • Example: some years ago an Australian bank had a mail server mis-configured to allow relaying (i.e., people could send phishing emails pretending to legitimately originate from that bank).  A caring tech citizen reported the issue to the bank.  Somehow, it ended up with the legal department rather than a system/network administrator.  The legal eagles decided that the risk to the organisation was fairly low, and didn’t forward it for action at that time.  Mind that the network admin would’ve been able to fix up the configuration within minutes.
  2. Appreciate that certifications tend to mainly give you a label to wave in front of a business partner requiring it, they do not make your business more secure.
    • Data leaves footprints.  For instance, some people use a separate email address for each website they interact with.  Thus, when a list of email addresses leaks, saying “it didn’t come from us” won’t hold.  That’s only a simple example, but it illustrates the point.  Blatant denial was never a good policy, but these days it’ll backfire even faster.
  3. Recent legislation around mandatory data retention only makes things worse, as
    • companies tend to already store much more detail about their clients and web visitors than is warranted, and
    • storing more activity data for longer just increases the already enlarged footprint.

business advice personSo what do we recommend?

  1. Working within the current legal requirements, we still advise to keeping as little data as possible.
    • More data does not intrinsically mean more value – while it’s cheap and easy to gather and store more data, if you’re actually actually more strategic about what you collect and store, you’ll find there’s much more value in that.
  2. Fundamentally, data that you don’t have can’t be leaked/stolen/accessed through you.  That’s obvious, but still worth noting.
    • Our most critical example of this is credit card details.  You do not want to store credit card details, ever.  Not for any perceived reason.  There are sensible alternatives using tokens provided by your credit card gateway, so that clients’ credit cards never touch your system.  We wrote about this (again) in our post “Your Ecommerce Site and Credit Cards” last year.
      Why?  It’s fairly easy to work out from a site’s frontend behaviour whether it stores credit cards locally, and if it does, you’re much more of a target.  Credit card details provide instant anonymous access to financial resources.  Respect your clients.
  3. More secure online architecture.
    • We’ll do a separate post on this.
  4. If you have a data breach, be sensible and honest about it.
    • If your organisation operates in Australia and “with an annual turnover of $3 million or more, credit reporting bodies, health service providers, and TFN recipients, among others.“, the Notifiable Data Breaches (part of the Australian Privacy Act) scheme applies, which came in to force this February 2018, applies to you.

handshakeWe’re happy to advise and assist.  Ideally, before trouble occurs.  For any online system, that’s a matter of when, not if.
(And, of course, we’re not lawyers.  We’re techies.  You may need both, but never confuse the two!)

by Arjen Lentz at February 27, 2018 01:56 AM

February 26, 2018

MariaDB AB

Keynote Highlights from MariaDB’s M|18 User Conference

Keynote Highlights from MariaDB’s M|18 User Conference MariaDB Team Mon, 02/26/2018 - 18:34

M|18, the second annual MariaDB user conference, is in full swing, with more than 330 companies represented. The action kicked off today with a trio of live-streamed keynotes with themes ranging from the power of change and community to truly massive scale and future-ready technology. Take a look at the highlights:

Dare to Be Different

Michael Howard | MariaDB

MariaDB’s CEO opened the show with the idea of global community as an agent of change – creating momentum, helping solve hard problems and making the future better. 


Howard touched on MariaDB’s aim to make it easier for global enterprises to change and migrate, and announced the release candidate of MariaDB 10.3, which offers Oracle compatibility and more to ensure portability in terms of skill sets, not just code

Howard recognized that community is a vital part of continual innovation, and the role that MariaDB partners such as Google, Facebook, Intel, ServiceNow and the Development Bank of Singapore are playing in MariaDB’s growth. He highlighted MariaDB’s focus for the future, including:

  • MariaDB cloud and DBaaS offering. The first product to support this endeavor is MariaDB Manager—a visual interface allowing management and deployment of MariaDB. 
  • MariaDB Labs, a new research division that brings together three key concepts: 
    • Machine learning to intelligently determine your needs for storage, compute power and more.
    • Distributed computing with seamless write scalability.
    • Use of new chips, persistent storage, and in-memory processing, thanks to work with Intel. Howard envisions this recharging the conversation with regard to price point on public and private clouds. 

Massive Scale with MariaDB

Tim Yim | ServiceNow

85,000 databases around the world. Inside those, 176 million InnoDB tables, accessed at rate of 25 billion queries per hour. That’s the scale of ServiceNow’s business with MariaDB.

Is this infrastructure multi-tenant or single-tenant? Neither! ServiceNow works with a new deployment model, “multi-instance deployment,” where each customer gets its own database and its own front-end app tier; there’s no commingling of data at all. This allows for “surgical failover” and scaling – one customer at a time.

Every customer instance is running on bare metal, with hardware shared across the app tier and the database tier, but processes are “containerized.” Every piece of gear is also duplicated across the country, as is every customer app node and database. And each one of those is backed up nightly. That’s a lot going on! How did they achieve this?


All of the server instances to power this system are on MariaDB, and ServiceNow aims for five 9s availability – with hospitals, factories and power stations among ServiceNow’s clients, high availability is critical. But it’s more than that; the stability they get from MariaDB TX is also vital. 

Corporate Banking and Future-Ready Technology

Ng Peng Khim and Joan Tay | DBS (Development Bank of Singapore)

Khim and Tay covered the impressive results of DBS's journey in “forklifting out” Oracle Enterprise, and replacing their institutional, transaction environments with MariaDB. 


By moving to MariaDB, DBS has realized a net savings of $4.1 million from replatforming by removing the need for DB2 and Oracle Enterprise. Here’s a rundown of the impressive achievements in just two years:

  • 700+ MariaDB instances put in place
  • 54% of critical applications running on MariaDB – $70-80 billion in transactions daily
  • 100% increase in automated app releases
  • 10x increase in testing capabilities
  • 7x performance improvement from moving PL/SQL capability from Oracle to MariaDB

In addition, MaxScale, MariaDB’s advanced database proxy, allows DBS to do quick schema changes to handle master/slave data replication, then do live verification—reducing downtime significantly.

Keep up with all the M|18 action! Follow #MARIADBM18 on Twitter.

M|18, the second annual MariaDB user conference, kicked off today with a trio of live-streamed keynotes with themes ranging from the power of change and community to truly massive scale and future-ready technology. Take a look at the highlights.

Login or Register to post comments

by MariaDB Team at February 26, 2018 11:34 PM

MariaDB Foundation

MariaDB 10.3.5 and MariaDB Connector/J 2.2.2 and 1.7.2 now available

The MariaDB project is pleased to announce the availability of MariaDB 10.3.5, the first release candidate in the MariaDB 10.3 series, as well as MariaDB Connector/J 2.2.2, the latest stable release in the MariaDB Connector/J 2.2 series, and MariaDB Connector/J 1.7.2, the latest stable release in the MariaDB Connector/J 1.7 series. See the release notes […]

The post MariaDB 10.3.5 and MariaDB Connector/J 2.2.2 and 1.7.2 now available appeared first on

by Ian Gilfillan at February 26, 2018 04:48 PM

MariaDB AB

MariaDB Server 10.3 Release Candidate Now Available

MariaDB Server 10.3 Release Candidate Now Available RalfGebhardt Mon, 02/26/2018 - 10:07

Today, we are releasing MariaDB Server 10.3.5, which is our first Release Candidate for 10.3.

Now feature complete, our user and customer base can move from reviewing and testing new features to integrating MariaDB Server into their staging environment to prove it against existing applications and to enhance their applications with new use cases provided through the abundance of new and exciting features in MariaDB Server, like temporal data processing. Another strong focus of this version is database compatibility where we have added several new features. With MariaDB Server 10.3, it’s even easier to migrate from legacy database systems to open source MariaDB.

Try MariaDB Server 10.3.5 Release Candidate! There are significant features to test out in MariaDB Server 10.3, including:

  • Temporal Data Processing

  • Database Compatibility Enhancements

  • User Flexibility

    • User Defined Aggregate Functions: In addition to creating SQL functions it is now also possible to create aggregate functions

    • Lifted limitations for updates and deletes: A DELETE statement can now delete from a table used in the WHERE clause. UPDATE can be the same for source and target

  • Performance/Storage Enhancements

  • Storage Engine Enhancements

    • Spider Storage Engine: The partitioning storage engine has been updated to the newest release of the Spider Storage engine to support new Spider features including direct join support, direct update and delete, direct aggregates

  • Proxy Layer Support for MariaDB Server: Client / Server authentication via a Proxy like MariaDB MaxScale using a Server Proxy Protocol Support

Download MariaDB Server 10.3 RC

Release Notes


Today, we are releasing MariaDB Server 10.3.5, which is our first Release Candidate for 10.3.

Login or Register to post comments

by RalfGebhardt at February 26, 2018 03:07 PM

February 24, 2018

MariaDB Foundation

2018 MariaDB Developers Unconference New York Presentations

The 2018 MariaDB Developers UnConference is being held in New York City on February 24 and February 25. Below are a list of the sessions with links to slides where available. This post will be updated as slides become available. Day One * Welcome (Otto Kekäläinen) * New Developers Tutorial and Best Practices (Vicențiu Ciorbaru) […]

The post 2018 MariaDB Developers Unconference New York Presentations appeared first on

by Ian Gilfillan at February 24, 2018 08:33 PM

MariaDB AB

New MariaDB AX Release Featuring MariaDB ColumnStore 1.1.3 GA

New MariaDB AX Release Featuring MariaDB ColumnStore 1.1.3 GA Dipti Joshi Fri, 02/23/2018 - 22:58

We are happy to announce a new MariaDB AX release featuring MariaDB ColumnStore 1.1.3. This is the largest MariaDB ColumnStore maintenance release to date – the streaming data adapters are now GA, a Spark adapter has been introduced (beta), package repositories are now available and there are a large number of fixes.

Notable changes in MariaDB ColumnStore 1.1.3 include:


Additional Resources:


For any questions, please email me at


We are happy to announce a new MariaDB AX release featuring MariaDB ColumnStore 1.1.3. This is the largest MariaDB ColumnStore maintenance release to date – the streaming data adapters are now GA, a Spark adapter has been introduced (beta), package repositories are now available and there are a large number of fixes.

Login or Register to post comments

by Dipti Joshi at February 24, 2018 03:58 AM

February 23, 2018

Peter Zaitsev

Webinar Tuesday February 27, 2018: Monitoring Amazon RDS with Percona Monitoring and Management (PMM)

Monitoring Amazon RDS

Monitoring Amazon RDSPlease join Percona’s Build / Release Engineer, Mykola Marzhan, as he presents Monitoring Amazon RDS with Percona Monitoring and Management on February 27, 2018, at 7:00 am PST (UTC-8) / 10:00 am EST (UTC-5).

Are you concerned about how you are monitoring your AWS environment? Keeping track of what is happening in your Amazon RDS deployment is key to guaranteeing the performance and availability of your database for your critical applications and services.

Did you know that Percona Monitoring and Management (PMM) ships with support for MySQL on Amazon RDS and Amazon Aurora out of the box? It does!

Percona Monitoring and Management (PMM) is a free and open-source platform for managing and monitoring MySQL, Percona Server for MySQL MariaDB, MongoDB, Percona Server for MongoDB performance both on-premise and in the cloud.

In this session we’ll discuss:

  • Configuring PMM (metrics and queries) against Amazon RDS MySQL and Amazon Aurora using an EC2 instance
  • Configuring PMM against CloudWatch metrics
  • Setting configuration parameters for AWS for maximum PMM visibility

Register for the webinar now.

mykolaMykola Marzhan, Release Engineer

Mykola joined Percona in 2016 as a release engineer. He has been developing monitoring systems since 2004, and has been working as Release Engineer/Release Manager/DevOps for ten years. Recently, Mykola achieved an AWS Certified Solutions Architect (Professional) authentication.


by Mykola Marzhan at February 23, 2018 10:15 PM

This Week in Data with Colin Charles 29: Percona Live Full Schedule, MariaDB Events, and a Matter of Compatibility

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

I think the biggest news from Percona-land is that besides the tutorial grid, the schedule for day 1 and day 2 are live! Also notice the many “sub-themes”: a 101 track, using MySQL, MongoDB, cloud, MySQL database software, PostgreSQL, containers & automation, monitoring & ops, and misc. database stuff. Learn from 107 different experts (this number is likely to increase). So register now.

This coming week, Peter Zaitsev, Tom Basil, and I will be in New York. Some of us will be at the MariaDB 2018 Developers Unconference, and all of us will be attending M|18. We have a schedule for the Developers Unconference, and I hope you find time on Sunday to join us as I present MySQL features missing in MariaDB  between 12:15-13:00. Being an unconference, it shouldn’t just be a presentation, but also active discussion. I recall during the FOSDEM MySQL DevRoom, MariaDB Foundation developer Vicentiu Ciorbaru assigned to himself support for the super readonly feature (see tweet).

If you have thoughts of what you like in MySQL but are missing from MariaDB Server, please don’t hesitate to tweet at me @bytebot, or even drop me an email: I will happily change and add to the slides until Sunday morning, Eastern Standard Time.

Why is this important? Quite simply, take a look at Todd Farmer’s blog post: Bitten by MariaDB 10.2 Incompatible Change. Here’s Cloudera Manager failing, on specific minor versions of software since the behavior changed (so this particular issue occurs in 10.2.8+ but not before!). I’d definitely spend some time reading the comments as well as the associated Jira. Maybe with 10.3/10.4, it’s time to stop calling it a “drop-in replacement” (an initial goal when I worked on MariaDB Server), and just call it something else. Maybe something for the new Chief Marketing Officer to think about?


Link List

Upcoming appearances

  • SCALE16x – Pasadena, California, USA – March 8-11 2018
  • FOSSASIA 2018 – Singapore – March 22-25 2018


I look forward to feedback/tips via e-mail at or on Twitter @bytebot.

by Colin Charles at February 23, 2018 05:04 PM

Jean-Jerome Schmidt

Updated: ClusterControl Tips & Tricks: MySQL Query Performance Tuning

Bad query performance is the most common problem DBA’s have to deal with. There are numerous ways to collect, process and analyze the data related to query performance - we’ve covered one of the most popular tools, pt-query-digest, in some of our previous blog posts:

Become a MySQL DBA blog series

When you use ClusterControl though, this is not always needed. You can use the data available in ClusterControl to solve your problem. In this blog post, we’ll look into how ClusterControl can help you solve problems related to query performance.

It may happen that a query cannot complete in a timely manner. The query may be stuck due to some locking issues, it may be not optimal or not indexed properly or it may be too heavy to complete in a reasonable amount of time. Keep in mind that a couple of not indexed joins can easily scan billions of rows if you have a large production database. Whatever happened, the query is probably using some of the resources - be it CPU or I/O for a non-optimized query or even just row locks. Those resources are required also for other queries and it may seriously slows things down. One of very simple yet important tasks would be to pinpoint the offending query and stop it.

It is pretty easily done from the ClusterControl interface. Go to the Query Monitor tab -> Running Queries section - you should see an output similar to the screenshot below.

As you can see, we have a pile of queries stuck. Usually the offending query is the one which takes the long time, you might want to kill it. You may also want to investigate it further to make sure you pick the correct one. In our case, we clearly see a SELECT … FOR UPDATE which joins a couple of tables and which is in the ‘Sending data’ state meaning it is processing the data, for last 90 seconds.

Another type of question a DBA may need to answer is - which queries take most time to execute? This is a common question, as such queries may be a low hanging fruit - they may be optimizable, and the more execution time a given query is responsible for in a whole query mix, the larger is the gain from its optimization. It is a simple equation - if a query is responsible for 50% of total execution time, making it 10x faster will give much better result than optimizing a  query which is responsible for just 1% of the total execution time.

ClusterControl can help you answer such questions, but first we need to ensure the Query Monitor is enabled. You can toggle the Query Monitor to ON under the Query Monitor page. Furthermore you can configure the "Long Query Time" and "Log queries not using indexes" option under Settings to suit your workload:

The Query Monitor in ClusterControl works in two modes, depending on whether you have the Performance Schema available with the required data on the running queries or not. If it is available (and this is true by default in MySQL 5.6 and newer), Performance Schema will be used to collect query data, minimizing the impact on the system. Otherwise, the slow query log will be used and all of the settings visible in the above screenshot are used. Those are pretty well explained in the UI, so there’s no need to do it here. When the Query Monitor uses Performance Schema, those settings are not used (except for toggling ON/OFF the Query Monitor to enable/disable data collection).

When you confirmed that the Query Monitor is enabled in ClusterControl, you can go to Query Monitor -> Top Queries, where you’ll be presented with a screen similar to the below:

What you can see here is a list of the most expensive queries (in terms of execution time) that hit our cluster. Each of them has some further details - how many times it was executed, how many rows were examined or sent to the client, how execution time varied, how much time the cluster spent on executing a given type of query. Queries are grouped by query type and schema.

You may be surprised to find out that the main place where execution time is spent is a ‘COMMIT’ query. Actually, this is fairly typical for quick OLTP queries executed on Galera cluster. Committing a transaction is an expensive process because certification has to happen. This leads to COMMIT being one of the most time-consuming queries in the query mix.

When you click on a query, you can see the full query, maximum execution time, number of occurrences, some general optimization hints and an EXPLAIN output for it - pretty useful to identify if something’s wrong with it. In our example we’ve checked a SELECT … FOR UPDATE with a high number of rows examined. As expected, this query is an example of terrible SQL - a JOIN which does not use any index. You can see on the EXPLAIN output that no index is used, not a single one was even considered possible to use. No wonder this query seriously impacted the performance of our cluster.

Another way to get some insight into query performance is to look at Query Monitor -> Query Outliers. This basically is a list of queries whose performance significantly differ from their average.

As you can see in the above screenshot, the second query took 0.01116s (time is shown in milliseconds) where average execution time for that query is much lower (0.000142s). We have also some additional statistical info on standard deviation and maximum query execution time. Such list of queries may seem to be not very useful - it’s not really true. When you see a query on this list, it means that something was different from the usual - query did not complete in regular time. It may be an indication of some performance issues on your system and a signal that you should investigate other metrics, and check if anything else happened at that time.

People tend to focus on achieving max performance, forgetting that it is not enough to have high throughput - it also has to be consistent. Users like performance to be stable - you may be able to squeeze more transactions per second from your system but if it means that some transactions will start to stall for seconds, that’s not worth it. Looking at the Query Histogram in ClusterControl helps you identify such consistency issues in your query mix.

Happy query monitoring!

PS.: To get started with ClusterControl, click here!

by ashraf at February 23, 2018 09:01 AM

Peter Zaitsev

How to Restore MySQL Logical Backup at Maximum Speed

Restore MySQL Logical Backup

Restore MySQL Logical BackupThe ability to restore MySQL logical backups is a significant part of disaster recovery procedures. It’s a last line of defense.

Even if you lost all data from a production server, physical backups (data files snapshot created with an offline copy or with Percona XtraBackup) could show the same internal database structure corruption as in production data. Backups in a simple plain text format allow you to avoid such corruptions and migrate between database formats (e.g., during a software upgrade and downgrade), or even help with migration from completely different database solution.

Unfortunately, the restore speed for logical backups is usually bad, and for a big database it could require days or even weeks to get data back. Thus it’s important to tune backups and MySQL for the fastest data restore and change settings back before production operations.


All results are specific to my combination of hardware and dataset, but could be used as an illustration for MySQL database tuning procedures related to logical backup restore.


There is no general advice for tuning a MySQL database for a bulk logical backup load, and any parameter should be verified with a test on your hardware and database. In this article, we will explore some variables that help that process. To illustrate the tuning procedure, I’ve downloaded IMDB CSV files and created a MySQL database with pyimdb.

You may repeat the whole benchmark procedure, or just look at settings changed and resulting times.


  • 16GB – InnoDB database size
  • 6.6GB – uncompressed mysqldump sql
  • 5.8GB – uncompressed CSV + create table statements.

The simplest restore procedure for logical backups created by the mysqldump tool:

mysql -e 'create database imdb;'
time mysql imdb < imdb.sql
# real 129m51.389s

This requires slightly more than two hours to restore the backup into the MySQL instance started with default settings.

I’m using the Docker image percona:latest – it contains Percona Server 5.7.20-19 running on a laptop with 16GB RAM, Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz, two disks: SSD KINGSTON RBU-SNS and HDD HGST HTS721010A9.

Let’s start with some “good” settings: buffer pool bigger than default, 2x1GB transaction log files, disable sync (because we are using slow HDD), and set big values for IO capacity,
the load should be faster with big batches thus use 1GB for max_allowed_packet.

Values were chosen to be bigger than the default MySQL parameters because I’m trying to see the difference between the usually suggested values (like 80% of RAM should belong to InnoDB buffer pool).

docker run --publish-all --name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7
  time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )
  # real 59m34.252s

The load is IO bounded, and there is no reaction on set global foreign_key_checks=0 and unique_checks=0 because these variables are already disabled in the dump file.

How can we reduce IO?

Disable InnoDB double write: --innodb_doublewrite=0

time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )
# real 44m49.963s

A huge improvement, but we still have an IO-bounded load.

We will not be able to improve load time significantly for IO bounded load. Let’s move to SSD:

time (mysql --max_allowed_packet=1G imdb1 < imdb.sql )
# real 33m36.975s

Is it vital to disable disk sync for the InnoDB transaction log?

sudo rm -rf mysql/*
docker rm p57
docker run -v /home/ihanick/Private/Src/tmp/data-movies/imdb.sql:/root/imdb.sql -v /home/ihanick/Private/Src/tmp/data-movies/mysql:/var/lib/mysql
--name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7
# real 33m49.724s

There is no significant difference.

By default, mysqldump produces SQL data, but it could also save data to CSV format:

cd /var/lib/mysql-files
mkdir imdb
chown mysql:mysql imdb/
time mysqldump --max_allowed_packet=128M --tab /var/lib/mysql-files/imdb imdb1
# real 1m45.983s
sudo rm -rf mysql/*
docker rm p57
docker run -v /srv/ihanick/tmp/imdb:/var/lib/mysql-files/imdb -v /home/ihanick/Private/Src/tmp/data-movies/mysql:/var/lib/mysql
--name p57 -it -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7
time (
mysql -e 'drop database imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'
(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1 ;
for i in $PWD/*.txt ; do mysqlimport imdb1 $i ; done
# real 21m56.049s
1.5X faster, just because of changing the format from SQL to CSV!

We’re still using only one CPU core, let’s improve the load with the –use-threads=4 option:

time (
mysql -e 'drop database if exists imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'
(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1
mysqlimport --use-threads=4 imdb1 $PWD/*.txt
# real 15m38.147s

In the end, the load is still not fully parallel due to a big table: all other tables are loaded, but one thread is still active.

Let’s split CSV files into smaller ones. For example, 100k rows in each file and load with GNU/parallel:

# /var/lib/mysql-files/imdb/
apt-get update ; apt-get install -y parallel
cd /var/lib/mysql-files/imdb
time (
cd split1
for i in ../*.txt ; do echo $i ; split -a 6 -l 100000 -- $i `basename $i .txt`. ; done
for i in `ls *.*|sed 's/^[^.]+.//'|sort -u` ; do
mkdir ../split-$i
for j in *.$i ; do mv $j ../split-$i/${j/$i/txt} ; done
# real 2m26.566s
time (
mysql -e 'drop database if exists imdb1;create database imdb1;set global FOREIGN_KEY_CHECKS=0;'
(echo "SET FOREIGN_KEY_CHECKS=0;";cat *.sql) | mysql imdb1
parallel 'mysqlimport imdb1 /var/lib/mysql-files/imdb/{}/*.txt' ::: split-*
#real 16m50.314s

Split is not free, but you can split your dump files right after backup.

The load is parallel now, but the single big table strikes back with ‘setting auto-inc lock’ in SHOW ENGINE INNODB STATUSG

Using the --innodb_autoinc_lock_mode=2 option fixes this issue: 16m2.567s.

We got slightly better results with just mysqlimport --use-threads=4. Let’s check if hyperthreading helps and if the problem caused by “parallel” tool:

  • Using four parallel jobs for load: 17m3.662s
  • Using four parallel jobs for load and two threads: 16m4.218s

There is no difference between GNU/Parallel and --use-threads option of mysqlimport.

Why 100k rows? With 500k rows: 15m33.258s

Now we have performance better than for mysqlimport --use-threads=4.

How about 1M rows at once? Just 16m52.357s.

I see periodic flushing logs message with bigger transaction logs (2x4GB): 12m18.160s:

--innodb_buffer_pool_size=4GB --innodb_log_file_size=4G --skip-log-bin --innodb_flush_log_at_trx_commit=0 --innodb_io_capacity=700 --innodb_io_capacity_max=1500 --max_allowed_packet=1G --innodb_doublewrite=0 --innodb_autoinc_lock_mode=2 --performance-schema=0

Let’s compare the number with myloader 0.6.1 also running with four threads (myloader have only -d parameter, myloader execution time is under corresponding mydumper command):

# oversized statement size to get 0.5M rows in one statement, single statement per chunk file
mydumper -B imdb1 --no-locks --rows 500000 --statement-size 536870912 -o 500kRows512MBstatement
mydumper -B imdb1 --no-locks -o default_options
mydumper -B imdb1 --no-locks --chunk-filesize 128 -o chunk128MB
mydumper -B imdb1 --no-locks --chunk-filesize 64 -o chunk64MB

It will be great to test mydumper with CSV format, but unfortunately, it wasn’t implemented in the last 1.5 years:

Returning back to parallel CSV files load, even bigger transaction logs 2x8GB: 11m15.132s.

What about a bigger buffer pool: --innodb_buffer_pool_size=12G? 9m41.519s

Let’s check six-year-old server-grade hardware: Intel(R) Xeon(R) CPU E5-2430 with SAS raid (used only for single SQL file restore test) and NVMe (Intel Corporation PCIe Data Center SSD, used for all other tests).

I’m using similar options as for previous tests, with 100k rows split for CSV files load:

--innodb_buffer_pool_size=8GB --innodb_log_file_size=8G --skip-log-bin --innodb_flush_log_at_trx_commit=0 --innodb_io_capacity=700 --innodb_io_capacity_max=1500 --max_allowed_packet=1G --innodb_doublewrite=0 --innodb_autoinc_lock_mode=2

  • Single SQL file created by mysqldump loaded for 117m29.062s = 2x slower.
  • 24 parallel processes of mysqlimport: 11m51.718s
  • Again hyperthreading making a huge difference! 12 parallel jobs: 18m3.699s.
  • Due to higher concurrency, adaptive hash index is a reason for locking contention. After disabling it with --skip-innodb_adaptive_hash_index: 10m52.788s.
  • In many places, disable unique checks referred as a performance booster: 10m52.489s
    You can spend more time reading advice about unique_checks, but it might help for some databases with many unique indexes (in addition to primary one).
  • The buffer pool is smaller than the dataset, can you change old/new pages split to make insert faster? No: --innodb_old_blocks_pct=5 : 10m59.517s.
  • O_DIRECT is also recommended: --innodb_flush_method=O_DIRECT: 11m1.742s.
  • O_DIRECT is not able to improve performance by itself, but if you can use a bigger buffer pool: O_DIRECT + 30% bigger buffer pool: --innodb_buffeer_pool_size=11G: 10m46.716s.


  • There is no common solution to improve logical backup restore procedure.
  • If you have IO-bounded restore: disable InnoDB double write. It’s safe because even if the database crashes during restore, you can restart the operation.
  • Do not use SQL dumps for databases > 5-10GB. CSV files are much faster for mysqldump+mysql. Implement mysqldump --tabs+mysqlimport or use mydumper/myloader with appropriate chunk-filesize.
  • The number of rows per load data infile batch is important. Usually 100K-1M, use binary search (2-3 iterations) to find a good value for your dataset.
  • InnoDB log file size and buffer pool size are really important options for backup restore performance.
  • O_DIRECT reduces insert speed, but it’s good if you can increase the buffer pool size.
  • If you have enough RAM or SSD, the restore procedure is limited by CPU. Use a faster CPU (higher frequency, turboboost).
  • Hyperthreading also counts.
  • A powerful server could be slower than your laptop (12×2.4GHz vs. 4×2.8+turboboost).
  • Even with modern hardware, it’s hard to expect backup restore faster than 50MBps (for the final size of InnoDB database).
  • You can find a lot of different advice on how to improve backup load speed. Unfortunately, it’s not possible to implement improvements blindly, and you should know the limits of your system with general Unix performance tools like vmstat, iostat and various MySQL commands like SHOW ENGINE INNODB STATUS (all can be collected together with pt-stalk).
  • Percona Monitoring and Management (PMM) also provides good graphs, but you should be careful with QAN: full slow query log during logical database dump restore can cause significant processing load.
  • Default MySQL settings could cost you 10x backup restore slowdown
  • This benchmark is aimed at speeding up the restore procedure while the application is not running and the server is not used in production. Make sure that you have reverted all configuration parameters back to production values after load. For example, if you disable the InnoDB double write buffer during restore and left it enabled in production, you may have scary data corruption due to partial InnoDB pages writes.
  • If the application is running during restore, in most cases you will get an inconsistent database due to missing support for locking or correct transactions for restore methods (discussed above).

by Nickolay Ihalainen at February 23, 2018 12:35 AM

February 22, 2018

MariaDB AB

Streaming Live: M|18 User Conference Keynotes

Streaming Live: M|18 User Conference Keynotes MariaDB Team Thu, 02/22/2018 - 18:39

MariaDB’s sold-out user conference, M|18, kicks off on Monday! It’ll be two jam-packed days of learning and networking in New York City. Not able to join us in person? You can still get a peek at the action – we’re live-streaming the opening-day keynote presentations on Monday, February 26.
MariaDB’s CEO, Michael Howard, will deliver a welcome keynote to provide an inside look at what MariaDB is working on. Then two exceptional MariaDB customers – ServiceNow and DBS Bank – will share their unique journeys with MariaDB and open source.

The stream goes live at 1:15 PM ET / 7:15 PM CET. Want a reminder when it happens?

Sign Up to Be Notified

While you await the start of M|18 action, you can read about how DBS Bank got started with MariaDB. DBS’s keynote will pick up where that story left off.
Craving more? Follow #MARIADBM18 on Twitter for ongoing M|18 coverage.

Watch the live stream of three keynote presentations at M|18, MariaDB's user conference. The stream starts at 1:15 PM ET / 7:15 PM CET – Monday, February 26.

Login or Register to post comments

by MariaDB Team at February 22, 2018 11:39 PM

MariaDB MaxScale 2.2: Introducing Failover, Switchover and Automatic Rejoin

MariaDB MaxScale 2.2: Introducing Failover, Switchover and Automatic Rejoin Esa Korhonen Thu, 02/22/2018 - 18:16

Failure tolerance and recoverability are essential for a high availability (HA) database setup. Although modern systems are quite reliable, hardware errors or software bugs (not necessarily in the database itself) can bring a system down. MariaDB HA setups use master-slave replication to copy the data to multiple servers, which may be located in different datacenters. Should the master server fail the application can be directed to use one of the slave servers. This operation either requires manual interference from a dba or a custom automated script. Depending on time of day and personnel, manual operation may be slow. Custom scripts may lack testing and flexibility. Clearly, recovery should be automatic, thoroughly tested and preferably included in existing database scalability software.

To answer this demand, MariaDB MaxScale 2.2.2. adds the following master-slave replication cluster management features:

  • Failover: replace a failed master with the most up-to-date slave
  • Switchover: swap the running master with a designated slave
  • Rejoin: rejoin a standalone server to the cluster as a slave

MariaDB MaxScale is an advanced database proxy for MariaDB database servers. It sits between client applications and the database servers, routing client queries and server responses. MaxScale also monitors the servers, so it will quickly notice any changes in server status or replication topology. This makes MaxScale a natural choice for controlling failover and similar features.

Failover for the master-slave cluster can and often should be set to activate automatically. Switchover must be activated manually through MaxAdmin, MaxCtrl or the REST interface. Rejoin can be set to automatic or activated manually. These features are implemented in the mariadbmonitor-module. This module replaces the old mysqlmonitor (MaxScale is still backwards compatible with the old name). All three operations require GTID-based replication and are intended for simple single-master replication topologies. Additionally, failover and switchover expect the topology to be one-layer deep. The cluster master may be replicating from an external master, in which case a promoted master server is instructed to replicate from the external master.

In this blog post, we present an example setup and experiment with the new features. The database setup for this example is:

  • One VM for MariaDB MaxScale 2.2.2
  • One VM for the master MariaDB Server
  • One VM for the slave MariaDB Server 

_MaxScale Cluster.jpg

[root@box01 ~]# maxadmin list servers
Server             | Address         | Port  | Connections | Status
box02              |   |  3306 |           0 | Master, Running
box03              |   |  3306 |           0 | Slave, Running

Here is the vagrantfile used for the examples of this blog:

# -*- mode: ruby maxscale222
# vi: set ft=ruby :

# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|

 #: adding ansible stuff
 config.ssh.insert_key = false

 #: maxscale box
 config.vm.define "box01" do |box01|
 box01.vm.hostname="box01" = "centos7.2_x86_64" "private_network", ip: "", virtualbox__intnet: "XY"
 ######: MASTER / SLAVE SERVERS :######
 #: master, async && semisync replication
 config.vm.define "box02" do |box02|
box02.vm.hostname="box02" = "centos7.2_x86_64" "private_network", ip: "", virtualbox__intnet: "XY"

 #: slave01, async && semisync replication
 config.vm.define "box03" do |box03|
       box03.vm.hostname="box03" = "centos7.2_x86_64" "private_network", ip: "", virtualbox__intnet: "XY"

Setting up MariaDB MaxScale 2.2.2

In these examples we are running CentOS 7.2. If you are running a Debian based Linux distribution, check here for the best MaxScale package for your system.  After downloading, install the packages and configure MaxScale as presented below.

#: packages you want to download
[root@box01 ~]# ls -lhS
total 15M
-rw-r--r--  1 root root 7.8M Jan 10 20:44 maxscale-client-2.2.2-1.centos.7.x86_64.rpm
-rw-r--r--  1 root root 7.0M Jan 10 20:44 maxscale-2.2.2-1.centos.7.x86_64.rpm

#: set up them
[root@box01 ~]# rpm -ivh *.rpm
Preparing...                          ################################# [100%]
Updating / installing...
  1:maxscale-client-2.2.2-1          ################################# [ 50%]
  2:maxscale-2.2.2-1                 ################################# [100%]

#: checking the version
[root@box01 ~]# maxscale --version-full
MaxScale 2.2.2 - eda82881619388a3512d6cfcbcf9ad83ea930339

#: basic configuration - /etc/maxscale.cnf



[CLI Unix Listener]

[CLI Inet Listener]

Above, the password for the service user is encrypted. An encrypted password can be generated with the maxkeys and maxpasswd utilities. For more information, check maxkeys/maxpasswd. Once configuration is complete, start MaxScale:

[root@box01 ~]# systemctl enable maxscale.service
Created symlink from /etc/systemd/system/ to /usr/lib/systemd/system/maxscale.service.
[root@box01 ~]# systemctl is-enabled maxscale.service
[root@box01 ~]# systemctl start maxscale.service
[root@box01 ~]# systemctl status maxscale.service
● maxscale.service - MariaDB MaxScale Database Proxy
  Loaded: loaded (/usr/lib/systemd/system/maxscale.service; enabled; vendor preset: disabled)
  Active: active (running) since Fri 2018-01-12 00:24:21 GMT; 5s ago
 Process: 4035 ExecStart=/usr/bin/maxscale (code=exited, status=0/SUCCESS)
 Process: 4032 ExecStartPre=/usr/bin/install -d /var/run/maxscale -o maxscale -g maxscale (code=exited, status=0/SUCCESS)
Main PID: 4038 (maxscale)
  CGroup: /system.slice/maxscale.service
          └─4038 /usr/bin/maxscale

The following script demonstrates MaxScale’s runtime configuration management. These items could have been added to the configuration file instead. The commands generate a server cluster monitor within MaxScale and set it up for automatic cluster management. The individual parameters set here are presented in the next section.


#: creating the monitor
maxadmin create monitor cluster-monitor mariadbmon

#: adding more features for the MariaDBMon monitor
maxadmin alter monitor cluster-monitor user=maxuser password=ACEEF153D52F8391E3218F9F2B259EAD monitor_interval=1000 replication_user=mariadb replication_password=ACEEF153D52F8391E3218F9F2B259EAD failcount=5 auto_failover=true auto_rejoin=true

#: restarting the monitor
maxadmin restart monitor cluster-monitor

#: creating the service listener
maxadmin create listener rwsplit-service rwsplit-listener 53310

#: creating and adding the servers
maxadmin create server prod_mariadb01 3306
maxadmin create server prod_mariadb02 3306
maxadmin add server prod_mariadb01 cluster-monitor rwsplit-service
maxadmin add server prod_mariadb02 cluster-monitor rwsplit-service

Before executing the script above, you should generate the users maxuser and mariadb (or whatever usernames were generated by the script) on the backends. Again, their encrypted passwords for the script should be generated with maxpasswd.

#: script execution output

[root@box01 ~]# ./
Created monitor 'cluster-monitor'
Listener 'rwsplit-listener' created
Created server 'prod_mariadb01'
Created server 'prod_mariadb02'
Added server 'prod_mariadb01' to 'cluster-monitor'
Added server 'prod_mariadb01' to 'rwsplit-service'
Added server 'prod_mariadb02' to 'cluster-monitor'
Added server 'prod_mariadb02' to 'rwsplit-service'

The monitor is now running. To check the status of Maxscale, execute the following:

#: listing servers after creating the configurations

[root@box01 ~]# maxadmin list servers
Server             | Address         | Port  | Connections | Status
prod_mariadb01     |   |  3306 |           0 | Master, Running
prod_mariadb02     |   |  3306 |           0 | Slave, Running

#: listing monitors

[root@box01 ~]# maxadmin list monitors
Monitor              | Status
cluster-monitor      | Running

Finally, you may check the listener and the open port:

[root@box01 ~]# maxadmin list listeners | grep rwsplit-listener
Rwsplit-listener | rwsplit-service | MariaDBClient | | 53310 | Running 

[root@box01 ~]# netstat -l | grep 53310
tcp        0      0 *               LISTEN

Monitor configuration parameters

The following parameters enable and control the cluster management features.

  • replication_user and replication_password: These are the username and the password used by MariaDBMonitor when generating a CHANGE MASTER TO-command.

  • auto_failover: Enables automatic failover. Failover can be activated manually regardless of this setting

  • failover_timeout: Time limit (in seconds) for executing a failover, measured from the moment failover (automatic or manual) is activated. If time runs out, an event is logged, and automatic failover is disabled. Typically, the timeout is only reached if the selected new master server cannot consume its relay log quickly enough.

  • auto_rejoin: Enable automatic rejoin. When enabled, two types of servers are set to replicate from the current cluster master:

    • Standalone servers (no slave thread)
    • Any server replicating from (or attempting to) from a server which is not the cluster master server.
  • failcount: How many times (during different monitoring passes) a server must fail to respond to status query before it is declared down and an automatic failover may be triggered if enabled.

  • verify_master_failure: This enables an additional criteria for triggering an automatic failover. The monitor will look at the master binlog file positions of the slave servers and if they have advanced within a configured timeout, failover is not activated even if the monitor cannot connect to the master. This means that at least one slave still receives events even if MaxScale cannot connect to the master.

  • master_failure_timeout: The timeout for verify_master_failure.

  • switchover_timeout: Similar to failover_timeout, just for switchover.

An example configuration file section for a monitor with these settings is below.

[wb@maxscale maxscale.cnf.d]$ cat /var/lib/maxscale/maxscale.cnf.d/cluster-monitor.cnf

monitor_interval=1000 #: it should be >= 5000 for production
failover_timeout=5    #: it should be >= 10 for production


If the current master is showing any issues, you may want to promote a slave to take its place. The switchover-command takes three arguments: the monitor name, the slave to be promoted and the current master.

#: switchover process
#: listing servers and current status
[root@box01 ~]# maxadmin list servers
Server             | Address         | Port  | Connections | Status
prod_mariadb01     |   |  3306 |           0 | Master, Running
prod_mariadb02     |   |  3306 |           0 | Slave, Running

#: command to make the current slave a new master
[root@box01 ~]# maxadmin call command mariadbmon switchover cluster-monitor prod_mariadb02 prod_mariadb01

#: what Maxscale logs says, default location /var/log/maxscale/maxscale.log
2018-01-12 20:00:28   info   : (2) Started CLI client session [8] for 'root' from localhost
2018-01-12 20:00:28   info   : (8) [cli] MaxAdmin: call command "mariadbmon" "switchover" "cluster-monitor" "prod_mariadb02" "prod_mariadb01"
2018-01-12 20:00:29   notice : (8) [mariadbmon] Stopped the monitor cluster-monitor for the duration of switchover.
2018-01-12 20:00:29   notice : (8) [mariadbmon] Demoting server 'prod_mariadb01'.
2018-01-12 20:00:29   notice : (8) [mariadbmon] Promoting server 'prod_mariadb02' to master.
2018-01-12 20:00:29   notice : (8) [mariadbmon] Old master 'prod_mariadb01' starting replication from 'prod_mariadb02'.
2018-01-12 20:00:29   notice : (8) [mariadbmon] Redirecting slaves to new master.
2018-01-12 20:00:29   notice : (8) [mariadbmon] Switchover prod_mariadb01 -> prod_mariadb02 performed.2018-01-12 20:00:29   info   : Stopped CLI client session [8]

The warning messages suggest activating gtid_strict_mode on the servers, as this enables some additional checks when a server is starting replication.
#: listing servers again
[root@box01 ~]# maxadmin list servers
Server             | Address         | Port  | Connections | Status
prod_mariadb01     |   |  3306 |           0 | Slave, Running
prod_mariadb02     |   |  3306 |           0 | Master, Running

Switchover uses the server global setting read_only to freeze the master server when preparing to switch. Users with SUPER-privilege bypass read_only, which allows them to modify data during a switchover. This often causes replication to  break as different servers have different events. To prevent this, make sure that any users who regularly do write queries do not have SUPER.


Failover is activated when the master crashes or becomes unavailable. MariaDB Monitor will detect that the master is out of reach, will wait for a while in case the master quickly comes back (wait time is configurable), and finally begins failover to replace the failed master with a slave.

For example, if failcount is 5 and monitor_interval is 1000, the failover requires 5 monitor passes without master server connection, with one second waits between monitor passes.

Let's demonstrate by shutting down the current master with systemctl. 

#: failover, let’s kill the current master
[root@box01 ~]# maxadmin list servers
Server             | Address         | Port  | Connections | Status
prod_mariadb01     |   |  3306 |           0 | Slave, Running
prod_mariadb02     |   |  3306 |           0 | Master, Running

[root@box03 mysql]# systemctl stop mariadb
[root@box03 mysql]# systemctl status mariadb
● mariadb.service - MariaDB database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mariadb.service.d
Active: inactive (dead) since Fri 2018-01-12 20:19:39 GMT; 12s ago
Process: 4295 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 4259 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 4223 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
Process: 4221 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Main PID: 4259 (code=exited, status=0/SUCCESS)
Status: "MariaDB server is down"

Below is an excerpt of the MaxScale log. mariadbmon detects that the current master has gone away and after 2 monitor passes failover activates.

#: what Maxscale logs says, default location /var/log/maxscale/maxscale.log
2018-01-12 20:19:39   error  : Monitor was unable to connect to server []:3306 : "Can't connect to MySQL server on '' (115)"
2018-01-12 20:19:39   notice : [mariadbmon] Server []:3306 lost the master status.
2018-01-12 20:19:39   notice : Server changed state: prod_mariadb02[]: master_down. [Master, Running] -> [Down]
2018-01-12 20:19:39   warning: [mariadbmon] Master has failed. If master status does not change in 2 monitor passes, failover begins.
2018-01-12 20:19:39   error  : [mariadbmon] No Master can be determined. Last known was
2018-01-12 20:19:41   notice : [mariadbmon] Performing automatic failover to replace failed master 'prod_mariadb02'.
2018-01-12 20:19:41   notice : [mariadbmon] Promoting server 'prod_mariadb01' to master.
2018-01-12 20:19:41   notice : [mariadbmon] Redirecting slaves to new master.
2018-01-12 20:19:42   warning: [mariadbmon] Setting standalone master, server 'prod_mariadb01' is now the master.
2018-01-12 20:19:42   notice : Server changed state: prod_mariadb01[]: new_master. [Slave, Running] -> [Master, Running]
#: checking the server's status
[root@box01 ~]# maxadmin list servers
Server             | Address         | Port  | Connections | Status
prod_mariadb01     |   |  3306 |           0 | Master, Running
prod_mariadb02     |   |  3306 |           0 | Down

Automatic rejoin

When auto_rejoin is enabled, the monitor will rejoin any standalone database servers or any slaves replicating from a relay master to the main cluster. The typical use case for this feature is rejoining the old master after a failover. Should the master come back online after a slave was already promoted to its place, it would not be immediately replicating. Auto-rejoin will detect this and redirect the master. This is not certain to succeed as the master may have conflicting events. In this case the slave thread will end in an error.

Below is an example of a successful operation:

#: let’s test the auto_rejoin now as we will back up with

#: the server we put down on the failover exercise
[root@box01 ~]# maxadmin list servers
Server             | Address         | Port  | Connections | Status
prod_mariadb01     |   |  3306 |           0 | Master, Running
prod_mariadb02     |   |  3306 |           0 | Down

#: what Maxscale logs says, default location /var/log/maxscale/maxscale.log
2018-01-12 20:22:43   notice : Server changed state: prod_mariadb02[]: server_up. [Down] -> [Running]
2018-01-12 20:22:43   notice : [mariadbmon] Directing standalone server 'prod_mariadb02' to replicate from 'prod_mariadb01'.
2018-01-12 20:22:43   notice : [mariadbmon] 1 server(s) redirected or rejoined the cluster.
2018-01-12 20:22:44   notice : Server changed state: prod_mariadb02[]: new_slave. [Running] -> [Slave, Running]

Above, the server backend prod_master02 has returned and was joined to the cluster as a slave of the current master.

#: checking the server's status
[root@box01 ~]# maxadmin list servers
Server             | Address         | Port  | Connections | Status
prod_mariadb01     |   |  3306 |           0 | Master, Running
prod_mariadb02     |   |  3306 |           0 | Slave, Running

Additional Comments

  • If you omit the replication_user and the replication_password on the monitor configurations, the username and password used by the monitor to check the current state of the backends will be used instead. In this case the monitor user should have, in addition to its normal rights, the ability to connect among the backends as well. Usually the user for MariaDBMon is restricted to connections only from the MaxScale host.

  • If you use an encrypted password for the monitor user, the replication_password should be encrypted as well. Otherwise, the CHANGE MASTER TO query will fail.

  • MariaDB Servers forming a cluster should be configured with gtid_strict_mode enabled to make sure databases have the same binary log order among the instances.

MariaDB MaxScale 2.2 introduces failover, switchover and automatic rejoin for MariaDB Master/Slave replication clusters.

sakthi sri

sakthi sri

Tue, 02/27/2018 - 03:09

getting ERROR

@Esa Korhonen, I am testing this in our lab.

The MySQL topology is simple 1 master & 2 slaves. Both the slaves are under Master.

But, when i stop the MySQL service in Master, the auto failover is not happen. I am getting the below ERROR.


2018-02-27 07:48:30 notice : [mariadbmon] Server []:3306 lost the master status.
2018-02-27 07:48:30 notice : Server changed state: master[]: master_down. [Master, Running] -> [Down]
2018-02-27 07:48:30 warning: [mariadbmon] Master has failed. If master status does not change in 4 monitor passes, failover begins.
2018-02-27 07:48:30 notice : [mariadbmon] Performing automatic failover to replace failed master 'master'.
2018-02-27 07:48:30 error : [mariadbmon] Slave server slave1 is not using gtid replication.
2018-02-27 07:48:30 error : [mariadbmon] Slave server slave2 is not using gtid replication.
2018-02-27 07:48:30 error : [mariadbmon] Failover not allowed due to errors.
2018-02-27 07:48:30 error : [mariadbmon] Failed to perform failover, disabling automatic failover. To re-enable failover, manually set 'auto_failover' to 'true' for monitor 'cluster-monitor' via MaxAdmin or the REST API, or restart MaxScale.

Note - as per document, I have configured the GTID replication only,

mysql> show variables like '%gtid%'; +----------------------------------+-----------+
| Variable_name | Value |
| binlog_gtid_simple_recovery | ON |
| enforce_gtid_consistency | ON |
| gtid_executed_compression_period | 1000 |
| gtid_mode | ON |
| gtid_next | AUTOMATIC |
| gtid_owned | |
| gtid_purged | |
| session_track_gtids | OFF |
8 rows in set (0.03 sec)

And, the Maxscale config file has the below content in [mariadbmon]


any suggestions regarding this will be really help for me, Thanks in advance !!!

Login or Register to post comments

by Esa Korhonen at February 22, 2018 11:16 PM

Peter Zaitsev

Percona Live 2018 Featured Talk – Scaling a High-Traffic Database: Moving Tables Across Clusters with Bryana Knight

Percona Live 2018 Featured Talk

Percona Live 2018 Featured TalkWelcome to the first interview blog for the upcoming Percona Live 2018. Each post in this series highlights a Percona Live 2018 featured talk that will be at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post highlights Bryana Knight, Platform Engineer at GitHub. Her talk is titled Scaling a High-Traffic Database: Moving Tables Across Clusters. Facing an immediate need to distribute load, GitHub came up with creative ways to move a significant amount of traffic off of their main MySQL cluster – with no user impact. In our conversation, we discussed how Bryana and GitHub solved some of these issues:

Percona: Who are you, and how did you get into databases? What was your path to your current responsibilities?

Bryana: I started at GitHub as a full-stack engineer working on a new business offering, and was then shortly offered the opportunity to transition to the database services team. Our priorities back then included reviewing every single database migration for Having spent my whole career as a full-stack engineer, I had to level-up pretty quickly on MySQL, data modeling, data access patterns – basically everything databases. I spent the first few months learning our schema and setup through lots of reading, mentorship from other members of my team, reviewing migrations for most of our tables, and asking a million questions.

Originally, my team spent a lot of time addressing immediate performance concerns. Then we started partnering with product engineering teams to build out the backends for new features. Now we are focused on the longterm scalability and availability of our database, stemming from how we access it. I work right between our DBA’s and our product and API engineers.

Percona: Your talk is titled “Scaling a High-Traffic Database: Moving Tables Across Clusters”. What were the challenges GitHub faced that required redistributing your tables?

Bryana GitHubBryana: This biggest part of the GitHub codebase is an 8-year-old monolith. As a company, we’ve been fortunate enough to see a huge amount of user growth since the company started. User growth means data growth. The schema and setup that worked for GitHub early on, and very much allowed GitHub to get to where it is today with tons of features and an extremely robust API, is not necessarily the right schema and setup for the size GitHub is today. 

We were seeing that higher than “normal” load was starting to have a more noticeable effect. The monolith aspect of our database, organic growth, plus inefficiencies in our code base were putting a lot of pressure on the master of our primary database cluster, which held our most core tables (think users, repos, permissions). From the database perspective, this meant contention, locking, and replica lag. From the user’s perspective, this meant anything from longer page loads to delays in UI updates and notifications, to timeouts. 

Percona: What were some of the other options you looked at (if any)?

Bryana: Moving tables out of our main cluster was not the only action we took to alleviate some of the pressure in our database. However, it was the highest impact change we could make in the medium-term to give us the breathing room we needed and improve performance and availability. We also prioritized efforts around moving more reads to replicas and off the master, throttling more writes where possible, index improvements and query optimizations. Moving these tables gave us the opportunity to start thinking more long-term about how we can store and access our data differently to allow us to scale horizontally while maintaining our healthy pace of feature development.

Percona: What were the issues that needed to be worked out between the different teams you mention in your description? How did they impact the project?

Bryana: Moving tables out of our main database required collaboration between multiple teams. The team I’m on, database-services, was responsible for coming up with the strategy to move tables without user impact, writing the code to handle query isolation and routing, connection switching, backgrounding writes, and so on. Our database-infrastructure team determined where the tables we were moving should go (new cluster or existing), setup the clusters, and advised us on how to safely copy the data. In some cases, we were able to use MySQL replication. When that wasn’t possible, they weighed in on other options. 

We worked with production engineers to isolate data access to these tables and safely split JOINs with other tables. Everybody needed to be sure we weren’t affecting performance and user experience when doing this. We discussed with our support team the risk of what we were doing. Then we worked with them to determine if we should preemptively status yellow when there was a higher risk of user impact. During the actual cut-overs, representatives from all these groups would get on a war-room-like video call and “push the button”, and we always made sure to have a roll-out and roll-back plan. 

Percona: Why should people attend your talk? What do you hope people will take away from it?

Bryana: In terms of database performance, there are a lot of little things you can do immediately to try and make improvements: things like adding indexes, tweaking queries, and denormalizing data. There are also more drastic, architectural changes you can pursue, that many companies need to do when they get to certain scale. The topic of this talk is a valid strategy that fits between these two extremes. It relieved some ongoing performance problems and availability risk, while giving us some breathing room to think long term. I think other applications and databases might be in a similar situation and this could work for them. 

Percona: What are you looking forward to at Percona Live (besides your talk)?

This is actually the first time I’m attending a Percona Live conference. I’m hoping to learn from some of the talks around scaling a high traffic database and sharding. I’m also looking forward to seeing some talks from the wonderful folks on GitHub database-infrastructure team.

Want to find out more about this Percona Live 2018 featured talk, and Bryana and GitHub’s migration? Register for Percona Live 2018, and see her talk Scaling a High-Traffic Database: Moving Tables Across Clusters. Register now to get the best price!

Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

by Dave Avery at February 22, 2018 10:25 PM

Open Query Pty Ltd

RDS Aurora MySQL and Service Interruptions

In Amazon space, any EC2 or Service instance can “disappear” at any time.  Depending on which service is affected, the service will be automatically restarted.  In EC2 you can choose whether an interrupted instance will be restarted, or left shutdown.

For an Aurora instance, an interrupted instance is always restarted. Makes sense.

The restart timing, and other consequences during the process, are noted in our post on Aurora Failovers.

Aurora Testing Limitations

As mentioned earlier, we love testing “uncontrolled” failovers.  That is, we want to be able to pull any plug on any service, and see that the environment as a whole continues to do its job.  We can’t do that with Aurora, because we can’t control the essentials:

  • power button;
  • reset switch;
  • ability to kill processes on a server;
  • and the ability to change firewall settings.

In Aurora, an instance is either running, or will (again) be running shortly.  So that we know.  Aurora MySQL also offers some commands that simulate various failure scenarios, but since they are built-in we can presume that those scenarios are both very well tested, as well as covered by the automation around the environment.  Those clearly defined cases are exactly the situations we’re not interested in.

What if, for instance, a server accepts new connections but is otherwise unresponsive?  We’ve seen MySQL do this on occasion.  Does Aurora catch this?  We don’t know and  we have no way of testing that, or many other possible problem scenarios.  That irks.

The Need to Know

If an automated system is able to catch a situation, that’s great.  But if your environment can end up in a state such as described above and the automated systems don’t catch and handle it, you could be dead in the water for an undefined amount of time.  If you have scripts to catch cases such as these, but the automated systems catch them as well, you want to be sure that you don’t trigger “double failovers” or otherwise interfere with a failover-in-progress.  So either way, you need to know and and be aware whether a situation is caught and handled, and be able to test specific scenarios.

In summary: when you know the facts, then you can assess the risk in relation to your particular needs, and mitigate where and as desired.

A corporate guarantee of “everything is handled and it’ll be fine” (or as we say in Australia “She’ll be right, mate!“) is wholly unsatisfactory for this type of risk analysis and mitigation exercise.  Guarantees and promises, and even legal documents, don’t keep environments online.  Consequently, promises and legalities don’t keep a company alive.

So what does?  In this case, engineers.  But to be able to do their job, engineers need to know what parameters they’re working with, and have the ability to test any unknowns.  Unfortunately Aurora is, also in this respect, a black box.  You have to trust, and can’t comprehensively verify.  Sigh.

by Arjen Lentz at February 22, 2018 12:10 AM

February 21, 2018

MariaDB AB

MariaDB Connector/J 2.2.2 and 1.7.2 now available

MariaDB Connector/J 2.2.2 and 1.7.2 now available dbart Wed, 02/21/2018 - 11:40

The MariaDB project is pleased to announce the immediate availability of MariaDB Connector/J 2.2.2 and MariaDB Connector/J 1.7.2. See the release notes and changelogs for details and visit to download.

Download MariaDB Connector/J 2.2.2

Release Notes Changelog About MariaDB Connector/J

Download MariaDB Connector/J 1.7.2

Release Notes Changelog About MariaDB Connector/J

The MariaDB project is pleased to announce the immediate availability of MariaDB Connector/J 2.2.2 and MariaDB Connector/J 1.7.2. See the release notes and changelogs for details.

Login or Register to post comments

by dbart at February 21, 2018 04:40 PM

Peter Zaitsev

Percona XtraDB Cluster and SELinux: Getting It To Work

Percona XtraDB Cluster and SELinux

Percona XtraDB Cluster and SELinuxIn this blog post, I’ll look at how to make Percona XtraDB Cluster and SELinux work when used together.

Recently, I encountered an issue with Percona XtraDB Cluster startup. We tried to setup a three-node cluster using Percona XtraDB Cluster with a Vagrant CentOS box, but somehow node2 was not starting. I did not get enough information to debug the issue in the donor/joiner error log. I got only the following error message:

2018-02-08 16:58:48 7910 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '7910' --binlog 'mysql-bin' '
2018-02-08 16:58:48 7910 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '7910' --binlog 'mysql-bin'
 Read: '(null)'
2018-02-08 16:58:48 7910 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '7910' --binlog 'mysql-bin' : 2 (No such file or directory)
2018-02-08 16:58:48 7910 [ERROR] WSREP: Failed to prepare for 'xtrabackup-v2' SST. Unrecoverable.
2018-02-08 16:58:48 7910 [ERROR] Aborting
2018-02-08 16:58:50 7910 [Note] WSREP: Closing send monitor...

The donor node error log also failed to give any information to debug the issue. After spending a few hours on the problem, one of our developers (Krunal) found that the error is due to SELinux. By default, SELinux is enabled in Vagrant CentOS boxes.

We have already documented how to disable SELinux when installing Percona XtraDB Cluster. Since we did not find any SELinux related error in the error log, we had to spend few hours finding out the root cause

You should also disable SELinux on the donor node to start the joiner node. Otherwise, the SST script starts but startup will fail with this error:

2018-02-09T06:55:06.099021Z 0 [Note] WSREP: Initiating SST/IST transfer on DONOR side (wsrep_sst_xtrabackup-v2 --role 'donor' --address '' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' '' --gtid '0dc70996-0d60-11e8-b008-074abdb3291a:1')
2018-02-09T06:55:06.099556Z 2 [Note] WSREP: DONOR thread signaled with 0
2018-02-09T06:55:06.099722Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' '' --gtid '0dc70996-0d60-11e8-b008-074abdb3291a:1': 2 (No such file or directory)
2018-02-09T06:55:06.099781Z 0 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' '' --gtid '0dc70996-0d60-11e8-b008-074abdb3291a:1'

Disable SELinux on all nodes to start Percona XtraDB Cluster.

The Percona XtraDB Cluster development team is working on providing the proper error message for SELinux issues.

by Ramesh Sivaraman at February 21, 2018 04:18 PM

Percona Live 2018 Open Source Database Conference Full Schedule Now Available

Percona Live 2018 Featured Talk

Percona Live 2018The conference session schedule for the seventh annual Percona Live 2018 Open Source Database Conference, taking place April 23-25 at the Santa Clara Convention Center in Santa Clara, CA is now live and available for review! Advance Registration Discounts can be purchased through March 4, 2018, 11:30 p.m. PST.

Percona Live Open Source Database Conference 2018 is the premier open source database event. With a theme of “Championing Open Source Databases,” the conference will feature multiple tracks, including MySQL, MongoDB, Cloud, PostgreSQL, Containers and Automation, Monitoring and Ops, and Database Security. Once again, Percona will be offering a low-cost database 101 track for beginning users who want to start learning how to use and operate open source databases.

Major areas of focus at the conference include:

  • Database operations and automation at scale, featuring speakers from Facebook, Slack, Github and more
  • Databases in the cloud – how database-as-a-service (DBaaS) is changing the DB Landscape, featuring speakers from AWS, Microsoft, Alibaba and more
  • Security and compliance – how GDPR and other government regulations are changing the way we manage databases, featuring speakers from Fastly, Facebook, Pythian, Percona and more
  • Bridging the gap between developers and DBAs – finding common ground, featuring speakers from Square, Oracle, Percona and more

Conference Session Schedule

Conference sessions take place April 24-25 and will feature 90+ in-depth talks by industry experts related to each of the key areas. Several sessions from Oracle and Percona will focus on how the new features and enhancements in the upcoming release of MySQL 8.0 will impact businesses. Conference session examples include:


Sponsorship opportunities for Percona Live Open Source Database Conference 2018 are available and offer the opportunity to interact with the DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors and entrepreneurs who typically attend the event. Contact for sponsorship details.

  • Diamond Sponsors – Continuent, VividCortex
  • Platinum – Microsoft
  • Gold Sponsors – Facebook, Grafana
  • Bronze Sponsors – Altinity, BlazingDB, SolarWinds, Timescale, TwinDB, Yelp
  • Other Sponsors – cPanel
  • Media Sponsors – Database Trends & Applications, Datanami, EnterpriseTech, HPCWire,, Packt

Hyatt Regency Santa Clara & The Santa Clara Convention Center

Percona Live 2018 Open Source Database Conference is held at the Hyatt Regency Santa Clara & The Santa Clara Convention Center, at 5101 Great America Parkway Santa Clara, CA 95054.

The Hyatt Regency Santa Clara & The Santa Clara Convention Center is a prime location in the heart of the Silicon Valley. Enjoy this spacious venue with complimentary wifi, on-site expert staff and three great restaurants. You can reserve a room by booking through the Hyatt’s dedicated Percona Live reservation site.

Book your hotel using Percona’s special room block rate!

by Dave Avery at February 21, 2018 01:00 PM

MariaDB AB

MaxCtrl: Advanced Administrative Tool for MariaDB MaxScale

MaxCtrl: Advanced Administrative Tool for MariaDB MaxScale markusmakela Wed, 02/21/2018 - 03:47

One of the new things in MariaDB MaxScale 2.2 is the addition of a fully integrated administrative interface implemented as a JSON-API conforming REST API. The MariaDB MaxScale REST API is a great way to monitor and interact with MariaDB MaxScale from a wide variety of programming languages but it is not the most convenient thing for command line use. For this very reason, we added a new command line client, MaxCtrl.

The MaxCtrl client is intended to be the successor/complement/alternative of MaxAdmin, the classic admin client for MariaDB MaxScale. It leverages the MaxScale REST API to provide human-readable information about MariaDB MaxScale and its modules to the user. While providing the same set of functionality that MaxAdmin does, it comes with the added benefit of being ready for multi-MaxScale setups (more on that later on).

One of the biggest benefits of the MaxCtrl client is the fact that you don’t need to be on the same server as MariaDB MaxScale to use it in a secure fashion. Since the MaxScale REST API supports HTTPS, you can now safely manage MariaDB MaxScale via encrypted connections. The client, written in Node.js, is a self-contained executable that works on a wide variety of platforms.

Getting Started with MaxCtrl

As most of the time MariaDB MaxScale is initially managed locally, the MaxCtrl client is included as a part of the maxscale package and the name of the executable is maxctrl. This means that when you install MariaDB MaxScale, you already have MaxCtrl installed as well.

The REST API listens to localhost on port 8989 by default. This is to make it more secure so that, upon installation, you don’t expose the API until you configure it and make it fully secure. In addition to this, the API requires authentication before any information can be accessed. The same users that are used in MaxAdmin also work for the REST API. This means that the default admin:mariadb credentials are also the default credentials for the REST API.

The end-user experience should be relatively similar to how MaxAdmin works but with a slightly modernized look and feel. Take, for example, the list servers command.

We can see a familiar layout of the data but with a more readable table-like formatting.

Since processing HTTP requests consisting of JSON in shell scripts is only slightly less painful compared to parsing MaxAdmin output, we’ve added the --tsv option to make it much easier.

As can be seen, the output is directly consumable by the majority of UNIX tools. For instance, if we want to only see the name and state of each server, we simply pipe the output of maxctrl --tsv list servers to cut -f 1,5.

Another nice feature of MaxCtrl is that it integrates the help output into the client program. This means that if you wonder what the exact syntax was to alter server parameters, you can simply look it up from the help output.

By starting from maxctrl help, you can explore all the commands that are available in MaxCtrl. In addition to the client itself, the same documentation can be found in the MariaDB Knowledgebase.

Connecting to Remote MaxScales

After the initial installation and configuration of a MariaDB MaxScale instance, management will often be done remotely. This means that a secure network connection is required and we achieve that by enabling the encrypted HTTPS protocol of the MaxScale REST API. To enable it, configure the admin_ssl_key, admin_ssl_cert and admin_ssl_ca_cert parameters to point to PEM format private key, public certificate and CA certificate file.


Then just use the --secure option of maxctrl to enable encrypted connections and give the hostname and port as the parameters. If you are using a self-signed certificate, pass the path to the CA certificate with --tls-ca-cert and disable server certificate verification with --tls-verify-server-cert=false.


As our example uses localhost:8989 as the hostname and port and we use self-signed certificates, we add --hosts=localhost:8989 --secure --tls-verify-server-cert=false as the extra options to maxctrl. We also explicitly define the path to our self-signed CA certificate to tell MaxCtrl to use a custom CA certificate instead of using the bundled OS certificates.

By using HTTPS, MariaDB MaxScale can be securely managed over open networks. This makes management of multiple remote MaxScale servers significantly more convenient.

Managing Multiple MaxScales

With the addition of the REST API, multiple MaxScale instances can now be controlled from a remote location. As we saw in the previous section, the description of the --hosts parameter mentions a list of MaxScale hosts. This means that you can execute the same command on multiple MaxScale servers and see the output grouped by each host. Here’s an example of the output when a server is set into maintenance mode in two different MaxScale instances (localhost:8989 and localhost:8990).

In addition to being able to execute these simpler commands on multiple MaxScale instances, you can execute cluster-wide commands with the cluster group of commands.

The cluster diff command shows all differences in the logical configuration of each MaxScale server. This is done by comparing the APIs of the hosts given with the --hosts option to the host given as the value to the cluster diff command. If we compare the two local setups, we see that the only difference these two servers have is the listener port they listen on.

If we swap the hosts around, we’ll the how the other host differs.

This command is intended to be a tool for detecting configuration divergence between multiple MaxScale instances. The output is not in the traditional diff format mainly due to the fact that, based on our testing, a diff of two JSON objects doesn’t produce actionable output. You always need to see the full object, or at least the path of the object that changed, to know what the real difference is. The diff calculation may still receive small changes in the beta to make the output more concise and readable.

Compared to the cluster diff command the cluster sync command is used to synchronize the configurations of multiple MaxScales.

Similar to other cluster commands, the list of MaxScale instances to synchronize are given with the --hosts option. The target MaxScale, used as the baseline, is given as the argument to the command.

The command builds on top of the cluster diff command by first detecting what parts are different, what parts are new and what parts were removed. This is all done via the REST API which allows MaxCtrl to leverage the HTTP methods for first detecting the changes and then creating, deleting and updating all the REST API resources.

This command can be used to converge the runtime configurations of MaxScales that have diverged due to downtime. The best way to understand how it works in practice is to see it in action.

Example: Converging Diverged Configurations

We have two MaxScale servers, maxscale-a and maxscale-b, in front of a two node master-slave cluster. Both MaxScales are configured with the following configuration.







Let’s say that some maintenance needs to be done on maxscale-b and we need to stop it. While we’re doing the maintenance, we realize that we want to change the monitoring interval from 10 seconds down to 2 seconds on the running MaxScale. Since maxscale-b is down, we can only do this on maxscale-a. We do it with the following command.

maxctrl --hosts maxscale-a alter monitor Cluster-Monitor monitor_interval 2000

Once we finish the maintenance on maxscale-b, we bring it up. The problem is that even though the base configurations are the same, the runtime configurations are different. We could just run the same commands again on the second MaxScale and they would end up in the same state. A more convenient way to synchronize is to simply run the following command.

maxctrl --hosts maxscale-b cluster sync maxscale-a

Now both of the MaxScale servers have the same logical configuration. This can be verified by executing the cluster diff command to see that there are no differences between the two installations.

You can also use MaxCtrl to manage multiple MaxScale instances that are managed by a Keepalived setup. For a detailed look into how MariaDB MaxScale is used with Keepalived, read the great article by Esa Korhonen on MariaDB MaxScale and Keepalived.


As can be seen in MaxCtrl, providing an easy-to-use API enriches the way you can interact with software and opens up new opportunities for further improvements.

We’d love to hear your input on how to improve MaxCtrl and the REST API. You can install it from the MariaDB download page. The MaxCtrl client can be found in the maxscale-client package which is a part of the MariaDB MaxScale repository.

One of the new things in MariaDB MaxScale 2.2 is the addition of a fully integrated administrative interface implemented as a JSON-API conforming REST API. The MariaDB MaxScale REST API is a great way to monitor and interact with MariaDB MaxScale from a wide variety of programming languages but it is not the most convenient thing for command line use. For this very reason, we added a new command line client, MaxCtrl.

Login or Register to post comments

by markusmakela at February 21, 2018 08:47 AM

Jean-Jerome Schmidt

Updated: Become a ClusterControl DBA - SSL Key Management and Encryption of MySQL Data in Transit

Databases usually work in a secure environment. It may be a datacenter with a dedicated VLAN for database traffic. It may be a VPC in EC2. If your network spreads across multiple datacenters in different regions, you’d usually use some kind of Virtual Private Network or SSH tunneling to connect these locations in a secure manner. With data privacy and security being hot topics these days, you might feel better with an additional layer of security.

MySQL supports SSL as a means to encrypt traffic both between MySQL servers (replication) and between MySQL servers and clients. If you use Galera cluster, similar features are available - both intra-cluster communication and connections with clients can be encrypted using SSL.

A common way of implementing SSL encryption is to use self-signed certificates. Most of the time, it is not necessary to purchase an SSL certificate issued by the Certificate Authority. Anybody who’s been through the process of generating a self-signed certificate will probably agree that it is not the most straightforward process - most of the time, you end up searching through the internet to find howto’s and instructions on how to do this. This is especially true if you are a DBA and only go through this process every few months or even years. This is why we added a ClusterControl feature to help you manage SSL keys across your database cluster. In this blog post, we’ll be making use of ClusterControl 1.5.1.

Key Management in the ClusterControl

You can enter Key Management by going to Side Menu -> Key Management section.

You will be presented with the following screen:

You can see two certificates generated, one being a CA and the other one a regular certificate. To generate more certificates, switch to the ‘Generate Key’ tab:

A certificate can be generated in two ways - you can first create a self-signed CA and then use it to sign a certificate. Or you can go directly to the ‘Client/Server Certificates and Key’ tab and create a certificate. The required CA will be created for you in the background. Last but not least, you can import an existing certificate (for example a certificate you bought from one of many companies which sell SSL certificates).

To do that, you should upload your certificate, key and CA to your ClusterControl node and store them in /var/lib/cmon/ca directory. Then you fill in the paths to those files and the certificate will be imported.

If you decided to generate a CA or generate a new certificate, there’s another form to fill - you need to pass details about your organization, common name, email, pick the key length and expiration date.

Once you have everything in place, you can start using your new certificates. ClusterControl currently supports deployment of SSL encryption between clients and MySQL databases and SSL encryption of intra-cluster traffic in Galera Cluster. We plan to extend the variety of supported deployments in future releases of the ClusterControl.

Full SSL encryption for Galera Cluster

Now let’s assume we have our SSL keys ready and we have a Galera Cluster, which needs SSL encryption, deployed through our ClusterControl instance. We can easily secure it in two steps.

First - encrypt Galera traffic using SSL. From your cluster view, one of the cluster actions is 'Enable SSL Galera Encryption'. You’ll be presented with the following options:

If you do not have a certificate, you can generate it here. But if you already generated or imported an SSL certificate, you should be able to see it in the list and use it to encrypt Galera replication traffic. Please keep in mind that this operation requires a cluster restart - all nodes will have to stop at the same time, apply config changes and then restart. Before you proceed here, make sure you are prepared for some downtime while the cluster restarts.

Once intra-cluster traffic has been secured, we want to cover client-server connections. To do that, pick ‘Enable SSL Encryption’ job and you’ll see following dialog:

It’s pretty similar - you can either pick an existing certificate or generate new one. The main difference is that to apply client-server encryption, downtime is not required - a rolling restart will suffice. Once restarted, you will find a lock icon right under the encrypted host on the Overview page:

The label 'Galera' means Galera encryption is enabled, while 'SSL' means client-server encryption is enabled for that particular host.

Of course, enabling SSL on the database is not enough - you have to copy certificates to clients which are supposed to use SSL to connect to the database. All certificates can be found in /var/lib/cmon/ca directory on the ClusterControl node. You also have to remember to change grants for users and make sure you’ve added REQUIRE SSL to them if you want to enforce only secure connections.

We hope you’ll find those options easy to use and help you secure your MySQL environment. If you have any questions or suggestions regarding this feature, we’d love to hear from you.

by ashraf at February 21, 2018 07:16 AM

Open Query Pty Ltd

RDS Aurora MySQL Failover

Right now Aurora only allows a single master, with up to 15 read-only replicas.

Master/Replica Failover

We love testing failure scenarios, however our options for such tests with Aurora are limited (we might get back to that later).  Anyhow, we told the system, through the RDS Aurora dashboard, to do a failover. These were our observations:

Role Change Method

Both master and replica instances are actually restarted (the MySQL uptime resets to 0).

This is quite unusual these days, we can do a fully controlled role change in classic asynchronous replication without a restart (CHANGE MASTER TO …), and Galera doesn’t have read/write roles as such (all instances are technically writers) so it doesn’t need role changes at all.

Failover Timing

Failover between running instances takes about 30 seconds.  This is in line with information provided in the Aurora FAQ.

Failover where a new instance needs to be spun up takes 15 minutes according to the FAQ (similar to creating a new instance from the dash).

Instance Availability

During a failover operation, we observed that all connections to the (old) master, and the replica that is going to be promoted, are first dropped, then refused (the connection refusals will be during the period that the mysqld process is restarting).

According to the FAQ, reads to all replicas are interrupted during failover.  Don’t know why.

Aurora can deliver a DNS CNAME for your writer instance. In a controlled environment like Amazon, with guaranteed short TTL, this should work ok and be updated within the 30 seconds that the shortest possible failover scenario takes.  We didn’t test with the CNAME directly as we explicitly wanted to observe the “raw” failover time of the instances themselves, and the behaviour surrounding that process.

Caching State

On the promoted replica, the buffer pool is saved and loaded (warmed up) on the restart; good!  Note that this is not special, it’s desired and expected to happen: MySQL and MariaDB have had InnoDB buffer pool save/restore for years.  Credit: Jeremy Cole initially came up with the buffer pool save/restore idea.

On the old master (new replica/slave), the buffer pool is left cold (empty).  Don’t know why.  This was a controlled failover from a functional master.

Because of the server restart, other caches are of course cleared also.  I’m not too fussed about the query cache (although, deprecated as it is, it’s currently still commonly used), but losing connections is a nuisance. More detail on that later in this article.


Because of the instance restarts, the running statistics (SHOW GLOBAL STATUS) are all reset to 0. This is annoying, but should not affect proper external stats gathering, other than for uptime.

On any replica, SHOW ENGINE INNODB STATUS comes up empty. Always.  This seems like obscurity to me, I don’t see a technical reason to not show it.  I suppose that with a replica being purely read-only, most running info is already available through SHOW GLOBAL STATUS LIKE ‘innodb%’, and you won’t get deadlocks on a read-only slave.


Aurora MySQL multi-master was announced at Amazon re:Invent 2017, and appears to currently be in restricted beta test.  No date has been announced for general availability.

We’ll have to review it when it’s available, and see how it works in practice.


Requiring 30 seconds or more for a failover is unfortunate, this is much slower than other MySQL replication (writes can failover within a few seconds, and reads are not interrupted) and Galera cluster environments (which essentially delivers continuity across instance failures – clients talking to the failed instance will need to reconnect to the loadbalancer/cluster to continue).

I don’t understand why the old master gets a cold InnoDB buffer pool.

I wouldn’t think a complete server restart should be necessary, but since we don’t have insight in the internals, who knows.

On Killing Connections (through the restart)

Losing connections across an Aurora cluster is a real nuisance that really impacts applications.  Here’s why:

When MySQL C client library (which most MySQL APIs either use or are modelled on) is disconnected, it passes back a specific error to the application.  When the application makes its next query call, the C client will automatically reconnect first (so the client does not have to explicitly reconnect).  So a client only needs to catch the error and re-issue its last command, and all will generally be fine.  Of course, if it relies on different SESSION settings, or was in the middle of a multi-statement transaction, it will need to do a bit more.

So, this means that the application has to handle disconnects gracefully without chucking hissy-fits at users, and I know for a fact that that’s not how many (most?) applications are written.  Consequently, an Aurora failover will make the frontend of most applications look like a disaster zone for about 30 seconds (provided functional instances are available for the failover, which is the preferred and best case scenario).

I appreciate that this is not directly Aurora’s fault, it’s sloppy application development that causes this, but it’s a real-world fact we have to deal with.  And, perhaps importantly: other cluster and replication options do not trigger this scenario.

by Arjen Lentz at February 21, 2018 03:10 AM

Colin Charles

MariaDB Developer’s unconference & M|18

Been a while since I wrote anything MySQL/MariaDB related here, but there’s the column on the Percona blog, that has weekly updates.

Anyway, I’ll be at the developer’s unconference this weekend in NYC. Even managed to snag a session on the schedule, MySQL features missing in MariaDB Server (Sunday, 12.15–13.00). Signup on meetup?

Due to the prevalence of “VIP tickets”, I too signed up for M|18. If you need a discount code, I’ll happily offer them up to you to see if they still work (though I’m sure a quick Google will solve this problem for you). I’ll publish notes, probably in my weekly column.

If you’re in New York and want to say hi, talk shop, etc. don’t hesitate to drop me a line.

by Colin Charles at February 21, 2018 02:07 AM

February 20, 2018

Peter Zaitsev

Understand Your Prometheus Exporters with Percona Monitoring and Management (PMM)

Prometheus Exporters 2 small

In this blog post, I will look at the new dashboards in Percona Monitoring and Management (PMM) for Prometheus exporters.

Percona Monitoring and Management (PMM) uses Prometheus exporters to capture metrics data from the system it monitors. Those Prometheus exporters are an important part of your monitoring infrastructure, and understanding their performance and other operational details is critical for well-implemented monitoring.    

To help you with this we’ve added a number of new dashboards to Percona Monitoring and Management.

The Prometheus Exporters Overview dashboard provides a high-level overview of your installed Prometheus exporter infrastructure:

Prometheus Exporters

The summary shows you how many hosts are monitored and how many exporters you have running, as well as how much CPU and memory they are using.

Note that the CPU usage shown in this graph is only the CPU usage of the exporter itself. It does not include the additional resource usage that is required to produce metrics by the application or operating system.

Next, we have an overview of resource usage by the host:  

Prometheus Exporters 2

Prometheus Exporters 3

These graphs allow us to analyze the resource usage for different hosts, allowing us to clearly see if any of the hosts have unusually high CPU or memory usage by exporters.

You may notice some of the CPU usage reported on these graphs is very high. This is due to the fact that we use very high-resolution sampling and very underpowered instances for this demonstration environment. CPU usage numbers like this are not typical.

The next graphs show resource usage by the type of exporter:

Prometheus Exporters 4

Prometheus Exporters 5

In this case, we measure CPU usage in “CPU Cores” rather than as a percent – it is more meaningful. Otherwise, the same amount of actual resource usage by the exporter will look very different on a system with one core versus a system with 64 cores. Core usage numbers have a pretty stable baseline, though.

Then there is a list of your monitored hosts and the exporters they are running:

Prometheus Exporters 6

This shows your CPU usage and memory usage per host, as well as the number of exporters running and system details.

You can click on a host to get to the System Overview, or jump to Prometheus Exporter Status dashboard.

Prometheus Exporter Status dashboard allows you to investigate how specific exporters are performing for the given host. Each of the well-known exporters has its own row in this dashboard.

Node Exporter Status shows us the resource usage, uptime and performance of Node Exporter (the exporter responsible for capturing OS-level metrics):   

Prometheus Exporters 7

Prometheus Exporters 8

The “Collector Scrape Successful” shows which node_exporter collector category (which are modules that collect specific information) have returned data reliably. If you have anything but a flat line on “1” here, you need to check for problems.

“Collector Execution Time” shows how long on average it takes to execute your enabled collectors. This shows which collectors are generally more expensive to run (or if some of them are experiencing performance problems).

MySQL Exporter Status shows us how MySQL exporter is performing:

Prometheus Exporters 9

Additionally, in resource usage we see the rate of scrapes for High, Medium and Low resolution data.

Generally, you should see three flat lines here if everything is working well. This is not the case for this host, and we can see some scrapes are not successful – either failing to complete, or not triggered by Prometheus Server altogether (due to overload or connectivity issues).

Prometheus Exporters 10

These graphs provide information about MySQL Exporter Errors – permission errors and other issues. It also shows if MySQL Server was up during this time. There are also similar details reported for MongoDB and ProxySQL exporters if they are running on the host.

I hope these new dashboards help you to understand your Prometheus exporter performance better!

by Peter Zaitsev at February 20, 2018 10:40 PM

MariaDB AB

MariaDB MaxScale 2.2.2 with Built-in Failover Management Reaches GA!

MariaDB MaxScale 2.2.2 with Built-in Failover Management Reaches GA! Dipti Joshi Tue, 02/20/2018 - 08:57

We're excited to announce MariaDB MaxScale 2.2, an advanced database proxy for MariaDB, is now GA. It introduces new features for replication cluster failover management, high availability of MaxScale, security features for General Data Protection Regulation (GDPR) compliance, readiness of upcoming MariaDB Server 10.3 while making it easier to manage with REST API and improved management interface - all to make things easier for DBAs.

MariaDB MaxScale provides master/slave deployments with high availability using automatic failover, manual switchover and automatic rejoin. If the master fails, MariaDB MaxScale can automatically promote the most up-to-date slave to master. If the failed master is recovered, MariaDB MaxScale can automatically reconfigure it as a slave to the new master. In addition, administrations can perform a manual switchover to change the master on demand.

MariaDB MaxScale can now be managed via a REST API or MaxCtrl, an improved command line interface (CLI). Further continuing on the high availability theme, HA Cluster consisting of two MaxScales can now be configured using MaxCtrl. When integrated with Keepalived and Virtual IP Address, MaxCtrl enables automatic failover between the two MaxScales of a HA Cluster.

For security compliance such as GDPR, HIPAA and PCI, now the masking filter also allows psuedo-anonymization (obfuscation) and partial masking of query data results returning from MaxScale. Additionally, the database firewall filter can now prevent use of functions for a specific column, so as to not leak column data that are supposed to be masked.

In addition, this release introduces Proxy Protocol. Proxy Protocol, developed for MariaDB MaxScale 2.2 and MariaDB Server 10.3, makes it easier to configure and authorize users by eliminating the need to duplicate them in both MariaDB MaxScale and MariaDB Server. With PAM support in MariaDB MaxScale, a single user can be authenticated from the client, database proxy and database.

For compatibility with the latest MariaDB Server releases, we made sure the query parser in MariaDB MaxScale 2.2 supports all of the new features in MariaDB Server 10.2 (common table expression and window functions) and MariaDB Server 10.3 (PL/SQL).

During the beta period, community members as well as our customers helped us validate this release. Specifically we would like to shout out to Rick Lane, Soumya Das, Matt Mencel, Ketan Kunde and Chandranana Naik for their valuable feedback.

Coming up, stay tuned for several follow up blogs by our engineering team members about the new capabilities of  MariaDB MaxScale 2.2. 

More information:

Feel free to post questions in our Knowledge Base or email me at

We are excited to announce MariaDB MaxScale 2.2.2 GA with built in failover management.

Login or Register to post comments

by Dipti Joshi at February 20, 2018 01:57 PM

Shlomi Noach

Using dbdeployer in CI tests

I was very pleased when Giuseppe Maxia (aka datacharmer) unveiled dbdeployer in his talk at pre-FOSDEM MySQL day. The announcement came just at the right time. I wish to briefly describe how we use dbdeployer (work in progress).

The case for gh-ost

A user opened an issue on gh-ost, and the user was using MySQL 5.5. gh-ost is being tested on 5.7 where the problem does not reproduce. A discussion with Gillian Gunson raised the concern of not testing on all versions. Can we run gh-ost tests for all MySQL/Percona/MariaDB versions? Should we? How easy would it be?

gh-ost tests

gh-ost has three different test types:

  • Unit tests: these are plain golang logic tests which are very easy and quick to run.
  • Integration tests: the topic of this post, see following. Today these do not run as part of an automated CI testing.
  • System tests: putting our production tables to the test, continuously migrating our production data on dedicated replicas, verifying checksums are identical and data is intact, read more.

Unit tests are already running as part of automated CI (every PR is subjected to those tests). Systems tests are clearly tied to our production servers. What's the deal with the integration tests?

gh-ost integration tests

The gh-ost integration tests are a suite of scenarios which verify gh-ost's operation is sound. These scenarios are mostly concerned with data types, special alter statements etc. Is converting DATETIME to TIMESTAMP working properly? Are latin1 columns being updated correctly? How about renaming a column? Changing a PRIMARY KEY? Column reorder? 5.7 JSON values? And so on. Each test will recreate the table, run migration, stop replication, check the result, resume replication...

The environment for these tests is a master-replica setup, where gh-ost modifies on the table on the replica and can then checksum or compare both the original and the altered ghost table.

We develop gh-ost internally at GitHub, but it's also an open source project. We have our own internal CI environment, but then we also wish the public to have visibility into test failures (so that a user can submit a PR and get a reliable automated feedback). We use Travis CI for the public facing tests.

To run gh-ost's integration tests as described above as part of our CI tests we should be able to:

  • Create a master/replica setup in CI.
  • Actually, create a master/replica setup in any CI, and namely in Travis CI.
  • Actually, create multiple master/replica setups, of varying versions and vendors, in any ci, including both our internal CI and Travis CI.

I was about to embark on a MySQL Sandbox setup, which I was not keen on. But FOSDEM was around the corner and I had other things to complete beforehand. Lucky me, dbdeplyer stepped in.


dbdeployer is a rewrite, a replacement to MySQL Sandbox. I've been using MySQL Sandbox for many years, and my laptop is running two sandboxes at this very moment. But MySQL Sandbox has a few limitations or complications:

  • Perl. Versions of Perl. Dependencies of packages of Perl. I mean, it's fine, we can automate that.
  • Command line flag complexity: I always get lost in the complexity of the flags.
  • Get it right or prepare for battle: if you deployed something, but not the way you wanted, there's sometimes limbo situations where you cannot re-deploy the same sandbox again, or you should start deleting files everywhere.
  • Deploy, not remove. Adding a sandbox is one thing. How about removing it?

dbdeployer is a golang rewrite, which solves the dependency problem. It ships as a single binary and nothing more is needed. It is simple to use. While it generates the equivalence of a that of a MySQL Sandbox, it does so with less command line flags and less confusion. There's first class handling of the MySQL binaries: you unpack MySQL tarballs, you can list what's available. You can then create sandbox environments: replication, standalone, etc. You can then delete those.

It's pretty simple and I have not much more to add -- which is the best thing about it.

So, with dbdeployer it is easy to create a master/replica. Something like:

dbdeployer unpack path/to/5.7.21.tar.gz --unpack-version=5.7.21 --sandbox-binary ${PWD}/sandbox/binary
dbdeployer replication 5.7.21 --nodes 2 --sandbox-binary ${PWD}/sandbox/binary --sandbox-home ${PWD}/sandboxes --gtid --my-cnf-options log_slave_updates --my-cnf-options log_bin --my-cnf-options binlog_format=ROW

Where does it all fit in, and what about the MySQL binaries though?

So, should dbdeployer be part of the gh-ost repo? And where does one get those MySQL binaries from? Are they to be part of the gh-ost repo? Aren't they a few GB to extract?

Neither dbdeployer nor MySQL binaries should be added to the gh-ost repo. And fortunately, Giuseppe also solved the MySQL binaries problem.

The scheme I'm looking at right now is as follows:

  • A new public repo, gh-ost-ci-env is created. This repo includes:
    • dbdeployer compiled binaries
    • Minimal MySQL tarballs for selected versions. Those tarballs are reasonably small: between `14MB` and `44MB` at this time.
  • gh-ost's CI to git clone (code)
  • gh-ost's CI to setup a master/replica sandbox (one, two).
  • Kick the tests.

The above is a work in progress:

  • At this time only runs a single MySQL version.
  • There is a known issue where after a test, replication may take time to resume. Currently on slower boxes (such as the Travis CI containers) this leads to failures.

Another concern I have at this time is build time. For a single MySQL version, it takes some 5-7 minutes on my local laptop to run all integration tests. It will be faster on our internal CI. It will be considerably slower on Travis CI, I can expect between 10m - 15m. Add multiple versions and we're looking at a 1hr build. Such long build times will affect our development and delivery times, and so we will split them off the main build. I need to consider what the best approach is.

That's all for now. I'm pretty excited for the potential of dbdeployer and will be looking into incorporating the same for orchestrator CI tests.



by shlomi at February 20, 2018 07:29 AM

Open Query Pty Ltd

Exploring Amazon RDS Aurora: replica writes and cache chilling

Our clients operate on a variety of platforms, and RDS (Amazon Relational Database Service) Aurora has received quite a bit of attention in recent times. On behalf of our clients, we look beyond the marketing, and see what the technical architecture actually delivers.  We will address specific topics in individual posts, this time checking out what the Aurora architecture means for write and caching behaviour (and thus performance).

What is RDS Aurora?

First of all, let’s declare the baseline.  MySQL Aurora is not a completely new RDBMS. It comprises a set of Amazon modifications on top of stock Oracle MySQL 5.6 and 5.7, implementing a different replication mechanism and some other changes/additions.  While we have some information (for instance from the “deep dive” by AWS VP Anurag Gupta), the source code of the Aurora modifications are not published, so unfortunately it is not immediately clear how things are implemented.  Any architecture requires choices to be made, trade-offs, and naturally these have consequences.  Because we don’t get to look inside the “black box” directly, we need to explore indirectly.  We know how stock MySQL is architected, so by observing Aurora’s behaviour we can try to derive how it is different and what it might be doing.  Mind that this is equivalent to looking at a distant star, seeing a wobble, and deducing from the pattern that there must be one or more planets orbiting.  It’s an educated guess.

For the sake of brevity, I have to skip past some aspects that can be regarded as “obvious” to someone with insight into MySQL’s architecture.  I might also defer explaining a particular issue in depth to a dedicated post on that topic.  Nevertheless, please do feel free to ask “so why does this work in this way”, or other similar questions – that’ll help me check my logic trail and tune to the reader audience, as well as help create a clearer picture of the Aurora architecture.

Instead of using the binary log, Aurora replication ties into the storage layer.  It only supports InnoDB, and instead of doing disk reads/writes, the InnoDB I/O system talks to an Amazon storage API which delivers a shared/distributed storage, which can work across multiple availability zones (AZs).  Thus, a write on the master will appear on the storage system (which may or may not really be a filesystem).  Communication between AZs is fairly fast (only 2-3 ms extra overhead, relative to another server in the same AZ) so clustering databases or filesystems across AZs is entirely feasible, depending on the commit mechanism (a two-phase commit architecture would still be relatively slow).  We do multi-AZ clustering with Galera Cluster (Percona XtraDB Cluster or MariaDB Galera Cluster).  Going multi-AZ is a good idea that provides resilience beyond a single data centre.

So, imagine an individual instance in an Aurora setup as an EC2 (Amazon Elastic Computing) instance with MySQL using an SSD EBS (Amazon Elastic Block Storage) volume, where the InnoDB I/O threads interface more directly the the EBS API.  The actual architecture might be slightly different still (more on that in a later post), but this rough description helps set up a basic idea of what a node might look like.

Writes in MySQL

In a regular MySQL, on commit a few things happen:

  • the InnoDB log is written to and flushed,
  • the binary log is written to (and possibly flushed), and
  • the changed pages (data and indexes)  in the InnoDB buffer pool are marked dirty, so a background thread knows they need to be written back to disk (this does not need to happen immediately).  When a page is written to disk, normally it uses a “double-write” mechanism where first the original page is read and written to a scratch space, and then the new page is put in the original position.  Depending on the filesystem and underlying storage (spinning disk, or other storage with different block size from InnoDB page size) this may be required to be able to recover from write fails.

This does not translate in to as many IOPS because in practice, transaction commits are put together (for instance with MariaDB’s group commit) and thus many commits that happen in a short space effectively only use a few IOs for their log writes.  With Galera cluster, the local logs are written but not flushed, because the guaranteed durability is provided with other nodes in the cluster rather than local persistence of the logfile.

In Aurora, a commit has to send either the InnoDB log entries or the changed data pages to the storage layer; which one it is doesn’t particularly matter.  The storage layer has a “quorum set” mechanism to ensure that multiple nodes accept the new data.  This is similar to Galera’s “certification” mechanism that provides the “virtual synchrony”.  The Aurora “deep dive” talk claims that it requires many fewer IOPS for a commit; however, it appears they are comparing a worst-case plain MySQL scenario with an optimal Aurora environment.  Very marketing.

Aurora does not use the binary log, which does make one wonder about point-in-time recovery options. Of course, it is possible to recover to any point-in-time from an InnoDB snapshot + InnoDB transaction logs – this would require adding timestamps to the InnoDB transaction log format.

While it is noted that the InnoDB transaction log is also backed up to S3, it doesn’t appear to be used directly (so, only for recovery purposes then).  After all, any changed page needs to be communicated to the other instances, so essentially all pages are always flushed (no dirty pages).  When we look at the InnoDB stats GLOBAL STATUS, we sometimes do see up to a couple of dozen dirty pages with Aurora, but their existence or non-existence doesn’t appear to have any correlation with user-created tables and data.

Where InnoDB gets its Speed

InnoDB rows and indexing
InnoDB rows and indexing

We all know that disk-access is slow.  In order for InnoDB to be fast, it is dependent on most active data being in the buffer pool.  InnoDB does not care for local filesystem buffers – something is either in persistent storage, or in the buffer pool.  In configurations, we prefer direct I/O so the system calls that do the filesystem I/O bypass the filesystem buffers and any related overhead.  When a query is executed, any required page that’s not yet in the buffer pool is requested to be loaded in the background. Naturally, this does slow down queries, which is why we preferably want all necessary pages to already be in memory.  This applies for any type of query.  In InnoDB, all data/indexes are structured in B+trees, so an INSERT has to be merged into a page and possibly causes pages to be split and other items shuffled so as to “re-balance” the tree.  Similarly, a delete may cause page merges and a re-balancing operation.  This way the depth of the tree is controlled, so that even for a billion rows you would generally see a depth of no more than 6-8 pages.  That is, retrieving any row would only require a maximum of 6-8 page reads (potentially from disk).

I’m telling you all this, because while most replication and clustering mechanisms essentially work with the buffer pool, Aurora replication appears to works against it.  As I mentioned: choices have consequences (trade-offs).  So, what happens?

Aurora Replication

When you do a write in MySQL which gets replicated through classic asynchronous replication, the slaves or replica nodes affect the row changes in memory.  This means that all the data (which is stored with the PRIMARY KEY, in InnoDB) as well as any other indexes are updated, the InnoDB log is written, and the pages marked as dirty.  It’s very similar to what happens on the writer/master system, and thus the end result in memory is virtually identical.  While Galera’s cluster replication operates differently from the asynchronous mechanism shown in the diagram, the resulting caching (which pages are in memory) ends up similar.

MySQL Replication architecture
MySQL Replication architecture

Not so with Aurora.  Aurora replicates in the storage layer, so all pages are updated in the storage system but not in the in-memory InnoDB buffer pool.  A secondary notification system between the instances ensures that cached InnoDB pages are invalidated.  When you next do a query that needs any of those no-longer-valid cached pages, they will have to be be re-read from the storage system.  You can see a representation of this in the diagram below, indicating invalidated cache pages in different indexes; as shown, for INSERT operations, you’re likely to have pages higher up in the tree and one sideways page change as well because of the B+tree-rebalancing.

Aurora replicated insert
Aurora replicated insert

The Chilling Effect

We can tell the replica is reading from storage, because the same query is much slower than before we did the insert from the master instance.  Note: this wasn’t a matter of timing. Even if we waited slightly longer (to enable a possible background thread to refresh the pages) the post-insert query was just as slow.

Interestingly, the invalidation process does not actually remove them from the buffer pool (that is, the # of pages in the buffer pool does not go down); however, the # of page reads does not go up either when the page is clearly re-read.    Remember though that a status variable is just that, it has to be updated to be visible and it simply means that the new functions Amazon implemented don’t bother updating these status variables.  Accidental omission or purposeful obscurity?  Can’t say.  I will say that it’s very annoying when server statistics don’t reflect what’s actually going on, as it makes the stats (and their analysis) meaningless.  In this case, the picture looks better than it is.

With each Aurora write (insert/update/delete), the in-memory buffer pool on replicas is “chilled”.

Unfortunately, it’s not even just the one query on the replica that gets affected after a write. The primary key as well as the secondary indexes get chilled. If the initial query uses one particular secondary index, that index and the primary key will get warmed up again (at the cost of multiple storage system read operations), however the other secondary indexes are still chattering their teeth.

Being Fast on the Web

In web applications (whether websites or web-services for mobile apps), typically the most recently added data is the most likely to be read again soon.  This is why InnoDB’s buffer pool is normally very effective: frequently accessed pages remain in memory, while lesser used ones “age” and eventually get tossed out to make way for new pages.

Having caches clear due to a write, slows things down.  In the MySQL space, the fairly simply query cache is a good example.  Whenever you write to table A, any cached SELECTs that accesses table A are cleared out of the cache.  Regardless of whether the application is read-intensive, having regular writes makes the query cache useless and we turn it off in those cases.  Oracle has already deprecated the “good old” query cache (which was introduced in MySQL 4.0 in the early 2000s) and soon its code will be completely removed.


With InnoDB, you’d generally have an AUTO_INCREMENT PRIMARY KEY, and thus newly inserted rows are sequenced to that outer end of the B+Tree.  This also means that the next inserted row often ends up in the same page, again invalidating that recently written page on the replicas and slowing down reads of any of the rows it contained.

For secondary indexes, the effect is obviously scattered although if the indexed column is temporal (time-based), it will be similarly affected to the PRIMARY KEY.

How much all of this slows things down will very much depend on your application DB access profile.  The read/write ratio will matter little, but rather whether individual tables are written to fairly frequently.  If they do, SELECT queries on those tables made on replicas will suffer from the chill.

Aurora uses SSD EBS so of course the storage access is pretty fast.  However, memory is always faster, and we know that that’s important for web application performance.  And we can use similarly fast SSD storage on EC2 or another hosting provider, with mature scaling technologies such as Galera (or even regular asynchronous multi-threaded replication) that don’t give your caches the chills.

by Arjen Lentz at February 20, 2018 12:40 AM

Peter Zaitsev

Archiving MySQL Tables in ClickHouse

Archiving MySQL Tables in ClickHouse

Archiving MySQL Tables in ClickHouseIn this blog post, I will talk about archiving MySQL tables in ClickHouse for storage and analytics.

Why Archive?

Hard drives are cheap nowadays, but storing lots of data in MySQL is not practical and can cause all sorts of performance bottlenecks. To name just a few issues:

  1. The larger the table and index, the slower the performance of all operations (both writes and reads)
  2. Backup and restore for terabytes of data is more challenging, and if we need to have redundancy (replication slave, clustering, etc.) we will have to store all the data N times

The answer is archiving old data. Archiving does not necessarily mean that the data will be permanently removed. Instead, the archived data can be placed into long-term storage (i.e., AWS S3) or loaded into a special purpose database that is optimized for storage (with compression) and reporting. The data is then available.

Actually, there are multiple use cases:

  • Sometimes the data just needs to be stored (i.e., for regulatory purposes) but does not have to be readily available (it’s not “customer facing” data)
  • The data might be useful for debugging or investigation (i.e., application or access logs)
  • In some cases, the data needs to be available for the customer (i.e., historical reports or bank transactions for the last six years)

In all of those cases, we can move the older data away from MySQL and load it into a “big data” solution. Even if the data needs to be available, we can still move it from the main MySQL server to another system. In this blog post, I will look at archiving MySQL tables in ClickHouse for long-term storage and real-time queries.

How To Archive?

Let’s say we have a 650G table that stores the history of all transactions, and we want to start archiving it. How can we approach this?

First, we will need to split this table into “old” and “new”. I assume that the table is not partitioned (partitioned tables are much easier to deal with). For example, if we have data from 2008 (ten years worth) but only need to store data from the last two months in the main MySQL environment, then deleting the old data would be challenging. So instead of deleting 99% of the data from a huge table, we can create a new table and load the newer data into that. Then rename (swap) the tables. The process might look like this:

  1. CREATE TABLE transactions_new LIKE transactions
  2. INSERT INTO transactions_new SELECT * FROM transactions WHERE trx_date > now() – interval 2 month
  3. RENAME TABLE transactions TO transactions_old, transactions_new TO transactions

Second, we need to move the transactions_old into ClickHouse. This is straightforward — we can pipe data from MySQL to ClickHouse directly. To demonstrate I will use the Wikipedia:Statistics project (a real log of all requests to Wikipedia pages).

Create a table in ClickHouse:

    id bigint,
    dt DateTime,
    project String,
    subproject String,
    path String,
    hits UInt64,
    size UInt64
ENGINE = MergeTree
0 rows in set. Elapsed: 0.010 sec.

Please note that I’m using the new ClickHouse custom partitioning. It does not require that you create a separate date column to map the table in MySQL to the same table structure in ClickHouse

Now I can “pipe” data directly from MySQL to ClickHouse:

mysql --quick -h localhost wikistats -NBe
"SELECT concat(id,',"',dt,'","',project,'","',subproject,'","', path,'",',hits,',',size) FROM wikistats" |
clickhouse-client -d wikistats --query="INSERT INTO wikistats FORMAT CSV"

Thirdwe need to set up a constant archiving process so that the data is removed from MySQL and transferred to ClickHouse. To do that we can use the “pt-archiver” tool (part of Percona Toolkit). In this case, we can first archive to a file and then load that file to ClickHouse. Here is the example:

Remove data from MySQL and load to a file (tsv):

pt-archiver --source h=localhost,D=wikistats,t=wikistats,i=dt --where "dt <= '2018-01-01 0:00:00'"  --file load_to_clickhouse.txt --bulk-delete --limit 100000 --progress=100000
TIME                ELAPSED   COUNT
2018-01-25T18:19:59       0       0
2018-01-25T18:20:08       8  100000
2018-01-25T18:20:17      18  200000
2018-01-25T18:20:26      27  300000
2018-01-25T18:20:36      36  400000
2018-01-25T18:20:45      45  500000
2018-01-25T18:20:54      54  600000
2018-01-25T18:21:03      64  700000
2018-01-25T18:21:13      73  800000
2018-01-25T18:21:23      83  900000
2018-01-25T18:21:32      93 1000000
2018-01-25T18:21:42     102 1100000

Load the file to ClickHouse:

cat load_to_clickhouse.txt | clickhouse-client -d wikistats --query="INSERT INTO wikistats FORMAT TSV"

The newer version of pt-archiver can use a CSV format as well:

pt-archiver --source h=localhost,D=wikitest,t=wikistats,i=dt --where "dt <= '2018-01-01 0:00:00'"  --file load_to_clickhouse.csv --output-format csv --bulk-delete --limit 10000 --progress=10000

How Much Faster Is It?

Actually, it is much faster in ClickHouse. Even the queries that are based on index scans can be much slower in MySQL compared to ClickHouse.

For example, in MySQL just counting the number of rows for one year can take 34 seconds (index scan):

mysql> select count(*) from wikistats where dt between '2017-01-01 00:00:00' and '2017-12-31 00:00:00';
| count(*)  |
| 103161991 |
1 row in set (34.82 sec)
mysql> explain select count(*) from wikistats where dt between '2017-01-01 00:00:00' and '2017-12-31 00:00:00'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: wikistats
   partitions: NULL
         type: range
possible_keys: dt
          key: dt
      key_len: 6
          ref: NULL
         rows: 227206802
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

In ClickHouse, it only takes 0.062 sec:

:) select count(*) from wikistats where dt between  toDateTime('2017-01-01 00:00:00') and  toDateTime('2017-12-31 00:00:00');
SELECT count(*)
FROM wikistats
WHERE (dt >= toDateTime('2017-01-01 00:00:00')) AND (dt <= toDateTime('2017-12-31 00:00:00'))
│ 103161991 │
1 rows in set. Elapsed: 0.062 sec. Processed 103.16 million rows, 412.65 MB (1.67 billion rows/s., 6.68 GB/s.)

Size on Disk

In my previous blog on comparing ClickHouse to Apache Spark to MariaDB, I also compared disk size. Usually, we can expect a 10x to 5x decrease in disk size in ClickHouse due to compression. Wikipedia:Statistics, for example, contains actual URIs, which can be quite large due to the article name/search phrase. This can be compressed very well. If we use only integers or use MD5 / SHA1 hashes instead of storing actual URIs, we can expect much smaller compression (i.e., 3x). Even with a 3x compression ratio, it is still pretty good as long-term storage.


As the data in MySQL keeps growing, the performance for all the queries will keep decreasing. Typically, queries that originally took milliseconds can now take seconds (or more). That requires a lot of changes (code, MySQL, etc.) to make faster.

The main goal of archiving the data is to increase performance (“make MySQL fast again”), decrease costs and improve ease of maintenance (backup/restore, cloning the replication slave, etc.). Archiving to ClickHouse allows you to preserve old data and make it available for reports.

by Alexander Rubin at February 20, 2018 12:05 AM

February 19, 2018

Peter Zaitsev

Percona Server for MySQL 5.7.21-20 Is Now Available

Percona Server for MySQL 5.7.20-18

Percona Server for MySQL 5.7.20-19Percona announces the GA release of Percona Server for MySQL 5.7.21-20 on February 19, 2018. Download the latest version from the Percona web site or the Percona Software Repositories. You can also run Docker containers from the images in the Docker Hub repository.

Based on MySQL 5.7.21, including all the bug fixes in it, Percona Server for MySQL 5.7.21-20 is the current GA release in the Percona Server for MySQL 5.7 series. Percona provides completely open-source and free software.

New Features:
  • A new string variable version_suffix allows to change suffix for the Percona Server version string returned by the read-only version variable. Also version_comment is converted from a global read-only to a global read-write variable.
  • A new keyring_vault_timeout variable allows to set the amount of seconds for the Vault server connection timeout. Bug fixed #298.
Bugs Fixed:
  • mysqld startup script was unable to detect jemalloc library location for preloading, and that prevented starting Percona Server on systemd based machines. Bugs fixed #3784 and #3791.
  • There was a problem with fulltext search, which could find a word with punctuation marks in natural language mode only, but not in boolean mode. Bugs fixed #258#2501 (upstream #86164).
  • Build errors were present on FreeBSD (caused by fixing the bug #255 in Percona Server 5.6.38-83.0) and on MacOS (caused by fixing the bug #264 in Percona Server 5.7.20-19). Bugs fixed #2284 and #2286.
  • A bunch of fixes was introduced to remove GCC 7 compilation warnings for
    the Percona Server build. Bugs fixed #3780 (upstream #89420#89421, and #89422).
  • CMake error took place at compilation with bundled zlib. Bug fixed #302.
  • A GCC 7 warning fix introduced regression in Percona Server that led to a wrong SQL query built to access the remote server when Federated storage engine was used. Bug fixed #1134.
  • It was possible to enable encrypt_binlog with no binary or relay logging enabled. Bug fixed #287.
  • Long buffer wait times where occurring on busy servers in case of the IMPORT TABLESPACE command.
  • Bug fixed #276.
  • Server queries that contained JSON special characters and were logged by Audit Log Plugin in JSON format caused invalid output due to lack of escaping. Bug fixed #1115.
  • Percona Server now uses Travis CI for additional tests. Bug fixed #3777.

Other bugs fixed:  #257#264#1090  (upstream #78048),  #1109#1127#2204#2414#2415#3767#3794, and  #3804 (upstream #89598).

 This release also contains fixes for the following CVE issues: CVE-2018-2565, CVE-2018-2573, CVE-2018-2576, CVE-2018-2583, CVE-2018-2586, CVE-2018-2590, CVE-2018-2612, CVE-2018-2600, CVE-2018-2622, CVE-2018-2640, CVE-2018-2645, CVE-2018-2646, CVE-2018-2647, CVE-2018-2665, CVE-2018-2667, CVE-2018-2668, CVE-2018-2696, CVE-2018-2703, CVE-2017-3737.
MyRocks Changes:
  • A new behavior makes Percona Server fail to restart on detected data corruption;  rocksdb_allow_to_start_after_corruption variable can be passed to mysqld as a command line parameter to switch off this restart failure.
  • A new cmake option ALLOW_NO_SSE42 was introduced to allow MyRocks build on hosts not supporting SSE 4.2 instructions set, which makes MyRocks usable without FastCRC32-capable hardware. Bug fixed MYR-207.
  • rocksdb_bytes_per_sync  and rocksdb_wal_bytes_per_sync  variables were turned into dynamic ones.
  • rocksdb_flush_memtable_on_analyze variable has been removed.
  • rocksdb_concurrent_prepare is now deprecated, as it has been renamed in upstream to  rocksdb_two_write_queues.
  • rocksdb_row_lock_deadlocks and rocksdb_row_lock_wait_timeouts global status counters were added to track the number of deadlocks and the number of row lock wait timeouts.
  • Creating table with string indexed column to non-binary collation now generates warning about using inefficient collation instead of error. Bug fixed MYR-223.
TokuDB Changes:
  • A memory leak was fixed in the PerconaFT library, caused by not destroying PFS key objects on shutdown. Bug fixed TDB-98.
  • A clang-format configuration was added to PerconaFT and TokuDB. Bug fixed TDB-104.
  • A data race was fixed in minicron utility of the PerconaFT. Bug fixed TDB-107.
  • Row count and cardinality decrease to zero took place after long-running REPLACE load.

Other bugs fixed: TDB-48TDB-78TDB-93, and TDB-99.

The release notes for Percona Server for MySQL 5.7.21-20 are available in the online documentation. Please report any bugs on the project bug tracking system.

by Dmitriy Kostiuk at February 19, 2018 05:11 PM

February 16, 2018

Peter Zaitsev

Why ZFS Affects MySQL Performance


In this blog post, we’ll look at how ZFS affects MySQL performance when used in conjunction.

ZFS and MySQL have a lot in common since they are both transactional software. Both have properties that, by default, favors consistency over performance. By doubling the complexity layers for getting committed data from the application to a persistent disk, we are logically doubling the amount of work within the whole system and reducing the output. From the ZFS layer, where is really the bulk of the work coming from?

Consider a comparative test below from a bare metal server. It has a reasonably tuned config (discussed in separate post, results and scripts here). These numbers are from sysbench tests on hardware with six SAS drives behind a RAID controller with a write-backed cache. Ext4 was configured with RAID10 softraid, while ZFS is the same (striped three pairs of mirrored VDEvs).

There are a few obvious observations here, one being ZFS results have a high variance between median and the 95th percentile. This indicates a regular sharp drop in performance. However, the most glaring thing is that with write-only only workloads of update-index, overall performance could drop to 50%:


Looking further into the IO metrics for the update-index tests (95th percentile from /proc/diskstats), ZFS’s behavior tells us a few more things.



  1. ZFS batches writes better, with minimal increases in latency with larger IO size per operation.
  2. ZFS reads are heavily scattered and random – the high response times and low read IOPs and throughput means significantly higher disk seeks.

If we focus on observation #2, there are a number of possible sources of random reads:

  • InnoDB pages that are not in the buffer pool
  • When ZFS records are updated, metadata also has to be read and updated

This means that for updates on cold InnoDB records, multiple random reads are involved that are not present with filesystems like ext4. While ZFS has some tunables for improving synchronous reads, tuning them can be touch and go when trying to fit specific workloads. For this reason, ZFS introduced the use of L2ARC, where faster drives are used to cache frequently accessed data and read them in low latency.

We’ll look more into the details how ZFS affects MySQL, the tests above and the configuration behind them, and how we can further improve performance from here in upcoming posts.

by Jervin Real at February 16, 2018 10:43 PM

This Week in Data with Colin Charles 28: Percona Live, MongoDB Transactions and Spectre/Meltdown Rumble On

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

In case you missed last week’s column, don’t forget to read the fairly lengthy FOSDEM MySQL & Friends DevRoom summary.

From a Percona Live Santa Clara 2018 standpoint, beyond the tutorials getting picked and scheduled, the talks have also been picked and scheduled (so you were very likely getting acceptance emails from the system by Tuesday). The rejections have not gone out yet but will follow soon. I expect the schedule to go live either today (end of week) or early next week. Cheapest tickets end March 4, so don’t wait to register!

Amazon Relational Database Service has had a lot of improvements in 2017, and the excellent summary from Jeff Barr is worth a read: Amazon Relational Database Service – Looking Back at 2017. Plenty of improvements for the MySQL, MariaDB Server, PostgreSQL and Aurora worlds.

Spectre/Meltdown and its impact are still being discovered. You need to read Brendan Gregg’s amazing post: KPTI/KAISER Meltdown Initial Performance Regressions. And if you visit Percona Live, you’ll see an amazing keynote from him too! Are you still using MyISAM? MyISAM and KPTI – Performance Implications From The Meltdown Fix suggests switching to Aria or InnoDB.

Probably the biggest news this week though? Transactions are coming to MongoDB 4.0. From the site, “MongoDB 4.0 will add support for multi-document transactions, making it the only database to combine the speed, flexibility, and power of the document model with ACID guarantees. Through snapshot isolation, transactions will provide a globally consistent view of data, and enforce all-or-nothing execution to maintain data integrity.”. You want to read the blog post, MongoDB Drops ACID (the title works if you’re an English native speaker, but maybe not quite if you aren’t). The summary diagram was a highlight for me because you can see the building blocks, plus future plans for MongoDB 4.2.


Link List

Upcoming appearances

  • SCALE16x – Pasadena, California, USA – March 8-11 2018
  • FOSSASIA 2018 – Singapore – March 22-25 2018


I look forward to feedback/tips via e-mail at or on Twitter @bytebot.

by Colin Charles at February 16, 2018 02:12 PM

February 15, 2018

Peter Zaitsev

ProxySQL 1.4.5 and Updated proxysql-admin Tool Now in the Percona Repository

ProxySQL 1.4.5

ProxySQL 1.4.5ProxySQL 1.4.5, released by ProxySQL, is now available for download in the Percona Repository along with an updated version of Percona’s proxysql-admin tool.

ProxySQL is a high-performance proxy, currently for MySQL and its forks (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. René Cannaò created ProxySQL for DBAs as a means of solving complex replication topology issues.

The ProxySQL 1.4.5 source and binary packages available at include ProxySQL Admin – a tool, developed by Percona to configure Percona XtraDB Cluster nodes into ProxySQL. Docker images for release 1.4.5 are available as well: You can download the original ProxySQL from

This release fixes the following bugs in ProxySQL Admin:

Usability improvements:

  • #PSQLADM-6: If the cluster node goes offline, the proxysql_node_monitor script now sets the node status as OFFLINE_HARD, and does not remove it from the ProxySQL database. Also, logging is consistent regardless of the cluster node online status.
  • #PSQLADM-30: Validation was added for the host priority file.
  • #PSQLADM-33: Added --proxysql-datadir option to run the proxysql-admin script with a custom ProxySQL data directory.
  • Also, BATS test suite was added for the proxysql-admin testing.

Bug fixes:

  • Fixed#PSQLADM-5: PXC mode specified with proxysql-admin with use of --mode parameter was not persistent.
  • Fixed#PSQLADM-8: ProxySQL High CPU load took place when mysqld was hanging.

ProxySQL is available under OpenSource license GPLv3.

by Dmitriy Kostiuk at February 15, 2018 08:15 PM

Troubleshooting MySQL Crashes Webinar: Q&A

Troubleshooting MySQL Crashes

Troubleshooting MySQL CrashesIn this blog, I will provide answers to the Q & A for the Troubleshooting MySQL Crashes webinar.

First, I want to thank everybody for attending our January 25, 2018, webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I was unable to answer fully during the webinar.

Q: I have the 600 seconds “Long semaphore wait” assertion failure / crashing issue following DDL queries, sometimes on the master, sometimes just the slaves. Any hints for troubleshooting these? How can I understand what semaphore holding threads are doing?

A: These are hardest errors to troubleshoot. Especially because in some cases (like long-running

 commands) long semaphore waits could be expected and appropriate behavior. If you see long semaphore waits when performing DDL operations, it makes sense to consider using pt-online-schema-change or gh-ost utilities. Also, check the list of supported online DDL operations in the MySQL User Reference Manual.

But if you want to know how to analyze such messages, let’s check the output from page #17 in the slide deck used in the webinar:

2018-01-19T20:38:43.381127Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139970010412800 has waited at line 3454 for 321.00 seconds the semaphore:
S-lock on RW-latch at 0x7f4dde2ea310 created in file line 1453
a writer (thread id 139965530261248) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: fffffffff0000000
Last time read locked in file line 3454
Last time write locked in file /mnt/workspace/percona-server-5.7-binaries-release/label_exp/
debian-wheezy-x64/percona-server-5.7.14-8/storage/innobase/btr/ line 177
2018-01-19T20:38:43.381143Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139965135804160 has waited at line 4196 for 321.00 seconds the semaphore:
S-lock on RW-latch at 0x7f4f257d33c0 created in file line 353
a writer (thread id 139965345621760) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file line 4196
Last time write locked in file ...

The line

--Thread 139970010412800 has waited at line 3454 for 321.00 seconds the semaphore:

Shows that some transaction was waiting for a semaphore. The code responsible for this wait is located on line 3454 in file
. I received this crash when I ran Percona Server for MySQL version 5.7.14-8. Therefore, to check what this code is doing, I need to use Percona Server 5.7.14-8 source code:

sveta@Thinkie:~/mysql_packages/percona-server-5.7.14-8$ vim storage/innobase/ibuf/
3454 btr_pcur_open(ibuf->index, ibuf_entry, PAGE_CUR_LE, mode, &pcur, &mtr);

A few lines above in the same file contain function definition and comment:

3334 /** Buffer an operation in the insert/delete buffer, instead of doing it
3335 directly to the disk page, if this is possible.
3336 @param[in] mode BTR_MODIFY_PREV or BTR_MODIFY_TREE
3337 @param[in] op operation type
3338 @param[in] no_counter TRUE=use 5.0.3 format; FALSE=allow delete
3339 buffering
3340 @param[in] entry index entry to insert
3341 @param[in] entry_size rec_get_converted_size(index, entry)
3342 @param[in,out] index index where to insert; must not be unique
3343 or clustered
3344 @param[in] page_id page id where to insert
3345 @param[in] page_size page size
3346 @param[in,out] thr query thread
3347 @return DB_SUCCESS, DB_STRONG_FAIL or other error */
3348 static MY_ATTRIBUTE((warn_unused_result))
3349 dberr_t
3350 ibuf_insert_low(
3351 ulint mode,
3352 ibuf_op_t op,
3353 ibool no_counter,
3354 const dtuple_t* entry,
3355 ulint entry_size,
3356 dict_index_t* index,
3357 const page_id_t& page_id,
3358 const page_size_t& page_size,
3359 que_thr_t* thr)
3360 {

The first line of the comment gives us an idea that InnoDB tries to insert data into change buffer.

Now, let’s check the next line from the error log file:

S-lock on RW-latch at 0x7f4dde2ea310 created in file line 1453
sveta@Thinkie:~/mysql_packages/percona-server-5.7.14-8$ vim storage/innobase/buf/
1446 /* If PFS_SKIP_BUFFER_MUTEX_RWLOCK is defined, skip registration
1447 of buffer block rwlock with performance schema.
1449 If PFS_GROUP_BUFFER_SYNC is defined, skip the registration
1450 since buffer block rwlock will be registered later in
1451 pfs_register_buffer_block(). */
1453 rw_lock_create(PFS_NOT_INSTRUMENTED, &block->lock, SYNC_LEVEL_VARYING);

And again let’s check what this function is doing:

1402 /********************************************************************//**
1403 Initializes a buffer control block when the buf_pool is created. */
1404 static
1405 void
1406 buf_block_init(

Even without knowledge of how InnoDB works internally, by reading only these comments I can guess that a thread waits for some global InnoDB lock when it tries to insert data into change buffer. The solution for this issue could be either disabling change buffer, limiting write concurrency, upgrading or using a software solution that allows you to scale writes.

Q: For the page cleaner messages, when running app using replication we didn’t get them. After switching to PXC we started getting them. Something we should look at particular to PXC to help resolve this?

A: Page cleaner messages could be a symptom of starving IO activity. You need to compare Percona XtraDB Cluster (PXC) and standalone server installation and check how exactly the write load increased.

Q: Hi, I have one question, we have a query we were joining on 

 fields that is causing system locks and high CPU alerts and causing a lot of system locks, can you please suggest how can we able to make it work? Can you please send the answer in a text I missed some information?

A: If you are joining on

 fields you most likely don’t use indexes. This means that InnoDB has to perform a full table scan. It increases IO and CPU activity by itself, but also increases the number of locks that InnoDB has to set to resolve the query. Even if you have partial indexes on the 
 columns, mysqld has to compare full values for the equation, so it cannot use index only to resolve
 clause. It is a best practice to avoid such kinds of
s. You can use surrogate integer keys, for example.

Q: Hi, please notice that “MySQL server has gone away” is the worst one, in my opinion, and there was no mention about that ….can you share some tips on this? Thank you.
Both MySQL from Oracle and Percona error log does not help on that, by the way …


MySQL Server has gone away
” error maybe the result of a crash. In this case, you need to handle it like any other crash symptom. But in most cases, this is a symptom of network failure. Unfortunately, MySQL doesn’t have much information why connection failures happen. Probably because, from mysqld’s point of view, a problematic network only means that the client unexpectedly disconnected after a timeout, and the client still waiting for a response receives “
MySQL Server has gone away
”. I discussed these kinds of errors in my  “Troubleshooting hardware resource usage” webinar. A good practice for situations when you see this kind of error often is don’t leave idle connections open for a long time.

Q: I see that a lot of work is doing hard investigation about some possibilities of what is going wrong….is there a plan at development roadmap on improve error log output messages? If you can comment on that …

A: Percona Engineering does a lot for better diagnostics. For example, Percona Server for MySQL has an extended slow log file format, and Percona Server for MySQL 5.7.20 introduced a new

  variable that allows log information about all InnoDB lock wait timeout errors (manual). More importantly, it logs not only blocked transaction, but also locking transaction. This feature was requested at lp:1657737 for one of our Percona Support customers and is now implemented

Oracle MySQL Engineering team also does a lot for better error logging. The start of these improvements happened in version 5.7.2, when variable log_error_verbosity was introduced. Version 8.0.4 added much better tuning control. You can read about it in the Release Notes.

Q: Hello, you do you using strace to find what exactly table have problems in case there is not clear information in mysql error log?

A: I am not a big fan of

 when debugging mysqld crashes, but Percona Support certainly uses this tool. I myself prefer to work with
 when debugging client issues, such as trying to identify why Percona XtraBackup behaves incorrectly.

Thanks everybody for attending the webinar. You can find the slides and recording of the webinar at the Troubleshooting MySQL Crashes web page.

by Sveta Smirnova at February 15, 2018 07:56 PM

February 14, 2018

Peter Zaitsev

Update on Percona Platform Lifecycle for Ubuntu “Stable” Versions

Percona Platform Lifecycle

Percona Platform LifecycleThis blog post highlights changes to the Percona Platform Lifecycle for Ubuntu “Stable” Versions.

We have recently made some changes to our Percona Platform and Software Lifecycle policy in an effort to more strongly align with upstream Linux distributions. As part of this, we’ve set our timeframe for providing supported builds for Ubuntu “Stable” (non-LTS) releases to nine (9) months. This matches the current Ubuntu distribution upstream policy.

In the future, we will continue to shift as necessary to match the upstream policy specified by Canonical. Along with this, as we did with Debian 9 before, we will only produce 64-bit builds for this platform ongoing. It has been our intention for some time to slowly phase out 32-bit builds, as they are rarely downloaded and largely unnecessary in contemporary times.

If you have any questions or concerns, please feel free to contact Percona Support or post on our Community Forums.

by Tyler Duzan at February 14, 2018 10:56 PM