Planet MariaDB

December 14, 2017

MariaDB AB

MariaDB ColumnStore Data Redundancy – A Look Under the Hood

MariaDB ColumnStore Data Redundancy – A Look Under the Hood Ben Thompson Thu, 12/14/2017 - 12:48

In this blog post, we take a close look at  MariaDB ColumnStore data redundancy, a new feature of MariaDB AX. This feature enables you to have highly available storage and automated PM failover when using local disk storage.

MariaDB ColumnStore data redundancy leverages an open source filesystem called GlusterFS, maintained by RedHat. GlusterFS is an open source, distributed file system that provides continued access to data and is capable of scaling very large data. To enable data redundancy you must install and enable GlusterFS prior to running postConfigure. For more information on this topic, refer to Preparing for MariaDB ColumnStore Installation - 1.1.X. Failover is configured automatically by MariaDB ColumnStore, so that if a physical server experiences a service interruption, data is still accessible from another PM node. 

During postConfigure you are prompted to enter number of redundant copies for each dbroot:

Enter Number of Copies [2-N] (2) >

N = Number of PMs. (An actual number is displayed by postConfigure)

On a multi-node install with internal storage the DBRoots are tied directly to a single PM. 



On a multi-node install with data redundancy, replicated GlusterFS volumes are created for each DBRoot. To users on the outside this appears to be the same as above. Under the hood, a DBRoot is now a gluster volume, where a gluster volume is a collection of gluster bricks that map to directories on the local file system located here:

/usr/local/mariadb/columnstore/gluster/brick(n) (Default). 

This directory will contain the subdirectories brick1 to brick[n], where n = copies configured. Note: Bricks are numbered sequentially on each PM as they are created by MariaDB ColumnStore and are not related to each other or to DBRoot IDs.


A Three-PM Installation with Data Redundancy Copies = 2


In mcsadmin getStorageConfig this is displayed in text form, like this:

Data Redundant Configuration

Copies Per DBroot = 2
DBRoot #1 has copies on PMs = 1 2 
DBRoot #2 has copies on PMs = 2 3 
DBRoot #3 has copies on PMs = 1 3 

The number of copies can be increased as high as the number of PMs. For a three-PM system, that would that would look like this:

A Three-PM Installation with Data Redundancy Copies = 3


It is important to note that as the number of copies increases, the amount of network resources for distributing redundant data between PMs also increases. Configuration of number of copies should be kept as low as your data-redundancy requirements allow. Alternatively, if hardware configurations allow, a dedicated network can be configured during installation with postConfigure to help offload gluster network data.

MariaDB ColumnStore assigns DBRoots to a PM by using GlusterFS to mount a dbroot to its associated data directory and used as normal.


mount -tglusterfs PM1:/dbroot1 /usr/local/mariadb/columnstore/data1


mount -tglusterfs PM2:/dbroot2 /usr/local/mariadb/columnstore/data2


mount -tglusterfs PM3:/dbroot3 /usr/local/mariadb/columnstore/data3

At this point when a change is made to any files in a data(n) directory, it is copied to the connected brick. Only the assigned bricks are mounted as the logical DBRoots. The unassigned bricks are standby copies waiting for a failover event.


Three-PM Data Redundancy Copies = 2


A failover occurs when a service interruption is detected from a PM. In a normal local disk installation, data stored on the dbroot for that module would be inaccessible. With data redundancy, a small interruption occurs while the DBRoot is reassigned to the secondary brick. 

In our example system, PM #3 has lost power. PM #1 would be assigned DBRoot3 along with DBRoot1 since it has been maintaining the replica brick for DBroot3. PM #2 will see no change.


Three-PM Data Redundancy Copies = 2 & Failure of PM #3


When PM #3 returns, data changes for DBRoot3 and DBRoot2 will be synced across bricks for the volumes by GlusterFS. PM #3 returns to operational and DBRoot3 is unmounted from PM #1 and returned to PM #3.


Three-PM Data Redundancy Copies = 2 & PM #3 Recovered


This is only a simple example meant to illustrate how MariaDB ColumnStore with data redundancy leverages GlusterFS to provide a simple and effective way to keep your data accessible through service interruptions. 

We are excited to offer data redundancy as part of MariaDB ColumnStore 1.1, which is available for download as part of MariaDB AX, an enterprise open source solution for modern data analytics and data warehousing.

Login or Register to post comments

by Ben Thompson at December 14, 2017 05:48 PM

December 13, 2017

MariaDB AB

Atomic Compound Statements

Atomic Compound Statements vaintroub_g Wed, 12/13/2017 - 16:39

Recently, we had a discussion about a hypothetical feature, "START TRANSACTION ON ERROR ROLLBACK", that our users would find very useful. It would allow sending several commands to the server in one batch (a single network packet), and let the server handle the errors. This would combine efficient network use, and atomic execution. It turns out that it is already possible to do this with MariaDB, albeit with a slightly different syntax.

To execute several statements, stmt1;.. stmtN, and rollback on error, we can use MariaDB Server 10.1's compound statement . Put a transaction inside it, combine it with an EXIT HANDLER, and here you are:



The statements are executed in a single transaction, which ends with a COMMIT, unless there is an exception, in which case the EXIT HANDLER takes care of ROLLBACK on any error and propagates the original error with RESIGNAL. QED.

The above is quite similar to what the ANSI standard BEGIN ATOMIC would be able to do, if it was implemented. It is not yet, but for the mentioned use case BEGIN NOT ATOMIC could already be helpful.

For illustration, here is a comparison of a conventional "atomic batch" implementation in Java vs using compounds.

Conventional example

Some Java boilerplate, lots of communication between client and server. On the positive side, portable JDBC code.

void atomicBatch(Connection con, Statement stmt, String[] commands) throws SQLException{
    try {
        for (String command : commands)
    catch(SQLException e) {
        throw e;
    finally {

Compound statement example

Shorter, more efficient network communication. This does not work with MySQL (no compound statement support).


void atomicBatch(Statement stmt, String[] commands) throws SQLException{
   stmt.execute(ATOMIC_BATCH_PREFIX + String.join(";", Arrays.asList(commands)) + ATOMIC_BATCH_SUFFIX);

Recently, we had a discussion about a hypothetical feature, "START TRANSACTION ON ERROR ROLLBACK", that our users would find very useful. It would allow sending several commands to the server in one batch (a single network packet), and let the server handle the errors. This would combine efficient network use, and atomic execution. It turns out that it is already possible to do this with MariaDB, albeit with a slightly different syntax.

Login or Register to post comments

by vaintroub_g at December 13, 2017 09:39 PM

5 Simple Steps to Get Started with MariaDB and Tableau

5 Simple Steps to Get Started with MariaDB and Tableau Dipti Joshi Tue, 12/12/2017 - 23:41

For organizations in today’s data-driven economy, easy and fast access to data and insight into data are crucial to stay competitive. In order to leverage the insight from data, companies need easy-to-use and scalable data platforms and visual reporting of the data through front-end business intelligence tools.

MariaDB offers a transactional solution with MariaDB TX and a high-performance analytics solution with MariaDB AX. Both MariaDB TX and MariaDB AX provide an ANSI SQL interface for end users and BI tools. Tableau’s advanced data-visualization tool can interface directly with MariaDB TX and MariaDB AX so BI professionals can confidently use the very same tooling and interface with the same semantics and behavior regardless of whether the data resides in a transactional system or data warehouse. Recently, the Tableau integration with MariaDB was certified.

In this blog post, we show how to connect Tableau to MariaDB TX or MariaDB AX and how to create advanced visualizations in Tableau using data in MariaDB.

Step 1: Download the right packages and load your data into MariaDB.

  • Download and install MariaDB TX (or MariaDB AX).

  • Download and install Tableau Desktop.

  • Next populate data in MariaDB. For this exercise's purpose we have populated MariaDB Server with the TPCH dataset.


Step 2: Connect from the Tableau desktop to MariaDB.


Step 3: Understand the relationship of the database tables through Tableau.


Step 4: Create basic visualizations in Tableau against the data in MariaDB.


Step 5: Create advanced visualizations in Tableau against the data in MariaDB.

As shown, you can leverage your data assets with the combination of powerful analytic capabilities of MariaDB AX and transactional data in MariaDB TX with the advanced data visualization from Tableau software.

Please leave a comment or contact us for any questions.


In this blog post, we show how to connect Tableau to MariaDB TX or MariaDB AX and how to create advanced visualizations in Tableau using data in MariaDB.

Login or Register to post comments

by Dipti Joshi at December 13, 2017 04:41 AM

Billy Mobile Gets Fast Data-Driven Insights With MariaDB ColumnStore and Tableau

Billy Mobile Gets Fast Data-Driven Insights With MariaDB ColumnStore and Tableau guest Tue, 12/12/2017 - 23:30

This is a guest post by Geoff Cleaves, Business Intelligence Manager at Billy Mobile.

At startups, we acutely feel the technology challenges of running a business. The consequences of implementing a technology that doesn’t deliver the anticipated performance, security or ROI can be disastrous. Unlike long-established companies, startups don’t have the luxury of making time-intensive customizations or waiting for vendor-delivered fixes or upgrades.  

About Billy Mobile

Billy Mobile is a fast growing mobile advertising startup. With headquarters in Barcelona and offices in Singapore, Billy Mobile operates exclusively in the mobile arena of the ad tech industry. Billy Mobile is an ad exchange that programmatically connects advertisers with high traffic publishers in a mobile environment.

Our mission is to optimize advertisers’ ROI and to maximize publishers’ income. Billy Mobile’s competitive advantage is located in the optimization and real-time capacity of its technological platform, Active Bx, an automated, in-house developed and exclusively used algorithm capable of creating predictive models to decide when, where and to whom an advertisement will be shown, thereby attaining the highest performance.

The Challenge

Billy Mobile serves approximately 400 million advertisements a day around the world – primarily in Asia and Europe. With this volume, we needed responsive business intelligence (BI) on more than half a billion events a day to glean key insights into the health of our business and to improve customer service.

We needed the right solution that gave us the speed and insight while also allowing for scalability.

We knew we wanted an open source, columnar database. Also, the ability to build out and run queries in parallel for high performance and fast result was extremely important. Unfortunately, many of the open source solutions we researched didn’t give us these features while also giving us the speed we needed.

Latstly, we needed a database technology that could, out of the box, integrate with our BI solution Tableau.

The Solution

Several months ago we deployed MariaDB ColumnStore and have immediately seen results.

With MariaDB ColumnStore, Billy Mobile is able to easily aggregate and continually update approximately 10 million rows of data per hour. MariaDB ColumnStore allows Billy Mobile to accomplish something we never had before - fast, interactive analysis of big data. Using MariaDB ColumnStore with Tableau, we can explore, drill down and filter data, resulting in valuable insights, in less than 10 seconds.

I’m grateful to MariaDB for saving us so much time and giving us a quick ROI. I don’t know of any other open source column-based databases with similar parallelism that you can lay a third party BI software on.

Learn More

Billy Mobile was able to get up and running with MariaDB ColumnStore and Tableau quickly and easily and saw immediate results. Learn more about MariaDB ColumnStore, a core component of MariaDB AX, our enterprise open source solution for modern data analytics and data warehousing. Get started today–download MariaDB AX.

Learn how Billy Mobile deployed MariaDB ColumnStore for fast, interactive analysis of big data that easily integrates with Tableau for data visualization.

Login or Register to post comments

by guest at December 13, 2017 04:30 AM

December 12, 2017

Oli Sennhauser

Galera Cluster and Antivirus Scanner on Linux

Today we had to investigate in a very strange behaviour of IST and SST on a MariaDB Galera Cluster.

The symptom was, that some Galera Cluster nodes took a very long time to start. Up to 7 minutes. So the customer was concluding that the Galera Cluster node does an SST instead of an IST and was asking why the SST happens.

It have to be mentioned here, that the MariaDB error log is very confusing about whether it is an SST or an IST. So the customer was confused and concluded, that MariaDB Galera Cluster was doing an SST instead of IST.

Further confusing was that this behaviour was not consistently on all 3 nodes and not consistently on the 3 stages production, test and integration.

First we had to clear if the Galera node was doing an IST or an SST to exclude problems with Galera Cache or event Bugs in MariaDB Galera Cluster. For this we were running our famous and did some node restarts with forcing SST and without.

As a Galera Cluster operator you must mandatorily be capable to determine which one of both State Transfers happens from the MariaDB error log:

MariaDB Error Log with IST on Joiner

2017-12-12 22:29:33 140158145914624 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 204013)
2017-12-12 22:29:33 140158426741504 [Note] WSREP: State transfer required: 
        Group state: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:204013
        Local state: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:201439
2017-12-12 22:29:33 140158426741504 [Note] WSREP: New cluster view: global state: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:204013, view# 7: Primary, number of nodes: 3, my index: 2, protocol version 3
2017-12-12 22:29:33 140158426741504 [Warning] WSREP: Gap in state sequence. Need state transfer.
2017-12-12 22:29:33 140158116558592 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '' --datadir '/home/mysql/database/magal-101-b/data/'  --defaults-file '/home/mysql/database/magal-101-b/etc/my.cnf'  --parent '16426' --binlog '/home/mysql/database/magal-101-b/binlog/laptop4_magal-101-b__binlog' '
2017-12-12 22:29:33 140158426741504 [Note] WSREP: Prepared SST request: rsync|
2017-12-12 22:29:33 140158426741504 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-12-12 22:29:33 140158426741504 [Note] WSREP: Assign initial position for certification: 204013, protocol version: 3
2017-12-12 22:29:33 140158203852544 [Note] WSREP: Service thread queue flushed.
2017-12-12 22:29:33 140158426741504 [Note] WSREP: IST receiver addr using tcp://
2017-12-12 22:29:33 140158426741504 [Note] WSREP: Prepared IST receiver, listening at: tcp://
2017-12-12 22:29:33 140158145914624 [Note] WSREP: Member 2.0 (Node B) requested state transfer from 'Node C'. Selected 1.0 (Node C)(SYNCED) as donor.
2017-12-12 22:29:33 140158145914624 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 204050)
2017-12-12 22:29:33 140158426741504 [Note] WSREP: Requesting state transfer: success, donor: 1
2017-12-12 22:29:33 140158426741504 [Note] WSREP: GCache history reset: old(e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:0) -> new(e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:204013)
2017-12-12 22:29:33 140158145914624 [Note] WSREP: 1.0 (Node C): State transfer to 2.0 (Node B) complete.
2017-12-12 22:29:33 140158145914624 [Note] WSREP: Member 1.0 (Node C) synced with group.
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 16663 (20171212 22:29:34.474)
WSREP_SST: [INFO] Joiner cleanup done. (20171212 22:29:34.980)
2017-12-12 22:29:34 140158427056064 [Note] WSREP: SST complete, seqno: 201439
2017-12-12 22:29:35 140158427056064 [Note] WSREP: Signalling provider to continue.
2017-12-12 22:29:35 140158427056064 [Note] WSREP: SST received: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:201439
2017-12-12 22:29:35 140158426741504 [Note] WSREP: Receiving IST: 2574 writesets, seqnos 201439-204013
2017-12-12 22:29:35 140158426741504 [Note] WSREP: IST received: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:204013
2017-12-12 22:29:35 140158145914624 [Note] WSREP: 2.0 (Node B): State transfer from 1.0 (Node C) complete.
2017-12-12 22:29:35 140158145914624 [Note] WSREP: Shifting JOINER -> JOINED (TO: 204534)
2017-12-12 22:29:35 140158145914624 [Note] WSREP: Member 2.0 (Node B) synced with group.
2017-12-12 22:29:35 140158145914624 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 204535)
2017-12-12 22:29:35 140158426741504 [Note] WSREP: Synchronized with group, ready for connections

MariaDB Error Log with SST on Joiner

2017-12-12 22:32:15 139817123833600 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 239097)
2017-12-12 22:32:15 139817401395968 [Note] WSREP: State transfer required: 
        Group state: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:239097
        Local state: 00000000-0000-0000-0000-000000000000:-1
2017-12-12 22:32:15 139817401395968 [Note] WSREP: New cluster view: global state: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:239097, view# 9: Primary, number of nodes: 3, my index: 2, protocol version 3
2017-12-12 22:32:15 139817401395968 [Warning] WSREP: Gap in state sequence. Need state transfer.
2017-12-12 22:32:15 139817094477568 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '' --datadir '/home/mysql/database/magal-101-b/data/'  --defaults-file '/home/mysql/database/magal-101-b/etc/my.cnf'  --parent '25291' --binlog '/home/mysql/database/magal-101-b/binlog/laptop4_magal-101-b__binlog' '
2017-12-12 22:32:15 139817401395968 [Note] WSREP: Prepared SST request: rsync|
2017-12-12 22:32:15 139817401395968 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-12-12 22:32:15 139817401395968 [Note] WSREP: Assign initial position for certification: 239097, protocol version: 3
2017-12-12 22:32:15 139817178507008 [Note] WSREP: Service thread queue flushed.
2017-12-12 22:32:15 139817401395968 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (e2fbbca5-df26-11e7-8ee2-bb61f8ff3774): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2017-12-12 22:32:15 139817123833600 [Note] WSREP: Member 2.0 (Node B) requested state transfer from 'Node C'. Selected 1.0 (Node C)(SYNCED) as donor.
2017-12-12 22:32:15 139817123833600 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 239136)
2017-12-12 22:32:15 139817401395968 [Note] WSREP: Requesting state transfer: success, donor: 1
2017-12-12 22:32:15 139817401395968 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:239097)
2017-12-12 22:32:17 139817123833600 [Note] WSREP: 1.0 (Node C): State transfer to 2.0 (Node B) complete.
2017-12-12 22:32:17 139817123833600 [Note] WSREP: Member 1.0 (Node C) synced with group.
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 25520 (20171212 22:32:17.846)
WSREP_SST: [INFO] Joiner cleanup done. (20171212 22:32:18.352)
2017-12-12 22:32:18 139817401710528 [Note] WSREP: SST complete, seqno: 239153
2017-12-12 22:32:18 139817132226304 [Note] WSREP: (ebfd9e9c, 'tcp://') turning message relay requesting off
2017-12-12 22:32:22 139817401710528 [Note] WSREP: Signalling provider to continue.
2017-12-12 22:32:22 139817401710528 [Note] WSREP: SST received: e2fbbca5-df26-11e7-8ee2-bb61f8ff3774:239153
2017-12-12 22:32:22 139817123833600 [Note] WSREP: 2.0 (Node B): State transfer from 1.0 (Node C) complete.
2017-12-12 22:32:22 139817123833600 [Note] WSREP: Shifting JOINER -> JOINED (TO: 239858)
2017-12-12 22:32:22 139817123833600 [Note] WSREP: Member 2.0 (Node B) synced with group.
2017-12-12 22:32:22 139817123833600 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 239866)
2017-12-12 22:32:22 139817401395968 [Note] WSREP: Synchronized with group, ready for connections

After we cleared that it really was an IST and that it was not a SST because of some other reasons the question rose: Why does an IST of only a few thousand transactions was taking 420 seconds. And this was not always the case...

So we were looking with top at the Donor and the Joiner during IST and we found that on the Donor node the Antivirus software was heavily using CPU (2 x 50%) and otherwise the system was doing nothing for a while and then suddenly started to transfer data over the network (possibly IST?).
Later we found, that the MariaDB datadir (/var/lib/mysql) was not excluded from the Antivirus software. And finally it looks like the Antivirus software was not properly configured by its Master server because the Antivirus software agent was from a cloned VM and not reinitialized. So the Antivirus Master server seems to be confused because there are 2 Antivirus software agents with the same ID.

Another very surprising situation which we did not expect was, that IST was much heavier influenced by the Antivirus software than SST. SST finished in a few seconds while IST took 420 seconds.

Conclusion: Be careful when using Antivirus software in combination with MariaDB Galera Cluster databases and exclude at least all database directories from virus scanning. If you want to be sure to avoid side effects (noisy neighbours) disable the Antivirus software on the database server at all and make sure by other means, that no virus is reaching your precious MariaDB Galera Cluster...

by Shinguz at December 12, 2017 09:51 PM

December 10, 2017

Valeriy Kravchuk

Fun with Bugs #58 - Bug of the Day From @mysqlbugs

In 2013 I had a habit of writing about MySQL bugs on Facebook almost every day. Typical post looked like this one, link to the bug and few words of wondering with a bit of sarcasm.
By the way, check last comments in Bug #68892 mentioned there - the problem of LOST_EVENTS in master's binary log and a way to workaround it still valid as of MySQL 5.7.17.
At that time I often got private messages from colleagues that Facebook is a wrong media for this kind of posts, these posts make MySQL look "buggy" etc, and eventually I was shut up (for a year or so) in a more or less official way by combined efforts of my employer at that time and Oracle MySQL officials.

A year later I started to write about MySQL bugs again on Facebook, but these posts were not regular any more, and maybe became a bit less sarcastic. Years later, I agree that Facebook should better be used for sharing photos of cats, or nice places, or family members, or even for hot political discussions, as these things annoy people much less than MySQL bugs. So, the need for different media for short annoying messages about MySQL bugs is clear to me. I do think the right media is Twitter (it is annoying by itself), and I am present there as @mysqlbugs since December 2012 actually. I am not a big fan of it historically and used to open it mostly to share information about yet another my blog post, but recently (after they allowed to write longer messages there) I decided to pay more attention to it (until its gone entirely and replaced by something else). So, I post a link to one MySQL bug there every day for a week or so, with the #bugoftheday tag. I quickly noticed that the tag was used by others to share photos of nice insects/bugs, but I don't mind for my posts to end up among those, and I am ready to share some of my related photos as well.

To summarize, I am going to write short messages about MySQL bugs regularly on Twitter. I try to write about some bug I recently notices just because it is new, improperly handled or was involved in some customer issue that I worked on. Let me share 5 last bugs mentioned there, so you can decide if it makes sense for you to follow me or #bugoftheday:
  • Bug #88827 - "innodb uses too much space in the PK for concurrent inserts into the same table". Interesting finding by Mark Callaghan during his recent benchmarking efforts. Still "Open".
  • Bug #88788 - "log_bin is not considered correctly an makes binary logging unusable!". This report by Oli Sennhauser (quickly declared a duplicate of my older Bug #75507) got a really nice number, and it's also interesting how efforts to name hosts in a nicely structured manner may play against poor DBA...
  • Bug #88776 - "Issues found by PVS-Studio static analyzer". Related post with the detailed analysis of problems found by the tool was mentioned few days ago by somebody from MariaDB on Slack, so I immediately noted that the bug comes from the same author, Sergey Vasiliev.
  • Bug #88765 - "This bug reporting form has a ridiculously short character limit for the bug syn". The bug report about MySQL bugs database itself. I also hit the limit there way more often than I'd like to. Discussion continues, but I feel that this is not going to be fixed...
  • Bug #87526 - "The output of 'XA recover convert xid' is not useful". One of our customers hit this problem recently. I can only agree with Sveta Smirnova here, "Output of XA RECOVER should show ID which can be used in XA COMMIT statement."
So, if the list above seems useful or interesting, please, follow me on Twitter. Let's continue to have regular fun with MySQL bugs, now using (hopefully) a more appropriate media!

As a side note, if you are interested in older bugs opened this day years ago and still "Verified", please, check this great page by Hartmut! You may find really great reports like Bug #79581- "Error 1064 on selects from Information Schema if routine name has '\0'", by Sveta Smirnova. This bug is 2 years old today...

by Valeriy Kravchuk ( at December 10, 2017 02:59 PM

December 07, 2017

Peter Zaitsev

Hands-On Look at ZFS with MySQL

ZFS with MySQL

ZFS with MySQLThis post is a hands-on look at ZFS with MySQL.

In my previous post, I highlighted the similarities between MySQL and ZFS. Before going any further, I’d like you to be able to play and experiment with ZFS. This post shows you how to configure ZFS with MySQL in a minimalistic way on either Ubuntu 16.04 or Centos 7.


In order to be able to use ZFS, you need some available storage space. For storage – since the goal here is just to have a hands-on experience – we’ll use a simple file as a storage device. Although simplistic, I have now been using a similar setup on my laptop for nearly three years (just can’t get rid of it, it is too useful). For simplicity, I suggest you use a small Centos7 or Ubuntu 16.04 VM with one core, 8GB of disk and 1GB of RAM.

First, you need to install ZFS as it is not installed by default. On Ubuntu 16.04, you simply need to run:

root@Ubuntu1604:~# apt-get install zfs-dkms zfsutils-linux

On RedHat or Centos 7.4, the procedure is a bit more complex. First, we need to install the EPEL ZFS repository:

[root@Centos7 ~]# yum install
[root@Centos7 ~]# gpg --quiet --with-fingerprint /etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux
[root@Centos7 ~]# gpg --quiet --with-fingerprint /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7

Apparently, there were issues with ZFS kmod kernel modules on RedHat/Centos. I never had any issues with Ubuntu (and who knows how often the kernel is updated). Anyway, it is recommended that you enable kABI-tracking kmods. Edit the file /etc/yum.repos.d/zfs.repo, disable the ZFS repo and enable the zfs-kmod repo. The beginning of the file should look like:

name=ZFS on Linux for EL7 - dkms
name=ZFS on Linux for EL7 - kmod

Now, we can proceed and install ZFS:

[root@Centos7 ~]# yum install zfs

After the installation, I have ZFS version on Ubuntu and version on Centos7. The version difference doesn’t matter for what will follow.


So, we need a container for the data. You can use any of the following options for storage:

  • A free disk device
  • A free partition
  • An empty LVM logical volume
  • A file

The easiest solution is to use a file, and so that’s what I’ll use here. A file is not the fastest and most efficient storage, but it is fine for our hands-on. In production, please use real devices. A more realistic server configuration will be discussed in a future post. The following steps are identical on Ubuntu and Centos. The first step is to create the storage file. I’ll use a file of 1~GB in /mnt. Adjust the size and path to whatever suits the resources you have:

[root@Centos7 ~]# dd if=/dev/zero of=/mnt/zfs.img bs=1024 count=1048576

The result is a 1GB file in /mnt:

[root@Centos7 ~]# ls -lh /mnt
total 1,0G
-rw-r--r--.  1 root root 1,0G 16 nov 16:50 zfs.img

Now, we will create our ZFS pool, mysqldata, using the file we just created:

[root@Centos7 ~]# modprobe zfs
[root@Centos7 ~]# zpool create mysqldata /mnt/zfs.img
[root@Centos7 ~]# zpool status
  pool: mysqldata
 state: ONLINE
  scan: none requested
        NAME            STATE     READ WRITE CKSUM
        mysqldata       ONLINE       0     0     0
          /mnt/zfs.img  ONLINE       0     0     0
errors: No known data errors
[root@Centos7 ~]# zfs list
mysqldata  79,5K   880M    24K  /mysqldata

If you have a result similar to the above, congratulations, you have a ZFS pool. If you put files in /mysqldata, they are in ZFS.

MySQL installation

Now, let’s install MySQL and play around a bit. We’ll begin by installing the Percona repository:

root@Ubuntu1604:~# cd /tmp
root@Ubuntu1604:/tmp# wget$(lsb_release -sc)_all.deb
root@Ubuntu1604:/tmp# dpkg -i percona-release_*.deb
root@Ubuntu1604:/tmp# apt-get update
[root@Centos7 ~]# yum install

Next, we install Percona Server for MySQL 5.7:

root@Ubuntu1604:~# apt-get install percona-server-server-5.7
root@Ubuntu1604:~# systemctl start mysql
[root@Centos7 ~]# yum install Percona-Server-server-57
[root@Centos7 ~]# systemctl start mysql

The installation command pulls all the dependencies and sets up the MySQL root password. On Ubuntu, the install script asks for the password, but on Centos7 a random password is set. To retrieve the random password:

[root@Centos7 ~]# grep password /var/log/mysqld.log
2017-11-21T18:37:52.435067Z 1 [Note] A temporary password is generated for root@localhost: XayhVloV+9g+

The following step is to reset the root password:

[root@Centos7 ~]# mysql -p -e "ALTER USER 'root'@'localhost' IDENTIFIED BY 'Mysql57OnZfs_';"
Enter password:

Since 5.7.15, the password validation plugin by defaults requires a length greater than 8, mixed cases, at least one digit and at least one special character. On either Linux distributions, I suggest you set the credentials in the /root/.my.cnf file like this:

[# cat /root/.my.cnf

MySQL configuration for ZFS

Now that we have both ZFS and MySQL, we need some configuration to make them play together. From here, the steps are the same on Ubuntu and Centos. First, we stop MySQL:

# systemctl stop mysql

Then, we’ll configure ZFS. We will create three ZFS filesystems in our pool:

  • mysql will be the top level filesystem for the MySQL related data. This filesystem will not directly have data in it, but data will be stored in the other filesystems that we create. The utility of the mysql filesystem will become obvious when we talk about snapshots. Something to keep in mind for the next steps, the properties of a filesystem are by default inherited from the upper level.
  • mysql/data will be the actual datadir. The files in the datadir are mostly accessed through random IO operations, so we’ll set the ZFS recordsize to match the InnoDB page size.
  • mysql/log will be where the log files will be stored. By log files, I primarily mean the InnoDB log files. But the binary log file, the slow query log and the error log will all be stored in that directory. The log files are accessed through sequential IO operations. We’ll thus use a bigger ZFS recordsize in order to maximize the compression efficiency.

Let’s begin with the top-level MySQL container. I could have used directly mysqldata, but that would somewhat limit us. The following steps create the filesystem and set some properties:

# zfs create mysqldata/mysql
# zfs set compression=gzip mysqldata/mysql
# zfs set recordsize=128k mysqldata/mysql
# zfs set atime=off mysqldata/mysql

I just set compression to ‘gzip’ (the equivalent of gzip level 6), recordsize to 128KB and atime (the file’s access time) to off. Once we are done with the mysql filesystem, we can proceed with the data and log filesystems:

# zfs create mysqldata/mysql/log
# zfs create mysqldata/mysql/data
# zfs set recordsize=16k mysqldata/mysql/data
# zfs set primarycache=metadata mysqldata/mysql/data
# zfs get compression,recordsize,atime mysqldata/mysql/data
NAME                  PROPERTY     VALUE     SOURCE
mysqldata/mysql/data  compression  gzip      inherited from mysqldata/mysql
mysqldata/mysql/data  recordsize   16K       local
mysqldata/mysql/data  atime        off       inherited from mysqldata/mysql

Of course, there are other properties that could be set, but let’s keep things simple. Now that the filesystems are ready, let’s move the files to ZFS (make sure you stopped MySQL):

# mv /var/lib/mysql/ib_logfile* /mysqldata/mysql/log/
# mv /var/lib/mysql/* /mysqldata/mysql/data/

and then set the real mount points:

# zfs set mountpoint=/var/lib/mysql mysqldata/mysql/data
# zfs set mountpoint=/var/lib/mysql-log mysqldata/mysql/log
# chown mysql.mysql /var/lib/mysql /var/lib/mysql-log

Now we have:

# zfs list
mysqldata             1,66M   878M  25,5K  /mysqldata
mysqldata/mysql       1,54M   878M    25K  /mysqldata/mysql
mysqldata/mysql/data   890K   878M   890K  /var/lib/mysql
mysqldata/mysql/log    662K   878M   662K  /var/lib/mysql-log

We must adjust the MySQL configuration accordingly. Here’s what I put in my /etc/my.cnf file (/etc/mysql/my.cnf on Ubuntu):

innodb_log_group_home_dir = /var/lib/mysql-log
innodb_doublewrite = 0
innodb_checksum_algorithm = none
slow_query_log = /var/lib/mysql-log/slow.log
log-error = /var/lib/mysql-log/error.log
server_id = 12345
log_bin = /var/lib/mysql-log/binlog
# Disabling symbolic-links is recommended to prevent assorted security risks

On Centos 7, selinux prevented MySQL from accessing files in /var/lib/mysql-log. I had to perform the following steps:

[root@Centos7 ~]# yum install policycoreutils-python
[root@Centos7 ~]# semanage fcontext -a -t mysqld_db_t "/var/lib/mysql-log(/.*)?"
[root@Centos7 ~]# chcon -Rv --type=mysqld_db_t /var/lib/mysql-log/

I could have just disabled selinux since it is a test server, but if I don’t get my hands dirty on selinux once in a while with semanage and chcon I will not remember how to do it. Selinux is an important security tool on Linux (but that’s another story).

At this point, feel free to start using your test MySQL database on ZFS.

Monitoring ZFS

To monitor ZFS, you can use the zpool command like this:

[root@Centos7 ~]# zpool iostat 3
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
mysqldata   19,6M   988M      0      0      0    290
mysqldata   19,3M   989M      0     44      0  1,66M
mysqldata   23,4M   985M      0     49      0  1,33M
mysqldata   23,4M   985M      0     40      0   694K
mysqldata   26,7M   981M      0     39      0   561K
mysqldata   26,7M   981M      0     37      0   776K
mysqldata   23,8M   984M      0     43      0   634K

This shows the ZFS activity while I was loading some data. Also, the following command gives you an estimate of the compression ratio:

[root@Centos7 ~]# zfs get compressratio,used,logicalused mysqldata/mysql
mysqldata/mysql  compressratio  4.10x  -
mysqldata/mysql  used           116M   -
mysqldata/mysql  logicalused    469M   -
[root@Centos7 ~]# zfs get compressratio,used,logicalused mysqldata/mysql/data
NAME                  PROPERTY       VALUE  SOURCE
mysqldata/mysql/data  compressratio  4.03x  -
mysqldata/mysql/data  used           67,9M  -
mysqldata/mysql/data  logicalused    268M   -
[root@Centos7 ~]# zfs get compressratio,used,logicalused mysqldata/mysql/log
NAME                 PROPERTY       VALUE  SOURCE
mysqldata/mysql/log  compressratio  4.21x  -
mysqldata/mysql/log  used           47,8M  -
mysqldata/mysql/log  logicalused    201M   -

In my case, the dataset compresses very well (4x). Another way to see how files are compressed is to use ls and du. ls returns the actual uncompressed size of the file, while du returns the compressed size. Here’s an example:

[root@Centos7 mysql]# -lah ibdata1
-rw-rw---- 1 mysql mysql 90M nov 24 16:09 ibdata1
[root@Centos7 mysql]# du -hs ibdata1
14M     ibdata1

I really invite you to further experiment and get a feeling of how ZFS and MySQL behave together.

Snapshots and backups

A great feature of ZFS that work really well with MySQL are snapshots. A snapshot is a consistent view of the filesystem at a given point in time. Normally, it is best to perform a snapshot while a flush tables with read lock is held. That allows you to record the master position, and also to flush MyISAM tables. It is quite easy to do that. Here’s how I create a snapshot with MySQL:

[root@Centos7 ~]# mysql -e 'flush tables with read lock;show master status;! zfs snapshot -r mysqldata/mysql@my_first_snapshot'
| File          | Position  | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
| binlog.000002 | 110295083 |              |                  |                   |
[root@Centos7 ~]# zfs list -t snapshot
NAME                                     USED  AVAIL  REFER  MOUNTPOINT
mysqldata/mysql@my_first_snapshot          0B      -    24K  -
mysqldata/mysql/data@my_first_snapshot     0B      -  67,9M  -
mysqldata/mysql/log@my_first_snapshot      0B      -  47,8M  -

The command took about 1s. The only time where such commands would take more time is when there are MyISAM tables with a lot of pending updates to the indices, or when there are long running transactions. You probably wonder why the “USED” column reports 0B. That’s simply because there were no changes to the filesystem since the snapshot was created. It is a measure of the amount of data that hasn’t been free because the snapshot requires the data. Said otherwise, it how far the snapshot has diverged from its parent. You can access the snapshot through a clone or through ZFS as a file system. To access the snapshot through ZFS, you have to set the snapdir parameter to “visible, ” and then you can see the files. Here’s how:

[root@Centos7 ~]# zfs set snapdir=visible mysqldata/mysql/data
[root@Centos7 ~]# zfs set snapdir=visible mysqldata/mysql/log
[root@Centos7 ~]# ls /var/lib/mysql-log/.zfs/snapshot/my_first_snapshot/
binlog.000001  binlog.000002  binlog.index  error.log  ib_logfile0  ib_logfile1

The files in the snapshot directory are read-only. If you want to be able to write to the files, you first need to clone the snapshots:

[root@Centos7 ~]# zfs create mysqldata/mysqlslave
[root@Centos7 ~]# zfs clone mysqldata/mysql/data@my_first_snapshot mysqldata/mysqlslave/data
[root@Centos7 ~]# zfs clone mysqldata/mysql/log@my_first_snapshot mysqldata/mysqlslave/log
[root@Centos7 ~]# zfs list
NAME                        USED  AVAIL  REFER  MOUNTPOINT
mysqldata                   116M   764M    26K  /mysqldata
mysqldata/mysql             116M   764M    24K  /mysqldata/mysql
mysqldata/mysql/data       67,9M   764M  67,9M  /var/lib/mysql
mysqldata/mysql/log        47,8M   764M  47,8M  /var/lib/mysql-log
mysqldata/mysqlslave         28K   764M    26K  /mysqldata/mysqlslave
mysqldata/mysqlslave/data     1K   764M  67,9M  /mysqldata/mysqlslave/data
mysqldata/mysqlslave/log      1K   764M  47,8M  /mysqldata/mysqlslave/log

At this point, it is up to you to use the clones to spin up a local slave. Like for the snapshots, the clone only grows in size when actual data is written to it. ZFS records that haven’t changed since the snapshot was taken are shared. That’s a huge space savings. For a customer, I once wrote a script to automatically create five MySQL slaves for their developers. The developers would do tests, and often replication broke. Rerunning the script would recreate fresh slaves in a matter of a few minutes. My ZFS snapshot script and the script I wrote to create the clone based slaves are available here:

Optional features

In the previous post, I talked about a SLOG device for the ZIL and the L2ARC, a disk extension of the ARC cache. If you promise to never use the following trick in production, here’s how to speed MySQL on ZFS drastically:

[root@Centos7 ~]# dd if=/dev/zero of=/dev/shm/zil_slog.img bs=1024 count=131072
131072+0 enregistrements lus
131072+0 enregistrements écrits
134217728 octets (134 MB) copiés, 0,373809 s, 359 MB/s
[root@Centos7 ~]# zpool add mysqldata log /dev/shm/zil_slog.img
[root@Centos7 ~]# zpool status
  pool: mysqldata
 state: ONLINE
  scan: none requested
        NAME                     STATE     READ WRITE CKSUM
        mysqldata                ONLINE       0     0     0
          /mnt/zfs.img           ONLINE       0     0     0
          /dev/shm/zil_slog.img  ONLINE       0     0     0
errors: No known data errors

The data in the SLOG is critical for ZFS recovery. I performed some tests with virtual machines, and if you crash the server and lose the SLOG you may lose all the data stored in the ZFS pool. Normally, the SLOG is on a mirror in order to lower the risk of losing it. The SLOG can be added and removed online.

I know I asked you to promise to never use an shm file as SLOG in production. Actually, there are exceptions. I would not hesitate to temporarily use such a trick to speed up a lagging slave. Another situation where such a trick could be used is with Percona XtraDB Cluster. With a cluster, there are multiple copies of the dataset. Even if one node crashed and lost its ZFS filesystems, it could easily be reconfigured and reprovisioned from the surviving nodes.

The other optional feature I want to cover is a cache device. The cache device is what is used for the L2ARC. The content of the L2ARC is compressed as the original data is compressed. To add a cache device (again an shm file), do:

[root@Centos7 ~]# dd if=/dev/zero of=/dev/shm/l2arc.img bs=1024 count=131072
131072+0 enregistrements lus
131072+0 enregistrements écrits
134217728 octets (134 MB) copiés, 0,272323 s, 493 MB/s
[root@Centos7 ~]# zpool add mysqldata cache /dev/shm/l2arc.img
[root@Centos7 ~]# zpool status
  pool: mysqldata
 state: ONLINE
  scan: none requested
    NAME                     STATE     READ WRITE CKSUM
    mysqldata                ONLINE       0     0     0
      /mnt/zfs.img           ONLINE       0     0     0
      /dev/shm/zil_slog.img  ONLINE       0     0     0
      /dev/shm/l2arc.img     ONLINE       0     0     0
errors: No known data errors

To monitor the L2ARC (and also the ARC), look at the file: /proc/spl/kstat/zfs/arcstats. As the ZFS filesystems are configured right now, very little will go to the L2ARC. This can be frustrating. The reason is that the L2ARC is filled by the elements evicted from the ARC. If you recall, we have set primarycache=metatdata for the filesystem containing the actual data. Hence, in order to get some data to our L2ARC, I suggest the following steps:

[root@Centos7 ~]# zfs set primarycache=all mysqldata/mysql/data
[root@Centos7 ~]# echo 67108864 > /sys/module/zfs/parameters/zfs_arc_max
[root@Centos7 ~]# echo 3 > /proc/sys/vm/drop_caches
[root@Centos7 ~]# grep '^size' /proc/spl/kstat/zfs/arcstats
size                            4    65097584

It takes the echo command to drop_caches to force a re-initialization of the ARC. Now, InnoDB data starts to be cached in the L2ARC. The way data is sent to the L2ARC has many tunables, which I won’t discuss here. I chose 64MB for the ARC size mainly because I am using a low memory VM. A size of 64MB is aggressively small and will slow down ZFS if the metadata doesn’t fit in the ARC. Normally you should use a larger value. The actual good size depends on many parameters like the filesystem system size, the number of files and the presence of a L2ARC. You can monitor the ARC and L2ARC using the arcstat tool that comes with ZFS on Linux (when you use Centos 7). With Ubuntu, download the tool from here.


So the ZFS party is over? We need to clean up the mess! Let’s begin:

[root@Centos7 ~]# systemctl stop mysql
[root@Centos7 ~]# zpool remove /dev/shm/l2arc.img
[root@Centos7 ~]# zpool remove mysqldata /dev/shm/zil_slog.img
[root@Centos7 ~]# rm -f /dev/shm/*.img
[root@Centos7 ~]# zpool destroy mysqldata
[root@Centos7 ~]# rm -f /mnt/zfs.img
[root@Centos7 ~]# yum erase spl kmod-spl libzpool2 libzfs2 kmod-zfs zfs

The last step is different on Ubuntu:

root@Ubuntu1604:~# apt-get remove spl-dkms zfs-dkms libzpool2linux libzfs2linux spl zfsutils-linux zfs-zed


With this guide, I hope I provided a positive first experience in using ZFS with MySQL. The configuration is simple, and not optimized for performance. However, we’ll look at more realistic configurations in future posts.

by Yves Trudeau at December 07, 2017 10:37 PM

Percona Server for MySQL 5.5.58-38.10 is Now Available

percona server 5.5.58-38.10

Percona Server for MySQL 5.5.58-38.10Percona announces the release of Percona Server for MySQL 5.5.58-38.10 on December 7, 2017. Based on MySQL 5.5.58, including all the bug fixes in it, Percona Server for MySQL 5.5.58-38.10 is now the current stable release in the 5.5 series.

Percona Server for MySQL is open-source and free. You can find release details in the 5.5.58-38.10 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

New Features:
  • Percona Server packages are now available for Ubuntu 17.10 (Artful).
Bugs Fixed:
  • If an I/O syscall returned an error during the server shutdown with Thread Pool enabled, a mutex could be left locked. Bug fixed #1702330 (Daniel Black).
  • Dynamic row format feature to support BLOB/VARCHAR in MEMORY tables requires all the key columns to come before any BLOB columns. This requirement however was not enforced, allowing creating MEMORY tables in unsupported column configurations, which then crashed or lose data in usage. Bug fixed #1731483.

Other bugs fixed: #1729241.

Find the release notes for Percona Server for MySQL 5.5.58-38.10 in our online documentation. Report bugs on the launchpad bug tracker.

by Hrvoje Matijakovic at December 07, 2017 07:22 PM

MariaDB AB

MariaDB ColumnStore Write API

MariaDB ColumnStore Write API linuxjedi Thu, 12/07/2017 - 02:19

There are many great unique features of MariaDB ColumnStore, one of which is the speed in which you can get data from CSV files into MariaDB ColumnStore using the tool ‘cpimport’. But what if your data is in another format? Or you wish to stream data into MariaDB ColumnStore from another application? This is where MariaDB ColumnStore’s new write API comes into play.

The MariaDB ColumnStore write API is a new C++ API which lets you inject data directly into MariaDB ColumnStore’s WriteEngine using a series of simple calls. This allows you to easily write custom data injection tools that are much faster than using the SQL interface. If you are a Python or Java developer, then we also bundle in wrappers for those languages.

We designed the API to be familiar to users of ORM ways of accessing the database, we have the function setColumn() which is used to set a column in a row you are going to write, writeRow() to store the row as well as commit() and rollback() functions.

This is an example of a simple application that will write to a MariaDB ColumnStore that has just two integer columns. The full source code for this can be found in the example/basic_bulk_insert.cpp file in the API’s source code:

int main(void)
    mcsapi::ColumnStoreDriver* driver = nullptr;
    mcsapi::ColumnStoreBulkInsert* bulk = nullptr;
    try {
        driver = new mcsapi::ColumnStoreDriver();

The ColumnStoreDriver class will automatically discover the MariaDB ColumnStore cluster by trying to find the Columnstore.xml which is in every module, there is also an optional parameter to specify a location for this xml file. It will throw an error if this cannot be found. You can copy the xml file to another server and use it there as long as that server can talk to all your PM nodes.

        bulk = driver->createBulkInsert("test", "t1", 0, 0);

This creates an instance of a bulk insert class from the driver class and sets everything up as required. You can see that we are writing to the table “test.t1”. The API can create many bulk insert objects from a single driver object. Each bulk insert object should be considered a database transaction.

        for (int i = 0; i < 1000; i++)
            bulk->setColumn(0, (uint32_t)i);
            bulk->setColumn(1, (uint32_t)1000 - i);

This is a very simple ‘for’ loop which sets an integer for the first and second column of the table and then stores that row. Note that writeRow() is designed to not immediately send data to ColumnStore for performance. It will instead buffer 100,000 rows or wait for a commit() before it actually sends the data.


After we have set these 1000 rows we ask the MariaDB ColumnStore API to commit the data. In this example the data would be sent to the MariaDB ColumnStore installation along with the commit() since we only have 1000 rows. At this point the ‘bulk’ object cannot be used to send any more data. It can only be used for retrieving summary information.

    } catch (mcsapi::ColumnStoreError &e) {
            std::cout << "Error caught: " << e.what() << std::endl;
    delete bulk;
    delete driver;

Finally we have a little bit of error handling and cleanup.

The data is written using the same atomic methods as ‘cpimport’ by appending new blocks of data to the column files and moving an atomic “High Water Mark” block pointer upon commit. This means that select queries are not blocked during the insert process and do not see the data until after it has been committed.

There are several more advanced features of the API which you can find in the documentation. The API is an Open Source project and the source code can be easily obtained via GitHub. The API has been released alongside MariaDB ColumnStore 1.1 and you can get it from our download page along with other components that make up MariaDB AX, our modern data warehousing solution.

There are many great unique features of MariaDB ColumnStore, one of which is the speed in which you can get data from CSV files into MariaDB ColumnStore using the tool ‘cpimport’. But what if your data is in another format? Or you wish to stream data into MariaDB ColumnStore from another application? This is where MariaDB ColumnStore’s new write API comes into play.

Login or Register to post comments

by linuxjedi at December 07, 2017 07:19 AM

December 06, 2017

Peter Zaitsev

MongoDB 3.6 Community Is Here!

MongoDB 3.6 Community

MongoDB 3.6 CommunityBy now you surely know MongoDB 3.6 Community became generally available on Dec 5, 2017. Of course, this is great news: it has some big ticket items that we are all excited about! But I want to also talk about my general thoughts on this release.

It is always a good idea for your internal teams to study and consider new versions. This is crucial for understanding if and when you should start using it. After deciding to use it, there is the question of if you want your developers using the new features (or are they not suitably implemented yet to be used)?

So what is in MongoDB 3.6 Community? Check it out:

  • Sessions
  • Change Streams
  • Retryable Writes
  • Security Improvement
  • Major love for Arrays in Aggregation
  • A better balancer
  • JSON Schema Validation
  • Better Time management
  • Compass is Community
  • Major WiredTiger internal overhaul

As you can see, this is an extensive list. But there are 1400+ implemented Jira tickets just on the server itself (not even in the WiredTigers project).

To that end, I thought we should break my review into a few areas. We will have blog posts out soon covering these areas in more depth. This blog is more about my thoughts on the topics above.

Expected blogs (we will link to them as they become available):

  • Change Streams –  Nov 11 2017
  • Sessions
  • Transactions and new functions
  • Aggregation improvements
  • Security Controls to use ASAP
  • Other changes from diagnostics to Validation

Today let’s quickly recap the above areas.


We will have a blog on this (it has some history). This move has been long-awaited by anyone using MongoDB before 2.4. There were connection changes in that release that made it complicated for load balancers due to the inability to “re-attach” to the same session.  If you were not careful in 2.4+, you could easily use a load-balancer and have very odd behavior: from broken to invisibly incomplete getMores (big queries).

Sessions aim is to change this. Now, the client drivers know about the internal session to the database used for reads and writes. Better yet, MongoDB tracks these sessions so even if an election occurs, when your drive fails over so will the session. For anyone who’s applications handled fail-overs badly, this is an amazing improvement. Some of the other new features that make 3.6 a solid release require this new behavior.

Does this mean this solution is perfect and works everywhere? No! It, like newer features we have seen, leave MMAPv1 out in the cold due to its inability without major re-work to support logic that is so native to Wired Tiger. Talking with engine engineers, it’s clear that some of the logic behind the underlying database snapshots and rollbacks added here can cause issues (which we will talk about more in the transactions blog).

Change streams

As one of the most talked about (but most specialized) features, I can see its appeal in a very specific use case (but it is rather limited). However, I am getting ahead of myself! Let’s talk about what it is first and where it came from.

Before this feature, people streamed data out of MySQL and MongoDB into Elastic and Hadoop stacks. I mention MySQL, as this was the primer for the initial method MongoDB used. The tools would read the MySQL binlogs – typically saved off somewhere – and then apply those operations to some other technology. When they went to do the same thing in MongoDB, there was a big issue: if writes are not sent to the majority of the nodes, it can cause a rollback. In fact, such rollbacks were not uncommon. The default was w:1 (meaning the primary only needed to have the write), which resulted in data existing in Elastic that had been removed from MongoDB. Hopefully, everyone can see the issue here, and why a better solution was needed than just reading the oplog.


, which in the shell has a helper called
 . This is a method that uses a multi-node consistent read to ensure the data is on the majority of nodes before the command returns the data in a tailable cursor. For this use case, this is amazing as it allows the data replicating tool much more assurance that data is not going to vanish. 
 is not without limits: if we have 10k collections and we want to watch them all, this is 10k separate operations and cursors. This puts a good deal of strain on the systems, so MongoDB Inc. suggests you do this on no more than 1000 namespaces at a time to prevent overhead issues.

Sadly it is not currently possible to take a mongod-wide snapshot to support this under the hood, as this is done on each namespace to implement the snapshot directly inside WiredTiger’s engine. So for anyone with a very large collection count, this will not be your silver bullet yet. This also means streams between collections and databases are not guaranteed to be in sync. This could be an issue for someone even with a smaller number of namespaces that expect this. Please don’t get me wrong: it’s a step in the correct direction, but it falls short.

I had very high hopes for this to simplify backups in MongoDB. Percona Labs’s GitHub has a tool called MongoDB-Consistent-Backup, which tails multiple oplogs to get a consistent sharded backup without the need to pay for MongoDB’s backup service or use the complicated design that is Ops Manager (when you host it yourself). Due to the inability to do a system-wide change stream, this type of tool still needs to use the oplog. If you are not using

  it could trigger a failure if you have an election or if a rollback occurs. Still, I have hopes this will be something that can be considered in the future to make things better for everyone.

Retryable writes

Unlike change streams, this feature is much more helpful to the general MongoDB audience. I am very excited for it. If you have not watched this video, please do right now! Samantha does a good job explaining the issue and solution in detail. However, for now just know there has been a problem that where a write that has an issue (network, app shutdown, DB shutdown, election), you had no way to know if the write failed or not. This unknown situation was terrible for a developer, and they would not know if they needed to run the command again or not. This is especially true if you have an ordering system and you’re trying to remove stock from your inventory system. Sessions, as discussed before, allowed the client to reconnect to a broken connection and keep getting results to know what happened or didn’t. To me, this is the second best feature of 3.6. Only Security is more important to me personally.

Security improvement

In speaking of security, there is one change that the security community wanted (which I don’t think is that big of a deal). For years now, the MongoDB packaging for all OSs (and even the Percona Server for MongoDB packing) by default would limit the bindIP setting to localhost. This was to prevent unintended situations where you had a database open to the public. With 3.6 now the binaries also default to this. So, yes, it will help some. But I would (and have) argued that when you install a database from binaries or source, you are taking more ownership of its setup compared to using Docker, Yum or Apt.

The other major move forward, however, is something I have been waiting for since 2010. Seriously, I am not joking! It offers the ability to limit users to specific CIDR or IP address ranges. Please note MySQL has had this since at least 1998. I can’t recall if it’s just always been there, so let’s say two decades.

This is also known as “authenticationRestriction” and it’s an array you can put into the user document when creating a document. The manual describes it as:

The authentication restrictions the server enforces on the created user. Specifies a list of IP addresses and CIDR ranges from which the user is allowed to connect to the server or from which the server can accept users.

I can not overstate how huge this is. MongoDB Inc. did a fantastic job on it. Not only does it support the classic client address matching, it supports an array of these with matching on the server IP/host also. This means supporting multiple IP segments with a single user is very easy. By extension, I could see a future where you could even limit some actions by range – allowing dev/load test to drop collections, but production apps would not be allowed to. While they should have separate users, I regularly see clients who have one password everywhere. That extension would save them from unplanned outages due to human errors of connecting to the wrong thing.

We will have a whole blog talking about these changes, their importance and using them in practice. Yes, security is that important!

Major love for array and more in Aggregation

This one is a bit easier to summarize. Arrays and dates get some much-needed aggregation love in particular. I could list all the new operators here, but I feel it’s better served in a follow-up blog where we talk about each operator and how to get the most of it. I will say my favorite new option is the $hint. Finally, I can try to control the work plan a bit if it’s making bad decisions, which sometimes happens in any technology.

A better balancer

Like many other areas, there was a good deal that went into balancer improvements. However, there are a few key things that continue the work of 3.4’s parallel balancing improvements.

Some of it makes a good deal of sense for anyone in a support role, such as FTDC now also existing in mongos’. If you do not know what this is, basically MongoDB collects some core metrics and state data and puts it into binary files in dbPath for engineers at companies like Percona and MongoDB Inc. to review. That is not to say you can’t use this data also. However, think of it as a package of performance information if a bug happens. Other diagnostic type improvements include moveChunk, which provides data about what happened when it runs in its output. Previously you could get this data from the config.changeLog or config.actionLog collections in the config servers. Obviously, more and more people are learning MongoDB’s internals and this should be made more available to them.

Having talked about diagnostic items, let’s move more into the operations wheelhouse. The single biggest frustration about sharding and replica-sets is the sensitivity to time variations that cause odd issues, even when using ntpd. To this point, as of 3.6 there is now a logical clock in MongoDB. For the geekier in the crowd, this was implemented using a Lamport Clock (great video of them). For the less geeky, think of it as a cluster-wide clock preventing some of the typical issues related to varying system times. In fact, if you look closer at the oplog record format in 3.6 there is a new wt field for tracking this. Having done that, the team at MongoDB Inc. considered what other things were an issue. At times like elections of the config servers, meta refreshes did not try enough times and could cause a mongos to stop working or fail. Those days are gone! Now it will check three times as much, for a total of ten attempts before giving up. This makes the system much more resilient.

A final change that is still somewhat invisible to the user but helps make dropping collections more stable, is that they remove the issue MongoDB had about dropping and recreating sharded collections. Your namespaces look as they always have. Behind the scenes, however, they have UUID’s attached to them so that if drops and gets recreated, it would be a different UUID. This allows for less-blocking drops. Additionally, it prevents confusion in a distributed system if we are talking about the current or old collection.

JSON Schema validation

Some non-developers might not know much about something called JSON Schema. It allows you to set rules on schema design more efficiently and strictly than MongoDB’s internal rules. With 3.6, you can use this directly. Read more about it here. Some key points:

  • Describes your existing data format
  • Clear, human- and machine-readable documentation
  • Complete structural validation, useful for:
    • Automated testing
    • Validating client-submitted data
You can even make it reject when certain fields are missing. As for MySQL DBAs, you might ask why this is a big deal? You could always have a DBA define a schema in an RDBMS, and the point of MongoDB was to be flexible. That’s a fair and correct view. However, the big point of using it is you could apply this in production, not in development. This gives developers the freedom to move quickly, but provides operational teams with control methods to understand when mistakes or bugs are present before production is adversely affected. Taking a step back, its all about bridging the control and freedom ravines to ensure both camps are happy and able to do their jobs efficiently.

Compass is Community

If you have never used Compass, you might think this isn’t that great. You could use things like RoboMongo and such. You absolutely could, but Compass can do visualization as well as CRUD operations. It’s also a very fluid experience that everyone should know is available for use. This is especially true for QA teams who want to review how often some types of data are present, or a business analyst who needs to understand in two seconds what your demographics are.

Major WiredTiger internal overhaul

There is so much here that I recommend any engineer-minded person take a look at this deck, presented by one of the great minds behind WiredTiger. It does a fantastic job explaining all the reasons behind some of the 3.2 and 3.4 scaling issues MongoDB had on WiredTiger. Of particular interest is why it tended to have worse and worse performance as you added more collections and indexes. It then goes into how they fixed those issues:

  • Some key points on what they did
  • Made Evictions Smarts, as they are not collection uniform
  • Improved assumption around the handle cache
  • Made Checkpoints better in all ways
  • Aggressively cleaned up old handles

I hope this provides a peek into the good, bad, and ugly in MongoDB 3.6 Community! Please check back as we publish more in-depth blogs on how these features work in practice, and how to best use them.

by David Murphy at December 06, 2017 10:09 PM

Webinar Thursday, December 7, 2017: Percona XtraDB Cluster (PXC) 101

Percona XtraDB Cluster

Percona XtraDB ClusterJoin Percona’s Software Engineer (PXC Lead), Krunal Bauskar as he presents Percona XtraDB Cluster 101 on Thursday, December 7, 2017, at 7:00 am PST / 10:00 am EST (UTC-8).

Tags: Percona XtraDB Cluster, MySQL, High Availability, Clustering

Experience Level: Beginner

Percona XtraDB Cluster (PXC) is a multi-master solution that offers virtual synchronous replication among clustering node. It is based on the Codership Galera replication library. In this session, we will explore some key features of Percona XtraDB Cluster that make it enterprise ready including some recently added 5.7 exclusive features.

This webinar is an introductory and will cover the following topics:

  • ProxySQL load balancer
  • Multi-master replication
  • Synchronous replication
  • Data at rest encryption
  • Improved SST Security through simplified configuration
  • Easy to setup encrypted between-nodes communication
  • ProxySQL-assisted Percona XtraDB Cluster maintenance mode
  • Automatic node provisioning
  • Percona XtraDB Cluster “strict-mode”

Register for the webinar now.

Percona XtraDB ClusterKrunal Bauskar, C/C++ Engineer

Krunal joined Percona in September 2015. Before joining Percona he worked as part of the InnoDB team at MySQL/Oracle. He authored most of the temporary table revamp work besides a lot of other features. In the past, he was associated with Yahoo! Labs researching on big data problems and database startup which is now part of Teradata. His interest mainly includes data-management at any scale, and he has been practicing it for more than a decade now. He loves to spend time with his family or get involved in social work, unless he is out for some near-by exploration drive. He is located out of Pune, India.

by Krunal Bauskar at December 06, 2017 03:22 PM

Percona Monitoring and Management 1.5.2 Is Now Available

Percona Monitoring and Management

Percona Monitoring and ManagementPercona announces the release of Percona Monitoring and Management 1.5.2. This release contains fixes to bugs found after Percona Monitoring and Management 1.5.1 was released.

Bug fixes

  • PMM-1790QAN displayed query metrics even for a host that was not configured for mysql:queries or mongodb:queries. We have fixed the behaviour to display an appropriate warning message when there are no query metrics for the selected host.
  • PMM-1826: If PMM Server 1.5.0 is deployed via Docker, the Update button would not upgrade the instance to a later release.
  • PMM-1830: If PMM Server 1.5.0 is deployed via AMI (Amazon Machine Image) instance, the Update button would not upgrade the instance to a later release.

by Borys Belinsky at December 06, 2017 01:27 PM

December 05, 2017

Jean-Jerome Schmidt

Deploying MySQL, MariaDB, Percona Server, MongoDB or PostgreSQL - Made Easy with ClusterControl

Helping users securely automate and manage their open source databases has been at the core of our efforts from the inception of Severalnines.

And ever since the first release of our flagship product, ClusterControl, it’s always been about making it as easy and secure as possible for users to deploy complex, open source database cluster technologies in any environment.

Since our first steps with deployment, automation and management we’ve perfected the art of securely deploying highly available open source database infrastructures by developing ClusterControl from a deployment and monitoring tool to a full-blown automation and management system adopted by thousands of users worldwide.

As a result, ClusterControl can be used today to deploy, monitor, and manage over a dozen versions of the most popular open source database technologies - on premise or in the cloud.

Whether you’re looking to deploy MySQL standalone, MySQL replication, MySQL Cluster, Galera Cluster, MariaDB, MariaDB Cluster, Percona XtraDB and Percona Server for MongoDB, MongoDB itself and PostgreSQL - ClusterControl has you covered.

In addition to the database stores, users can also deploy and manage load balancing technologies such as HAProxy, ProxySQL, MaxScale and Keepalived.

“Very easy to deploy a cluster, also it facilitates administration and monitoring.”

Michel Berger IT Applications Manager European Broadcasting Union (EBU)

Using ClusterControl, database clusters can be either deployed new or existing ones imported.

A deployment wizard makes it easy and secure to deploy production-ready database clusters with a point and click interface that walks the users through the deployment process step by step.

Select Deploy or Import Cluster

Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Walk through of the Deploy Wizard

View your cluster list

“ClusterControl is great for deploying and managing a high availability infrastructure. Also find the interface very easy to manage.”

Paul Masterson, Infrastructure Architect, Dunnes

Deploying with the ClusterControl CLI

Users can also chose to work with our CLI, which allows for easy integration with infrastructure orchestration tools such as Ansible etc.

s9s cluster     
  --os-user=vagrant   --wait

The ClusterControl deployment supports multiple NICS and templated configurations.

In short, ClusterControl provides:

  • Topology-aware deployment jobs for MySQL, MariaDB,Percona, MongoDB and PostgreSQL
  • Self-service and on-demand
  • From standalone nodes to load-balanced clusters
  • Your choice of barebone servers, private/public cloud and containers

To see for yourself, download ClusterControl today and give us your feedback.

by jj at December 05, 2017 07:26 PM

Peter Zaitsev

MySQL 8.0 Window Functions: A Quick Taste

Window Functions

Window FunctionsIn this post, we’ll briefly look at window functions in MySQL 8.0.

One of the major features coming to MySQL 8.0 is the support of Window functions. The detailed documentation is already available here Window functions. I wanted to take a quick look at the cases where window functions help.

Probably one the most frequent limitations in MySQL SQL syntax was analyzing a dataset. I tried to find the answer to the following question: “Find the Top N entries for each group in a grouped result.”

To give an example, I will refer to this request on Stackoverflow. While there is a solution, it is hardly intuitive and portable.

This is a popular problem, so databases without window support try to solve it in different ways. For example, ClickHouse introduced a special extension for LIMIT. You can use LIMIT n BY m to find “m” entries per group.

This is a case where window functions come in handy.

As an example, I will take the IMDB database and find the TOP 10 movies per century (well, the previous 20th and the current 21st).To download the IMDB dataset, you need to have to have an AWS account and download data from S3 storage (the details are provided on IMDB page).

I will use the following query with MySQL 8.0.3:

SELECT primaryTitle,century*100,rating,genres,rn as `Rank` FROM (SELECT primaryTitle,startYear div 100 as century,rating,genres, RANK() OVER (PARTITION BY startYear div 100 ORDER BY rating desc) rn FROM title,ratings WHERE title.tconst=ratings.tconst AND titleType='movie' AND numVotes>100000) t1 WHERE rn<=10 ORDER BY century,rating desc

The main part of this query is RANK() OVER (PARTITION BY startYear div 100 ORDER BY rating desc), which is the mentioned window function. PARTITION BY divides rows into groups, ORDER BY specifies the order and RANK() calculates the rank using the order in the specific group.

The result is:

| primaryTitle                                      | century*100 | rating | genres                     | Rank |
| The Shawshank Redemption                          |        1900 |    9.3 | Crime,Drama                |    1 |
| The Godfather                                     |        1900 |    9.2 | Crime,Drama                |    2 |
| The Godfather: Part II                            |        1900 |      9 | Crime,Drama                |    3 |
| 12 Angry Men                                      |        1900 |    8.9 | Crime,Drama                |    4 |
| The Good, the Bad and the Ugly                    |        1900 |    8.9 | Western                    |    4 |
| Schindler's List                                  |        1900 |    8.9 | Biography,Drama,History    |    4 |
| Pulp Fiction                                      |        1900 |    8.9 | Crime,Drama                |    4 |
| Star Wars: Episode V - The Empire Strikes Back    |        1900 |    8.8 | Action,Adventure,Fantasy   |    8 |
| Forrest Gump                                      |        1900 |    8.8 | Comedy,Drama,Romance       |    8 |
| Fight Club                                        |        1900 |    8.8 | Drama                      |    8 |
| The Dark Knight                                   |        2000 |      9 | Action,Crime,Drama         |    1 |
| The Lord of the Rings: The Return of the King     |        2000 |    8.9 | Adventure,Drama,Fantasy    |    2 |
| The Lord of the Rings: The Fellowship of the Ring |        2000 |    8.8 | Adventure,Drama,Fantasy    |    3 |
| Inception                                         |        2000 |    8.8 | Action,Adventure,Sci-Fi    |    3 |
| The Lord of the Rings: The Two Towers             |        2000 |    8.7 | Action,Adventure,Drama     |    5 |
| City of God                                       |        2000 |    8.7 | Crime,Drama                |    5 |
| Spirited Away                                     |        2000 |    8.6 | Adventure,Animation,Family |    7 |
| Interstellar                                      |        2000 |    8.6 | Adventure,Drama,Sci-Fi     |    7 |
| The Intouchables                                  |        2000 |    8.6 | Biography,Comedy,Drama     |    7 |
| Gladiator                                         |        2000 |    8.5 | Action,Adventure,Drama     |   10 |
| Memento                                           |        2000 |    8.5 | Mystery,Thriller           |   10 |
| The Pianist                                       |        2000 |    8.5 | Biography,Drama,Music      |   10 |
| The Lives of Others                               |        2000 |    8.5 | Drama,Thriller             |   10 |
| The Departed                                      |        2000 |    8.5 | Crime,Drama,Thriller       |   10 |
| The Prestige                                      |        2000 |    8.5 | Drama,Mystery,Sci-Fi       |   10 |
| Like Stars on Earth                               |        2000 |    8.5 | Drama,Family               |   10 |
| Whiplash                                          |        2000 |    8.5 | Drama,Music                |   10 |
27 rows in set (0.19 sec)

The previous century was dominated by “The Godfather” and the current one by “The Lord of the Rings”. While we may or may not agree with the results, this is what the IMDB rating tells us.
If we look at the result set, we can see that there are actually more than ten movies per century, but this is how function RANK() works. It gives the same RANK for rows with an identical rating. And if there are multiple rows with the same rating, all of them will be included in the result set.

I welcome the addition of window functions into MySQL 8.0. This definitely simplifies some complex analytical queries. Unfortunately, complex queries still will be single-threaded — this is a performance limiting factor. Hopefully, we can see multi-threaded query execution in future MySQL releases.

by Vadim Tkachenko at December 05, 2017 06:21 PM

Webinar Wednesday, December 6, 2017: Gain a MongoDB Advantage with the Percona Memory Engine

Percona Memory Engine

Percona Memory EngineJoin Percona’s, CTO, Vadim Tkachenko as he presents Gain a MongoDB Advantage with the Percona Memory Engine on Wednesday, December 6, 2017, at 11:00 am PST / 2:00 pm EST (UTC-8).

Experience: Entry Level to Intermediate

Tags: Developer, DBAs, Operations

Looking for the performance of Redis or Memcache, the expressiveness of the MongoDB query language and simple high availability and sharding? Percona Memory Engine, available as part of Percona Server for MongoDB, has it all!

In this webinar, Vadim explains the architecture of the MongoDB In-Memory storage engine. He’ll also show some benchmarks compared to disk-based storage engines and other in-memory technologies.

Vadim will share specific use cases where Percona Memory Engine for MongoDB excels, such as:

  • Caching documents
  • Highly volatile data
  • Workloads with predictable response time requirements

Register for the webinar now.

Vadim TkachenkoVadim Tkachenko, CTO

Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Officer. Vadim leads Percona Labs, which focuses on technology research and performance evaluations of Percona’s and third-party products. Percona Labs designs no-gimmick tests of hardware, filesystems, storage engines, and databases that surpass the standard performance and functionality scenario benchmarks. Vadim’s expertise in LAMP performance and multi-threaded programming help optimize MySQL and InnoDB internals to take full advantage of modern hardware. Oracle Corporation and its predecessors have incorporated Vadim’s source code patches into the mainstream MySQL and InnoDB products. He also co-authored the book High-Performance MySQL: Optimization, Backups, and Replication 3rd Edition. Previously, he founded a web development company in his native Ukraine and spent two years in the High-Performance Group within the official MySQL support team. Vadim received a BS in Economics and an MS in computer science from the National Technical University of Ukraine.


by Vadim Tkachenko at December 05, 2017 05:06 PM

December 04, 2017

MariaDB AB

MariaDB Connector/C 2.3.4 now available

MariaDB Connector/C 2.3.4 now available dbart Mon, 12/04/2017 - 12:37

The MariaDB project is pleased to announce the immediate availability of MariaDB Connector/C 2.3.4. See the release notes and changelogs for details and visit to download.

Download MariaDB Connector/C 2.3.4

Release Notes Changelog About MariaDB Connector/C

The MariaDB project is pleased to announce the immediate availability of MariaDB Connector/C 2.3.4. See the release notes and changelog for details.

Login or Register to post comments

by dbart at December 04, 2017 05:37 PM

Peter Zaitsev

Internal Temporary Tables in MySQL 5.7

InnoDB row operations graph from PMM

In this blog post, I investigate a case of spiking InnoDB Rows inserted in the absence of a write query, and find internal temporary tables to be the culprit.

Recently I was investigating an interesting case for a customer. We could see the regular spikes on a graph depicting “InnoDB rows inserted” metric (jumping from 1K/sec to 6K/sec), however we were not able to correlate those spikes with other activity. The

 graph (picture from PMM demo) looked similar to this (but on a much larger scale):

InnoDB row operations graph from PMM

Other graphs (Com_*, Handler_*) did not show any spikes like that. I’ve examined the logs (we were not able to enable general log or change the threshold of the slow log), performance_schema, triggers, stored procedures, prepared statements and even reviewed the binary logs. However, I was not able to find any single write query which could have caused the spike to 6K rows inserted.

Finally, I figured out that I was focusing on the wrong queries. I was trying to correlate the spikes on the InnoDB Rows inserted graph to the DML queries (writes). However, the spike was caused by SELECT queries! But why would SELECT queries cause the massive InnoDB insert operation? How is this even possible?

It turned out that this is related to temporary tables on disk. In MySQL 5.7 the default setting for internal_tmp_disk_storage_engine is set for InnoDB. That means that if the SELECT needs to create a temporary table on disk (e.g., for GROUP BY) it will use the InnoDB storage engine.

Is that bad? Not necessarily. Krunal Bauskar published a blog post originally about the InnoDB Intrinsic Tables performance in MySQL 5.7. The InnoDB internal temporary tables are not redo/undo logged. So in general performance is better. However, here is what we need to watch out for:

  1. Change of the place where MySQL stores temporary tables. InnoDB temporary tables are stored in ibtmp1 tablespace file. There are a number of challenges with that:
    • Location of the ibtmp1 file. By default it is located inside the innodb datadir. Originally MyISAM temporary tables were stored in  tmpdir. We can configure the size of the file, but the location is always relative to InnoDB datadir, so to move it to tmpdir we need something like this: 
    • Like other tablespaces it never shrinks back (though it is truncated on restart). The huge temporary table can fill the disk and hang MySQL (bug opened). One way to fix that is to set the maximum size of ibtmp1 file: 
    • Like other InnoDB tables it has all the InnoDB limitations, i.e., InnoDB row or column limits. If it exceeds these, it will return “Row size too large” or “Too many columns” errors. The workaround is to set internal_tmp_disk_storage_engine to MYISAM.
  2. When all temp tables go to InnoDB, it may increase the total engine load as well as affect other queries. For example, if originally all datasets fit into buffer_pool and temporary tables were created outside of the InnoDB, it will not affect the InnoDB memory footprint. Now, if a huge temporary table is created as an InnoDB table it will use innodb_buffer_pool and may “evict” the existing pages so that other queries may perform slower.


Beware of the new change in MySQL 5.7, the internal temporary tables (those that are created for selects when a temporary table is needed) are stored in InnoDB ibtmp file. In most cases this is faster. However, it can change the original behavior. If needed, you can switch the creation of internal temp tables back to MyISAM: 

set global internal_tmp_disk_storage_engine=MYISAM

by Alexander Rubin at December 04, 2017 02:51 PM

MariaDB AB

MariaDB MaxScale Setup with Binlog Server and SQL Query Routing

MariaDB MaxScale Setup with Binlog Server and SQL Query Routing massimiliano_pinto_g Mon, 12/04/2017 - 03:26

Binlog server is a MariaDB MaxScale replication proxy setup which involves one Master server and several slave servers using MariaDB replication protocol.

Up to MariaDB MaxScale version 2.1, due to the lack of some SQL variables needed by the monitor for MariaDB instances, it’s not possible to use it in conjunction with SQL routing operations, such as Read/Write split routing.

With MariaDB MaxScale 2.2 (currently in beta) this is no longer a limitation as the monitor can detect a Binlog server setup and SQL statements can be properly routed among Master and Slave servers.

Depending on the configuration value of the optional variable “master_id”, the binlog server can be seen as a ‘Relay Master’ with its own slaves or just a ‘Running’ server, without its slaves being listed.

MariaDB MaxScale configuration:

# binlog server details

# Mysql monitor
[MySQL Monitor]

# R/W split service

# Binlog server configuration

# Binlog server listener

Note: the ‘binlog_server’ is not needed in the server list of R/W split service; if set it doesn’t harm MariaDB MaxScale as it doesn’t have Slave or Master states.

Binlog Server identity post reminds which parameters affect the way MaxScale is seen from Slave servers and MaxScale monitor.

Scenario A: only server_id is given in configuration.

MySQL [(none)]> select @@server_id; // The server_id of master, query from slaves.
| @@server_id |
|       10124 |

MySQL [(none)]> select @@server_id, @@read_only; // Maxscale server_id, query from MySQL monitor only.

| @@server_id | @@read_only |
|          93 |           0 |

*************************** 1. row ***************************
              Slave_IO_State: Binlog Dump
                 Master_Host:  // Master server IP
                 Master_User: repo
                 Master_Port: 3306
            Master_Server_Id: 10124 // Master Server_ID

MaxAdmin> show servers
Server 0x1f353b0 (server1)
    Status:                              Slave, Running
    Protocol:                            MySQLBackend
    Port:                                25231
    Server Version:                      10.0.21-MariaDB-log
    Node Id:                             101
    Master Id:                           10124
    Slave Ids:                           
    Repl Depth:                          1

Server 0x1f31af0 (server2)
    Status:                              Master, Running
    Protocol:                            MySQLBackend
    Port:                                10124
    Server Version:                      10.1.24-MariaDB
    Node Id:                             10124
    Master Id:                           -1
    Slave Ids:                           101, 93
    Repl Depth:                          0

Server 0x1f32d90 (binlog_server)
    Status:                              Running
    Protocol:                            MySQLBackend
    Port:                                8808
    Server Version:                      10.1.17-log
    Node Id:                             93
    Master Id:                           10124
    Slave Ids:                           
    Repl Depth:                          1

Scenario B: server_id and common_identity (master_id)

router_options=server-id=93, master_id=1111

MySQL [(none)]> select @@server_id; // Maxscale common identity
| @@server_id |
|        1111 |
1 row in set (0.00 sec)

MySQL [(none)]> select @@server_id, @@read_only; // Maxscale common identity
| @@server_id | @@read_only |
|        1111 |           0 |
1 row in set (0.00 sec)

MySQL [(none)]> show slave status\G
*************************** 1. row ***************************
              Slave_IO_State: Binlog Dump
                 Master_Host:  // Master server IP
                 Master_User: repl
                 Master_Port: 3306
            Master_Server_Id: 10124 // Master Server_ID

MaxAdmin> show servers
Server 0x24103b0 (server1)
    Status:                              Slave, Running
    Protocol:                            MySQLBackend
    Port:                                25231
    Server Version:                      10.0.21-MariaDB-log
    Node Id:                             101
    Master Id:                           1111
    Slave Ids:                           
    Repl Depth:                          2
Server 0x240dd90 (binlog_server)
    Status:                              Relay Master, Running
    Protocol:                            MySQLBackend
    Port:                                8808
    Server Version:                      10.1.17-log
    Node Id:                             1111
    Master Id:                           10124
    Slave Ids:                           101
    Repl Depth:                          1

Server 0x240caf0 (server2)
    Status:                              Master, Running
    Protocol:                            MySQLBackend
    Port:                                10124
    Server Version:                      10.1.24-MariaDB
    Node Id:                             10124
    Master Id:                           -1
    Slave Ids:                           1111
    Repl Depth:                          0

The latter configuration with the extra master_id option is clearly then one which well represents the setup with Binlog server as a replication proxy: the user can immediately see that.

The picture shows the setup and makes it clear MariaDB MaxScale handles both replication protocol between Master and Slaves and also routes Read and Write application traffic.



This post shows how it's easy for any user to improve a MariaDB replication setup with MariaDB MaxScale combining benefits of replication proxy and query routing scalability.

MariaDB MaxScale 2.2 is in beta and we do not recommend using it in production environments. However, we do encourage you to download, test it and share your successes!


Additional Resources

With MariaDB MaxScale 2.2 (currently in beta) the monitor for MariaDB instances can detect a Binlog server setup and SQL statements can be properly routed among Master and Slave servers.

Login or Register to post comments

by massimiliano_pinto_g at December 04, 2017 08:26 AM

December 02, 2017

Valeriy Kravchuk

Using strace for MySQL Troubleshooting

I'd say that strace utility is even more useful for MySQL DBAs than lsof. Basically, it is a useful general purpose diagnostic and debugging tool for tracing system calls some process makes and signals it receives. The name of each system call, its arguments and its return value are printed to stderr or to the file specified with the -o option.

In context of MySQL strace is usually used to find what files mysqld process accesses, and to check the details about any I/O errors. For example, if I'd really want to try to verify Bug #88479 - "Unable to start mysqld using a single config file (and avoiding reading defaults)" by Simon Mudd, I'd just run mysqld from some 5.7.x version as a command argument for strace. On my Ubuntu 14.04 netbook I have the following files:
openxs@ao756:~/dbs/5.7$ ls -l /usr/my.cnf
-rw-r--r-- 1 root root 943 лип 19  2013 /usr/my.cnf
openxs@ao756:~/dbs/5.7$ ls -l /etc/my.cnf
-rw-r--r-- 1 root root 260 чер 24 20:40 /etc/my.cnf
openxs@ao756:~/dbs/5.7$ ls -l /etc/mysql/my.cnf
-rw-r--r-- 1 root root 116 лют 26  2016 /etc/mysql/my.cnf
openxs@ao756:~/dbs/5.7$ bin/mysqld --version
bin/mysqld  Ver 5.7.18 for Linux on x86_64 (MySQL Community Server (GPL))
So, what happens if I try to run mysqld --defaults-file=/etc/mysql/my.cnf, like this:
openxs@ao756:~/dbs/5.7$ strace bin/mysqld --defaults-file=/etc/mysql/my.cnf --print-defaults 2>&1 | grep 'my.cnf'
stat("/etc/mysql/my.cnf", {st_mode=S_IFREG|0644, st_size=116, ...}) = 0
open("/etc/mysql/my.cnf", O_RDONLY)     = 3
It seems we proved that only the file passed as --defaults-file is read (if it exists). By default other locations are also checked in a specific order (note that return codes are mapped to symbolic errors when possible):
openxs@ao756:~/dbs/5.7$ strace bin/mysqld --print-defaults --print-defaults 2>&1 | grep 'my.cnf'
stat("/etc/my.cnf", {st_mode=S_IFREG|0644, st_size=260, ...}) = 0
open("/etc/my.cnf", O_RDONLY)           = 3
stat("/etc/mysql/my.cnf", {st_mode=S_IFREG|0644, st_size=116, ...}) = 0
open("/etc/mysql/my.cnf", O_RDONLY)     = 3
stat("/home/openxs/dbs/5.7/etc/my.cnf", 0x7ffd68f0e020) = -1 ENOENT (No such file or directory)
stat("/home/openxs/.my.cnf", 0x7ffd68f0e020) = -1 ENOENT (No such file or directory)
If we think that --print-defaults may matter, we can try without it:
openxs@ao756:~/dbs/5.7$ strace bin/mysqld --defaults-file=/etc/mysql/my.cnf 2>&1 | grep 'my.cnf'stat("/etc/mysql/my.cnf", {st_mode=S_IFREG|0644, st_size=116, ...}) = 0
open("/etc/mysql/my.cnf", O_RDONLY)     = 3
Last example also shows how one can terminate tracing with Ctrl-C.

Now, let me illustrate typical use cases with (surprise!) some public MySQL bug reports where strace was important to find or verify the bug:
  • Bug #20748 - "Configuration files should not be read more than once". In this old bug report Domas Mituzas proved the point by running mysqld as a command via strace. He filtered out lines related to opening my.cnf file with egrep and made it obvious that one file may be read more than once. The bug is closed long time ago, but it is not obvious that all possible cases are covered, based on further comments...
  • Bug #49336 - "mysqlbinlog does not accept input from stdin when stdin is a pipe". Here bug reporter shown how run mysqlbinlog as a command under strace and redirect output to the file with -o option.
  • Bug #62224 - "Setting open-files-limit above the hard limit won't be logged in errorlog". This minor bug in all versions before old 5.6.x (that still remains "Verified", probably forgotten) was reported by Daniël van Eeden. strace allowed him to show what limits are really set by setrlimit() calls. The fact that arguments of system calls are also shown matters sometimes.
  • Bug #62578 - "mysql client aborts connection on terminal resize". This bug report by Jervin Real from Percona is one of my all time favorites! I clearly remember how desperate customer tried to to dump data and load them on a remote server with a nice command line, and after waiting for many days (mysqldump was run for a data set of 5TB+, don't ask, not my idea) complained that they got "Lost connection to MySQL server during query" error message and loading failed. Surely, the command was run from the terminal window on his Mac and in the process of moving it here and there he just resized the window  by chance... You can see nice example of strace usage with MySQL Sandbox in the bug report, as well as some outputs for SIGWINCH signal. Note that the bug is NOT fixed by Oracle in MySQL 5.5.x (and never happened in 5.6+), while Percona fixed it since 5.5.31.
  • Bug #65956 - "client and libmysqlclient VIO drops connection if signal received during read()". In this yet another 5.5 only regression bug that was NOT fixed in 5.5.x by Oracle (until in frames of Bug #82019 - "Is client library supposed to retry EINTR indefinitely or not" the patch was contributed for a special case of it by Laurynas Biveinis from Percona, that in somewhat changed form allowed to, hopefully, get some fix in 5.5.52+), the problem was noted while trying to attach strace to running client program (with -p option).
  • Bug #72340 - "my_print_defaults fails to parse include directive if there is no new line". Here bug reporter used strace (with -f option to trace child/forked processes and -v option get verbose output) to show that file name is NOT properly recognized by my_print_defaults. The bug is still "Verified".
  • Bug #72701 - "mysqlbinlog uses localtime() to print events, causes kernel mutex contention". Yet another nice bug report by Domas Mituzas, who had used -e stat options of strace to trace only stat() calls and show how many times they are applied to /etc/localtime.The bug is fixed since 5.5.41, 5.6.22 and 5.7.6 removed related kernel mutex contention.
  • Bug #76627 - "MySQL/InnoDB mix buffered and direct IO". It was demonstrated with strace that InnoDB was opening each .ibd file twice (this is expected as explained by Marko Mäkelä) in different modes (and this was not any good). The bug is fixed in recent renough MySQL server versions.
  • Bug #80020 - "mysqlfrm doesn't work with 5.7". In this bug report Aleksandr Kuzminsky had to use strace to find out why mysqlfrm utility failed mostly silently even with --verbose option.
  • Bug #80319 - ""flush tables" semantics deviate from manual". Here Jörg Brühe proved with strace that close() system call is not used for individual InnoDB table t when FLUSH TABLES t is executed. manual had to be clarified.
  • Bug #80889 - "PURGE BINARY LOGS TO is reading the whole binlog file and causing MySql to Stall". By attaching strace to running mysqld process and running the command it was shown that when GTID_MODE=OFF the entire file we purge to was read. No reason to do this, really. Note how return value of open(), file descriptor, was further used to track reads from this specific file.
  • Bug #81443 - "mysqld --initialize-insecure silently fails with --user flag". As MySQL DBA, you should be ready to use strace when requested by support or developers. Here bug reporter was asked to use it, and this revealed a permission problem (that was a result of the bug). No need to guess what may go wrong with permissions - just use strace to find out!
  • Bug #84708 - "mysqld fills up error logs with [ERROR] Error in accept: Bad file descriptor". Here strace was used to find out that "When mysqld is secured with tcp_wrappers it will close a socket from an unauthorized ip address and then immediately start polling that socket"... The bug is fixed in MySQL 5.7.19+ and 8.0.2+, so take care!
  • Bug #86117 - "reconnect is very slow". I do not care about MySQL Shell, but in this bug report Daniël van Eeden used -r option of strace to print a relative timestamp for every system call, and this was it was clear how much time was spent during reconnect, and where it was spent. Very useful!
  • Bug #86462 - "mysql_ugprade: improve handling of upgrade errors". In this case strace allowed Simon Mudd to find out what exact SQL statement generated by mysql_upgrade failed.
The last but not the least, note that you may need root or sudo privileges to use strace. On my Ubuntu 14.04 this kind of messages may appear even if I am the user that owns mysqld process (same with gdb):
openxs@ao756:~/dbs/maria10.2$ strace -p 2083
strace: attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
To summarize, strace may help MySQL DBA to find out:

  • what files are accessed by the mysqld process or related utilities, and in what order
  • why some MySQL-related command (silently) fails or hangs
  • why some commands end up with permission denied or other errors
  • what signals MySQL server and tools get
  • what system calls could took a lot of time when something works slow
  • when files are opened and closed, and how much data are read from the files
  • where the error log and other logs are really located (we can look for system calls related to writing to stderr, for example)
  • how MySQL really works with files, ports and sockets
It also helps to find and verify MySQL bugs, and clarify missing details in MySQL manual.

There other similar tools for tracing system calls (maybe among other things) on Linux that I am going to review in this blog some day. Performance impact of running MySQL server under this kind of tracing is also a topic to study.

by Valeriy Kravchuk ( at December 02, 2017 04:42 PM

December 01, 2017

Peter Zaitsev

Percona Monitoring and Management 1.5: QAN in Grafana Interface

Percona-Monitoring-and-Management-1.5-QAN-1 small

In this post, we’ll examine how we’ve improved the GUI layout for Percona Monitoring and Management 1.5 by moving the Query Analytics (QAN) functions into the Grafana interface.

For Percona Monitoring and Management users, you might notice that QAN appears a little differently in our 1.5 release. We’ve taken steps to unify the PMM interface so that it feels more natural to move from reviewing historical trends in Metrics Monitor to examining slow queries in QAN.  Most significantly:

  1. QAN moves from a stand-alone application into Metrics Monitor as a dashboard application
  2. We updated the color scheme of QAN to match Metrics Monitor (but you can toggle a button if you prefer to still see QAN in white!)
  3. Date picker and host selector now use the same methods as Metrics Monitor

Percona Monitoring and Management 1.5 QAN 1

Starting from the PMM landing page, you still see two buttons – one for Metrics Monitor and another for Query Analytics (this hasn’t changed):

Percona Monitoring and Management 1.5 QAN 2

Once you select Query Analytics on the left, you see the new Metrics Monitor dashboard page for PMM Query Analytics. It is now hosted as a Metrics Monitor dashboard, and notice the URL is no longer /qan:

Percona Monitoring and Management 1.5 QAN 3

Another advantage of the Metrics Monitor dashboard integration is that the QAN inherits the host selector from Grafana, which supports partial string matching. This makes it simpler to find the host you’re searching for if you have more than a handful of instances:

Percona Monitoring and Management 1.5 QAN 4

The last feature enhancement worth mentioning is the native Grafana time selector, which lets you select down to the minute resolution time frames. This was a frequent source of feature requests — previously PMM limited you to our pre-defined default ranges. Keep in mind that QAN has an internal archiving job that caps QAN history at eight days.

Percona Monitoring and Management 1.5 QAN 5

Last but not least is the ability to toggle between the default dark interface and the optional white. Look for the small lightbulb icon at the bottom left of any QAN screen (Percona Monitoring and Management 1.5 QAN 6) and give it a try!

Percona Monitoring and Management 1.5 QAN 7

We hope you enjoy the new interface, and we look forward to your feedback on these improvements!

by Michael Coburn at December 01, 2017 09:21 PM

This Week in Data with Colin Charles 17: AWS Re:Invent, a New Book on MySQL Cluster and Another Call Out for Percona Live 2018

Colin Charles

Colin Charles Open Source Database evangelist for PerconaJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

The CFP for Percona Live Santa Clara 2018 closes December 22, 2017: please consider submitting as soon as possible. We want to make an early announcement of talks, so we’ll definitely do a first pass even before the CFP date closes. Keep in mind the expanded view of what we are after: it’s more than just MySQL and MongoDB. And don’t forget that with one day less, there will be intense competition to fit all the content in.

A new book on MySQL Cluster is out: Pro MySQL NDB Cluster by Jesper Wisborg Krogh and Mikiya Okuno. At 690 pages, it is a weighty tome, and something I fully plan on reading, considering I haven’t played with NDBCLUSTER for quite some time.

Did you know that since MySQL 5.7.17, connection control plugins are included? They help DBAs introduce an increasing delay in server response to clients after a certain number of consecutive failed connection attempts. Read more at the connection control plugins.

While there are a tonne of announcements coming out from the Amazon re:Invent 2017 event, I highly recommend also reading Some data of interest as AWS reinvent 2017 ramps up by James Governor. Telemetry data from sumologic’s 1,500 largest customers suggest that NoSQL database usage has overtaken relational database workloads! Read The State of Modern Applications in the Cloud. Page 8 tells us that MySQL is the #1 database on AWS (I don’t see MariaDB Server being mentioned which is odd; did they lump it in together?), and MySQL, Redis & MongoDB account for 40% of database adoption on AWS. In other news, Andy Jassy also mentions that less than 1.5 months after hitting 40,000 database migrations, they’ve gone past 45,000 over the Thanksgiving holiday last week. Have you started using AWS Database Migration Service?


Link List

Upcoming appearances

  • ACMUG 2017 gathering – Beijing, China, December 9-10 2017 – it was very exciting being there in 2016, I can only imagine it’s going to be bigger and better in 2017, since it is now two days long!


I look forward to feedback/tips via e-mail at or on Twitter @bytebot.

by Colin Charles at December 01, 2017 02:58 PM

Jean-Jerome Schmidt

Upgrading to the ClusterControl Enterprise Edition

The ClusterControl Enterprise Edition provides you will a full suite of management and scaling features in addition to the deployment and monitoring functions offered as part of the free Community Edition. You also have the ability to deploy, configure and manage the top open source load balancing and caching technologies to drive peak performance for your mission-critical applications.

Whether you have been benefiting from the free resources included in the Community Edition or have evaluated the product through the Enterprise Trial, we’ll walk you through how our licensing works and explain how to get you up-and-running with all the automation and scaling that ClusterControl Enterprise has to offer.

“With quick installation, ease of use, great support, stable deployments and a scalable architecture, ClusterControl is just the solution we were looking for to provide a strong MySQL HA platform to our customers.”

Xavi Morrus, CMO, MediaCloud

How to Upgrade from Community to Enterprise

While using the ClusterControl Community Edition you may have clicked on a feature and got a pop-up indicating that it was not included in the version you are using. When this happens you have two options. You can activate (or extend) your Enterprise Trial OR you can contact sales to purchase an enterprise license.

“Our back-end is reliant on different databases to tackle different tasks. Using several different tools, rather than a one-stop shop, was detrimental to our productivity. Severalnines is that ‘shop’ and we haven’t looked back. ClusterControl is an awesome solution like no other.”

Zeger Knops, Head of Business Technology, vidaXL

Enterprise Trial

The ClusterControl Enterprise trial provides you with free access to our full suite of features for 30 days. The purpose of this trial is to allow you to “kick the tires” using your environments and applications to make sure that ClusterControl meets your needs.

With the trial you have access to all our Community features plus: Custom Dashboards, Load Balancers, Configuration Management, Backup and Restore, Automatic Node and Cluster Recovery, Role Based Access Control, Key Management, LDAP, SSL Encryption Scaling, and more!

The trial also grants you Enterprise Level access to our support teams 24/7. We want to make sure that you have the best experience during your trial and also introduce you to our amazing support that you can count on when you become a customer of Severalnines.

At the end of your trial, you will have the option to meet with our sales team to continue with ClusterControl Enterprise on a paid license. Or you may also continue with our ClusterControl Community Edition, which you can use for free - forever.

Extending Your Trial

Sometimes thirty days isn’t enough time to evaluate a product as extensive as ClusterControl. In these situations we can sometimes grant an extension to allow you some more time to evaluate the product. This extension can be requested from the product itself and you will be contacted by an account manager to arrange for the extension.

“ClusterControl is phenomenal software…I’m usually not impressed with vendors or the software we buy, because usually it’s over promised and under delivered. ClusterControl is a nice handy system that makes
me feel confident that we can run this in a production environment.”

Jordan Marshall, Manager of Database Administration, Black Hills Corporation

Purchasing a Commercial License

ClusterControl offers three separate plans and different support options. Our account managers are available to assist, and recommend the best plan. We also offer volume discounts for larger orders. In short, we will work very hard to make sure our price meets your needs and budget. Once we’ve all signed on the dotted line, you will then be provided with Commercial License keys that you can put into your already deployed environment (or into a new one) which will then immediately grant you full access to the entire suite of ClusterControl features that you have contracted.

Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Benefits of Upgrading

While the free ClusterControl community version provides rich features that allow you to easily and securely deploy and monitor your open source databases, the Enterprise Edition provides much much more!

These are just some of the features awaiting you in the Enterprise Edition...

  • Advanced Backup & Restoration: With ClusterControl you can schedule logical or physical backups with failover handling and easily restore backups to bootstrap nodes or systems.
  • Automated Failover: ClusterControl includes advanced support for failure detection and handling; it also allows you to deploy different proxies to integrate them with your HA stack.
  • Topology Changes: Making topology changes with ClusterControl is easy; it does all the background work to elect a new master, deploy fail-over slave servers, rebuild slaves in case of data corruption, and maintain load balancer configurations to reflect all the changes.
  • Load Balancing: Load balancers are an essential component in database high availability; especially when making topology changes transparent to applications and implementing read-write split functionality and ClusterControl provides support for ProxySQL, HAProxy, and Maxscale.
  • Advanced Security: ClusterControl removes human error and provides access to a suite of security features automatically protecting your databases from hacks and other threats. Operational Reports come in handy, whether you need to show you are meeting your SLAs or wish to keep track of the historical data of your cluster.
  • Scaling: Easily add and remove nodes, resize instances, and clone your production clusters with ClusterControl.

In short, ClusterControl is an all-inclusive database management system that removes the need for your team to have to cobble together multiple tools, saving you time and money.

If you ever have any issues during this process you can always consult the documentation or contact us. If you need support you can contact us here.

by Severalnines at December 01, 2017 02:11 PM

MariaDB AB

The Binlog Server Common Identity

The Binlog Server Common Identity massimiliano_pinto_g Fri, 12/01/2017 - 03:19

The “identity” of the binlog server layer from the slave server’s point of view is something that could be modified in order to provide all slaves such common parameters for all MariaDB MaxScale server they could replicate from.

This way the slave servers can see the same values for such config options and they will not be able to understand whether the real master has changed after a failure or a new server has been promoted as master, for any reason.

Set of parameters useful for binlog server identity configuration.

    Some parameters must  be configured with different values in each MariaDB MaxScale server:

  • server-id
    As with uuid, MariaDB MaxScale must have a unique server-id for the connection it makes to the master, this parameter provides the value of server-id that MariaDB MaxScale will use when connecting to the master.

  • uuid
    This is used to set the unique uuid that the binlog router uses when it connects to the master server.
    If no explicit value is given for the uuid in the configuration file then a uuid will be generated.

    Let’s see in the details the parameters that could be set for binlog server common identity:

  • master-id
    The server-id value that MariaDB MaxScale should use to report to the slaves that connect to MariaDB MaxScale.
    This may either be the same as the server-id of the real master or can be chosen to be different if the slaves need to be aware of the proxy layer.
    The real master server-id will be used if the option is not set.

  • master_uuid
    It is a requirement of replication that each slave have a unique UUID value. The MariaDB MaxScale router will identify itself to the slaves using the uuid of the real master if this option is not set.

  • master_version
    The MariaDB MaxScale router will identify itself to the slaves using the server version of the real master if this option is not set.

  • master_hostname
    The MariaDB MaxScale router will identify itself to the slaves using the server hostname of the real master if this option is not set.
  • slave_hostname
    MariaDB MaxScale can optionally identify itself to the master using a custom hostname.
    The specified hostname can be seen in the master server via SHOW SLAVE HOSTS command.
    The default is not to send any hostname string during registration.
    This parameter doesn’t affect the identity seen by the slave servers but it’s useful in order to properly list all MaxScale servers that are replicating from the master.

An example with one Master, two MariaDB MaxScale servers and many slaves per each proxy:



  • server_id=1, server_uuid: A, version: 5.6.15

MariaDB MaxScale

  • Max_1: server-id=2, uuid=B        
  • Max_2: server-id=3, uuid=C

Slaves (1 ... N)

  • Slave_1: server_id=10,server_uuid=FFF,Ver: 5.6.18
  • Slave_2: server_id=11,server_uuid=DDD,Ver: 5.6.19
  • Slave_3: server_id=12,server_uuid=ABC,Ver: 5.6.17

The MariaDB MaxScale common identity we want to show all slaves is:

  1. master_id: 1111
  2. master_version: 5.6.99-common
  3. master_uuid: xxx-fff-cccc-fff
  4. master_hostname=common_server


    Detailed options set in maxscale.cnf for each MaxScale server:





    Query results for the common parameters, issued to any MaxScale server on its client port:

    MariaDB> select @@server_id;
    | @@server_id |
    |        1111 |
    1 row in set (0.00 sec)
    MariaDB> select @@server_uuid;
    | @@server_uuid    |
    | xxx-fff-cccc-fff |
    1 row in set (0.00 sec)
    MariaDB> select version();
    | VERSION()     |
    | 5.6.29-common |
    1 row in set (0.00 sec)
    MariaDB> select @@hostname;
    | @@hostname    |
    | common_server |
    1 row in set (0.00 sec)

    Example in the log file of Max1: both master and slave identity is logged.

    2016-01-13 11:06:45  notice : BinlogServer: identity seen by the master: server_id: 2, uuid: XYZ, Host: binlog-server-1
    2016-01-13 11:06:45  notice : BinlogServer: identity seen by the slaves: server_id: 1111, uuid: XYZ, hostname: common_server, MySQL version: 5.6.17-mxs



    We encourage you to download MariaDB MaxScale, test the Binlog Server setup with the Common Identity variables and share your experience!

    We are done for now but stay tuned, a follow up blog post will show how to configure MariaDB MaxScale for both replication proxy and query routing applications.

    Additional Resources

    The “identity” of the binlog server layer from the slave server’s point of view is something that could be modified in order to provide all slaves such common parameters for all MariaDB MaxScale server they could replicate from.

    Login or Register to post comments

    by massimiliano_pinto_g at December 01, 2017 08:19 AM

    November 30, 2017

    MariaDB AB

    Validating a MariaDB ColumnStore System Setup

    Validating a MariaDB ColumnStore System Setup davidhill2 Thu, 11/30/2017 - 17:41


    We have learned that the MariaDB ColumnStore prerequisites and install can be complex for multi-node installs so we decided it would help our users to develop a tool to validate for install readiness.

    This tool will validate the setup whether installing on a single server or a multi-server system. It is called MariaDB ColumnStore Cluster Test Tool. It is part of the MariaDB ColumnStore package and can be run from the installing server once the MariaDB ColumnStore package has been installed.

    You can also use this tool on a MariaDB ColumnStore system if it fails to startup or a server has been replaced within an existing system. Some OS settings could have been changed by a System Admin or lost during a reboot. This tool will help detect those issues.

    The tool will:

    • Communicate with all the servers that are going to be used in the MariaDB ColumnStore system to test out the SSH connectivity

    • Check for matching OSs and locale settings on all servers

    • Check that each of the required dependency packages are installed on each node

    • Check the firewall settings and test the ports that the MariaDB ColumnStore product will utilize to communicate between the servers

    • Compare the system’s date and time to make sure they are in sync between the local servers and the other servers

    Running this tool will detect and report any issues that might prevent the MariaDB ColumnStore product from installing and starting up, which can save a lot of time during the initial install.

    Here is an example where the tool detected two issues on a three-node system. The two non-local nodes are referenced by the provided IP addresses in the command. This command is run from the server and is designated as ‘pm1’


    # ./ --ipaddr=,
    *** This is the MariaDB Columnstore Cluster System test tool ***
    ** Validate local OS is supported
    Local Node OS System Name : CentOS Linux 7 (Core)
    ** Run Ping access Test to remote nodes  Node Passed ping test  Node Passed ping test
    ** Run SSH Login access Test to remote nodes  Node Passed SSH login test using ssh-keys  Node Passed SSH login test using ssh-keys
    ** Run OS check - OS version needs to be the same on all nodes
    Local Node OS Version : CentOS Linux 7 (Core) Node OS Version : CentOS Linux 7 (Core) Node OS Version : CentOS Linux 7 (Core)
    ** Run Locale check - Locale needs to be the same on all nodes
    Local Node Locale : LANG=en_US.UTF-8 Node Locale : LANG=en_US.UTF-8 Node Locale : LANG=en_US.UTF-8
    ** Run SELINUX check - Setting should to be disabled on all nodes
    Local Node SELINUX setting is Not Enabled Node SELINUX setting is Not Enabled Node SELINUX setting is Not Enabled
    ** Run Firewall Services check - Firewall Services should to be Inactive on all nodes
    Local Node iptables service is Not Active
    Local Node ufw service is Not Active
    Local Node firewalld service is Not Active
    Local Node firewall service is Not Active Node iptables service is Not Enabled Node ufw service is Not Enabled Node firewalld service is Not Enabled Node firewall service is Not Enabled Node iptables service is Not Enabled Node ufw service is Not Enabled Node firewalld service is Not Enabled Node firewall service is Not Enabled
    ** Run MariaDB ColumnStore Port (8600-8620) availability test  Node Passed port test  Node Passed port test
    ** Run Date/Time check - Date/Time should be within 10 seconds on all nodes
    Passed: Node date/time is within 10 seconds of local node
    Passed: Node date/time is within 10 seconds of local node
    ** Run MariaDB ColumnStore Dependent Package Check
    Local Node - Passed, all dependency packages are installed Node - Passed, all dependency packages are installed
    Failed, Node package expect is not installed, please install
    Failure occurred, do you want to continue? (y,n) > y
    *** Finished Validation of the Cluster, Failures occurred. Check for Error/Failed test results ***



    As you can see from the report above, one issue was detected:

    • Package ‘expect’ was missing on one of the other servers.

    So we fix the one issue as shown here


    # ssh yum install expect -y
    Loaded plugins: fastestmirror
    Loading mirror speeds from cached hostfile
     * base:
     * extras:
     * updates:
    Resolving Dependencies
    --> Running transaction check
    ---> Package expect.x86_64 0:5.45-14.el7_1 will be installed
    --> Finished Dependency Resolution
    Dependencies Resolved
     Package          Arch             Version                 Repository      Size
     expect           x86_64           5.45-14.el7_1           base           262 k
    Transaction Summary
    Install  1 Package
    Total download size: 262 k
    Installed size: 566 k
    Downloading packages:
    Running transaction check
    Running transaction test
    Transaction test succeeded
    Running transaction
      Installing : expect-5.45-14.el7_1.x86_64                                  1/1 
      Verifying  : expect-5.45-14.el7_1.x86_64                                  1/1 
      expect.x86_64 0:5.45-14.el7_1                                                 




    Now rerun the test to validate a clean setup

    # ./ --ipaddr=,
    *** This is the MariaDB Columnstore Cluster System test tool ***
    ** Validate local OS is supported
    Local Node OS System Name : CentOS Linux 7 (Core)
    ** Run Ping access Test to remote nodes  Node Passed ping test  Node Passed ping test
    ** Run SSH Login access Test to remote nodes  Node Passed SSH login test using ssh-keys  Node Passed SSH login test using ssh-keys
    ** Run OS check - OS version needs to be the same on all nodes
    Local Node OS Version : CentOS Linux 7 (Core) Node OS Version : CentOS Linux 7 (Core) Node OS Version : CentOS Linux 7 (Core)
    ** Run Locale check - Locale needs to be the same on all nodes
    Local Node Locale : LANG=en_US.UTF-8 Node Locale : LANG=en_US.UTF-8 Node Locale : LANG=en_US.UTF-8
    ** Run SELINUX check - Setting should to be disabled on all nodes
    Local Node SELINUX setting is Not Enabled Node SELINUX setting is Not Enabled Node SELINUX setting is Not Enabled
    ** Run Firewall Services check - Firewall Services should to be Inactive on all nodes
    Local Node iptables service is Not Active
    Local Node ufw service is Not Active
    Local Node firewalld service is Not Active
    Local Node firewall service is Not Active Node iptables service is Not Enabled Node ufw service is Not Enabled Node firewalld service is Not Enabled Node firewall service is Not Enabled Node iptables service is Not Enabled Node ufw service is Not Enabled Node firewalld service is Not Enabled Node firewall service is Not Enabled
    ** Run MariaDB ColumnStore Port (8600-8620) availability test  Node Passed port test  Node Passed port test
    ** Run Date/Time check - Date/Time should be within 10 seconds on all nodes
    Passed: Node date/time is within 10 seconds of local node
    Passed: Node date/time is within 10 seconds of local node
    ** Run MariaDB ColumnStore Dependent Package Check
    Local Node - Passed, all dependency packages are installed Node - Passed, all dependency packages are installed Node - Passed, all dependency packages are installed
    *** Finished Validation of the Cluster, all Test Passed ***




    Please find more details about the MariaDB ColumnStore Cluster Test Tool and how to use it.

    We are excited to offer this new tool for MariaDB ColumnStore 1.1, which is available for download as part of MariaDB AX, an enterprise open source solution for modern data analytics and data warehousing.

    We have learned that the MariaDB ColumnStore prerequisites and install can be complex for multi-node installs so we decided it would help our users to develop a tool to validate for install readiness. This tool will validate the setup whether installing on a single server or a multi-server system. It is called MariaDB ColumnStore Cluster Test Tool. It is part of the MariaDB ColumnStore package and can be run from the installing server once the MariaDB ColumnStore package has been installed.

    Login or Register to post comments

    by davidhill2 at November 30, 2017 10:41 PM

    Peter Zaitsev

    Percona Server for MongoDB 3.4.10-2.10 Is Now Available

    Percona Server for MongoDB 3.4

    Percona Server for MongoDB 3.4Percona announces the release of Percona Server for MongoDB 3.4.10-2.10 on November 30, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

    Percona Server for MongoDB is an enhanced, open source, fully compatible, highly-scalable, zero-maintenance downtime database supporting the MongoDB v3.4 protocol and drivers. It extends MongoDB with Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features:

    Percona Server for MongoDB requires no changes to MongoDB applications or code.

    This release is based on MongoDB 3.4.10 and includes the following additional change:

    • oplog searches have been optimized in MongoRocks, which should also increase overall performance.

    by Hrvoje Matijakovic at November 30, 2017 09:50 PM

    November 29, 2017

    Peter Zaitsev

    Percona Monitoring and Management 1.5.1 Is Now Available

    Percona Monitoring and Management

    Percona announces the release of Percona Monitoring and Management 1.5.1. This release contains fixes for bugs found after Percona Monitoring and Management 1.5.0 was released.

    Bug fixes

    • PMM-1771: When upgrading PMM to 1.5.0 using Docker commands, PMM System SummaryPMM Add InstancePMM Query Analytics dashboards were not available.
    • PMM-1761: The PMM Query Analytics dashboard did not display the list of hosts correctly.
    • PMM-1769: It was possible to add an Amazon RDS instance providing invalid credentials on the PMM Add Instance dashboard.

    Other bug fixes: PMM-1767PMM-1762

    by Borys Belinsky at November 29, 2017 07:12 PM

    Jean-Jerome Schmidt

    Free Open Source Database Deployment & Monitoring with ClusterControl Community Edition

    The ClusterControl Community Edition is a free-to-use, all-in-one database management system that allows you to easily deploy and monitor the top open source database technologies like MySQL, MariaDB, Percona, MongoDB, PostgreSQL, Galera Cluster and more. It also allows you to import and monitor your existing database stack.

    Free Database Deployment

    The ClusterControl Community Edition ensures your team can easily and securely deploy production-ready open source database stacks that are built using battle-tested, proven methodologies. You don’t have to be a database expert to utilize the ClusterControl Community Edition - deploying the most popular open sources databases is easy with our point-and-click interface. Even if you are a master of deploying databases, ClusterControl’s point-and-click deployments will save you time and ensure your databases are deployed correctly, removing the chance for human error. There is also a CLI for those who prefer the command line, or need to integrate with automation scripts.

    The ClusterControl Community Edition is not restricted to a single database technology and supports the major flavors and versions. With it you’re able to apply point-and-click deployments of MySQL standalone, MySQL replication, MySQL Cluster, Galera Cluster, MariaDB, MariaDB Cluster, Percona XtraDB and Percona Server for MongoDB, MongoDB itself and PostgreSQL!

    Free Database Monitoring

    The ClusterControl Community Edition makes monitoring easy by providing you the ability to look at all your database instances across multiple data centers or drill into individual nodes and queries to pinpoint issues. Offering a high-level, multi-dc view as well as a deep-dive view, ClusterControl lets you keep track of your databases so you can keep them running at peak performance.

    In addition to monitoring the overall stack and node performance you can also monitor the specific queries to identify potential errors that could affect performance and uptime.

    Why pay for a monitoring tool when the ClusterControl Community Edition gives you a great one for free!

    Free Database Developer Studio

    The Developer Studio provides you a set of monitoring and performance advisors to use and lets you create custom advisors to add security and stability to your database infrastructures. It lets you extend the functionality of ClusterControl, which helps you detect and solve unique problems in your environments.

    We even encourage our users to share the advisors they have created on GitHub by adding a fork to our current advisor bundle. If we like them and think that they might be good for other users we’ll include them in future ClusterControl releases.

    Single Console for Your Entire Database Infrastructure
    Find out what else is new in ClusterControl

    Why Should I Use the ClusterControl Community Edition?

    These are just a few of the reasons why you should use ClusterControl as your system to deploy and monitor your open source database environments…

    • You can deploy knowing you are using proven methodologies and industry best practices.
    • If you are just getting started with open source database technology ClusterControl makes it easy for the beginner to deploy and monitor your stacks removing human error and saving you time.
    • If you are not familiar with orchestration programs like Puppet and Chef? Don’t worry! The ClusterControl Community Edition uses a point-and-click GUI to make it easy to get your environment production-ready.
    • The ClusterControl Community Edition gives you deployment and monitoring in one battle-tested all-in-one system. Why use one tool for scripting only to use a different tool for monitoring?
    • If you are not sure what database technology is right for your application? The ClusterControl Community Edition supports nearly two dozen database versions that you can try.
    • Have a load balancer running on an existing stack? With the ClusterControl Community Edition you can import and deploy your existing and already configured load balancer to run alongside your database instances.

    If you are ready to give it a try click here to download and install the latest version of ClusterControl. Each install comes with the option to activate a 30-day enterprise trial as well.

    by jj at November 29, 2017 12:36 PM

    MariaDB Foundation

    MariaDB 10.2.11 now available

    The MariaDB project is pleased to announce the availability of MariaDB 10.2.11. See the release notes and changelogs for details. Download MariaDB 10.2.11 Release Notes Changelog What is MariaDB 10.2? MariaDB APT and YUM Repository Configuration Generator Thanks, and enjoy MariaDB!

    The post MariaDB 10.2.11 now available appeared first on

    by Ian Gilfillan at November 29, 2017 05:36 AM

    November 28, 2017

    Peter Zaitsev

    Best Practices for Percona XtraDB Cluster on AWS

    Percona XtraDB Cluster on AWS 2 small

    In this blog post I’ll look at the performance of Percona XtraDB Cluster on AWS using different service instances, and recommend some best practices for maximizing performance.

    You can use Percona XtraDB Cluster in AWS environments. We often get questions about how best to deploy it, and how to optimize both performance and spend when doing so. I decided to look into it with some benchmark testing.

    For these benchmark tests, I used the following configuration:

    • Region:
      • Availability zones: US East – 1, zones: b, c, d
      • Sysbench 1.0.8
      • ProxySQL 1.4.3
      • 10 tables, 40mln records – ~95GB dataset
      • Percona XtraDB Cluster 5.7.18
      • Amazon Linux AMI

    We evaluated different AWS instances to provide the best recommendation to run Percona XtraDB Cluster. We used instances

    • With General Purpose storage volumes, 200GB each
    • With IO provisioned volumes, 200GB, 10000 IOS
    • I3 instances with local attached NVMe storage.

    We also used different instance sizes:

    Instance vCPU Memory
    r4.large 2 15.25
    r4.xlarge 4 30.5
    r4.2xlarge 8 61
    r4.4xlarge 16 122
    i3.large 2 15.25
    i3.xlarge 4 30.5
    i3.2xlarge 8 61
    i3.4xlarge 16 122


    While I3 instances with NVMe storage do not provide the same functionality for handling shared storage and snapshots as General Purpose and IO provisioned volumes, since Percona XtraDB Cluster provides data duplication by itself we think it is still valid to include them in this comparison.

    We ran benchmarks in the US East 1 (N. Virginia) Region, and we used different availability zones for each of the Percona XtraDB Cluster zones (mostly zones “b”, “c” and “d”):

    Percona XtraDB Cluster on AWS 1

    The client was directly connected and used ProxySQL, so we were able to measure ProxySQL’s performance overhead as well.

    ProxySQL is an advanced method to access Percona XtraDB Cluster. It can perform a health check of the nodes and route the traffic to the ONLINE node. It can also split read and write traffic and route read traffic to different nodes (although we didn’t use this capability in our benchmark).

    In our benchmarks, we used 1,4, 16, 64 and 256 user threads. For this detailed review, however, we’ll look at the 64 thread case. 


    First, let’s review the average throughput (higher is better) and latency (lower is better) results (we measured 99% percentile with one-second resolution):

    Percona XtraDB Cluster on AWS 2

    Results summary, raw performance:

    The performance for Percona XtraDB Cluster running on GP2 volumes is often pretty slow, so it is hard to recommend this volume type for the serious workloads.

    IO provisioned volumes perform much better, and should be considered as the primary target for Percona XtraDB Cluster deployments. I3 instances show even better performance.

    I3 instances use locally attached volumes and do not provide equal functionality as EBS IO provisioned volumes — although some of these limitations are covered by Percona XtraDB Cluster’s ability to keep copies of data on each node.

    Results summary for jitter:

    Along with average throughput and latency, it is important to take into account “jitter” — how stable is the performance during the runs?

    Percona XtraDB Cluster on AWS 3

    Latency variation for GP2 volumes is significant — practically not acceptable for serious usage. Let’s review the latency for only IO provisioning and NVMe volumes. The following chart provides better scale for just these two:

    Percona XtraDB Cluster on AWS 4

    At this scale, we see that NVMe provides a 99% better response time and is more stable. There is still variation for IO provisioned volumes.

    Results summary, cost

    When speaking about instance and volume types, it would be impractical to avoid mentioning of the instance costs. We need to analyze how much we need to pay to achieve the better performance. So we prepared data how much does it cost to produce throughput of 1000 transactions per second.

    We compare on-demand and reserved instances pricing (reserved for one year / all upfront / tenancy-default):

    Percona XtraDB Cluster on AWS 5

    Because IO provisioned instances give much better performance, the price performance is comparable if not better than GP2 instances.

    I3 instances are a clear winner.

    It is also interesting to compare the raw cost of benchmarked instances:

    Percona XtraDB Cluster on AWS 6

    We can see that IO provisioned instances are the most expensive, and using reserved instances does not provide much savings. To understand the reason for this, let’s take a look at how cost is calculated for components:

    Percona XtraDB Cluster on AWS 7

    So for IO provisioned volumes, the majority of the cost comes from IO provisioning (which is the same for both on-demand and reserved instances).

    Percona XtraDB Cluster scalability

    Another interesting effort is looking at how Percona XtraDB Cluster performance scales with the instance size. As we double resources (both CPU and Memory) while increasing the instance size, how does it affect Percona XtraDB Cluster?

    So let’s take a look at throughput:

    Percona XtraDB Cluster on AWS 8

    Throughput improves with increasing the instance size. Let’s calculate speedup with increasing instance size for IO provisioned and I3 instances:

    Speedup X Times to Large Instance IO1 i3
    large 1 1
    xlarge 2.67 2.11
    2xlarge 5.38 4.31
    4xlarge 5.96 7.83


    Percona XtraDB Cluster can scale (improve performance) with increasing instance size. Keep in mind, however, that it depends significantly on the workload. You may not get the same performance speedup as in this benchmark.

    ProxySQL overhead

    As mentioned above, ProxySQL adds additional functionality to the cluster. It can also add overhead, however. We would like to understand the expected performance penalty, so we compared the throughput and latency with and without ProxySQL.

    Out of box, the ProxySQL performance was not great and required additional tuning. 

    ProxySQL specific configuration:

    • Use connection through TCP-IP address, not through local socket
    • Adjust  mysql-max_stmts_per_connection variable for optimal value (default:50) – optimal – 1000
    • Ensure that “monitor@<host>” user has permissions as it’s important for proper handling of prepared statement.
      • CREATE USER ‘monitor’@‘172.30.%.%’ IDENTIFIED BY ‘monitor’;


    Percona XtraDB Cluster on AWS 9

    Response time:

    Percona XtraDB Cluster on AWS 10

    ProxySQL performance penalty in throughput

    ProxySQL performance penalty IO1 i3
    large 0.97 0.98
    xlarge 1.03 0.97
    2xlarge 0.95 0.95
    4xlarge 0.96 0.93


    It appears that ProxySQL adds 3-7% overhead. I wouldn’t consider this a significant penalty for additional functionality.


    Amazon instances

    First, the results show that instances based on General Purpose volumes do not provide acceptable performance and should be avoided in general for serious production usage. The choice is between IO provisioned instances and NVMe based instances.

    IO provisioned instances are more expensive, but offer much better performance than General Purpose volumes. If we also look at price/performance metric, IO provisioned volumes are comparable with General Purpose volumes. You should use IO provisioned volumes if you are looking for the functionality provided by EBS volumes.

    If you do not need EBS volumes, however, then i3 instances with NVMe volumes are a better choice. Both are cheaper and provide better performance than IO provisioned instances. Percona XtraDB Cluster provides data duplication on its own, which mitigates the need for EBS volumes to some extent.

    ProxySQL overhead

    We recommend using Percona XtraDB Cluster in combination with ProxySQL, as ProxySQL provides additional management and routing functionality. In general, the overhead for ProxySQL is not significant. But in our experience, however, ProxySQL has to be properly tuned — otherwise the performance penalty could be a bottleneck.

    Percona XtraDB Cluster scalability

    AWS has great capability to increase the instance size (both CPU and memory) if we exceed the capacity of the current instance. From our experiments, we see that Percona XtraDB Cluster can scale along with and benefit from increased instance size.

    Below is a chart showing the speedup in relation to the instance size:

    Percona XtraDB Cluster on AWS 11

    So increasing the instance size is a feasible strategy for improving Percona XtraDB Cluster performance in an AWS environment.

    Thanks for reading this benchmark! Put any questions or thoughts in the comments below.

    by Vadim Tkachenko at November 28, 2017 10:52 PM

    MariaDB AB

    MariaDB Server 10.2.11 now available

    MariaDB Server 10.2.11 now available dbart Tue, 11/28/2017 - 16:33

    The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.2.11. See the release notes and changelog for details and visit to download.

    Download MariaDB Server 10.2.11

    Release Notes Changelog What is MariaDB 10.2?

    The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.2.11. See the release notes and changelog for details.

    Login or Register to post comments

    by dbart at November 28, 2017 09:33 PM

    Peter Zaitsev

    Percona Monitoring and Management 1.5.0 Is Now Available

    Percona Monitoring and Management

    Percona announces the release of Percona Monitoring and Management 1.5.0 on November 28, 2017.

    This release focuses on the following features:

    • Enhanced support for MySQL on Amazon RDS and Amazon Aurora – Dedicated Amazon Aurora dashboard offers maximum visibility into key database characteristics, eliminating the need for additional monitoring nodes.  We renamed Amazon RDS OS Metrics to Amazon RDS / Aurora MySQL Metrics
    • Simpler configuration – Percona Monitoring and Management now offers easier configuration of key Amazon RDS and Amazon Aurora settings via a web interface
    • One-click data collection – One button retrieves vital information on server performance to assist with troubleshooting
    • Improved interface – A simple, consistent user interface makes it faster and more fluid to switch between Query Analytics and Metrics Monitor

    Highlights from our new Amazon RDS / Aurora MySQL Metrics dashboard:

    Shared elements for Amazon Aurora MySQL and RDS MySQL

    Amazon Aurora MySQL unique elements

    Amazon RDS for MySQL unique elements

    We’ve integrated Query Analytics into Metrics Monitor, and it appears as a separate dashboard known as PMM Query Analytics.

    With this release, Percona Monitoring and Management introduces a new deployment option via AWS Marketplace. This is in addition to our distribution method of Amazon Machine Images (AMI).

    We have upgraded Grafana and Prometheus in this release. PMM now includes Grafana 4.6.1. One of the most prominent features that the upgraded Grafana offers is the support of annotations. You can mark a point or select a region in a graph and give it a meaningful description. For more information, see the release highlights.

    Prometheus version 1.8.2, shipped with this release, offers a number of bug fixes. For more information, see the Prometheus change log.

    New features

    • PMM-434PMM enables monitoring of Amazon RDS and Amazon Aurora metrics
    • PMM-1133Query Analytics is available from Grafana as a dashboard
    • PMM-1470: Integrated Cloudwatch metrics into Prometheus
    • PMM-699: Combined AWS RDS and Amazon Aurora metrics into one dashboard
    • PMM-722: Distributed the MariaDB dashboard graph elements among other existing dashboards and removed the MariaDB dashboard. Further, we renamed the MyISAM dashboard  to MyISAM/Aria Metrics
    • PMM-1258: The DISABLE_UPDATES option enables preventing manual updates when PMM Server is run from a Docker container.
    • PMM-1500: Added InnoDB Buffer Disk Reads to graph InnoDB Buffer Pool Requests to better understand missed InnoDB BP cache hits


    • PMM-1577: Updated Prometheus to version 1.8.2
    • PMM-1603: Updated Grafana to version 4.6.1
    • PMM-1669: The representation of numeric values in the Context Switches graph in the System Overview dashboard was changed to improve readability.
    • PMM-1575: Templating rules were improved for the MyRocks and TokuDB dashboards so that only those instances with these storage engines are displayed

    Bug fixes

    • PMM-1082: The CPU Usage graph on the Trends dashboard showed incorrect spikes
    • PMM-1549: The authentication of the mongodb:queries monitoring service did not work properly when the name of the database to authenticate was not provided.
    • PMM-1673: Fixed display issue with Microsoft Internet Explorer 11

    by Borys Belinsky at November 28, 2017 12:56 PM

    November 27, 2017

    Peter Zaitsev

    autoxtrabackup v1.5.0: A Tool for Automatic Backups


    autoxtrabackupThere is a new version of the autoxtrabackup tool. In this post, I’ll provide some of the highlights available this time around.

    autoxtrabackup is a tool created by PerconLabs. We’ve now put out the 1.5.0 version, and you can test it further.

    Note: PerconaLabs and Percona-QA are open source GitHub repositories for unofficial scripts and tools created by Percona staff. While not covered by Percona support or services agreements, these handy utilities can help you save time and effort.

    autoxtrabackup is written in Python3 and hosted in PerconaLab (forked from Shako’s repo). Basically, this tool automates backup/prepare/copy-back actions. I want to talk about recent changes and additions.

    First of all, autoxtrabackup now has a --test_mode option, intended to test XtraBackup automation process.

    Here is the brief flow for this:

    • Clone percona-qa repo
    • Clone Percona Server for MySQL 5.6 and 5.7 from github.
    • Build PS servers in debug mode.
    • Get 2.3 and 2.4 versions of XtraBackup
    • Generate autoxtrabackup .conf files for each version of PS and XtraBackup
    • Pass different combination of options to PS start command and initialize PS servers each time with different options
    • Run sysbench against each started PS server
    • Take backup in cycles for each started PS + prepare
    • If make_slaves is defined, then create slave1 server from this backup (i.e., copy-back to another directory and start the slave from it)
    • Then take a backup, prepare and copy-back from this new slave1 to create slave2
    • Run pt-table-checksum on the master to check backup consistency

    I have prepared my environment, and now want to start --test_mode. Basically, it creates option combinations and passes them to the start script:

    2017-11-15 22:28:21 DEBUG    Starting cycle1
    2017-11-15 22:28:21 DEBUG    Will start MySQL with --innodb_buffer_pool_size=1G --innodb_log_file_size=1G
    --log-bin=mysql-bin --log-slave-updates --server-id=1 --gtid-mode=ON --enforce-gtid-consistency --binlog-format=row

    So as you see, it is starting MySQL with --innodb_buffer_pool_size=1G --innodb_log_file_size=1G --innodb_page_size=64K. In cycle2, it will likely pick --innodb_buffer_pool_size=1G --innodb_log_file_size=1G --innodb_page_size=32K, and so on. It depends what you have passed in config:

    # Do not touch; this is for --test_mode, which is testing for XtraBackup itself.
    ps_branches=5.6 5.7
    gitcmd=--recursive --depth=1
    xb_configs=xb_2_4_ps_5_6.conf xb_2_4_ps_5_7.conf xb_2_3_ps_5_6.conf
    mysql_options=--innodb_buffer_pool_size=1G 2G 3G,--innodb_log_file_size=1G 2G 3G,--innodb_page_size=4K 8K 16K 32K 64K

    You can pass more options by changing the mysql_options in the config file. Also you can specify how many incremental backups you want by setting the incremental_count option. You can enable creating slaves from backup to test it as well, by enabling the make_slaves option. This is not recommended for daily usage. You can read more about it here: –test_mode.

    For daily backup actions, I have added the --tag and --show_tags options, which can be quite useful. They help you to tag your backups. Take a full backup:

    $ sudo autoxtrabackup --tag="My Full backup" -v
    -lf /home/shahriyar.rzaev/autoxtrabackup_2_4_5_7.log
    -l DEBUG --defaults_file=/home/shahriyar.rzaev/XB_TEST/server_dir/xb_2_4_ps_5_7.conf --backup

    Take an incremental one:

    $ autoxtrabackup --tag="First incremental backup" -v
    -lf /home/shahriyar.rzaev/autoxtrabackup_2_4_5_7.log
    -l DEBUG --defaults_file=/home/shahriyar.rzaev/XB_TEST/server_dir/xb_2_4_ps_5_7.conf --backup

    Take a second incremental one:

    $ autoxtrabackup --tag="Second incremental backup" -v
    -lf /home/shahriyar.rzaev/autoxtrabackup_2_4_5_7.log
    -l DEBUG --defaults_file=/home/shahriyar.rzaev/XB_TEST/server_dir/xb_2_4_ps_5_7.conf --backup

    Now you can use the --show_tags to list tags:

    $ sudo autoxtrabackup --show_tags
    Backup              Type    Status  TAG
    2017-11-16_20-10-53 Full        OK  'My Full backup'
    2017-11-16_20-12-23 Inc         OK  'First incremental backup'
    2017-11-16_20-13-39 Inc         OK  'Second incremental backup'

    It would be quite nice if we could prepare those backups with a tag name. In other words, if I have a full backup and five incremental backups, what if I want to prepare until the second or third incremental, or just a full backup?

    Pass the tag name with the --prepare option, and it will do the trick:

    $ autoxtrabackup --tag="First incremental backup" -v
    -lf /home/shahriyar.rzaev/autoxtrabackup_2_4_5_7.log
    -l DEBUG --defaults_file=/home/shahriyar.rzaev/XB_TEST/server_dir/xb_2_4_ps_5_7.conf --prepare

    It will prepare the full and “First incremental backup” – the remaining incremental backups will be ignored.

    autoxtrabackup 1.5.0 also has a --dry_run option, which is going to show but not run exact commands. It is described here: –dry_run.

    How about autoxtrabackup 1.5.0 installation? You can install it from the source or use pip3:

    pip3 install mysql-autoxtrabackup

    For more please read: Installation.

    Do you want to enable encryption and compression for backups? Yes? You can enable this from the autoxtrabackup config as described here: Config file structure.

    You can enable taking partial backups again by editing the config: partial backups.

    autoxtrabackup 1.5.0 allows you to perform a partial recovery – i.e., restoring only a single specified table from a full backup. If the table was dropped,  autoxtrabackup will try to extract the create table statement from the .frm file using the mysqlfrm tool and then discard/import the tablespace from full backup. This is related to the transportable tablespace concept. You can read more here: restoring-single-table-after-drop.

    For a full list of available options, read the DOC: autoxtrabackup DOC.

    Thanks for reading! If you are going to try autoxtrabackup 1.5.0, don’t hesitate to provide some feedback!

    by Shahriyar Rzayev at November 27, 2017 06:39 PM

    November 24, 2017

    Oli Sennhauser

    First Docker steps with MySQL and MariaDB

    The Docker version of the distributions are often quite old. On Ubuntu 16.04 for example:

    shell> docker --version 
    Docker version 1.13.1, build 092cba3

    But the current docker version is 17.09.0-ce (2017-09-26). It seems like they have switched from the old version schema x.y.z to the new year.month.version version schema in February/March 2017.

    Install Docker CE Repository

    Add the Docker's official PGP key:

    shell> curl -fsSL | sudo apt-key add -

    Add the Docker repository:

    shell> echo "deb [arch=amd64] \
       $(lsb_release -cs) \
       stable" > /etc/apt/sources.list.d/docker.list
    shell> apt-get update

    Install or upgrade Docker:

    shell> apt-get install docker-ce
    shell> docker --version
    Docker version 17.09.0-ce, build afdb6d4

    To test your Docker installation run:

    shell> docker run --rm hello-world

    Add Docker containers for MariaDB, MySQL and MySQL Enterprise Edition

    First we want to see what Docker containers are available:

    shell> docker search mysql --no-trunc --filter=stars=100
    NAME               DESCRIPTION                                                                                         STARS OFFICIAL AUTOMATED
    mysql              MySQL is a widely used, open-source relational database management system (RDBMS).                  5273  [OK]
    mariadb            MariaDB is a community-developed fork of MySQL intended to remain free under the GNU GPL.           1634  [OK]
    mysql/mysql-server Optimized MySQL Server Docker images. Created, maintained and supported by the MySQL team at Oracle 368            [OK]
    percona            Percona Server is a fork of the MySQL relational database management system created by Percona.     303   [OK]

    OK. It seems like MySQL Server Enterprise Edition is missing. So we have to create an account on Docker Store and get the MySQL Server Enterprise Edition Image from there:

    shell> docker login --username=fromdual
    Login Succeeded

    Unfortunately one can still not see MySQL Server Enterprise Edition.

    But we can try anyway:

    shell> docker pull store/oracle/mysql-enterprise-server:5.7
    shell> docker logout
    shell> docker pull mysql
    shell> docker pull mariadb
    shell> docker pull mysql/mysql-server

    To see what is going on on your local Docker registry you can type:

    shell> docker images
    REPOSITORY                           TAG    IMAGE ID     CREATED       SIZE
    mariadb                              latest abcee1d29aac 8 days ago    396MB
    mysql                                latest 5709795eeffa 2 weeks ago   408MB
    mysql/mysql-server                   latest a3ee341faefb 5 weeks ago   246MB
    store/oracle/mysql-enterprise-server 5.7    41bf2fa0b4a1 4 months ago  244MB
    hello-world                          latest 48b5124b2768 10 months ago 1.84kB

    I personally do not like that all those images which are tagged with latest because I want a clear control over what version is used. MariaDB and MySQL community server have implemented this quite nicely but not MySQL Enterprise Edition:

    shell> docker pull mariadb:10.0
    shell> docker pull mariadb:10.0.23
    shell> docker pull mysql:8.0
    shell> docker pull mysql:8.0.3
    docker images | sort
    REPOSITORY                           TAG     IMAGE ID     CREATED       SIZE
    hello-world                          latest  48b5124b2768 10 months ago 1.84kB
    mariadb                              10.0.23 93631b528e67 21 months ago 305MB
    mariadb                              10.0    eecd58425049 8 days ago    337MB
    mariadb                              latest  abcee1d29aac 8 days ago    396MB
    mysql                                8.0.3   e691422324d8 2 weeks ago   343MB
    mysql                                8.0     e691422324d8 2 weeks ago   343MB
    mysql                                latest  5709795eeffa 2 weeks ago   408MB
    mysql/mysql-server                   latest  a3ee341faefb 5 weeks ago   246MB
    store/oracle/mysql-enterprise-server 5.7     41bf2fa0b4a1 4 months ago  244MB

    Run a MariaDB server container

    Start a new Docker container from the MariaDB image by running:

    shell> CONTAINER_NAME=mariadb
    shell> CONTAINER_IMAGE=mariadb
    shell> TAG=latest
    shell> MYSQL_ROOT_PASSWORD=Secret-123
    shell> MYSQL_ROOT_USER=root
    shell> docker run \
      --name=${CONTAINER_NAME} \
      --detach \
    shell> docker ps
    CONTAINER ID IMAGE          COMMAND                CREATED        STATUS        PORTS    NAMES
    60d7b6de7ed1 mariadb:latest "docker-entrypoint..." 24 seconds ago Up 23 seconds 3306/tcp mariadb
    shell> docker logs ${CONTAINER_NAME}
    shell> docker exec \
      --interactive \
      --tty \
      mysql --user=${MYSQL_ROOT_USER} --password=${MYSQL_ROOT_PASSWORD} --execute="status"
    shell> docker image tag mariadb:latest mariadb:10.2.10
    shell> docker exec --interactive \
      --tty \
    shell> docker stop ${CONTAINER_NAME}
    shell> docker rm ${CONTAINER_NAME}

    Run a MySQL Community server container

    shell> CONTAINER_NAME=mysql
    shell> CONTAINER_IMAGE=mysql/mysql-server
    shell> TAG=latest
    shell> MYSQL_ROOT_PASSWORD=Secret-123
    shell> docker run \
      --name=${CONTAINER_NAME} \
      --detach \
    shell> docker stop ${CONTAINER_NAME}
    shell> docker rm ${CONTAINER_NAME}

    Run a MySQL Server Enterprise Edition container

    shell> CONTAINER_NAME=mysql-ee
    shell> CONTAINER_IMAGE=store/oracle/mysql-enterprise-server
    shell> TAG=5.7
    shell> MYSQL_ROOT_PASSWORD=Secret-123
    shell> docker run \
      --name=${CONTAINER_NAME} \
      --detach \
    shell> docker ps --all
    CONTAINER ID IMAGE                                    COMMAND                CREATED        STATUS                  PORTS               NAMES
    0cb4e6a8a621 store/oracle/mysql-enterprise-server:5.7 "/ my..." 37 seconds ago Up 36 seconds (healthy) 3306/tcp, 33060/tcp mysql-ee
    1832b98da6ef mysql:latest                             "docker-entrypoint..." 6 minutes ago  Up 6 minutes            3306/tcp            mysql
    60d7b6de7ed1 mariadb:latest                           "docker-entrypoint..." 21 minutes ago Up 21 minutes           3306/tcp            mariadb

    All my 3 docker containers are currently running as root:

    shell> ps -ef | grep docker
    root 13177     1 20:20 ? 00:00:44 /usr/bin/dockerd -H fd://
    root 13186 13177 20:20 ? 00:00:04 docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim docker-containerd-shim --runtime docker-runc
    root 24004 13186 21:41 ? 00:00:00 docker-containerd-shim 60d7b6de7ed1ff62b67e66c6effce0094fd60e9565ede65fa34e188b636c54ec /var/run/docker/libcontainerd/60d7b6de7ed1ff62b67e66c6effce0094fd60e9565ede65fa34e188b636c54ec docker-runc
    root 26593 13186 21:56 ? 00:00:00 docker-containerd-shim 1832b98da6ef7459c33181e9b9ddd89a4136c3b2676335bcbbb533389cbf6219 /var/run/docker/libcontainerd/1832b98da6ef7459c33181e9b9ddd89a4136c3b2676335bcbbb533389cbf6219 docker-runc
    root 27714 13186 22:02 ? 00:00:00 docker-containerd-shim 0cb4e6a8a62103b66164ccddd028217bb4012d8a6aad1f62d3ed6ae71e1a38b4 /var/run/docker/libcontainerd/0cb4e6a8a62103b66164ccddd028217bb4012d8a6aad1f62d3ed6ae71e1a38b4 docker-runc

    But the user running the process IN the container is not root:

    shell> docker exec \
      --interactive \
      --tty \
      grep ^Uid /proc/1/status
    Uid:    27      27      27      27
    shell> docker exec \
      --interactive \
      --tty \
      bash -c "id 27"
    uid=27(mysql) gid=27(mysql) groups=27(mysql)

    Run a Docker container from mysql user

    shell> id
    uid=1001(mysql) gid=1001(mysql) groups=1001(mysql)
    shell> docker images
    Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.32/images/json: dial unix /var/run/docker.sock: connect: permission denied
    shell> sudo adduser mysql docker
    Adding user `mysql' to group `docker' ...
    Adding user mysql to group docker

    Taxonomy upgrade extras: 

    by Shinguz at November 24, 2017 10:05 PM

    Jean-Jerome Schmidt

    Deploying & Managing MySQL NDB Cluster with ClusterControl

    In ClusterControl 1.5 we added a support for the MySQL NDB Cluster 7.5. In this blog post, we’ll look at some of the features that make ClusterControl a great tool to manage MySQL NDB Cluster. First and foremost, as there are numerous products with “Cluster” in their name, we’d like to say couple of words about MySQL NDB Cluster itself and how it differentiates from other solutions.

    MySQL NDB Cluster

    MySQL NDB Cluster is a shared-nothing synchronous cluster for MySQL, based on the NDB engine. It is a product with its own list of features, and quite different from Galera Cluster or MySQL InnoDB Cluster. One main difference is the use of NDB engine, not InnoDB, which is the default engine for MySQL. In NDB cluster, data is partitioned across multiple data nodes while Galera Cluster or MySQL InnoDB Cluster contain the full data set on each of the nodes. This has serious repercussions in the way MySQL NDB Cluster deals with queries which use JOINs and large chunks of the dataset.

    When it comes to architecture, MySQL NDB Cluster consists of three different node types. Data nodes stores the data using NDB engine. Data is mirrored for redundancy, with up to 4 replicas of data. Note that ClusterControl will deploy 2 replicas per node group, as this is the most tested and stable configuration. Management nodes are intended to control the cluster - for high availability reasons, typically, you have two such nodes. SQL nodes are used as the entry points to the cluster. They parse SQL, ask for data from the data nodes and aggregate result sets when needed.

    ClusterControl features for MySQL NDB Cluster


    ClusterControl 1.5 supports deployment of MySQL NDB Cluster 7.5. It’s done through the same deployment wizard like with the remaining cluster types.

    In the first step, you need to configure how ClusterControl can login via SSH to the hosts - this is a standard requirement for ClusterControl - it is agentless so it requires root SSH access either directly, to the root account or via (password or passwordless) sudo.

    In the next step, you define management nodes for your cluster.

    Here, you need to decide how many data nodes you’d like to have. As we previously stated, every 2 nodes will be part of a node group so this should be an even number.

    Finally, you need to decide how many SQL nodes you’d like to deploy in your cluster. Once you click deploy, ClusterControl will connect to the hosts, install the software and configure all services. After a while, you should see your cluster deployed.

    Scaling of MySQL NDB Cluster

    For MySQL NDB Cluster, ClusterControl 1.5.0 supports scaling of SQL nodes. You can access the job from the Cluster jobs dropdown.

    There you can fill in the hostname of the node you’d like to add and that’s all you need - ClusterControl will take care of the rest.

    Management of MySQL NDB Cluster

    ClusterControl helps you manage MySQL NDB Cluster. In this section we’d like to go through some of the management features that we have.


    Backups are crucial for any production environment. In case of disaster, only a good backup can minimize the data loss and help you to quickly recover from the issue. Replication might not always be a solution that works - DROP TABLE will drop the table on all of the hosts in the topology. Even a delayed slave can delay the inevitable only by so much.

    ClusterControl supports ndb backup for MySQL NDB Cluster.

    You can easily create a backup schedule to be executed by ClusterControl.

    Proxy layer

    ClusterControl lets you deploy a full high availability stack on top of the MySQL NDB Cluster. For the proxy layer, we support deployment of HAProxy and MaxScale.

    As shown on the screenshot above, deployment looks very similar to the other cluster types. You need to decide if you want to use an existing HAProxy or deploy a new one. Then you need to make a choice how to install it - using packages from repositories available on the node or compile it from the source code of the latest release.

    If you decide to use HAProxy, you will have the possibility to configure high availability using Keepalived and Virtual IP.

    The process is the following - you define a Virtual IP and the interface on which it should be brought up. Then, you can deploy it for every HAProxy that you have installed. One of the Keepalived processes will be determined as a “master” and it’ll enable VIP on its node. Your application then connects to this particular IP. When a current active HAProxy is not available, the VIP will be moved to another available HAProxy, restoring the connectivity.

    Recovery management

    While MySQL NDB Cluster can tolerate failures of individual nodes, it is important to promptly react to these. ClusterControl provides automated recovery for all components of the cluster. No matter what fails (management node, data node or SQL node), ClusterControl will automatically restart them.

    Monitoring of the MySQL NDB Cluster

    Any production-ready environment has to be monitored. ClusterControl provides you with a range of metrics to monitor. In the “Overview” page, we show graphs based on the most important metrics for your cluster. You can also create your own dashboards, showing additional data that would be useful in your environment.

    In addition to the graphs, the “Overview” page gives you insights into the state of the cluster based on some MySQL NDB Cluster metrics like used Index Memory, Data Memory and state of some buffers.

    It also provides monitoring of the host metrics, including CPU utilization, RAM, Disk or Network stats. Those graphs are also crucial in building a view of the health of the cluster.

    ClusterControl can also help you to improve performance of your databases by giving you access to the Query Monitor, which holds statistics about your traffic.

    As seen on the screenshot above, you can see what kind of queries are running against your cluster, how many queries of a given type, what are their execution times and the total execution times. This helps identify which queries are slow and which of them are responsible for the majority of the traffic. You can then focus on the queries which can provide you with the biggest performance improvement.

    by krzysztof at November 24, 2017 10:59 AM

    Peter Zaitsev

    This Week in Data with Colin Charles 16: FOSDEM, Percona Live call for papers, and ARM

    Colin Charles

    Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

    Hurry up – the call for papers (CFP) for FOSDEM 2018 ends December 1, 2017. I highly recommend submitting as its a really fun, free, and technically-oriented event.

    Don’t forget that the CFP for Percona Live Open Source Database Conference 2018 in Santa Clara closes December 22, 2017, so please also consider submitting as soon as possible. We want to make an early announcement of the talks, so we’ll definitely do the first pass even before the CFP date closes.

    Is ARM the new hotness? Marvell confirms $6 billion purchase of chip maker Cavium. This month we’ve seen Red Hat Enterprise Linux for ARM arrive. We’ve also seen a press release from MariaDB about the performance on the Qualcomm Centriq 2400 Processor.

    Some new books to add to your bookshelf and read: MariaDB and MySQL Common Table Expressions and Window Functions Revealed by Daniel Bartholomew. The accompanying source code repository will also be useful. Much awaited for, by Percona Live keynote speakers, Database Reliability Engineering: Designing and Operating Resilient Database Systems by Laine Campbell and Charity Majors is now ready to read.


    Link List

    Upcoming appearances

    • ACMUG 2017 gathering – Beijing, China, December 9-10 2017 – it was very exciting being there in 2016, I can only imagine it’s going to be bigger and better for 2017 since it is now two days long!


    I look forward to feedback/tips via e-mail at or on Twitter @bytebot.

    by Colin Charles at November 24, 2017 09:53 AM

    November 23, 2017

    Jean-Jerome Schmidt

    ClusterControl 1.5 - Announcing MariaDB 10.2 Support

    Announced as part of the ClusterControl 1.5 release, we now provide full support for MariaDB version 10.2. This new version provides even greater integration with Galera Cluster, MariaDB’s HA solution of choice, and also features enhancements to SQL like window functions, common table expressions, and JSON functions.

    MariaDB is the fastest growing open source database, reaching more than 60 million developers worldwide through its inclusion in every major Linux distribution, as well as a growing presence in the world’s leading cloud providers. Its widespread use across Linux distributions and cloud platforms, as well as its ease of use, have quickly made MariaDB the open source database standard for the modern enterprise.

    MariaDB Server was listed in the recent OpenStack survey as the number one and two database technologies in use today.

    What’s New in Version 10.2?

    MariaDB Server 10.1 brought the default built-in integration of Galera Cluster to allows its users to achieve the ultimate in high availability. Severalnines was an early adopter of this clustering technology and was excited to see MariaDB embrace it for HA.

    Here are some of the enhancements included in the new 10.2 version as announced by MariaDB

    • SQL enhancements like window functions, common table expressions and JSON functions allow new use cases for MariaDB Server
    • Standard MariaDB Server replication has further optimizations
    • Many area limitations have been removed, which allows easier use and there is no need for limitation handling on the application level
    • MyRocks, a new storage engine developed by Facebook, has been introduced, which will further enrich the use cases for MariaDB Server (NOTE: This new Storage Engine is also now available for MariaDB deployments in ClusterControl, however ClusterControl does not yet support MyRocks specific monitoring.)

    Window Functions

    Window functions are popular in Business Intelligence (BI) where more complex report generation is needed based on a subset of the data, like country or sales team metrics. Another common use case is where time-series based data should be aggregated based on a time window instead of just a current record, like all rows inside a certain time span.

    As analytics is becoming more and more important to end users, window functions deliver a new way of writing performance optimized analytical SQL queries, which are easy to read and maintain, and eliminates the need to write expensive subqueries and self-joins.

    Common Table Expressions

    Hierarchical and recursive queries are usually implemented using common table expressions (CTEs). They are similar to derived tables in a FROM clause, but by having an identification keyword WITH, the optimizer can produce more efficient query plans. Acting as an automatically created temporary and named result set, which is only valid for the time of the query, it can be used for recursive and hierarchical execution, and also allows for reuse of the temporary dataset. Having a dedicated method also helps to create more expressive and cleaner SQL code.

    JSON Functions

    JSON (JavaScript Object Notation), a text-based and platform independent data exchange format, is used not only to exchange data, but also as a format to store unstructured data. MariaDB Server 10.2 offers more than 24 JSON functions to allow querying, modification, validation and indexing of JSON formated data, which is stored in a text-based field of a database. As a result, the powerful relational model of MariaDB can be enriched by working with unstructured data, where required.

    Through the use of virtual columns, the JSON function, JSON_VALUE and the newest indexing feature of MariaDB Server 10.2 on virtual columns, JSON values will be automatically extracted from the JSON string, stored in a virtual column and indexed providing the fastest access to the JSON string.

    Using the JSON function JSON_VALID, the new CHECK CONSTRAINTS in MariaDB Server 10.2 guarantee that only JSON strings of the correct JSON format can be added into a field.

    Binary Log Based Rollback

    The enhanced mysqlbinlog utility delivered with MariaDB Server 10.2 includes a new point-in-time rollback function, which allows a database or table to revert to an earlier state, and delivers binary log based rollback of already committed data. The tool mysqlbinlog is not directly modifying any data, it is generating an “export file” including the reverted statements of the transactions, logged in a binary log file. The created file can be used with the command line client or other SQL tool to execute the included SQL statements. This way all committed transactions up to a given timestamp will be rolled back.

    In the case of addressing logical mistakes like adding, changing or deleting data, so far the only possible way has been to use mysqlbinlog to review transactions and fix the problems manually. However, this often leads to data inconsistency because corrections typically only address the wrong statement, thereby ignoring other data dependencies.

    Typically caused by DBA or user error, restoring a huge database can result in a significant outage of service. Rolling back the last transactions using point-in-time roll back takes only the time of the extract, a short review and the execution of the reverted transactions – saving valuable time, resources and service.

    Single Console for Your Entire Database Infrastructure
    Find out what else is new in ClusterControl

    Why MariaDB?

    With several MySQL options to choose from, why select MariaDB as the technology to power your application? Here are some of the benefits to selecting MariaDB...

    • MariaDB is built on a modern architecture that is extensible at every layer: client, cluster, kernel and storage. This extensibility provides two major advantages. It allows for continual community innovation via plugins and it makes it easy for customers to configure MariaDB to support a wide variety of use cases from OLTP to OLAP.
    • MariaDB develops features and enhancements that are part of its own roadmap, independent from Oracle / MySQL. This allows MariaDB to accept and attract broader community innovation, as well as to add internally developed new features that make it easier to migrate from proprietary systems to open source MariaDB.
    • MariaDB is engineered to secure the database at every layer, making it a trusted general-purpose database used in industries such as government and banking that require the highest level security features.
    • MariaDB offers support for a variety of storage engines, including NoSQL support, giving its users several choices to determine the one which will work best with their environment.
    • MariaDB has deployed many performance enhancing improvements including query optimizations which, in several benchmark tests, let's MariaDB perform 3-5% better than a similarly configured MySQL environment.

    ClusterControl for MariaDB

    ClusterControl provides support for each of the top MariaDB technologies...

    • MariaDB Server: MariaDB Server is a general purpose database engineered with an extensible architecture to support a broad set of use cases via pluggable storage engines – such as InnoDB, MyRocks and Spider.
      • Built-in asynchronous master/slave replication
      • Dynamic columns that allows different rows to store different data in the same column
      • Built-in encryption
      • Query optimization
      • Improved schema compatibility
    • MariaDB Cluster: MariaDB Cluster is made for today’s cloud based environments. It is fully read-write scalable, comes with synchronous replication, allows multi-master topologies, and guarantees no lag or lost transactions.
      • Synchronous replication with no slave lag or lost transactions
      • Active-active multi-master topology
      • Read and write to any cluster node
      • Automatic membership control, with failed nodes dropped from the cluster
      • Automatic node joining
      • True row-level parallel replication
      • Direct client connections, native MariaDB look and feel
      • Both read and write scalability
    • MariaDB MaxScale: MariaDB MaxScale is a database proxy that extends the high availability, scalability, and security of MariaDB Server while at the same time simplifying application development by decoupling it from underlying database infrastructure.
      • Includes Database Firewall and DoS protection
      • Read-Write Splitting
      • Data Masking
      • Schema-based Sharding
      • Query Caching

    by jj at November 23, 2017 10:59 AM

    MariaDB Foundation

    Shenzhen MariaDB Developers Unconference Reportback

    Last week saw an excellent Developers Unconference, with many of the top MariaDB developers from Asia and the rest of the world attending. Working remotely has many advantages, but there’s a certain magic to working through a difficult problem in the same room. We were made most welcome by our hosts, Shannon Systems. The location […]

    The post Shenzhen MariaDB Developers Unconference Reportback appeared first on

    by Ian Gilfillan at November 23, 2017 07:54 AM

    Peter Zaitsev

    MongoDB 3.6 Change Streams: A Nest Temperature and Fan Control Use Case

    MongoDB 3.6 Change Streams

    MongoDB 3.6 Change StreamsIn this post, I’ll look at what MongoDB 3.6 change streams are, in a creative way. Just in time for the holidays!

    What is a change stream?

    Change streams in MongoDB provide a cross-platform unified API that can be supported with sharding. It has an option for talking to secondaries, and even allows for security controls like restrictions and action controls.

    How is this important? To demonstrate, I’ll walk through an example of using a smart oven with a Nest Thermostat to keep your kitchen from becoming a sauna while you bake a cake — without the need for you to moderate the room temperature yourself.

    What does a change stream look like? {
         $match: {
                 documentKey.device: {
                       $in : [ "jennair_oven", "nest_kitchen_thermo"]
                 operationType: "insert"

    What can we watch?

    We can use change streams to watch these actions:

    • Insert
    • Delete
    • Replace
    • Update
    • Invalidate

    Why not just use the Oplog?

    Any change presented in the oplog could be rolled back as it’s only single node durable. Change streams need at least one other node to receive the change. In general, this represents a majority for a typical three node replica-set.

    In addition, change streams are resumable. Having a collector job that survives an election is easy as pie, as by default it will automatically retry once. However, you can also record the last seen token to know how to resume where it left off.

    Finally, since this is sharding supported with the new cluster clock (wc in Oplog), you can trust the operations order you get, even across shards. This was problematic both with the old oplog format and when managing connections manually.

    In short, this is the logical evolution of oplog scrapping, and helps fit a long help request to be able to tail the oplog via mongos, not per replica set.

    So what’s the downside?

    It’s estimated that after 1000 streams you will start to see very measurable performance drops. Why there is not a global change stream option to avoid having so many cursors floating around is not clear. I think it’s something that should be looked at for future versions of this feature. Up to now, many use cases of mongo, specifically in the multi-tenant world, might have > 1000 namespaces on a system. This would make the performance drop problematic.

    What’s in a change stream anyhow?

    The first thing to understand is that while some drivers will have
     as a function, you could use, this is just an alias for an actual aggregation pipeline $changeStream. This means you could mix this with much more powerful pipelines, though you should be careful. Things like projection could break the ability to resume if the token is not passed on accurately.

    So a change stream:

    1. Is a view of an oplog entry for a change This sometimes means you know the change contents, and sometimes you don’t, for example in a delete
    2. Is an explicit API for drivers and code, but also ensures you can get data via Mongos rather than having to connect directly to each node.
    3. Is scalable, resumable, and well ordered – even when sharded!
    4. Harnesses the power of aggregations.
    5. Provides superior ACL support via roles and privileges

    Back to a real-world example. Let’s assume you have a Nest unit in your house (because none of us techies have those right?) Let’s also assume you’re fancy and have the Jenn-Air oven which can talk to the Nest. If you’re familiar with the Nest, you might know that its API lets you enable the Jenn-Air fan or set its oven temperature remotely. Sure the oven has a fan schedule to prevent it running at night, but its ability to work with other appliances is a bit more limited.

    So for our example, assume you want the temperature in the kitchen to drop by 15 degrees F whenever the oven is on, and that the fan should run even if it’s outside its standard time window.

    Hopefully, you can see how such an app, powered by MongoDB, could be useful? However, there are a few more assumptions, which we have already set up: a collection of “device_states” to record the original state of the temperature setting in the Nest; and to record the oven’s status so that we know how to reset the oven using the Nest once cooking is done.

    As we know we have the state changes for the devices coming in on a regular basis, we could simply say:{
        $match: {
            documentKey.device: {
                  $in : [ "jennair_oven", "nest_kitchen_thermo"]
            operationType: "insert"

    This will watch for any changes to either of these devices whether it be inserting new states or updating old ones.

    Now let’s assume anytime something comes in for the Nest, we are updating  db.nest_settings with that document. However, in this case, when the oven turns on we update a secondary document with an _id of “override” to indicate this is the last known nest_setting before the oven enabling. This means that we can revert to it later.

    This would be accomplished via something like…

    Change Event document

        _id: <resume_token>,
        operationType: 'insert',
        ns: {db:'example',coll:"device_states"},
        documentKey: { device:'nest_kitchen_thermo'},
        fullDocument: { 
           _id : ObjectId(),
           device: 'nest_kitchen_thermo',
           temp: 68

    So you could easily run the follow from your code:

    db.nest_settings.update({_id:"current"},{_id:"current",data: event.fullDocument})

    Now the current document is set to the last checking from the Nest API.

    That was simple enough, but now we can do even more cool things…

    Change Event document

        _id: <resume_token>,
        operationType: 'insert',
        ns: {db:'example',coll:"device_states"},
        documentKey: { device:'jennair_oven'},
        fullDocument: { 
           _id : ObjectId(),
           device: 'jennair_oven',
           temp: 350,
           power: 1,
           state: "warming"

    This next segment is mode pseudocode:

    var event =;
    var device = event.documentKey.device;
    var data = event.fullDocument;
    if ( device == "jennair_oven"){
         override_enabled = db.nest_settings.count({_id:"override"});
         if ( data.power  && !override_enabled){
            var doc = db.nest_settings.findOne({_id:"current"});
   += -15; 
         if (data.power){
             overide_doc = db.nest_settings.findOne({_id:"override"});
             NestObj.termostate.enableFan(15); //Enable for 15 minutes 
             overide_doc = db.nest_settings.findOne({_id:"override"});
    += 15;
             NestObj.termostate.enableFan(0); //Enable for 15 minutes 

    This code is doing a good deal, but it’s pretty basic at the same time:

    1. If the oven is on, but there is no override document, create one from the most recent thermostat settings.
    2. Decrease the current temp setting by 15, and then insert it with the “override” _id value
    3. If the power is set to on
      (a) read in the current override document
      (b) set the thermostat to that setting
      (c) enable the fan for 15 minutes
    4. If the power is now off
      (a) read in the current override document
      (b) set the thermostat to 15 degrees higher
      (c) set the fan to disabled

    Assuming you are constantly tailing the watch cursor, this means you will disable the oven and fan as soon as the oven is off.

    Hopefully, this blog has helped explain how change streams work by using a real-world logical application to keep your kitchen from becoming a sweat sauna while making some cake… and then eating it!

    by David Murphy at November 23, 2017 02:16 AM

    November 22, 2017

    Peter Zaitsev

    Sudoku Recursive Common Table Expression Solver

    Recursive Common Table Expressions

    Recursive Common Table ExpressionsIn this blog post, we’ll look at a solving Sudoku using MySQL 8.0 recursive common table expression.

    Vadim was recently having a little Saturday morning fun solving Sudoku using MySQL 8. The whole idea comes from SQLite, where Richard Hipp has come up with some outlandish recursive query examplesWITH clause.

    The SQLite query:

     input(sud) AS (
     digits(z, lp) AS (
       VALUES('1', 1)
       CAST(lp+1 AS TEXT), lp+1 FROM digits WHERE lp<9
     x(s, ind) AS (
       SELECT sud, instr(sud, '.') FROM input
       UNION ALL
         substr(s, 1, ind-1) || z || substr(s, ind+1),
         instr( substr(s, 1, ind-1) || z || substr(s, ind+1), '.' )
        FROM x, digits AS z
       WHERE ind>0
         AND NOT EXISTS (
               SELECT 1
                 FROM digits AS lp
                WHERE z.z = substr(s, ((ind-1)/9)*9 + lp, 1)
                   OR z.z = substr(s, ((ind-1)%9) + (lp-1)*9 + 1, 1)
                   OR z.z = substr(s, (((ind-1)/3) % 3) * 3
                           + ((ind-1)/27) * 27 + lp
                           + ((lp-1) / 3) * 6, 1)
    SELECT s FROM x WHERE ind=0;

    Which should provide the answer: 534678912672195348198342567859761423426853791713924856961537284287419635345286179.

    The modified query to run on MySQL 8.0.3 release candidate and MariaDB Server 10.2.9 stable GA courtesy of Vadim:

     input(sud) AS (
       SELECT '53..7....6..195....98....6.8...6...34..8.3..17...2...6.6....28....419..5....8..79'
     digits(z, lp) AS (
       SELECT '1', 1
       CAST(lp+1 AS CHAR), lp+1 FROM digits WHERE lp<9
     x(s, ind) AS (
       SELECT sud, instr(sud, '.') FROM input
       UNION ALL
         concat(substr(s, 1, ind-1) , z , substr(s, ind+1)),
         instr( concat(substr(s, 1, ind-1) ,z ,substr(s, ind+1)), '.' )
        FROM x, digits AS z
       WHERE ind>0
         AND NOT EXISTS (
               SELECT 1
                 FROM digits AS lp
                WHERE z.z = substr(s, ((ind-1) DIV 9)*9 + lp, 1)
                   OR z.z = substr(s, ((ind-1)%9) + (lp-1)*9 + 1, 1)
                   OR z.z = substr(s, (((ind-1) DIV 3) % 3) * 3
                           + ((ind-1) DIV 27) * 27 + lp
                           + ((lp-1) DIV 3) * 6, 1)
    SELECT s FROM x WHERE ind=0;

    The test environment for the setup is a standard Linode 1024 instance, with one CPU core and 1GB of RAM. The base OS was Ubuntu 17.04. MySQL and MariaDB Server were installed via their respective tarballs. No configuration is done beyond a basic out-of-the-box install inside of the MySQL sandbox. This is similar for sqlite3. Remember to run “.timer on” for sqlite3.

    Note that initially they were done on separate instances, but because of the variance you get in cloud instances, it was decided that it would be better to run on the same instance using the MySQL Sandbox.

    MySQL 8 first run time: 0.16s. 5 runs: 0.16, 0.16, 0.17, 0.16, 0.16
    MariaDB Server 10.2 first run time: 0.20s. 5 runs: 0.22, 0.22, 0.21, 0.21, 0.20
    MariaDB Server 10.3.2 first run time: 0.206s. 5 runs: 0.237, 0.199, 0.197, 0.198, 0.192
    SQLite3 first run time: Run Time: real 0.328 user 0.323333 sys 0.003333 / Run Time: real 0.334 user 0.333333 sys 0.000000

    Trying a more complex Sudoku routine, “..41..2.3……..12…..8..82.6.43…..8.9…..67.2.48..5…..64……..3.7..69..” to produce the result “574198263638425791219367854821654379743819625956732148195273486462981537387546912″the results are:

    MySQL 8 first run time: 4.87s. 5 runs: 5.43, 5.35, 5.10, 5.19, 5.05
    MariaDB Server 10.2 first run time: 6.65s. 5 runs: 7.03, 6.57, 6.61, 6.59, 7.12
    MariaDB Server 10.3.2 first run time: 6.121s. 5 runs: 5.701, 6.043, 6.043, 5.849, 6.199
    SQLite3 first run time: Run Time: real 10.105 user 10.099999 sys 0.000000 / Run Time: real 11.305 user 11.293333 sys 0.000000

    Conclusions from this fun little exercise? SQL, even though it’s a standard is not portable between databases. Thankfully, MySQL and MariaDB are syntax-compatible in this case! MySQL and MariaDB Server are both faster than sqlite3 when returning a recursive CTE. It would seem that the MySQL 8.0.3 release candidate is faster at solving these Sudoku routines compared to the MariaDB Server 10.2 stable GA release. It also seems that MariaDB Server 10.3.2 alpha is marginally quicker than MariaDB Server 10.2.

    Kudos to Team MariaDB for getting recursive common table expression support first in the MySQL ecosystem, and kudos to Team MySQL for making it fast!

    by Colin Charles at November 22, 2017 07:17 PM

    MariaDB AB

    MariaDB AX for Analytics: Out With The Old, In With The New

    MariaDB AX for Analytics: Out With The Old, In With The New Shane Johnson Tue, 11/21/2017 - 19:07

    The market wanted an enterprise open source database for modern transactional workloads. It wanted a database capable of meeting traditional enterprise requirements, but from an open source vendor committed to community innovation. We called it MariaDB TX.

    Today, the market wants an enterprise open source data warehouse for modern analytical workloads and with the same requirements. We call it MariaDB AX. It’s for everything from traditional business intelligence/reporting and data mining to modern analytics, including decision support systems and recommendation engines.

    What is an enterprise open solution for modern analytics and data warehousing?

    It’s analytics made flexible.

    You don’t have to model data around a handful of predefined queries. You can, but with a columnar database, you don’t have to – query data however like, whenever you like.

    It’s analytics made simple.

    You don’t have to worry about complex, time-consuming batch jobs anymore. You can import data directly from C++, Python and Java applications, continuously or on demand.

    It’s analytics made easy.

    You don’t have to learn a new query language or programming model. You have the full power of standard SQL at your disposal – no limitations, no workaround.

    It’s analytics made current.

    You don’t have to wait for data to become available for analysis. MariaDB AX can continuously import data from Apache Kafka* or MariaDB TX (via change-data-capture).

    It’s analytics made fast.

    You don’t have to wait when millions to billions of rows can be queried in a matter of seconds – a columnar database is optimized for querying most if not all data.

    It’s analytics made scalable.

    You don’t have to scale up. You can scale out with distributed storage and parallel query processing. If you need to query more data, add more nodes.

    It’s analytics made powerful.

    You don’t have to be limited to what’s available out of the box. You can create custom analytical functions. You can can analyze semi and unstructured data.

    It’s analytics made affordable.

    You don’t have to invest in million-dollar appliances or commit to a cloud vendor. MariaDB AX runs on commodity hardware, on premises or in the cloud of your choice.

    Times have changed.

    MariaDB AX is the data warehouse built for everyone, today and tomorrow.

    If you want to see what’s under the hood, MariaDB AX now includes MariaDB ColumnStore 1.1 (blog) as well as bulk and streaming data adapters (blog).

    To learn more about MariaDB AX, join our upcoming webinar on December 12 – register here to attend. We hope you can join us.

    Today, the market wants an enterprise open source data warehouse for modern analytical workloads. We call it MariaDB AX. It’s for everything from traditional business intelligence/reporting and data mining to modern analytics, including decision support systems and recommendation engines. Learn what it takes to be an enterprise open solution for modern analytics and data warehousing.

    Login or Register to post comments

    by Shane Johnson at November 22, 2017 12:07 AM

    November 21, 2017

    MariaDB AB

    Real-time Data Streaming with MariaDB AX

    Real-time Data Streaming with MariaDB AX Dipti Joshi Tue, 11/21/2017 - 17:10

    When we started working on the big data and distributed columnar technology through MariaDB ColumnStore, the focus has been to help our customers get the most value out of their data assets. Time to insight and time to action are competitive differentiators for our customers to get the most value from their data.  In order to have faster time to insight and time to action it’s critical that:

    • Organizations make data available for analysis as soon as it arrives, and;

    • Applications stream data from data sources to analytics platform seamlessly.

    With this in mind, the latest MariaDB AX analytics solution, which introduces MariaDB ColumnStore 1.1.2 and MariaDB ColumnStore Data Adapters, enables easy integration with data from various sources such as web/mobile services, IoT, sensors, social networks, device logs and machine learning model output.

    In this blog, we explore the two new data streaming capabilities of MariaDB AX and how it helps users.

    Bulk Data Adapters

    Previously, the method of data ingestion into MariaDB ColumnStore was through high speed bulk loading with cpimport or LOAD DATA INFILE for batch load operations.  However, these required manual operational processes and resulted in delays while having to generate CSV files from data sources, and moving them to a UM or PM node. The new bulk data adapter API introduced in MariaDB ColumnStore 1.1,  available as an SDK, enables near real-time data analytics by streaming data directly from their ETL and data source applications into MariaDB ColumnStore in a programmatic way. APIs are available as a C++ SDK, along with Python and Java bindings.


    The APIs use the MariaDB ColumnStore configuration file (ColumnStore.xml) to understand and locate the distributed PM nodes upon startup. Then, the application can perform per table writes by passing input data to the API calls as data structure. The API allows the application to stream data for each row, and then buffers the configurable number of rows (100,000 by default) before flushing them from the application to the network. When the application commits, then the data is written on the PM node. The application however can commit rows any time, and does not have to wait for a 100,000 row buffering. As the APIs stream data over the network, the data streaming applications using the API can be running outside the MariaDB ColumnStore UM and PM node. Hence, applications can be running very close to data source and pushing data to MariaDB ColumnStore as data is being generated by the source. Thus resulting in real-time streaming.

    Users utilize the bulk data API for various use cases such as publishing data from python machine learning models, data ingestion from data collection points across IoT, computing and telecommunication networks, data streaming from Ad-engines, data feeds from transactional databases or queuing systems such as Kafka. A detailed usage guide is available on our KnowledgeBase page. Source code examples of API usage can be found here.


    Streaming Data Adapters

    MaxScale CDC Data Adapter

    Many MariaDB users that have both MariaDB TX (OLTP) and MariaDB AX (Analytics) solution, feed data from the InnoDB tables on MariaDB TX into MariaDB AX. While the InnoDB tables in MariaDB TX are used for transactional purposes such as daily financial transactions, on the MariaDB AX side they are interested in data from certain tables for analytic purposes. The natural inclination is to use replication of data from MariaDB Server in TX to MariaDB ColumnStore in AX. However, MariaDB ColumnStore is not optimized as a replication slave, as replication executes individual SQL insert, update and delete plus MariaDB ColumnStore is optimized for bulk writes rather than row based DML. The MariaDB TX solution has MariaDB MaxScale that includes the capability of streaming change data events to external sources. We marry this with the new Bulk Data Adapter API of MariaDB AX, to provide continuous data streaming from MariaDB TX to MariaDB AX. The out of box integration of the MaxScale CDC streams into MariaDB ColumnStore is available as the MaxScale CDC Data Adapter. No development is required to use this adapter.


    The MaxScale-CDC-Data Adapter registers with MariaDB MaxScale as a CDC Client using the MaxScale CDC Connector API, receiving change data records from MariaDB MaxScale (that are converted from binlog events received from the Master on MariaDB TX) in a JSON format. Then, using the MariaDB ColumnStore Bulk Data Adapter API, converts the JSON data into API calls and streams it to a MariaDB PM node. The adapter has options to insert all the events in the same schema as the source database table or insert each event with metadata as well as table data. The event meta data includes the event timestamp, the GTID, event sequence and event type (insert, update, delete).

    The usage guide for the adapter can be found here. Using this MaxScale CDC Data Adapter, you can now stream directly from your OLTP MariaDB Servers to Analytics MariaDB ColumnStore servers.  

    Kafka Adapter

    The Kafka data adapter streams all messages published to Apache Kafka topics to MariaDB AX automatically and continuously - enabling data from many sources to be streamed and collected for analysis without complex code. The Kafka adapter is built using librdkafka and the MariaDB ColumnStore bulk data adapter API.

    Kafka Data Adapter.jpg

    At this point we have tested the Kafka data adapter, where the source of the events to the Kafka broker has been the CDC events from MariaDB MaxScale. Going forward, we will also have support for a generic key-value type events. Having the ability to stream data from Kafka opens up the data adapter to a variety of data sources such as websites, advertising engines, social network feeds, system logs, IoT events, etc.



    The bulk data adapters allow users to build their own custom ETL applications, and the streaming data adapters provides out of box capabilities to continuously stream data from MariaDB TX and various other sources without any coding. Try the new data adapters today and let us know your feedback.

    Learn more about MariaDB AX, our modern data warehousing solution for large scale advanced analytics.


    In order to have faster time to insight and time to action it’s critical that:

    • Organizations make data available for analysis as soon as it arrives, and;

    • Applications stream data from data sources to analytics platform seamlessly.

    The new MariaDB AX includes MariaDB ColumnStore Data Adapters, enables easy integration with data from various sources such as web/mobile services, IoT, sensors, social networks, device logs and machine learning model output.


    Login or Register to post comments

    by Dipti Joshi at November 21, 2017 10:10 PM

    What's Great About MariaDB ColumnStore 1.1

    What's Great About MariaDB ColumnStore 1.1 david_thompson_g Tue, 11/21/2017 - 16:51

    I'm excited that our second major GA release of MariaDB ColumnStore 1.1 is now available for download. In this blog, I review some of the major features that make up this release. Our focus for this release was to enable greater extensibility and to provide some additional features that have come up as we have worked with prospects and customers.

    One of the features I'm most excited about is the bulk write SDK. This is a separate new product available here. This has been built to enable data streaming, integration and publishing use cases. Streaming means that it enables you to consume data for queuing systems such as Apache Kafka. The SDK will enable creation of higher performance adapters for ETL integration. Finally, I see this being used as a way to programatically record and publish results for machine learning platforms enabling business users to interact with the data in their tool of choice. The SDK is implemented in C++ and currently provides Python and Java wrapper implementations. More details can be found here. We also plan to develop and support a number of streaming data adapters applying the SDK to specific use cases such as replication and Kafka integration. A blog from my colleague Dipti Joshi provides more details on the streaming data adapters.

    Continuing the extensibility theme, the capability to support user defined aggregate and window functions now exists. This provides a C++ SDK framework enabling the creation of functions that can scale out aggregate calculation across many PM's. Distributed reference implementations of median and sum of squares are provided for use or extension. You can learn more about this feature here.

    The number of data types supported by MariaDB ColumnStore has been extended to support Blob and Text types. I was surprised to see this being an in demand data type for analytics but many users are looking at MariaDB ColumnStore as an archive databases for OLTP data and this is one of the gaps. In addition, Text columns are a common workaround to allow a greater number of long string columns while keeping within the MariaDB row size limit.

    There are a number of improvements for installation and manageability in this release. The first capability is that the postConfigure script now offers a 'Data Redundancy' storage option to leverage GlusterFS to provide data high availability for on premise customers that lack a networked storage device. Second, the install now offers an option where you can pre-install the software packages rather than having postConfigure perform remote installs. This will enable us to support package repository installs and will also make integration with orchestration tools simpler. Finally, a backup and restore tool is now provided that automates the current manual procedure.

    MariaDB ColumnStore 1.1 has been updated to be based off of MariaDB Server 10.2. As part of this, the window function implementation was migrated to use the same front end SQL parser introduced in MariaDB Server 10.2. 

    Finally, we have made some significant investments internally in our processes. We have migrated to utilize buildbot as our continuous integration tool (which is also used by MariaDB Server). Also, with the significant increase in OS distributions and deployment options, we have invested in parallelizing and fully automating install, upgrade, and system verification for well over a 100 permutations of operating system, deployment topology, and configuration. In addition, we have made ongoing improvements to our developer regression test, other system tests, and performance benchmark tests.

    Over the coming weeks, we'll publish more detailed blogs drilling into each of these features. However, if you can't wait, feel free to download MariaDB ColumnStore and provide us feedback so we can continue to improve and make this the best open source OLTP and OLAP database out there.

    Learn more about MariaDB AX, our modern data warehousing solution for large scale advanced analytics.

    I'm excited that our second major GA release of MariaDB ColumnStore 1.1 is now available for download. In this blog, I review some of the major features that make up this release. Our focus for this release was to enable greater extensibility and to provide some additional features that have come up as we have worked with prospects and customers.

    Login or Register to post comments

    by david_thompson_g at November 21, 2017 09:51 PM

    Peter Zaitsev

    Percona Toolkit 3.0.5 is Now Available

    Percona ToolkitPercona announces the release of Percona Toolkit 3.0.5 on November 21, 2017.

    Percona Toolkit is a collection of advanced command-line tools that perform a variety of MySQL and MongoDB server and system tasks too difficult or complex for DBAs to perform manually. Percona Toolkit, like all Percona software, is free and open source.

    You download Percona Toolkit packages from the web site or install from official repositories.

    This release includes the following changes:

    New Features:

    • PT-216: The pt-mongodb-query-digest supports MongoDB versions lower than 3.2; incorrect output was fixed.
    • PT-182: The pt-summary, pt-mysql-summary, pt-mongodb-summary commands provide output in the the JSON format.
    • pt-mysql-summary shows the output of the SHOW SLAVE HOSTS command.
    • pt-table-sync supports replication channels (requires MySQL version 5.7.6 or higher)
    • PMM-1590: MongoDB Profiler for Percona Management and Monitoring and Percona Toolkit has been improved.

    Bug fixes:

    • pt-mext would fail if the Rsa_public_key variable was empty.
    • PT-212: pt-mongodb-query-digest --version produced incorrect values.
    • PT-202: pt-online-schema-change incorrectly processed virtual columns.
    • PT-200: pt-online-schema-change command reported an error when the name of an index contained UNIQUE as as the prefix or suffix.
    • pt-table-checksum did not detect differences on a system with the ROW based replication active.
    • PT-196: pt-onine-schema-change --max-load paused if a status variable was passed 0 as the value.
    • PT-193: pt-table-checksum reported a misleading error if a column comment contained an apostrophe. For more information, see #1708749.
    • PT-187: In some cases, pt-table-checksum did not report that the same table contained different values on the master and slave.
    • PT-186: pt-online-schema-change --alter could fail if field names contained upper case characters. For more information, see #1705998.
    • PT-183: In some cases pt-mongodb-query-digest could not connect to a database using authentication.
    • PT-167: In some cases, pt-kill could ignore the value of the --busy-time parameter. For more information, see #1016272.
    • PT-161: When run with the --skip-check-slave-lag, the pt-table-checksum could could fail in some cases.

    by Hrvoje Matijakovic at November 21, 2017 06:32 PM

    Jean-Jerome Schmidt

    ClusterControl 1.5 - Automatic Backup Verification, Build Slave from Backup and Cloud Integration

    At the core of ClusterControl is its automation, as is ensuring that your data is is securely backed up and ready for restoration whenever something goes wrong. Having an effective backup strategy and disaster recovery plan is key to the success of any application or environment.

    In our latest release, ClusterControl 1.5, we have introduced a number of enhancements for backing up MySQL and MariaDB-based systems.

    One of the key improvements is the ability to backup from ClusterControl to the cloud provider of your choice. Cloud providers like Google Cloud Services and Amazon S3 each offer virtually unlimited storage, reducing local space needs. This allows you to retain your backup files longer, for as long as you would like and not have concerns around local disk space.

    Let’s explore all the exciting new backup features for ClusterControl 1.5...

    Backup/Restore Wizard Redesign

    First of all, you will notice backup and restore wizards have been revamped to better improve the user experience. It will now load as a side menu on the right of the screen:

    The backup list is also getting a minor tweak where backup details are displayed when you click on the particular backup:

    You will be able to view backup location and which databases are inside the backup. There are also options to restore the backup or upload it into the cloud.

    PITR Compatible Backup

    ClusterControl performs the standard mysqldump backup with separate schema and data dumps. This makes it easy to restore partial backups. However, it breaks the consistency of the backup (schema and data are dumped in two separate sessions), thus it cannot be used to provision a slave or point-in-time recovery.

    A mysqldump PITR-compatible backup contains one single dump file, with GTID info, binlog file and position. Thus, only the database node that produces binary log will have the "PITR compatible" option available, as highlighted in the screenshot below:

    When PITR compatible option is toggled, the database and table fields are greyed out since ClusterControl will always perform the backup against all databases, events, triggers and routines of the target MySQL server.

    The following lines will appear in the first ~50 lines of the completed dump file:

    $ head -50 mysqldump_2017-11-07_072250_complete.sql
    -- GTID state at the beginning of the backup
    SET @@GLOBAL.GTID_PURGED='20dc5247-4a98-ee18-73af-5c79373388ee:1-1681';
    -- Position to start replication or point-in-time recovery from

    The information can be used to build slaves from backup, or perform point-in-time recovery together with binary logs, where you can start the recovery from the MASTER_LOG_FILE and MASTER_LOG_POS reported in the dump file using "mysqlbinlog" utility. Note that binary logs are not backed up by ClusterControl.

    Single Console for Your Entire Database Infrastructure
    Find out what else is new in ClusterControl

    Build Slaves from Backup

    Another feature is the ability to build a slave directly from a PITR-compatible backup, instead of doing it from a chosen master. This is a huge advantage as it offloads the master server. This option can be used with MySQL Replication or Galera Cluster. An existing backup can be used to rebuild an existing replication slave or add a new replication slave during the staging phase, as shown in the following screenshot:

    Once the staging completes, the slave will connect to the chosen master and start catching up. Previously, ClusterControl performed a streaming backup directly from the chosen master using Percona Xtrabackup. This could impact performance of the master when scaling out a large dataset, despite the operation being non blocking on the master. With the new option, if the backup is stored on ClusterControl, only these hosts (ClusterControl + the slave) will be busy when staging the data on the slave.

    Backup to Cloud

    Backups can now be automatically uploaded in the cloud. This requires a ClusterControl module to be installed, called clustercontrol-cloud (Cloud integration module) and clustercontrol-clud (Cloud download/upload CLI) which are available in v1.5 and later. The upgrade instructions have been included with these packages and they come without any extra configuration. At the moment, the supported cloud platforms are Amazon Web Services and Google Cloud Platform. Cloud credentials are configured under ClusterControl -> Settings -> Integrations -> Cloud Providers.

    When creating or scheduling a backup, you should see the following additional options when "Upload Backup to the cloud" is toggled:

    The feature allows a one time upload or to schedule backups to be uploaded after completion (Amazon S3 or Google Cloud Storage). You can then download and restore the backups as required.

    Custom Compression for mysqldump

    This feature was in fact first introduced with ClusterControl v1.4.2 after its release. We added a backup compression level based on gzip. Previously, ClusterControl used the default backup compression (level 6) if the backup destination was on the controller node. The lowest compression (level 1 - fastest, less compression) was used if the backup destination was on the database host itself, to ensure minimal impact to the database during the compressing operation.

    In this version, we have polished the compression aspect and you can now customize the compression level, regardless of the backup destination. When upgrading your ClusterControl instance, all the scheduled backups will be automatically converted to use level 6, unless you explicitly edit them in v1.5.

    Backup compression is vital when your dataset is large, combined with a long backup retention policy, while storage space is limited. Mysqldump, which is text-based, can benefit from compression with savings of up to 60% of disk space of the original file size. On some occasions, the highest compression ratio is the best option to go, although it comes at the price of longer decompression when restoring.

    Bonus Feature: Automatic Backup Verification

    As old sysadmins say - A backup is not a backup if it's not restorable. Backup verification is something that is usually neglected by many. Some sysadmins have developed in-house routines for this, usually more manual than automated. Automating it is hard, mainly due to the complexity of the operation as a whole - starting from host provisioning, MySQL installation and preparation, backup files transfer, decompression, restore operation, verification procedures and finally cleaning up the system after the process. All these hassles make people neglect such an important aspect of a reliable backup. In general a backup restore test should be done at least once a month, or in case of significant changes in data size or database structure. Find a schedule that works for you and formalize it with a scheduled event.

    ClusterControl can automate the backup verification by performing the restoration on a fresh host, without compromising any of the verification procedures mentioned above. This can be done after some delay, or right after the backup has completed. It will report the backup status based on the exit code of the restore operation, perform automatic shutdown if the backup is verified, or simply let the restored host run so you perform additional manual verifications on the data.

    When creating or scheduling a backup, you will have additional options if "Verify Backup" is toggled:

    If "Install Database Software" is enabled, ClusterControl will remove any existing MySQL installation on the target host and reinstall the database software with the same version as the existing MySQL server. Otherwise, if you have a specific setup for the restored host, you can skip this option. The rest of the options are self-explanatory.

    Bonus Feature: Don’t Forget PostgreSQL

    In addition to all this great functionality for MySQL and MariaDB ClusterControl 1.5 also now provides PostgreSQL with an additional backup method (pg_basebackup) that can be used for online binary backups. Backups taken with pg_basebackup can be used later for point-in-time recovery and as the starting point for a log shipping or streaming replication standby servers.

    That’s it for now. Do give ClusterControl v1.5 a try, play around with the new features and let us know what you think.

    by ashraf at November 21, 2017 10:59 AM

    November 20, 2017

    Peter Zaitsev

    InnoDB Page Compression: the Good, the Bad and the Ugly

    InnoDB Page CompressionIn this blog post, we’ll look at some of the facets of InnoDB page compression.

    Somebody recently asked me about the best way to handle JSON data compression in MySQL. I took a quick look at InnoDB page compression and wanted to share my findings.

    There is also some great material on this topic that was prepared and presented by Yura Sorokin at Percona Live Europe 2017: Yura also implemented Compressed Columns in Percona Server.

    First, the good part.

    InnoDB page compression is actually really easy to use and provides a decent compression ratio. To use it, I just ran

    CREATE TABLE commententry (...) COMPRESSION="zlib";
     – and that’s all. By the way, for my experiment I used the subset of Reddit comments stored in JSON (described here: Big Dataset: All Reddit Comments – Analyzing with ClickHouse).

    This method got me a compressed table of 3.9GB. Compare this to 8.4GB for an uncompressed table and it’s about a 2.15x compression ratio.

    Now, the bad part.

    As InnoDB page compression uses “hole punching,” the standard Linux utils do not always properly support files created this way. In fact, to see the size “3.9GB” I had to use

    du --block-size=1 tablespace_name.ibd
     , as the standard
    ls -l tablespace_name.ibd
     shows the wrong size (8.4GB). There is a similar limitation on copying files. The standard way
    cp old_file new_file
     may not always work, and to be sure I had to use
    cp --sparse=always old_file new_file

    Speaking about copying, here’s the ugly part.

    The actual time to copy the sparse file was really bad.

    On a fairly fast device (a Samsung SM863), copying the sparse file mentioned above in its compressed size of 3.9GB took 52 minutes! That’s shocking, so let me repeat it again: 52 minutes to copy a 3.9GB file on an enterprise SATA SSD.

    By comparison, copying regular 8.4GB file takes 9 seconds! Compare 9 sec and 52 mins.

    To be fair, the NMVe device (Intel® SSD DC D3600) handles sparse files much better. It took only 12 seconds to copy the same sparse file on this device.

    Having considered all this, it is hard to recommend that you use InnoDB page compression for serious production. Well, unless you power your database servers with NVMe storage.

    For JSON data, the Compressed Columns in Percona Server for MySQL should work quite well using Dictionary to store JSON keys – give it a try!

    by Vadim Tkachenko at November 20, 2017 06:54 PM

    Valeriy Kravchuk

    How lsof Utility May Help MySQL DBAs

    While working in Support, I noticed that probably at least once a week I have to use or mention lsof utility in some context. This week, for example, we had a customer trying to find out if his mysqld process running is linked with tcmalloc library. He started it different ways, using LD_PRELOAD directly and --malloc-lib option of mysqld_safe script etc, but wanted to verify that his attempts really worked as expected. My immediate comment in the internal chat was: "Just let them run lsof -p `pidof mysqld` | grep mall and check!" My MariaDB 10.2 instance uses jemalloc and this can be checked exactly the same way:
    openxs@ao756:~/dbs/maria10.2$ ps aux | grep mysqld...
    openxs    4619  0.0  0.0   4452   804 pts/2    S    17:02   0:00 /bin/sh bin/mysqld_safe --no-defaults --port=3308 --malloc-lib=/usr/lib/x86_64-linux-gnu/
    openxs    4734  0.5  2.9 876368 115156 pts/2   Sl   17:02   0:00 /home/openxs/dbs/maria10.2/bin/mysqld --no-defaults --basedir=/home/openxs/dbs/maria10.2 --datadir=/home/openxs/dbs/maria10.2/data --plugin-dir=/home/openxs/dbs/maria10.2/lib/plugin --log-error=/home/openxs/dbs/maria10.2/data/ao756.err --port=3308
    openxs    5391  0.0  0.0  14652   964 pts/2    S+   17:05   0:00 grep --color=auto mysqld
    openxs@ao756:~/dbs/maria10.2$ lsof -p 4734 | grep mall
    mysqld  4734 openxs  mem    REG              252,2    219776 12058822 /usr/lib/x86_64-linux-gnu/
    I think it's time to summarize most important use cases for lsof utility for MySQL DBAs. I am going to show different cases when it can be useful based on public MySQL bug reports. 

    As one can read in the manual, lsof "lists on its standard output file information about files opened by processes". In one of the simplest possible calls presented above, we just pass PID of the process after -p option and get list of open files for this process. This includes shared libraries the process uses. By default the following format of the output is used:
    openxs@ao756:~/dbs/maria10.2$ lsof -p 4734 | more
    COMMAND  PID   USER   FD   TYPE             DEVICE  SIZE/OFF     NODE NAMEmysqld  4734 openxs  cwd    DIR              252,2      4096 29638597 /home/openxs/dbs/maria10.2/data
    mysqld  4734 openxs  rtd    DIR              252,2      4096        2 /
    mysqld  4734 openxs  txt    REG              252,2 147257671 29514843 /home/openxs/dbs/maria10.2/bin/mysqld
    mysqld  4734 openxs  mem    REG              252,2     31792  1311130 /lib/x86_64-linux-gnu/
    mysqld  4734 openxs  mem    REG              252,2 101270905 29241175 /home/openxs/dbs/maria10.2/lib/plugin/

    mysqld  4734 openxs  DEL    REG               0,11             443265 /[aio]
    mysqld  4734 openxs    0r   CHR                1,3       0t0     1050 /dev/null
    mysqld  4734 openxs    1w   REG              252,2     52623  5255961 /home/openxs/dbs/maria10.2/data/.rocksdb/LOG
    mysqld  4734 openxs    2w   REG              252,2    458880 29647192 /home/openxs/dbs/maria10.2/data/ao756.err
    mysqld  4734 openxs    3r   DIR              252,2      4096  5255249 /home/openxs/dbs/maria10.2/data/.rocksdb
    mysqld  4734 openxs  451u  IPv6             443558       0t0      TCP *:3308 (LISTEN)
    mysqld  4734 openxs  452u  unix 0x0000000000000000       0t0   443559 /tmp/mysql.sock
    mysqld  4734 openxs  470u   REG              252,2         0 29756970 /home/openxs/dbs/maria10.2/data/mysql/event.MYD
    mysqld  4734 openxs  471u   REG              252,2         0 29647195 /home/openxs/dbs/maria10.2/data/
    Most columns have obvious meaning, so let me concentrate on the few. FD should be numeric file descriptor, and it is for normal files. In this case it is also followed by a letter describing the mode under which the file is open (r for read, w for write and u for update). There may be one more letter describing a type of lock applied to the file. But we can see values without any single digit in the output above, so obviously some special values can be present there, like cwd for current working directory, rtd for root directory, txt for program text (code and data) or mem for memory mapped file, etc.

    TYPE column is also interesting and may have plenty of values (as there are many types of files in Linux). REG means regular file, DIR is, obviously, a directory. Note also unix for a socket and IPv6 for the TCP port mysqld process listens to.

    In SIZE/OFF column for normal files we usually see their size in bytes. Values for offset in file are usually prefixed with 0t if the value is decimal, or 0x if it's hex. NAME is obviously a fully specified file name (with symbolic links resolved). Some more details about the output format are discussed in the following examples.

    Another usual way to use lsof is to pass a file name and get details about processes that have it opened, like this:
    openxs@ao756:~/dbs/maria10.2$ lsof /tmp/mysql.sock
    mysqld  4734 openxs  452u  unix 0x0000000000000000      0t0 443559 /tmp/mysql.sock
    openxs@ao756:~/dbs/maria10.2$ lsof /home/openxs/dbs/maria10.2
    mysqld_sa  4619 openxs  cwd    DIR  252,2     4096 29235594 /home/openxs/dbs/maria10.2
    lsof      14354 openxs  cwd    DIR  252,2     4096 29235594 /home/openxs/dbs/maria10.2
    lsof      14355 openxs  cwd    DIR  252,2     4096 29235594 /home/openxs/dbs/maria10.2
    bash      29244 openxs  cwd    DIR  252,2     4096 29235594 /home/openxs/dbs/maria10.2
    In this case we see that /home/openxs/dbs/maria10.2 is used as a current working directory by 4 processes. Usually this kind of check is used when we can not unmount some directory, but it may be also useful in context of MySQL when you get error messages that some file is already used by other process. In the first example above I was checking what process could use /tmp/mysql.sock file.

    Now, with the above details on basic usage in mind, let's check several recent enough MySQL bug reports that demonstrate typical and more advanced usage of lsof:
    • Bug #66237 - "Temporary files created by binary log cache are not purged after transaction commit". My former colleague and mentor from Percona, Miguel Angel Nieto (who recently joined a dark side of MongoDB employees) used lsof to show numerous files with names ML* created and left (until connection is closed) by mysqld process in /tmp directory (tmpdir to be precize) of a server with binary logging enabled, when transaction size was larger that binlog cache size. The bug is fixed in 5.6.17+ and 5.7.2+. It shows us a usual way of creating temporary files by MySQL server:
      # lsof -p 6112|grep ML
      mysqld 6112 root 38u REG 7,0 106594304 18 /tmp/MLjw4ecJ (deleted)
      mysqld 6112 root 39u REG 7,0 237314310 17 /tmp/MLwdWDGW (deleted)
      Notice (deleted) above. This is a result of immediate call to unlink() when temporary files are created. Check this in the source code, as well as my_delete() implementation.
    • Bug #82870 - "mysqld opens too many descriptors for slow query log". This bug (that is still "Verified") was opened by my former colleague Sveta Smirnova (now in Percona). Basically, mysqld opens too many descriptors for slow query log (and general query log) if it is turned ON and OFF while concurrent sessions are running. lsof allowed to see multiple descriptors created for the same file, until eventually open_files_limit is hit.
    • Bug #83434 - "Select statement with partition selection against MyISAM table opens all partitions". This bug (later declared a duplicate of older one and, eventually, a documented, even if unexpected, behavior by design) was opened by my colleague from MariaDB Geoff Montee. lsof utility helped to show that all partitions are actually opened by the mysqld process in this case.
    • Bug #74145 - "FLUSH LOGS improperly disables the logging if the log file cannot be accessed". This bug (still "Verified") was reported by Jean Weisbuch. Here we can see how lsof was used to find out if slow log is open after FLUSH. The logging has to be disabled, but MySQL continue to lie that it is enabled. I remember many cases when lsof also helped to find out where the error log (file with descriptor 2w) is really located/redirected to.
    • Bug #77752 - "bind-address wrongly prefers IPv4 over IPv6". This was not a bug (more like a configuration issue), but see how lsof -i is used by Daniël van Eeden to find out what process listens to a specific port, and does it listen to IPv4 or IPv6 address.
    • Bug #87589 - "Documentation incorrectly states that LOAD DATA LOCAL INFILE does not use tmpdir". In this "Verified" bug report Geoff Montee used lsof to show that temporary files are really created in tmpdir, not in /tmp (OS temporary directory). This is how you can find out when MySQL manual lies...
    • Bug #77519 - "Reported location of Innodb Merge Temp File is wrong". One more bug from Daniël van Eeden, this time "Verified". By calling lsof +L1 during an online alter table, he demonstrated that two temp files are created in tmpdir instead of in the datadir (as described by the manual), while events_waits_history_long table in performance_schema seems to claim it waited ion temporary file in the datadir. Note that in other his bug report, Bug #76225, fixed since 5.7.9 and 5.8.0, he had also shown ML* binlog cache files created that were not instrumented by performance_schema.
    • Bug #75706 - "alter table import tablespace creates a temporary table". This bug report by BJ Quinn is formally still "Verified", but according to my former colleague Przemyslaw Malkowski from Percona in recent 5.6.x and 5.7.x versions lsof does NOT show temporary table created. Time to re-verify this bug maybe, if the decision is made on how to implement this?
    • Bug #83717 - "Manual does not explain when ddl_log.log file is deleted and how large it can be". My own bug report where lsof was used to show that the ddl_log.log file remains open even after online ALTER completes. Manual is clear about this now.
    To summarize, lsof may help MySQL DBA to find out:
    • what dynamic libraries are really used by the mysqld process
    • where the error log and other logs are really located
    • what other process may have some file, port or socket opened that is needed for current MySQL instance
    • why you may hit open_files_limit or use all free space in some filesystem unexpectedly
    • where all kinds of temporary files are created during specific operations
    • how MySQL really works with files, ports and sockets
    It also allows to find MySQL bugs and clarify missing details in MySQL manual.

    by Valeriy Kravchuk ( at November 20, 2017 11:00 AM

    November 17, 2017

    Jean-Jerome Schmidt

    Several Ways to Intentionally Fail or Crash your MySQL Instances for Testing

    You can take down a MySQL database in multiple ways. Some obvious ways are to shut down the host, pull out the power cable, or hard kill the mysqld process with SIGKILL to simulate an unclean MySQL shutdown behaviour. But there are also less subtle ways to deliberately crash your MySQL server, and then see what kind of chain reaction it triggers. Why would you want to do this? Failure and recovery can have many corner cases, and understanding them can help reduce the element of surprise when things happen in production. Ideally, you would want to simulate failures in a controlled environment, and then design and test database failover procedures.

    There are several areas in MySQL that we can tackle, depending on how you want it to fail or crash. You can corrupt the tablespace, overflow the MySQL buffers and caches, limit the resources to starve the server, and also mess around with permissions. In this blog post, we are going to show you some examples of how to crash a MySQL server in a Linux environment. Some of them would be suitable for e.g. Amazon RDS instances, where you would have no access to the underlying host.

    Kill, Kill, Kill, Die, Die, Die

    The easiest way to fail a MySQL server is to simply kill the process or host, and not give MySQL a chance to do a graceful shutdown. To simulate a mysqld crash, just send signal 4, 6, 7, 8 or 11 to the process:

    $ kill -11 $(pidof mysqld)

    When looking at the MySQL error log, you can see the following lines:

    11:06:09 UTC - mysqld got signal 11 ;
    This could be because you hit a bug. It is also possible that this binary
    or one of the libraries it was linked against is corrupt, improperly built,
    or misconfigured. This error can also be caused by malfunctioning hardware.
    Attempting to collect some information that could help diagnose the problem.
    As this is a crash and something is definitely wrong, the information
    collection process might fail.
    Attempting backtrace. You can use the following information to find out
    where mysqld died. If you see no messages after this, something went
    terribly wrong...

    You can also use kill -9 (SIGKILL) to kill the process immediately. More details on Linux signal can be found here. Alternatively, you can use a meaner way on the hardware side like pulling off the power cable, pressing down the hard reset button or using a fencing device to STONITH.

    Triggering OOM

    Popular MySQL in the cloud offerings like Amazon RDS and Google Cloud SQL have no straightforward way to crash them. Firstly because you won't get any OS-level access to the database instance, and secondly because the provider uses a proprietary patched MySQL server. One ways is to overflow some buffers, and let the out-of-memory (OOM) manager to kick out the MySQL process.

    You can increase the sort buffer size to something bigger than what the RAM can handle, and shoot a number of mysql sort queries against the MySQL server. Let's create a 10 million rows table using sysbench on our Amazon RDS instance, so we can build a huge sort:

    $ sysbench \
    --db-driver=mysql \
    --oltp-table-size=10000000 \
    --oltp-tables-count=1 \
    --threads=1 \ \
    --mysql-port=3306 \
    --mysql-user=rdsroot \
    --mysql-password=password \
    /usr/share/sysbench/tests/include/oltp_legacy/parallel_prepare.lua \

    Change the sort_buffer_size to 5G (our test instance is db.t2.micro - 1GB, 1vCPU) by going to Amazon RDS Dashboard -> Parameter Groups -> Create Parameter Group -> specify the group name -> Edit Parameters -> choose "sort_buffer_size" and specify the value as 5368709120.

    Apply the parameter group changes by going to Instances -> Instance Action -> Modify -> Database Options -> Database Parameter Group -> and choose our newly created parameter group. Then, reboot the RDS instance to apply the changes.

    Once up, verify the new value of sort_buffer_size:

    MySQL [(none)]> select @@sort_buffer_size;
    | @@sort_buffer_size |
    |         5368709120 |

    Then fire 48 simple queries that requires sorting from a client:

    $ for i in {1..48}; do (mysql -urdsroot -ppassword -e 'SELECT * FROM sbtest.sbtest1 ORDER BY c DESC >/dev/null &); done

    If you run the above on a standard host, you will notice the MySQL server will be terminated and you can see the following lines appear in the OS's syslog or dmesg:

    [164199.868060] Out of memory: Kill process 47060 (mysqld) score 847 or sacrifice child
    [164199.868109] Killed process 47060 (mysqld) total-vm:265264964kB, anon-rss:3257400kB, file-rss:0kB

    With systemd, MySQL or MariaDB will be restarted automatically, so does Amazon RDS. You can see the uptime for our RDS instance will be resetted back to 0 (under mysqladmin status), and the 'Latest restore time' value (under RDS Dashboard) will be updated to the moment it went down.

    Corrupting the Data

    InnoDB has its own system tablespace to store data dictionary, buffers and rollback segments inside a file named ibdata1. It also stores the shared tablespace if you do not configure innodb_file_per_table (enabled by default in MySQL 5.6.6+). We can just zero this file, send a write operation and flush tables to crash mysqld:

    # empty ibdata1
    $ cat /dev/null > /var/lib/mysql/ibdata1
    # send a write
    $ mysql -uroot -p -e 'CREATE TABLE sbtest.test (id INT)'
    # flush tables

    After you send a write, in the error log, you will notice:

    2017-11-15T06:01:59.345316Z 0 [ERROR] InnoDB: Tried to read 16384 bytes at offset 98304, but was only able to read 0
    2017-11-15T06:01:59.345332Z 0 [ERROR] InnoDB: File (unknown): 'read' returned OS error 0. Cannot continue operation
    2017-11-15T06:01:59.345343Z 0 [ERROR] InnoDB: Cannot continue operation.

    At this point, mysql will hang because it cannot perform any operation, and after the flushing, you will get "mysqld got signal 11" lines and mysqld will shut down. To clean up, you have to remove the corrupted ibdata1, as well as ib_logfile* because the redo log files cannot be used with a new system tablespace that will be generated by mysqld on the next restart. Data loss is expected.

    For MyISAM tables, we can mess around with .MYD (MyISAM data file) and .MYI (MyISAM index) under the MySQL datadir. For instance, the following command replaces any occurrence of string "F" with "9" inside a file:

    $ replace F 9 -- /var/lib/mysql/sbtest/sbtest1.MYD

    Then, send some writes (e.g, using sysbench) to the target table and perform the flushing:

    mysql> FLUSH TABLE sbtest.sbtest1;

    The following should appear in the MySQL error log:

    2017-11-15T06:56:15.021564Z 448 [ERROR] /usr/sbin/mysqld: Incorrect key file for table './sbtest/sbtest1.MYI'; try to repair it
    2017-11-15T06:56:15.021572Z 448 [ERROR] Got an error from thread_id=448, /export/home/pb2/build/sb_0-24964902-1505318733.42/rpm/BUILD/mysql-5.7.20/mysql-5.7.20/storage/myisam/mi_update.c:227

    The MyISAM table will be marked as crashed and running REPAIR TABLE statement is necessary to make it accessible again.

    Limiting the Resources

    We can also apply the operating system resource limit to our mysqld process, for example number of open file descriptors. Using open_file_limit variable (default is 5000) allows mysqld to reserve file descriptors using setrlimit() command. You can set this variable relatively small (just enough for mysqld to start up) and then send multiple queries to the MySQL server until it hits the limit.

    If mysqld is running in a systemd server, we can set it in the systemd unit file located at /usr/lib/systemd/system/mysqld.service, and change the following value to something lower (systemd default is 6000):

    # Sets open_files_limit
    LimitNOFILE = 30

    Apply the changes to systemd and restart MySQL server:

    $ systemctl daemon-reload
    $ systemctl restart mysqld

    Then, start sending new connections/queries that count in different databases and tables so mysqld has to open multiple files. You will notice the following error:

    2017-11-16T04:43:26.179295Z 4 [ERROR] InnoDB: Operating system error number 24 in a file operation.
    2017-11-16T04:43:26.179342Z 4 [ERROR] InnoDB: Error number 24 means 'Too many open files'
    2017-11-16T04:43:26.179354Z 4 [Note] InnoDB: Some operating system error numbers are described at
    2017-11-16T04:43:26.179363Z 4 [ERROR] InnoDB: File ./sbtest/sbtest9.ibd: 'open' returned OS error 124. Cannot continue operation
    2017-11-16T04:43:26.179371Z 4 [ERROR] InnoDB: Cannot continue operation.
    2017-11-16T04:43:26.372605Z 0 [Note] InnoDB: FTS optimize thread exiting.
    2017-11-16T04:45:06.816056Z 4 [Warning] InnoDB: 3 threads created by InnoDB had not exited at shutdown!

    At this point, when the limit is reached, MySQL will freeze and it will not be able to perform any operation. When trying to connect, you would see the following after a while:

    $ mysql -uroot -p
    ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 104

    Messing up with Permissions

    The mysqld process runs by "mysql" user, which means all the files and directory that it needs to access are owned by mysql user/group. By messing up with the permission and ownership, we can make the MySQL server useless:

    $ chown root:root /var/lib/mysql
    $ chmod 600 /var/lib/mysql

    Generate some loads to the server and then connect to the MySQL server and flush all tables onto disk:


    At this moment, mysqld is still running but it's kind of useless. You can access it via a mysql client but you can't do any operation:

    mysql> SHOW DATABASES;
    ERROR 1018 (HY000): Can't read dir of '.' (errno: 13 - Permission denied)

    To clean up the mess, set the correct permissions:

    $ chown mysql:mysql /var/lib/mysql
    $ chmod 750 /var/lib/mysql
    $ systemctl restart mysqld

    Lock it Down

    FLUSH TABLE WITH READ LOCK (FTWRL) can be destructive in a number of conditions. Like for example, in a Galera cluster where all nodes are able to process writes, you can use this statement to lock down the cluster from within one of the nodes. This statement simply halts other queries to be processed by mysqld during the flushing until the lock is released, which is very handy for backup processes (MyISAM tables) and file system snapshots.

    Although this action won't crash or bring down your database server during the locking, the consequence can be huge if the session that holds the lock does not release it. To try this, simply:

    mysql> exit

    Then send a bunch of new queries to the mysqld until it reaches the max_connections value. Obviously, you can not get back the same session as the previous one once you are out. So the lock will be running infinitely and the only way to release the lock is by killing the query, by another SUPER privilege user (using another session). Or kill the mysqld process itself, or perform a hard reboot.


    This blog is written to give alternatives to sysadmins and DBAs to simulate failure scenarios with MySQL. Do not try these on your production server :-)

    by ashraf at November 17, 2017 03:12 PM

    Peter Zaitsev

    This Week in Data with Colin Charles 15: Percona Live 2018 Call for Papers and Best Practices for Observability

    Colin Charles

    Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

    So we have announced the call for presentations for Percona Live Santa Clara 2018. Please send your submissions in!

    As you probably already know, we have been expanding the content to be more than just MySQL and MongoDB. It really does include more open source databases: the whole of 2016 had a “time series” theme to it, and we of course love to have more PostgreSQL content (there have been tracks dedicated to PostgreSQL for sometime now). I found this one comment interesting recently, from John Arundel, “If you’re going to learn one database really well, make it Postgres.” I have been noticing newer developers jump on the PostgreSQL bandwagon. I presume much of this blog’s readership is still MySQL centric, but it will be interesting to see where this goes.

    Charity Majors recently wrote Best Practices for Observability. In addition, her book alongside Laine Campbell is now available for purchase on Kindle: Database Reliability Engineering: Designing and Operating Resilient Database Systems. Highly recommended purchase. You can also get it on O’Reilly Safari (free month with those codes for Percona Live Europe Dublin attendees).

    Are you using Google Cloud Spanner? It now has multi-region support, and has an updated SLA for 99.999% uptime. That’s basically no more than 5.25 minutes of downtime per year!


    • orchestrator 3.0.3 – auto-provisioning Raft nodes, native Consul support, SQLite or MySQL backed setups, web UI improvements and more. Solid release.
    • MongoDB 3.6 – you can download this soon.
    • MariaDB 10.1.29 – important changes to Mariabackup, InnoDB/XtraDB, and some security fixes
    • Apache Kylin 2.2 – OLAP for Hadoop, originally developed at eBay, has enhanced ACL support amongst other improvements.
    • Cassandra on Azure Cosmos DB

    Link List

    Upcoming Appearances

    • ACMUG 2017 gathering – Beijing, China, December 9-10 2017 – it was very exciting being there in 2016, I can only imagine it’s going to be be bigger and better for 2017, since it is now two days long!


    I look forward to feedback/tips via e-mail at or on Twitter @bytebot.

    by Colin Charles at November 17, 2017 02:10 PM

    November 16, 2017

    Shlomi Noach

    orchestrator 3.0.3: auto provisioning raft nodes, native Consul support and more

    orchestrator 3.0.3 is released! There's been a lot going on since 3.0.2:

    orchestrator/raft: auto-provisioning nodes via lightweight snaphsots

    In an orchestrator/raft setup, we have n hosts forming a raft cluster. In a 3-node setup, for example, one node can go down, and still the remaining two will form a consensus, keeping the service operational. What happens when the failed node returns?

    With 3.0.3 the failed node can go down for as long as it wants. Once it comes back, it attempts to join the raft cluster. A node keeps its own snapshots and its raft log outside the relational backend DB. If it has recent-enough data, it just needs to catch up with raft replication log, which is acquires from one of the active nodes.

    If its data is very stale, it will request a snapshot from an active node, which it will import, and will just resume from that point.

    If its data is gone, that's not a problem. It gets a snapshot from an active node, improts it, and keeps running from that point.

    If it's a newly provisioned box, that's not a problem. It gets a snapshot from an active node, ... etc.

    • SQLite backed setups can just bootstrap new nodes. No need to dump+load or import any data.
      • Side effect: you may actually use :memory:, where SQLite does not persist any data to disk. Remember that the raft snapshots and replication log will cover you. The cheat is that the raft replication log itself is managed and persisted by an independent SQLite database.
    • MySQL backed setups will still need to make sure orchestrator has the privileges to deploy itself.

    More info in the docs.

    This plays very nicely into the hands of kubernetes, which is on orchestrator's roadmap.

    Key Value, native Consul support (Zk TODO)

    orchestrator now supports Key-Value stores built-in, and Consul in particular.

    At this time the purpose of orchestrator KV is to support master discovery. orchestrator will write the identity of the master of each cluster to KV store. The user will use that information to apply changes to their infrastructure.

    For example, the user will rely on Consul KV entries, written by orchestrator, to generate proxy config files via consul-template, such that traffic is directed via the proxy onto the correct master.

    orchestrator supports:

    • Manually writing identity of cluster's master to KV store
      • e.g. `orchestrator-client -c submit-masters-to-kv-stores -alias mycluster`
    • Automatically updating master's identify upon failover

    Key-value pairs are in the form of <cluster-alias>-&lt;master&gt;. For example:

    • Key is `main_cluster`
    • Value is

    Web UI improvements

    Using the web UI, you can now:

    • Promote a new master

      graceful takeover via ui

      Dragging onto the left part of the master's box implies promoting a new server. Dragging onto the right side of a master's box means relocation a server below the master.

    • "reverse" replication (take local master)

      take master via UI

      Dragging onto the left part of a server's local master implies taking over the master. Dragging onto the right part of a server's local master implies relocating a server below that local master.

    • Work in quiet mode: click `mute` icon on the left sidebar to avoid being prompted when relocating replicas. You'll still be prompted for risky operations such as master promotion.

    Other noteworthy changes

    • Raft advertise addresses: a contribution by Sami Ahlroos allows orchestrator/raft to work over NAT, and `kubernetes` in particular.
    • Sparser histories: especially for the `orchestrator/raft` setup, but true in general, we wish to keep the `orchestrator` backend database lightweight. orchestrator will now keep less history than it used to.
      • Detection/recovery history is kept for 7 days
      • Encouraging general audit to go to log file instead of `audit` table.
    • Building via go1.9 which will soon become a requirement for developers wishing to build `orchestrator` on their own.


    We're looking to provision orchestrator on kubernetes, and will publish as much of that work as possible.

    There's many incoming feature requests from the community and we'll try and address them where it makes sense and time allows. We greatly appreciate all input from the community!


    orchestrator is free and open source, released under the Apache 2 license.

    Source & binary releases are available from the GitHub repository:

    Packages are also available in package cloud.

    by shlomi at November 16, 2017 09:38 AM

    November 15, 2017

    Peter Zaitsev

    Understanding how an IST donor is selected

    IST donor cluster

    IST donor clusterIn a clustering environment, we often see a node that needs to be taken down for maintenance. For a node to rejoin, it should re-sync with the cluster state. In PXC (Percona XtraDB Cluster), there are 2 ways for the rejoining node to re-sync: State Snapshot Transfer (SST) and Incremental State Transfer (IST). SST involves a full data transfer (which could be time consuming). IST is an incremental data transfer whereby only missing write-sets are donated by a DONOR to the rejoining node (aka as JOINER).

    In this article I will try to show how a DONOR for the IST process is selected.

    Selecting an IST DONOR

    First, a word about gcache. Each node retains some write-sets in its cache known as gcache. Once this gcache is full it is purged to make room for new write-sets. Based on gcache configuration, each node may retain a different span of write-sets. The wider the span, the greater the probability of the node acting as prospective DONOR. The lowest seqno in gcache can be queried using ( 

    show status like 'wsrep_local_cached_downto'

    Let’s understand the IST DONOR algorithm with a topology and working example:

    • Say we have 3 node cluster: N1, N2, N3.
    • To start with, all 3 nodes are in sync (wsrep_last_committed is the same for all 3 nodes, let’s say 100).
    • N3 is schedule for maintenance and is taken down.
    • In meantime N1 and N2 processes workload, thereby moving them from 100 -> 1100.
    • N1 and N2 also purges the gcache. Let’s say wsrep_local_cached_downto for N1 and N2 is 110 and 90 respectively.
    • Now N3 is restarted and discovers that the cluster has made progress from 100 -> 1100 and so it needs the write-sets from (101, 1100).
    • It starts looking for a prospective DONOR.
      • N1 can service data from (110, 1100) but the request is for (101, 1100) so N1 can’t act as DONOR
      • N2 can service data from (90, 1100) and the request is for (101, 1100) so N2 can act as DONOR.

    Safety gap and how it affects DONOR selection

    So far so good. But can N2 reliably act as DONOR? While N3 is evaluating the prospective DONOR, what if N2 purges more data and now wsrep_local_cached_downto on N2 is 105? In order to accommodate this, the N3 algorithm adds a safety gap.

    safety gap = (Current State of Cluster – Lowest available seqno from any of the existing node of the cluster) * 0.008

    So the N2 range is considered to be (90 + (1100 – 90) * 0.008, 1100) = (98, 1100).

    Can now N2 act as DONOR ? Yes: (98, 1100) < (101, 1100)

    What if N2 had purged up to 95 and then N3 started looking for prospective DONOR?

    In this case the N2 range would be (95 + (1100 – 95) * 0.008, 1100) = (103, 1100), ruling N2 out from the prospective DONOR list.

    Twist at the end

    Considering the latter case above (N2 purged up to 95), it has been proven that N2 can’t act as the IST DONOR and the only way for N3 to join is through SST.

    What if I say that N3 still joins back using IST? CONFUSED?

    Once N3 falls back from IST to SST it will select a SST donor. This selection is done sequentially and nominates N1 as the first choice. N1 doesn’t have the required write-sets, so SST is forced.

    But what if I configure

      on N3? This will cause N2 to get selected instead of N1. But wait: N2 doesn’t qualify either as with safety gap, the range is (103, 1100).

    That’s true. But the request has IST + SST request, so even though N3 ruled out N2 as the IST DONOR, a request is sent for one last try. If N2 can service the request using IST, it is allowed to do so.  Otherwise it falls back to SST.

    Interesting! This is a well thought out algorithm from Codership: I applaud them for this and the many other important control functions that go on backstage of the galera cluster.

    by Krunal Bauskar at November 15, 2017 04:11 PM

    MariaDB Foundation

    Microsoft joins the MariaDB Foundation as a Platinum level sponsor

    MariaDB Foundation today announced that Microsoft has become a platinum sponsor. The sponsorship will help the Foundation in its goals to support continuity and open collaboration in the MariaDB ecosystem, and to drive adoption, serving an ever growing community of users and developers. “Joining the MariaDB Foundation as a Platinum member is a natural next […]

    The post Microsoft joins the MariaDB Foundation as a Platinum level sponsor appeared first on

    by Otto Kekäläinen at November 15, 2017 03:10 PM

    Peter Zaitsev

    ZFS from a MySQL perspective

    Open ZFS logo

    Open ZFS logoSince the purpose of a database system is to store data, there is close relationship with the filesystem. As MySQL consultants, we always look at the filesystems for performance tuning opportunities. The most common choices in term of filesystems are XFS and EXT4, on Linux it is exceptional to encounter another filesystem. Both XFS and EXT4 have pros and cons but their behaviors are well known and they perform well. They perform well but they are not without shortcomings.

    Over the years, we have developed a bunch of tools and techniques to overcome these shortcomings. For example, since they don’t allow a consistent view of the filesystem, we wrote tools like Xtrabackup to backup a live MySQL database. Another example is the InnoDB double write buffer. The InnoDB double write buffer is required only because neither XFS nor EXT4 is transactional. There is one filesystem which offers nearly all the features we need, ZFS.  ZFS is arguably the most advanced filesystem available on Linux. Maybe it is time to reconsider the use of ZFS with MySQL.

    ZFS on Linux or ZoL (from the OpenZFS project), has been around for quite a long time now. I first started using ZoL back in 2012, before it was GA (general availability), in order to solve a nearly impossible challenge to backup a large database (~400 GB) with a mix of InnoDB and MyISAM tables. Yes, ZFS allows that very easily, in just a few seconds. As of 2017, ZoL has been GA for more than 3 years and most of the issues that affected it in the early days have been fixed. ZFS is also GA in FreeBSD, illumos, OmniOS and many others.

    This post will hopefully be the first of many posts, devoted to the use of ZFS with MySQL. The goal here is not to blindly push for ZFS but to see when ZFS can help solve real problems. We will first examine ZFS and try to draw parallels with the architecture of MySQL. This will help us to better understand how ZFS works and behaves. Future posts will be devoted to more specific topics like performance, PXC, backups, compression, database operations, bad and poor use cases and sample configurations.

    Some context

    ZFS is a filesystem that was developed by Sun Microsystems and introduced for the first time in with OpenSolaris in 2005. ZFS is unique in many ways, let’s first have a look at its code base using the sloccount tool which provides an estimation of the development effort.

    • EXT4: 8.5 person-years
    • XFS: 17 person-years
    • ZFS: 77 person-years

    graph of the estimated development efforts for ZFS versus other filesystems

    In term of code base complexity, it is approaching 10 times the complexity of EXT4, the above graphic shows the scale. To put things in perspective, the sloccount development effort for Percona-Server 5.7 which is based on MySQL community 5.7, is estimated at 680 person-years. The ZoL development is sponsored by the Lawrence Livermore National Laboratory and the project is very active.

    ZFS features

    Why does ZFS need such a large code base? Well, in Linux, it functionally replaces MD (software raid), LVM (volume manager) and the filesystem. ZFS is really a transactional database designed to support filesystem operations. Let’s review the ZFS main features.

    128 bits filesystem

    That’s huge! According to Jeff Bonwick (, the rest energy of such a storage device would be enough to boil the oceans.  It seems inconceivable that we’d ever need a larger filesystem.

    Copy-on-write (COW)

    When ZFS needs to update a record it does not overwrite it. Instead, it writes a new record, change the pointers and then frees up the old one if it is no longer referenced. That design is at the core of ZFS. It allows for features like free snapshots and transactions.


    ZFS supports snapshots, and because of its COW architecture taking a snapshot is merely a matter of recording a transaction number and telling ZFS to protect the referenced records from its garbage collector. This is very similar to the InnoDB MVCC. If a read view is kept open, InnoDB keeps a copy of each of the rows that changed in the undo log, and those rows are not purged until the transaction commits.


    A ZFS snapshot can be cloned and then written too. At this point, the clone is like a fork for the original data. There is no equivalent feature in MySQL/InnoDB.


    All the ZFS records have a checksum. This is exactly like the page checksums of InnoDB. If a record is found to have an invalid checksum, it is automatically replaced by a copy, provided one is available. It is normal to define a ZFS production with more than one copy of the data set. With ZFS, we can safely disable InnoDB checksums.


    ZFS records can be compressed transparently. The most common algorithms are gzip and lz4. The data is compressed per record and the recordsize is an adjustable property. The principle is similar to transparent InnoDB page compression but without the need for punching holes. In nearly all the ZFS setups I have worked with, enabling compression helped performance.


    ZoL doesn’t support transparent encryption of the records yet, but the encryption code is currently under review. If all goes well, the encryption should be available in a matter of a few months. Once there, it will offer another option for encryption at rest with MySQL. That feature compares very well with InnoDB tablespace encryption.


    An fsync on ZFS is transactional. This comes mainly from the fact that ZFS uses COW. When a file is opened with O_SYNC or O_DSYNC, ZFS behaves like a database where the fsync calls represent commits. The writes are atomic. The fsync calls return as soon as ZFS has written the data to the ZIL (ZFS Intent Log).  Later, a background process flushes the data accumulated in the ZIL to the actual data store. This flushing process is called at an interval of txg_timeout. By default, txg_timeout is set to 5s.  The process is extremely similar to the way InnoDB flushes pages.  A direct benefit for MySQL is the possibility of disabling the InnoDB doublewrite buffer. The InnoDB doublewrite buffer is often a source of contention in a heavy write environment, although the latest Percona Server releases have parallel doublewrite buffers that relieve most of the issue.


    The transactional support in ZFS bears a huge price in term of latency, since the synchronous writes and fsyncs involve many random write IO operations. Since ZFS is transactional, it needs a transactional journal, the ZIL. ZIL stands for ‘ZFS Intent Log’. There is always a ZIL. The ZIL serves a purpose very similar to the InnoDB log files. The ZIL is written to sequentially, is fsynced often, and read from only for recovery after a crash. The goal is to delay random write IO operations by writing sequentially pending changes to a device. By default the ZIL delays the actual writes by only 5s (zfs_txg_timeout) but that’s still very significant. To help synchronous write performance, ZFS has the possibility of locating the ZIL on a Separate Intent Log (SLOG).

    The SLOG device doesn’t need to be very large, a few GB is often enough, but it must be fast for sequential writes and fast for fsyncs. A fast flash device with good write endurance or spinners behind a raid controller with a protected write cache are great SLOG devices. Normally, the SLOG is on a redundant device like a mirror since losing the ZIL can be dramatic. With MySQL, the presence of a fast SLOG is extremely important for performance.


    The ARC is the ZFS file cache. It is logically split in two parts, the ARC and the L2ARC. The ARC is the in memory file cache, while the L2ARC is an optional on disk cache that stores items that are evicted from the ARC. The L2ARC is especially interesting with MySQL because it allows the use of a small flash storage device as a cache for a large slow storage device. Functionally, the ARC is like the InnoDB buffer pool while the L2ARC is similar to tools like flashcache/bcache/dm-cache.


    ZFS has its own way of dealing with disk. At the lowest level, ZFS can use the bare disks individually with no redundancy, a bit like JBOD devices used with LVM. Redundancy can be added with a mirror which is essentially a software RAID-1 device. These mirrors can then be striped together to form the equivalent of a RAID-10 array. Going further, there are RAIDZ-1, RAIDZ-2 and RAIDZ-3 which are respectively the equivalent of RAID-5, RAID-6 and RAID… Well, an array with 3 parities has no standard name yet. When you build a RAID array with Linux MD, you could have the RAID-5+ write hole issue if you do not have a write journal. The write journal option is available only in recent kernels and with the latest mdadm packages. ZFS is not affected by the RAID-5 write hole.


    I already touched on this feature when I talked about the checksums. If more than one copy of a record is available and one of the copies is found to be corrupted, ZFS will return only a valid copy and will repair the damaged record. You can trigger a full check with the


    ZVOL block devices

    Not only can ZFS manage filesystems, it can also offer block devices. The block devices, called ZVOLs, can be snapshotted and cloned. That’s a very handy feature when I want to create a cluster of similar VMs. I create a base image and then snaphot and create clones for all the VMs. The whole image is stored only once, and each clone contains only the records that have been modified since the original clone was created.


    ZFS allows you to send and receive snapshots. This feature is very useful to send data between servers. If there is already a copy of the data on the remote server, you can also send only the incremental changes.


    ZFS can automatically hardlink together files (or records) that have identical content. Although interesting, if you have a lot of redundant data, the dedup feature is very intensive. I don’t see a practical use case of dedup for databases except maybe for a backup server.

    This concludes this first post about ZFS, stay tuned for more.


    by Yves Trudeau at November 15, 2017 10:44 AM

    Jean-Jerome Schmidt

    The Galera Cluster & Severalnines Teams Present: How to Manage Galera Cluster with ClusterControl - The Replay

    Watch the replay of this joint webinar in which we combine forces with the Codership Galera Cluster Team to talk about how to manage Galera Cluster using ClusterControl!

    Galera Cluster has become one of the most popular high availability solutions for MySQL and MariaDB; and ClusterControl is the de facto automation and management system for Galera Cluster.

    In this webinar we’re joined by Seppo Jaakola, CEO of Codership - Galera Cluster, and together, we demonstrate what it is that makes Galera Cluster such a popular high availability solution for MySQL and MariaDB and how to best manage it with ClusterControl.

    We discuss the latest features of Galera Cluster with Seppo, one of its creators. And we also demo how to automate it all from deployment, monitoring, backups, failover, recovery, rolling upgrades and scaling using the new ClusterControl CLI and of course the ClusterControl GUI.

    Watch the replay


    • Introduction
      • About Codership, the makers of Galera Cluster
      • About Severalnines, the makers of ClusterControl
    • What’s new with Galera Cluster
      • Core feature set overview
      • The latest features
      • What’s coming up
    • ClusterControl for Galera Cluster
      • Deployment
      • Monitoring
      • Management
      • Scaling
    • Live Demo
    • Q&A


    Seppo Jaakola, Founder of Codership, has over 20 years experience in software engineering. He started his professional career in Digisoft and Novo Group Oy working as a software engineer in various technical projects. He then worked for 10 years in Stonesoft Oy as a Project Manager in projects dealing with DBMS development, data security and firewall clustering. In 2003, Seppo Jaakola joined Continuent Oy, where he worked as team leader for MySQL clustering product. This position linked together his earlier experience in DBMS research and distributed computing. Now he’s applying his years of experience and administrative skills to steer Codership to a right course. Seppo Jaakola has MSc degree in Software Engineering from Helsinki University of Technology.

    Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.

    by jj at November 15, 2017 08:33 AM

    MariaDB Foundation

    Presentations from the 2017 MariaDB Developers Unconference in Shenzhen

    The following sessions were held on the two presentation days of the MariaDB Developers Unconference in Shenzhen. Day 1 MariaDB in 2017 (Otto Kekäläinen) What’s in the pipeline for 10.3 and beyond (Monty) – Slides AliSQL Roadmap (Xiao Bin) JSON support in MariaDB (Vicențiu Ciorbaru) – Slides Replication (Lixun Peng) – Slides Encryption key management […]

    The post Presentations from the 2017 MariaDB Developers Unconference in Shenzhen appeared first on

    by Ian Gilfillan at November 15, 2017 08:17 AM

    MariaDB AB

    MariaDB Server 10.1.29 & MariaDB Galera Cluster 10.0.33 now available

    MariaDB Server 10.1.29 & MariaDB Galera Cluster 10.0.33 now available dbart Tue, 11/14/2017 - 22:28

    The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.1.29 and MariaDB Galera Cluster 10.0.33. See the release notes and changelogs for details and visit to download.

    Download MariaDB Server 10.1.29

    Release Notes Changelog About MariaDB Server 10.1

    Download MariaDB Server 10.2.10

    Release Notes Changelog About MariaDB Galera Cluster

    The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.1.29 and MariaDB Galera Cluster 10.0.33. See the release notes and changelogs for details.

    Login or Register to post comments

    by dbart at November 15, 2017 03:28 AM

    MariaDB Foundation

    MariaDB 10.1.29, MariaDB Galera Cluster 10.0.33 and MariaDB Connector/J Releases now available

    The MariaDB project is pleased to announce the availability of MariaDB 10.1.29, MariaDB Galera Cluster 10.0.33 and MariaDB Connector/J 2.2.0. See the release notes and changelogs for details. Download MariaDB 10.1.29 Release Notes Changelog What is MariaDB 10.1? MariaDB APT and YUM Repository Configuration Generator Download MariaDB Galera Cluster 10.0.33 Release Notes Changelog What is […]

    The post MariaDB 10.1.29, MariaDB Galera Cluster 10.0.33 and MariaDB Connector/J Releases now available appeared first on

    by Ian Gilfillan at November 15, 2017 01:29 AM

    November 14, 2017

    Peter Zaitsev

    Webinars on Wednesday November 15, 2017: Proxy Wars and Percona Software Update for Q4

    Webinars double bill

    Do you need to get to grips with MySQL proxies? Or maybe you could do with discovering the latest developments and plans for Percona’s software?

    Webinars double billWell, wait no more because …

    on Wednesday November 15, 2017, we bring you a webinar double bill.

    Securing Your MySQLJoin Percona’s Chief Evangelist, Colin Charles as he presents “The Proxy Wars – MySQL Router, ProxySQL, MariaDB MaxScale” at 7:00 am PST / 10:00 am EST (UTC-8).

    Reflecting on his past experience with MySQL proxies, Colin will provide a short review of three open source solutions. He’ll run through a comparison of MySQL Router, MariaDB MaxScale and ProxySQL and talk about the reasons for using the right tool for an application.


    Percona Live EuropeMeanwhile, return a little later in the day at 10:00 am PST / 1:00 pm EST (UTC-8) to hear Percona CEO Peter Zaitsev discuss what’s new in Percona open source software. In “Percona Software News and Roadmap Update – Q4 2017”, Peter will talk about new features in Percona software, show some quick demos and share highlights from the Percona open source software roadmap. He will also talk about new developments in Percona commercial services and finish with a Q&A.


    You are, of course, very welcome to register for either one or both webinars. Please register for your place soon!

    Peter Zaitsev, Percona CEO and Co-Founder

    Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 150 professionals in 30+ countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. Percona was named to the Inc. 5000 in 2013, 2014, 2015 and 2016. Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University, where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Database Performance Blog. Fortune and DZone have both tapped Peter as a contributor, and his recent ebook Practical MySQL Performance Optimization is one of’s most popular downloads.

    Colin Charles, Chief Evangelist

    Colin Charles is the Chief Evangelist at Percona. He was previously on the founding team for MariaDB Server in 2009, worked in MySQL since 2005 and been a MySQL user since 2000. Before joining MySQL, he worked actively on the Fedora and projects. He’s well known within many open source communities and has spoken on the conference circuit.


    by Peter Zaitsev at November 14, 2017 06:00 PM

    MariaDB AB

    What’s New in MariaDB Connector/J 2.2 and 1.7

    What’s New in MariaDB Connector/J 2.2 and 1.7 diego Dupin Tue, 11/14/2017 - 05:56

    We are pleased to announce the general availability (GA) of MariaDB Connector/J 2.2 and 1.7, the newest versions of MariaDB Connector/J. 

    As both new versions are fully compatible to their corresponding latest maintenance releases to support Java 6/7 and Java 8+, version 2.1.2 and 1.6.5 are the last maintenance releases for 2.1 and 1.6.

    New enhancements include:

    Pool Datasource

    There are now two different Datasources implementations:

    • MariaDbDataSource: The existing basic implementation. A new connection each time the getConnection() method is called.
    • MariaDbPoolDataSource: Connection pooling implementation. MariaDB Driver will keep a pool of connections and borrow connections when asked for it.

    Good framework already exists that can accomplish this job such as DBCP2, HikariCP, C3P0, apache, so why have another implementation? Here are some of the reasons: 

    • Reliability: When reusing a connection from pool, the connection must be like a "new freshly created" connection. Depending on connection state, frameworks may result executing multiple commands to reset state (Some frameworks even choose to skip some of those reset to avoid some performance impact). MariaDB has a dedicated command to refresh connection state permitting real reset (rollback remaining transaction, reset transaction isolation level, reset session variables, delete user variables, remove all PREPARE statement, ...) in one command.
    • Performance: The pool can save some information at the first connection, allowing faster creations when making the next connection.
    • Easy configuration: Solve some frequent issues, like server will close socket if not used after some time (wait_timeout default to 8h). Pool implementation avoids keeping a connection in a bad state. 

    The pool is implemented at connection level, which allows using pool for a particular use case by enabling pool using the connection string: "jdbc:mariadb://host/db?pool=true".

    Configuration example using Spring:

    public class DatabaseConfig {
        private String DB_PASSWORD;
        private String DB_URL;
        private String DB_USERNAME;
        public DataSource dataSource() throws SQLException {
            MariaDbPoolDataSource dataSource = new MariaDbPoolDataSource();
            return dataSource;


    Download the MariaDB Connector now and learn about the newest evolution of MariaDB Connector/J.


    Download Release Notes Knowledge Base


    We are pleased to announce the general availability (GA) of MariaDB Connector/J 2.2 and 1.7, the newest versions of MariaDB Connector/J.

    Login or Register to post comments

    by diego Dupin at November 14, 2017 10:56 AM

    Peter Zaitsev

    Common MongoDB Topologies

    MongoDB Topologies

    In this blog post, we’ll look at some common MongoDB topologies used in database deployments.

    The question of the best architecture for MongoDB will arise in your conversations between developers and architects. In this blog, we wanted to go over the main sharded and unsharded designs, with their pros and cons.

    We will first look at “Replica Sets.” Replica sets are the most basic form of high availability (HA) in MongoDB, and the building blocks for sharding. From there, we will cover sharding approaches and if you need to go that route.

    Replica Set

    Replica Set

    From the MongoDB manual:

    replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica sets provide redundancy and high availability, and are the basis for all production deployments.

    Short of sharding, this is the ideal way to run MongoDB. Things like high availability, failover and recovery become automated with no action typically needed. If you expect large growth or more than 200G of data, you should consider using this plus sharding to reduce your mean time to recovery on a restore from backup.


    • Elections happen automatically and unnoticed by application setup with retry
    • Rebuilding a new node, or adding an additional read-only node, is as easy as  “rs.add(‘hostname’)”
    • Can skip building indexes to improve write speed
    • Can have members
      • hidden in other geographic location
      • delayed replication
      • analytics nodes via taggings


    • Depending on the size of the oplog used, you can use  10-100+% more space to hold to change data for replication
    • You must scale up not out meaning more expensive hardware
    • Recovery using a sharded approach is faster than having is all on a single node ( parallelism)

    Flat Mongos (not load balanced)

    Flat Mongos

    This is one of MongoDB’s more suggested deployment designs. To understand why, we should talk about the driver and the fact that it supports a CSV list of mongos hosts for fail-over.

    You can’t distribute writes in a single replica set. Instead, they all need to go to the primary node. You can distribute reads to the secondaries using Read Preferences. The driver keeps track of what is a primary and what is a secondary and routes queries appropriately.

    Conceptually, the driver should have connections bucketed into the mongos they go to. This allowed the 3.0+ driver to be semi-stateless and ask any connection to a specific mongos to preform a getMore to that mongos. In theory, this allows slightly more concurrency. Realistically you only use one mongos, since this is only a fail-over system.


    • Mongos is on its own gear, so it will not run the application out of memory
    • If Mongos doesn’t respond, the driver “fails-over” to the next in the list
    • Can be put closer to the database or application depending on your network and sorting needs


    • You can’t use mongos in a list evenly, so it is only good for fail-over (not evenness) in most drivers. Please read specific drivers for support, and test thoroughly.

    Load Balanced (preferred if possible)

    Load Balanced

    According to the Mongo docs:

    You may also deploy a group of mongos instances and use a proxy/load balancer between the application and the mongos. In these deployments, you must configure the load balancer for client affinity so that every connection from a single client reaches the same mongos.

    This is the model used by platforms such as ObjectRocket. In this pattern, you move mongos nodes to their own tier but then put them behind a load-balancer. In this design, you can even out the use of mongos by using a least-connection system. The challenge, however, is new drivers have issues with getMores. By this we mean the getMore selects a new random connection, and the load balancer can’t be sure which mongos should get it. Thus it has a one in N (number of mongos) chance of selecting the right one, or getting a “Cursor Not Found” error.


    • Ability to have an even use of mongos
    • Mongos are separated from each other and the applications to prevent memory and CPU contention
    • You can easily remove or add mongos to help scale the layer without code changes
    • High availability at every level (multiple mongos, multiple configs, ReplSet for high availability and even multiple applications for app failures)


    • If batching is used, unless switched to an IP pinning algorithm (which loses evenness) you can get “Cursor Not Found” errors due to the wrong mongos getting getMore and bulk connector connections

    App-Centric Mongos

    Appcentric Mongos

    By and large, this is one of the most typical deployment designs for MongoDB sharding. In it, we have each application host talking to a mongos on the local network interface. This ensures there is very little latency to the application from the mongos.

    Additionally, this means if a mongos fails, at most its own host is affected instead of the wider range of all application hosts.


    • Local mongos on the loopback interface mean low to no latency
    • Limited scope of outage if this mongos fails
    • Can be geographically farther from the data storage in cases where you have a DR site


    • Mongos is a memory hog; you could steal from your application memory to support running it here
      • Made worse with large batches, many connections, and sorting
    • Mongos is single-threaded and could become a bottleneck for your application
    • It is possible for a slow network to cause bad decision making, including duplicate databases on different shards. The functional result is data writing intermittently to two locations, and a DBA must remediate that at some point (think MMM VIP ping pong issues)
    • All sorting and limits are applied on the application host. In cases where the sort uses an index this is OK, but if not indexed the entire result set must be held in memory by mongos and then sorted, then returned the limited number of results to the client. This is the typical cause of mongos OOM’s errors due to the memory issues listed before.


    The topologies are above cover many of the deployment needs for MongoDB environments. Hope this helps, and list any questions in the comments below.

    by David Murphy at November 14, 2017 10:34 AM

    November 13, 2017

    Peter Zaitsev

    Percona Live Open Source Database Conference 2018 Call for Papers Is Now Open!

    Percona Live

    Percona LiveAnnouncing the opening of the Percona Live Open Source Database Conference 2018 in Santa Clara, CA, call for papers. It will be open from now until December  22, 2017.

    Our theme is “Championing Open Source Databases,” with topics of MySQL, MongoDB and other open source databases, including PostgreSQL, time series databases and RocksDB. Sessions tracks include Developers, Operations and Business/Case Studies.

    We’re looking forward to your submissions! We want proposals that cover the many aspects and current trends of using open source databases, including design practices, application development, performance optimization, HA and clustering, cloud, containers and new technologies, as well as new and interesting ways to monitor and manage database environments.

    Describe the technical and business values of moving to or using open source databases. How did you convince your company to make the move? Was there tangible ROI? Share your case studies, best practices and technical knowledge with an engaged audience of open source peers.

    Possible topics include:

    • Application development. How are you building applications using open source databases to power the data layers? What languages, frameworks and data models help you to build applications that your customers love? Are you using MySQL, MongoDB, PostgreSQL, time series or other databases?  
    • Database performance. What database issues have you encountered while meeting new application and new workload demands? How did they affect the user experience? How did you address them? Are you using WiredTiger or a new storage engine like RocksDB? Have you moved to an in-memory engine? Let us know about the solutions you have found to make sure your applications can get data to users and customers.
    • DBaaS and PaaS. Are you using a Database as a Service (DBaaS) in the public cloud, or have you rolled out your own? Are you on AWS, Google Cloud, Microsoft Azure or RackSpace/ObjectRocket? Are you using a database in a Platform as a Service (PaaS) environment? Tell us how it’s going.
    • High availability. Are your applications a crucial part of your business model? Do they need to be available at all times, no matter what? What database challenges have you come across that impacted uptime, and how did you create a high availability environment to address them?
    • Scalability. Has scaling your business affected database performance, user experience or the bottom line? How are you addressing the database environment workload as your business scales? Let us know what technologies you used to solve issues.
    • Distributed databases. Are you moving toward a distributed model? Why? What is your plan for replication and sharding?
    • Observability and monitoring. How do we design open source database deployment with observability in mind? Are you using Elasticsearch or some other analysis tool? What tools are you using to monitor data? Grafana? Prometheus? Percona Monitoring and Management? How do you visualize application performance trends for maximum impact?
    • Container solutions. Do you use Docker, Kubernetes or other containers in your database environment? What are the best practices for using open source databases with containers and orchestration? Has it worked out for you? Did you run into challenges and how did you solve them?
    • Security. What security and compliance challenges are you facing and how are you solving them?
    • Migrating to open source databases. Did you recently migrate applications from proprietary to open source databases? How did it work out? What challenges did you face, and what obstacles did you overcome? What were the rewards?
    • What the future holds. What do you see as the “next big thing”? What new and exciting features just released? What’s in your next release? What new technologies will affect the database landscape? AI? Machine learning? Blockchain databases? Let us know what you see coming.

    The Percona Live Open Source Database Conference 2018 Call for Papers is open until December 22, 2017. We invite you to submit your speaking proposal for breakout, tutorial or lightning talk sessions. Share your open source database experiences with peers and professionals in the open source community by presenting a:

    • Breakout Session. Broadly cover a technology area using specific examples. Sessions should be either 25 minutes or 50 minutes in length (including Q&A).
    • Tutorial Session. Present a technical session that aims for a level between a training class and a conference breakout session. Encourage attendees to bring and use laptops for working on detailed and hands-on presentations. Tutorials will be three or six hours in length (including Q&A).
    • Lightning Talk. Give a five-minute presentation focusing on one key point that interests the open source community: technical, lighthearted or entertaining talks on new ideas, a successful project, a cautionary story, a quick tip or demonstration.

    Speaking at Percona Live is a great way to build your personal and company brands. If selected, you will receive a complimentary full conference pass!

    Submit your talks now.

    Tips for Submitting to Percona Live

    Include presentation details, but be concise. Clearly state:

    • Purpose of the talk (problem, solution, action format, etc.)
    • Covered technologies
    • Target audience
    • Audience takeaway

    Keep proposals free of sales pitches. The Committee is looking for case studies and in-depth technical talks, not ones that sound like a commercial.

    Be original! Make your presentation stand out by submitting a proposal that focuses on real-world scenarios, relevant examples, and knowledge transfer.

    Submit your proposals as soon as you can – the call for papers is open until December 22, 2017.

    by Colin Charles at November 13, 2017 06:09 PM

    November 11, 2017

    Valeriy Kravchuk

    Fun with Bugs #57 - On MySQL Bug Reports I am Subscribed to, Part I

    I've decided to stop reviewing MySQL Release Notes in this series, but it does not mean that I am not interested in MySQL bugs any more. At the moment I am subscribed to 91 active MySQL bugs reported by other MySQL users, and in this blog post I am going to present 15 of them, the most recently reported ones. I'd really want to see them fixed or at least properly processed as soon as possible.

    In some cases I am going to add my speculations on how the bug had better be handled, or maybe highlight some important details about it. It is not my job any more to process/"verify" any community bug reports for any kind of MySQL, but I did that for many years and I've spent more than 5 years "on the other side", being a member of Community, so in some cases I let myself to share some strong opinion on what may be done differently from the Oracle side.

    As a side note, I started to subscribe to MySQL bugs mostly after I left Oracle, as before that I got email notification about each and every change in every MySQL bug report ever created...

    Here is the list, starting from the most recent ones:
    • Bug #88422 - "MySQL 5.7 innodb purge thread get oldest readview could block other transaction". It is one of that bug reports without a test case from reporter. It is tempting to set it to "Verified" just "based on code review", as the code in 5.7 is quite obviously shows both holding the trx_sys->mutex and linear complexity of search depending on number of read views in the worst case (when most of them are closed):
      Get the oldest (active) view in the system.
      @return oldest view if found or NULL */

      MVCC::get_oldest_view() const
              ReadView*       view;


              for (view = UT_LIST_GET_LAST(m_views);
                   view != NULL;
                   view = UT_LIST_GET_PREV(m_view_list, view)) {

                      if (!view->is_closed()) {

      But probably current Oracle bugs verification rules do not let to just mark it as verified. After all, somebody will have to create a test case... So, my dear old friend Sinisa Milivojevic decided to try to force bug reporter to provide a test case instead of spending some time trying to create one himself. I am not going to blame him for that, why to try the easy way :) But I consider this his statement in the comment dated [10 Nov 16:21]:
      "... 5.7 methods holds no mutex what so ever..."
      a bit wrong, as we can see the mutex is acquired when get_oldest_view() method is called:
      MVCC::clone_oldest_view(ReadView* view)

              ReadView*       oldest_view = get_oldest_view();

              if (oldest_view == NULL) {
    • Bug #88381 - "Predicate cannot be pushed down "past" window function". Here bug reporter had provided enough hints for a test case. One can probably just check 'Handler%' status variables before and after query execution to come to the conclusion. Moreover, it seems Oracle developer,  Dag Wanvik, accepted this as a known limitation, but the bug still remains "Open" and nobody knows if it was copied to the internal bugs database, got prioritized and if any work on this is planned any time soon. We shell see. You may also want to monitor MDEV-10855.
    • Bug #88373 - "Renaming a column breaks replication from 5.7 to 8.0 because of impl. collation". This bug was quickly verified by Umesh Shastry. I expect a lot of "fun" for users upgrading to MySQL 8.0 when it becomes GA, especially in replication setups.
    • Bug #88328 - "Performance degradation with the slave_parallel_workers increase". There is no test case, just some general description and ideas about the root case when semi-sync replication is used. I expect this bug to stay "Open" for a long time, as it is a topic for a good research and blog posts like this one, that is, a work for real expert!
    • Bug #88223 - "Replication with no tmpdir space and InnoDB as tmp_storage_engine can break". Here we have clear and simple test case from Sveta Smirnova (no wonder, she also worked at bugs verification team in MySQL, Sun and Oracle). I hope Umesh will verify it soon. As a side note, it is explained (in the comments) elsewhere that InnoDB as internal_tmp_disk_storage_engine may not be the best possible option. We do not have this variable and do not plan to support InnoDB for internal temporary tables in MariaDB 10.2+.
    • Bug #88220 - "compressing and uncompressing InnoDB tables seems to be inconsistent". See also other, older bug reports mentioned there that are duplicates/closely related, but were not getting proper attention.
    • Bug #88150 - "'Undo log record is too big.' error occurring in very narrow range of str length". It was reported by my colleague Geoff Montee and is already fixed in recent versions of MariaDB (see MDEV-14051 for the details and some nice examples of gdb usage by a developer)!
    • Bug #88127 - "Index not used for 'order by' query with utf8mb4 character set". Here I am just curious when bugs like that would be caught up by Oracle QA before any public releases.
    • Bug #88071 - "An arresting Performance degradation when set sort_buffer_size=32M". here the test case is clear - just run sysbench oltp test at high concurrency with different values of sort_buffer_size. Still, Sinisa Milivojevic decided to explain when RAM limit may play a role instead of just showing how it works great (if it does) on any server with enough RAM... Let's see how this attempt to force bug reporter to work/explain more may end up...
    • Bug #87947 - "Optimizer chooses ref over range when access when range access is faster". Nice example of a case when optimizer trace may be really useful. Øystein Grøvlen kindly explained that "range access and ref access are not comparable costs". I wish we get better cost model for such cases in MySQL one day.
    • Bug #87837 - "MySQL 8 does not start after upgrade to 8.03". It is expected actually, and even somewhat documented in the release notes that MySQL 8.0.3 is not compatible to any older version. So, it is more like MySQL Installer (that I do not care much about) bug, but I still subscribed to it as yet another source of potential fun during further upgrade attempts.
    • Bug #87716 - "SELECT FOR UPDATE with BETWEEN AND gets row lock excessively". I think I already studied once why with IN() rows are locked differently by InnoDB comparing to BETWEEN that selects the same rows. But I'd like to know what's the Oracle's take on this, and I'd like to study this specific test case in details one day as well.
    • Bug #87670 - "Force index for group by is not always honored". Clear and simple test case, so no wonder it was immediately verified.
    • Bug #87621 - "Huge InnoDB slowdown when selecting strings without indexes ". I'd like to check with perf one day where the time is spent mostly during this test. For now I think this is a result of the way "long" data are stored on separate pages in InnoDB. What;'s interesting here is also a test case where R is used to generate data set.
    • Bug #87589 - "Documentation incorrectly states that LOAD DATA LOCAL INFILE does not use tmpdir". This was yet another report from my colleague Geoff Montee. lsof is your friend, maybe I have to talk about it one day at FOSDEM (call for papers is still open :) I like to find and follow bugs and missing details in MySQL manual, maybe because I would never be able to contribute to it as a writer directly...

    So, this list shows my typical recent interests related to MySQL bugs - mostly InnoDB, optimizer, replication problems, fine manual and just some fun details like the way some Oracle engineers try to avoid working extra hard while processing bugs... I am also happy to know that in some cases MariaDB is able to deliver fixes faster.

    by Valeriy Kravchuk ( at November 11, 2017 03:54 PM

    MariaDB AB

    Blog entry title

    Blog entry title mariadb.drupal Sat, 11/11/2017 - 05:39

    MariaDB Corporation is the testing process of combinations of testimonials process.

    MariaDB Corporation is the testing process of combinations of testimonials.



    Sat, 11/11/2017 - 09:03


    This is testimonials process.1



    Sat, 11/11/2017 - 09:12

    This is testing process

    This is testimonials process.



    Sat, 11/11/2017 - 09:15

    Hi Bryan

    I have fixed the issue.



    Sat, 11/11/2017 - 09:35

    Hello Bryan

    I am checking the functionality.



    Sat, 11/11/2017 - 09:36

    In reply to by mariadb.drupal


    This functionality is working fine.

    Login or Register to post comments

    by mariadb.drupal at November 11, 2017 10:39 AM

    Oli Sennhauser

    MariaDB master/master GTID based replication with keepalived VIP

    Some of our customers still want to have old-style MariaDB master/master replication clusters. Time goes by, new technologies appear but some old stuff still remains.

    The main problem in a master/master replication set-up is to make the service highly available for the application (applications typically cannot deal with more than one point-of-contact). This can be achieved with a load balancer (HAproxy, Galera Load Balancer (GLB), ProxySQL or MaxScale) in front of the MariaDB master/master replication cluster. But the load balancer by it-self should also become highly available. And this is typically achieved by a virtual IP (VIP) in front of one of the load balancers. To make operations of the VIP more handy the VIP is controlled by a service like keepalived or corosync.

    M/M with LB and keepalived

    Because I like simple solutions (I am a strong believer in the KISS principle) I thought about avoiding the load balancer in the middle and attach the VIP directly to the master/master replication servers and let them to be controlled by keepalived as well.

    M/M with keepalived

    Important: A master/master replication set-up is vulnerable to split-brain situations. Neither keepalived nor the master/master replication helps you to avoid conflicts and in any way to prevent this situation. If you are sensitive to split-brain situations you should look for Galera Cluster. Keepalived is made for stateless services like load balancers, etc. but not databases.

    Set-up a MariaDB master/master replication cluster

    Because most of the Linux distributions have a bit old versions of software delivered we use the MariaDB 10.2 repository from the MariaDB website:

    # /etc/yum.repos.d/MariaDB-10.2.repo 
    # MariaDB 10.2 CentOS repository list - created 2017-11-08 20:32 UTC
    name = MariaDB
    baseurl =

    Then we install the MariaDB server and start it:

    shell> yum makecache
    shell> yum install MariaDB-server MariaDB-client
    shell> systemctl start mariadb
    shell> systemctl enabled mariadb

    For the MariaDB master/master replication set-up configuration we use the following parameters:

    # /etc/my.cnf
    server_id                = 1           # 2 on the other node
    log_bin                  = binlog-m1   # binlog-m2 on the other node
    log_slave_updates        = 1
    gtid_domain_id           = 1           # 2 on the other node
    gtid_strict_mode         = On
    auto_increment_increment = 2
    auto_increment_offset    = 1           # 2 on the other node
    read_only                = On          # super_read_only for MySQL 5.7 and newer

    Then we close the master/master replication ring according to: Starting with empty server.

    mariadb> SET GLOBAL gtid_slave_pos = "";
    mariadb> CHANGE MASTER TO master_host="", master_user="replication"
                            , master_use_gtid=current_pos;
    mariadb> START SLAVE;

    Installing keepalived


    The next step is to install and configure keepalived. This can be done as follows:

    shell> yum install keepalived
    shell> systemctl enable keepalived

    Important: In my tests I got crashes and core dumps with keepalived which disappeared after a full upgrade of CentOS 7.

    Configuring keepalived

    The most important part is the keepalived configuration file:

    # /etc/keepalived/keepalived.conf
    global_defs {
      notification_email {
      notification_email_from root@master1   # master2 on the other node
      smtp_server localhost 25
      router_id MARIADB_MM
    # Health checks
    vrrp_script chk_mysql {
      script "/usr/sbin/pidof mysqld"
      weight 2     # Is relevant for the diff in priority
      interval 1   # every ... seconds
      timeout 3    # script considered failed after ... seconds
      fall 3       # number of failures for K.O.
      rise 1       # number of success for OK
    vrrp_script chk_failover {
      script "/etc/keepalived/"
      weight -4    # Is relevant for the diff in priority
      interval 1   # every ... seconds
      timeout 1    # script considered failed after ... seconds
      fall 1       # number of failures for K.O.
      rise 1       # number of success for OK
    # Main configuration
    vrrp_instance VI_MM_VIP {
      state MASTER           # BACKUP on the other side
      interface enp0s9       # private heartbeat interface
      priority 100           # Higher means: elected first (BACKUP: 99)
      virtual_router_id 42   # ID for all nodes of Cluster group
      debug 0                # 0 .. 4, seems not to work?
      unicast_src_ip   # Our private IP address
      unicast_peer {       # Peers private IP address
      # For keepalived communication
      authentication {
        auth_type PASS
        auth_pass Secr3t!
      # VIP to move around
      virtual_ipaddress {
    dev enp0s8   # public interface for VIP
      # Check health of local system. See vrrp_script above.
      track_script {
        # If File /etc/keepalived/failover is touched failover is triggered
        # Similar can be reached when priority is lowered followed by a reload
      # When node becomes MASTER this script is triggered
      notify_master "/etc/keepalived/ --user=root --password= --wait=yes --variable=read_only"
      # When node becomes SLAVE this script is triggered
      notify_backup "/etc/keepalived/ --user=root --password= --kill=yes --variable=read_only"
      # Possibly fault and stop should also call to be on the safe side...
      notify_fault "/etc/keepalived/ arg1 arg2"
      notify_stop "/etc/keepalived/ arg1 arg2"
      # ANY state transit is triggered
      notify /etc/keepalived/
      smtp_alert   # send notification during state transit

    With the command:

    shell> systemctl restart keepalived

    the service is started and/or the configuration is reloaded.

    The scripts we used in the configuration file are the following:

    # /etc/keepalived/
    TS=$(date '+%Y-%m-%d_%H:%M:%S')
    echo $TS $0 $@ >>${LOG}

    # /etc/keepalived/
    /usr/bin/stat /etc/keepalived/failover 2>/dev/null 1>&2
    if [ ${?} -eq 0 ] ; then
      exit 1
      exit 0

    To make MariaDB master/master replication more robust against replication problems we took the following (configurable) actions on the database side:

    Getting the MASTER role:

    • Waiting for catch-up replication
    • Make the MariaDB instance read/write

    Getting the BACKUP role:

    • Make the MariaDB instance read-only
    • Kill all open connections

    Testing scenarios

    The following scenarios where tested under load (

    • Intentional fail-over for maintenance:
      shell> touch /etc/keepalived/failover
      shell> rm -f /etc/keepalived/failover
    • Stopping keepalived:
      shell> systemctl stop keepalived
      shell> systemctl start keepalived
    • Stopping MariaDB node:
      shell> systemctl stop mariadb
      shell> systemctl start mariadb
    • Reboot server:
      shell> reboot
    • Simulation of split-brain:
      shell> ip link set enp0s9 down
      shell> ip link set enp0s9 up


    Problems we faced during set-up and testing were:

    • SElinux/AppArmor
    • Firewall

    Keepalived controlling 2 virtual IPs

    A second scenario we wanted to build is a MariaDB master/master GTID based replication cluster with 2 VIP addresses. This is to achieve either a read-only VIP and a read/write VIP or to have half of the load on one master and half of the load on the other master:

    M/M with keepalived and 2 VIPs

    For this scenario we used the same scripts but a slightly different keepalived configuration:

    # /etc/keepalived/keepalived.conf
    global_defs {
      notification_email {
      notification_email_from root@master1   # master2 on the other node
      smtp_server localhost 25
      router_id MARIADB_MM
    # Health checks
    vrrp_script chk_mysql {
      script "/usr/sbin/pidof mysqld"
      weight 2     # Is relevant for the diff in priority
      interval 1   # every ... seconds
      timeout 3    # script considered failed after ... seconds
      fall 3       # number of failures for K.O.
      rise 1       # number of success for OK
    vrrp_script chk_failover {
      script "/etc/keepalived/"
      weight -4    # Is relevant for the diff in priority
      interval 1   # every ... seconds
      timeout 1    # script considered failed after ... seconds
      fall 1       # number of failures for K.O.
      rise 1       # number of success for OK
    # Main configuration
    vrrp_instance VI_MM_VIP1 {
      state MASTER           # BACKUP on the other side
      interface enp0s9       # private heartbeat interface
      priority 100           # Higher means: elected first (BACKUP: 99)
      virtual_router_id 42   # ID for all nodes of Cluster group
      unicast_src_ip   # Our private IP address
      unicast_peer {       # Peers private IP address
      # For keepalived communication
      authentication {
        auth_type PASS
        auth_pass Secr3t!
      # VIP to move around
      virtual_ipaddress {
    dev enp0s8   # public interface for VIP
      # Check health of local system. See vrrp_script above.
      track_script {
      # ANY state transit is triggered
      notify /etc/keepalived/
      smtp_alert   # send notification during state transit
    vrrp_instance VI_MM_VIP2 {
      state BACKUP           # MASTER on the other side
      interface enp0s9       # private heartbeat interface
      priority 99            # Higher means: elected first (MASTER: 100)
      virtual_router_id 43   # ID for all nodes of Cluster group
      unicast_src_ip   # Our private IP address
      unicast_peer {       # Peers private IP address
      # For keepalived communication
      authentication {
        auth_type PASS
        auth_pass Secr3t!
      # VIP to move around
      virtual_ipaddress {
    dev enp0s8   # public interface for VIP
      # Check health of local system. See vrrp_script above.
      track_script {
      # ANY state transit is triggered
      notify /etc/keepalived/
      smtp_alert   # send notification during state transit

    by Shinguz at November 11, 2017 10:29 AM

    November 10, 2017

    Peter Zaitsev

    This Week in Data with Colin Charles 14: A Meetup in Korea and The Magic Quadrant

    Colin Charles

    Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

    We’re close to opening up the call for papers for Percona Live Santa Clara 2018 and I expect this to happen next week. We also have a committee all lined up and ready to vote on submissions.

    In other news, I’ve spent some time preparing for the Korean MySQL Power Group meetup to be held in Seoul this Saturday, 11 November 2017. This is a great opportunity for us to extend our reach in Asia. This meetup gathers together top DBAs from Internet companies that use MySQL and related technologies.

    Gartner has released their Magic Quadrant for Operational Database Management Systems 2017. Reprint rights have been given to several vendors, e.g. EnterpriseDB and Microsoft. I’m sure you can find other links. The Magic Quadrant features far fewer database vendors now, many have been dropped. What’s your take on it?


    This was a slow release week. Check out:

    Link List


    I look forward to feedback/tips via e-mail at or on Twitter @bytebot.

    by Colin Charles at November 10, 2017 11:03 AM

    Jean-Jerome Schmidt

    MySQL & MariaDB Database Backup Resources

    Most organizations do not realize they have a problem with database backups until they need to restore the data and find it’s not there or not in the form that they were expecting.

    The designated administrator managing the database environments must be prepared for situations where any failure may cause an impact to the availability, integrity, or usability of a database or application. Reacting to these failures is a key component of the administrator’s responsibilities and their ability to react correctly depends on whether they have a well-planned strategy for database backups and recovery.

    Pixar’s “Toy Story 2” famously almost never happened due a command line mis-run causing the movie to be deleted and an in-effective backup strategy in place. That movie went on to take in nearly $500 million dollars worldwide in box office… money that, without the fact that one team member made their own personal backup, may have never been made.

    ClusterControl provides you with sophisticated backup and failover features using a point-and-click interface to easily restore your data if something goes wrong and can be your DBA-sidekick when it comes to building an effective backup strategy. There are many aspects to consider though when building such a strategy.

    Here at Severalnines we have database experts who have written much about the topic and in this blog we will collect the top resources to help you build your own database backup strategy for MySQL and MariaDB databases more specifically.

    If you are running a MySQL or MariaDB environment our best resource for you is the free whitepaper “The DevOps Guide to Database Backups for MySQL and MariaDB.” The guide covers the two most popular backup utilities available for MySQL and MariaDB, namely mysqldump and Percona XtraBackup. It further covers topics such as how database features like binary logging and replication can be leveraged in backup strategies and provides best practices that can be applied to high availability topologies in order to make database backups reliable, secure and consistent.

    In addition to the whitepaper there are two webinars focused on backups that you can watch on-demand. “MySQL Tutorial - Backup Tips for MySQL, MariaDB & Galera Cluster” and “Become a MySQL DBA - Deciding on a Relevant Backup Solution.” Each of these webinars offer tips and best practices on building a backup plan and summarize much of the content that is available throughout our website.

    Here are our most popular and relevant blogs on the topic...

    Overview of Backup and Restores

    In the blog “Become a MySQL DBA - Backup and Restore” we provide a high-level overview of backups and restores when managing a MySQL environment. Included in the blog is an overview of different backup methodologies, overview of logical and physical backups, and some best practices and guidelines you can follow.

    The Impact of MySQL Storage Engines on Backups

    In the blog “The Choice of MySQL Storage Engine and its Impact on Backup Procedures” we discuss how the selection of different types of storage engines (like MyISAM, InnoDB, etc) can have an impact on your backup strategy.

    Building a Backup Strategy and Plan

    In our blog “mysqldump or Percona XtraBackup? Backup Strategies for MySQL Galera Cluster” we discuss the different options available to you when making your backup and restore plan with special focus on doing it in a way that does not affect performance.

    Making Sure You Perform a Good Backup

    In our blog “How to Perform Efficient Backups for MySQL and MariaDB” we discuss a number of ways to backup MySQL and MariaDB, each of which comes with pros and cons.

    Using ClusterControl for Backups

    In the blog “ClusterControl Tips & Tricks - Best Practices for Database Backups” we should how to effectively manage your backup plan using ClusterControl. With ClusterControl you can schedule logical or physical backups with failover handling and easily restore backups to bootstrap nodes or systems.

    Single Console for Your Entire Database Infrastructure
    Find out what else is new in ClusterControl

    Additional Blogs

    There are several more blogs that have been written over the years that can also aid you in ensuring your backups are performed successfully and efficiently. Here’s a list of them...

    Full Restore of a MySQL or MariaDB Galera Cluster from Backup

    Performing regular backups of your database cluster is imperative for high availability and disaster recovery. This blog post provides a series of best practices on how to fully restore a MySQL or MariaDB Galera Cluster from backup.

    Read the Blog

    What’s New in ClusterControl 1.4 - Backup Management

    This blog post covers the new backup features available in ClusterControl version 1.4.

    Read the Blog

    ClusterControl Tips & Tricks: Customizing your Database Backups

    ClusterControl follows some best practices to perform backups using mysqldump or Percona xtrabackup. Although these work for the majority of database workloads, you might still want to customize your backups. This blog shows you how.

    Read the Blog

    Architecting for Failure - Disaster Recovery of MySQL/MariaDB Galera Cluster

    Whether you use unbreakable private data centers or public cloud platforms, Disaster Recovery (DR) is indeed a key issue. This is not about copying your data to a backup site and being able to restore it, this is about business continuity and how fast you can recover services when disaster strikes.

    Read the Blog

    Using BitTorrent Sync to Transfer Database Backups Offsite

    BitTorrent Sync is a simple replication application providing encrypted bidirectional file transfers that can run behind NAT and is specifically designed to handle large files. By leveraging the simplicity of Bittorrent Sync, we can transfer backup files away from our cluster, enhancing the backups availability and reducing the cost of broken backup, where you can regularly verify your backups off-site.

    Read the Blog

    How to Clone Your Database

    If you are managing a production database, chances are high that you’ve had to clone your database to a different server than the production server. The basic method of creating a clone is to restore a database from a recent backup onto a different database server. Other methods include replicating from a source database while it is up, in which case it is important the original database be unaffected by any cloning procedure.

    Read the Blog

    Not Using MySQL? Here are some resources we have to help with other database technologies…

    Become a MongoDB DBA: MongoDB Backups

    This is our fifth post in the “Become a MongoDB DBA” blog series - how do you make a good backup strategy for MongoDB, what tools are available and what you should watch out for.

    Read the Blog

    Become a MongoDB DBA: Recovering your Data

    This is our sixth post in the “Become a MongoDB DBA” blog series - how do you recover MongoDB using a backup.

    Read the Blog

    Become a PostgreSQL DBA - Logical & Physical PostgreSQL Backups

    Taking backups is one of the most important tasks of a DBA - it is crucial to the availability and integrity of the data. Part of our Become a PostgreSQL DBA series, this blog post covers some of the backup methods you can use with PostgreSQL.

    Read the Blog

    by Severalnines at November 10, 2017 10:18 AM

    November 09, 2017

    Peter Zaitsev

    MySQL and Linux Context Switches

    Context Switches

    In this blog post, I’ll look at MySQL and Linux context switches and what is the normal number per second for a database environment.

    You might have heard many times about the importance of looking at the number of context switches to indicate if MySQL is suffering from the internal contention issues. I often get the question of what is a “normal” or “acceptable” number, and at what point should you worry about the number of context switches per second?

    First, let’s talk about what context switches are in Linux. This StackOverflow Thread provides a good discussion, with a lot of details, but basically it works like this:  

    The process (or thread in MySQL’s case) is running its computations. Sooner or later, it has to do some blocking operation: disk IO, network IO, block waiting on a mutex or yield. The execution switches to the other process, and this is called voluntary context switch.On the other hand, the process/thread may need to be preempted by the scheduler because it used an allotted amount of CPU time (and now other tasks need to run) or because it is required to run high priority task. This is called involuntary context switches. When all the process in the system are added together and totaled, this is the system-wide number of context switches reported (using, for example, vmstat):

    root@nuc2:~# vmstat 10
    procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
    r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
    17  0      0 12935036 326152 2387388    0    0     0     5     0      1  9  0 91  0  0
    20  0      0 12933936 326152 2387384    0    0     0     3 32228 124791 77 22  1  0  0
    17  0      0 12933348 326152 2387364    0    0     0    11 33212 124575 78 22  1  0  0
    16  0      0 12933380 326152 2387364    0    0     0    78 32470 126100 78 22  1  0  0

    This is a global number. In many cases, however, it is better to look at it as context switches per CPU logical core. This is because cores execute tasks independently. As such, they have mostly independent causes for context switches. If you have a large number of cores, there can be quite a difference:

    MySQL Context Switches

    The number of context switches per second on this system looks high (at more than 1,000,000). Considering it has 56 logical cores, however, it is only about 30,000 per second per logical core (which is not too bad).

    So how do we judge if the number of context switches is too high in your system? One answer is that it is too high if you’re wasting too much CPU on context switches. This brings up the question: how many context switches can the system handle if it is only doing context switches?

    It is easy to find this out!  

    Sysbench has a “threads” test designed specifically to measure this. For example:

    sysbench --thread-locks=128 --time=7200 --threads=1024 threads run

    Check the vmstat output or the Context Switches PMM graph:

    MySQL Context Switches 1

    We can see this system can handle up to 35 million context switches per second in total (or some 500K per logical CPU core on average).

    I don’t recommend using more than 10% of CPU resources on context switching, so I would try to keep the number of the context switches at no more than 50K per logical CPU core.

    Now let’s think about context switches from the other side: how many context switches do we expect to have at the very minimum for given load? Even if all the stars align and your query to MySQL doesn’t need any disk IO or context switches due to waiting for mutexes, you should expect at least two context switches: one to the client thread which processes the query and one for the query response sent to the client.    

    Using this logic, if we have 100,000 queries/sec we should expect 200,000 context switches at the very minimum.

    In the real world, though, I would not worry about contention being a big issue if you have less than ten context switches per query.

    It is worth noting that in MySQL not every contention results in a context switch. InnoDB implements its own mutexes and RW-locks, which often try to “spin” to wait for a resource to become available. This wastes CPU time directly rather than doing a context switch.


    • Look at the number of context switches per logical core rather than the total for easier-to-compare numbers
    • Find out how many context switches your system can handle per second, and don’t get too concerned if your context switches are no more than 10% of that number
    • Think about the number of context switches per query: the minimum possible is two, and values less than 10 make contention an unlikely issue
    • Not every MySQL contention results in a high number of context switches

    by Peter Zaitsev at November 09, 2017 07:50 PM

    Jean-Jerome Schmidt

    HAProxy: All the Severalnines Resources

    Load balancers are an essential component in MySQL and MariaDB database high availability; especially when making topology changes transparent to applications and implementing read-write split functionality.

    HAProxy is free, open source software that provides a high availability load balancer and proxy server for TCP and HTTP-based applications that spreads requests across multiple servers.

    ClusterControl provides support for deployment, configuration and optimization of HAProxy as well as for other popular load balancing and caching technologies for MySQL and MariaDB databases.

    Here are our top resources for HAProxy to get you started with this widely used technology.


    MySQL Load Balancing with HAProxy - Tutorial

    We have recently updated our tutorial on MySQL Load Balancing with HAProxy. Read about deployment and configuration, monitoring, ongoing maintenance, health check methods, read-write splitting, redundancy with VIP and Keepalived and more.

    Read More

    On-Demand Webinars

    How to deploy and manage HAProxy, MaxScale or ProxySQL with ClusterControl

    In this webinar we talk about support for proxies for MySQL HA setups in ClusterControl: how they differ and what their pros and cons are. And we show you how you can easily deploy and manage HAProxy, MaxScale and ProxySQL from ClusterControl during a live demo.

    Watch the replay

    How To Set Up SQL Load Balancing with HAProxy

    In this webinar, we cover the concepts around the popular open-source HAProxy load balancer, and shows you how to use it with your SQL-based database clusters.

    Watch the replay

    Performance Tuning of HAProxy for Database Load Balancing

    This webinar discusses the performance tuning basics for HAProxy and explains how to take advantage of some of the new features in 1.5, which was released in June 2014 after 4 years of development work.

    Watch the replay

    Introducing the Severalnines MySQL© Replication Blueprint

    The Severalnines Blueprint for MySQL Replication includes all aspects of a MySQL Replication topology with the ins and outs of deployment, setting up replication, monitoring, upgrades, performing backups and managing high availability using proxies as ProxySQL, MaxScale and HAProxy. This webinar provides an in-depth walk-through of this blueprint and explains how to make best use of it.

    Watch the replay

    Top Blogs

    HAProxy Connections vs MySQL Connections - What You Should Know

    Max connections determines the maximum number of connections to the database server. This can be set on both the database server, or the proxy in front of it. In this blog post, we’ll dive into HAProxy and MySQL maximum connections variables, and see how to get the best of both worlds.

    Read More

    SQL Load Balancing Benchmark - Comparing Performance of MaxScale vs HAProxy

    In a previous post, we gave you a quick overview of the MaxScale load balancer and walked through installation and configuration. We did some quick benchmarks using sysbench, a system performance benchmark that supports testing CPU, memory, IO, mutex and also MySQL performance. We will be sharing the results in this blog post.

    Read More

    Load balanced MySQL Galera setup - Manual Deployment vs ClusterControl

    Deploying a MySQL Galera Cluster with redundant load balancing takes a bit of time. This blog looks at how long it would take to do it manually vs using ClusterControl to perform the task.

    Read More

    Read-Write Splitting for Java Apps using Connector/J, MySQL Replication and HAProxy

    In this blog post, we will play around with Java and MySQL Replication to perform read-write splitting for Java Apps using Connector/J.

    Read More

    Single Console for Your Entire Database Infrastructure
    Find out what else is new in ClusterControl

    High availability read-write splitting with php-mysqlnd, MySQL Replication and HAProxy

    In this blog post, we explore the use of php-mysqlnd_ms with a PHP application (Wordpress) on a standard MySQL Replication backend.

    Read More

    Become a ClusterControl DBA: Making your DB components HA via Load Balancers

    There are various ways to retain high availability with databases. You can use Virtual IPs (VRRP) to manage host availability, you can use resource managers like Zookeeper and Etcd to (re)configure your applications or use load balancers/proxies to distribute the workload over all available hosts.

    Read More

    Wordpress Application Clustering using Kubernetes with HAProxy and Keepalived

    In this blog post, we’re going to play with Kubernetes application clustering and pods. We’ll use Wordpress as the application, with a single MySQL server. We will also have HAProxy and Keepalived to provide simple packet forwarding (for external network) with high availability capability.

    Read More

    How Galera Cluster Enables High Availability for High Traffic Websites

    This post gives an insight into how Galera can help to build HA websites.

    Read More

    by Severalnines at November 09, 2017 10:58 AM

    November 08, 2017

    MariaDB AB

    MariaDB Server 10.2 Now Available on Qualcomm Centriq™ 2400 Server Processor

    MariaDB Server 10.2 Now Available on Qualcomm Centriq™ 2400 Server Processor david_thompson_g Wed, 11/08/2017 - 00:35

    MariaDB Corporation is pleased to announce support for the 64-bit ARM Qualcomm Centriq™ 2400 server processor. The Centriq 2400 brings Qualcomm's ARM expertise to the data center server world, offering a high core count of 48 per physical CPU chip.

    MariaDB 10.2 Server for Centos 7 and Ubuntu 16 is available now.

    MariaDB's architectural support for thread per connection helps MariaDB scale extremely well on the Qualcomm Centric 2400 server processor providing near consistent increase in throughput through the core count. 

    MariaDB utilizes a custom fork of the widely available sysbench benchmarking utility for performance testing and evaluation. For the benchmarking test we use a data set with 1.2 million rows in either a single table or distributed across 24 tables. Each benchmark thread executes transactions of 1000 SELECT statements, each fetching a single row based on the PRIMARY KEY. Database caches are tuned to be able to hold the data set in memory. Hence the benchmark does not depend on disk speed but tests mostly cpu, memory bandwidth and software scalability (amount of serialized code in the path). My colleague Axel Schwenke ran the benchmark and found some very interesting results. Here we show the results at a system level, the blue bars indicate the test run where data is spread against 24 tables and red is 1 table:

    What you see is a near doubling in throughput as more query threads run all the way up to the core count (46 in our pre-production server) and then relatively flat throughput past that as threads increase.

    To show the near perfect scaling effect we introduce a throughput per active core metric. This is defined as system throughput by min (thread-count, core-count). When there are fewer threads than cores, some cores will be idle and don’t contribute to the system throughput. But if there are more threads than cores, then some threads must be serialized by the system scheduler and we won’t get any more throughput because all cores are busy anyway. Here are the results:

    We are very excited to see a CPU that scales out so well combined with very low power / heat consumption making it very well suited for dense deployment in the data center.  We look forward to continue to optimize MariaDB Server for the Qualcomm Centriq ARM64 architecture.

    MariaDB Corporation is pleased to announce support for the 64-bit ARM Qualcomm Centriq™ 2400 server processor. The Centriq 2400 brings Qualcomm's ARM expertise to the data center server world, offering a high core count of 48 per physical CPU chip.

    Login or Register to post comments

    by david_thompson_g at November 08, 2017 05:35 AM

    November 07, 2017

    Jean-Jerome Schmidt

    HAProxy Connections vs MySQL Connections - What You Should Know

    Having a load balancer or reverse proxy in front of your MySQL or MariaDB server does add a little bit of complexity to your database setup, which might lead to some, things behaving differently. Theoretically, a load balancer which sits in front of MySQL servers (for example an HAProxy in front of a Galera Cluster) should just act like a connection manager and distribute the connections to the backend servers according to some balancing algorithm. MySQL, on the other hand, has its own way of managing client connections. Ideally, we would need to configure these two components together so as to avoid unexpected behaviours, and narrow down the troubleshooting surface when debugging issues.

    If you have such setup, it is important to understand these components as they can impact the overall performance of your database service. In this blog post, we will dive into MySQL's max_connections and HAProxy maxconn options respectively. Note that timeout is another important parameter that we should know, but we are going to cover that in a separate post.

    MySQL's Max Connections

    The number of connections permitted to a MySQL server is controlled by the max_connections system variable. The default value is 151 (MySQL 5.7).

    To determine a good number for max_connections, the basic formulas are:


    **Variable innodb_additional_mem_pool_size is removed in MySQL 5.7.4+. If you are running in the older version, take this variable into account.


    By using the above formulas, we can calculate a suitable max_connections value for this particular MySQL server. To start the process, stop all connections from clients and restart the MySQL server. Ensure you only have the minimum number of processes running at that particular moment. You can use 'mysqladmin' or 'SHOW PROCESSLIST' for this purpose:

    $ mysqladmin -uroot -p processlist
    | Id     | User | Host      | db   | Command | Time | State | Info             | Progress |
    | 232172 | root | localhost | NULL | Query   |    0 | NULL  | show processlist |    0.000 |
    1 row in set (0.00 sec)

    From the above output, we can tell that only one user is connected to the MySQL server which is root. Then, retrieve the available RAM (in MB) of the host (look under 'available' column):

    $ free -m
                  total        used        free      shared  buff/cache   available
    Mem:           3778        1427         508         148        1842        1928
    Swap:          2047           4        2043

    Just for the info, the 'available' column gives an estimate of how much memory is available for starting new applications, without swapping (only available in kernel 3.14+).

    Then, specify the available memory, 1928 MB in the following statement:

    mysql> SELECT ROUND((1928 - (ROUND((@@innodb_buffer_pool_size + @@innodb_log_buffer_size + @@query_cache_size + @@tmp_table_size + @@key_buffer_size) / 1024 / 1024))) / (ROUND(@@read_buffer_size + @@read_rnd_buffer_size + @@sort_buffer_size + @@thread_stack + @@join_buffer_size + @@binlog_cache_size) / 1024 / 1024)) AS 'Possible Max Connections';
    | Possible Max Connections |
    |                      265 |

    **Variable innodb_additional_mem_pool_size is removed in MySQL 5.7.4+. If you are running in the older version, take this variable into account.

    From this example, we can have up to 265 MySQL connections simultaneously according to the available RAM the host has. It doesn't make sense to configure a higher value than that. Then, append the following line inside MySQL configuration file, under the [mysqld] directive:

    max_connections = 265

    Restart the MySQL service to apply the change. When the total simultaneous connections reaches 265, you would get a "Too many connections" error when trying to connect to the mysqld server. This means that all available connections are in use by other clients. MySQL actually permits max_connections+1 clients to connect. The extra connection is reserved for use by accounts that have the SUPER privilege. So if you face this error, you should try to access the server as a root user (or any other SUPER user) and look at the processlist to start the troubleshooting.

    HAProxy's Max Connections

    HAProxy has 3 types of max connections (maxconn) - global, defaults/listen and default-server. Assume an HAProxy instance configured with two listeners, one for multi-writer listening on port 3307 (connections are distributed to all backend MySQL servers) and another one is single-writer on port 3308 (connections are forwarded to a single MySQL server):

        maxconn 2000 #[a]
        maxconn 3 #[b]
    listen mysql_3307
        maxconn 8 #[c]
        balance leastconn
        default-server port 9200 maxqueue 10 weight 10 maxconn 4 #[d]
        server db1 check
        server db2 check
        server db3 check
    listen mysql_3308
        default-server port 9200 maxqueue 10 weight 10 maxconn 5 #[e]
        server db1 check
        server db2 check backup #[f]

    Let’s look at the meaning of some of the configuration lines:

    global.maxconn [a]

    The total number of concurrent connections that are allowed to connect to this HAProxy instance. Usually, this value is the highest value of all. In this case, HAProxy will accept a maximum of 2000 connections at a time and distribute them to all listeners defined in the HAProxy process, or worker (you can run multiple HAProxy processes using nbproc option).

    HAProxy will stop accepting connections when this limit is reached. The "ulimit-n" parameter is automatically adjusted to this value. Since sockets are considered equivalent to files from the system perspective, the default file descriptors limit is rather small. You will probably want to raise the default limit by tuning the kernel for file descriptors.

    defaults.maxconn [b]

    Defaults maximum connections value for all listeners. It doesn't make sense if this value is higher than global.maxconn.

    If "maxconn" line is missing under the "listen" stanza (listen.maxconn), the listener will obey this value. In this case, mysql_3308 listener will get maximum of 3 connections at a time. To be safe, set this value equal to global.maxconn, divided by the number of listeners. However, if you would like to prioritize other listeners to have more connections, use listen.maxconn instead.

    listen.maxconn [c]

    The maximum connections allowed for the corresponding listener. The listener takes precedence over defaults.maxconn if specified. It doesn't make sense if this value is higher than global.maxconn.

    For a fair distribution of connections to backend servers like in the case of a multi-writer listener (mysql_3307), set this value as listen.default-server.maxconn multiply by the number of backend servers. In this example, a better value should be 12 instead of 8 [c]. If we chose to stick with this configuration, db1 and db2 are expected to receive a maximum of 3 connections each, while db3 will receive a maximum of 2 connections (due to leastconn balancing), which amounts to 8 connections in total. It won't hit the limit as specified in [d].

    For single-writer listener (mysql_3308) where connections should be allocated to one and only one backend server at a time, set this value to be the same or higher than listen.default-server.maxconn.

    listen.default-server.maxconn [d][e]

    This is the maximum number of connections that every backend server can receive at a time. It doesn't make sense if this value is higher than listen.maxconn or defaults.maxconn. This value should be lower or equal to MySQL's max_connections variable. Otherwise, you risk exhausting the connections to the backend MySQL server, especially when MySQL's timeout variables are configured lower than HAProxy's timeouts.

    In this example, we've set each MySQL server to only get a maximum of 4 connections at a time for multi-writer Galera nodes [d]. While the single-writer Galera node will get a maximum of 3 connections at a time, due to the limit that applies from [b]. Since we specified "backup" [f] to the other node, the active node will at once get all 3 connections allocated to this listener.

    The above explanation can be illustrated in the following diagram:

    To sum up the connections distribution, db1 is expected to get a maximum number of 6 connections (3 from 3307 + 3 from 3308). The db2 will get 3 connections (unless if db1 goes down, where it will get additional 3) and db3 will stick to 2 connections regardless of topology changes in the cluster.

    Connection Monitoring with ClusterControl

    With ClusterControl, you can monitor MySQL and HAProxy connection usage from the UI. The following screenshot provides a summary of the MySQL connection advisor (ClusterControl -> Performance -> Advisors) where it monitors the current and ever used MySQL connections for every server in the cluster:

    For HAProxy, ClusterControl integrates with HAProxy stats page to collect metrics. These are presented under the Nodes tab:

    From the above screenshot, we can tell that each backend server on multi-writer listener gets a maximum of 8 connections. 4 concurrent sessions are running. These are highlighted in the top red square, while the single-writer listener is serving 2 connections and forwarding them to a single node respectively.


    Configuring the maximum connections for HAProxy and MySQL server is important to ensure good load distribution to our database servers, and protect the MySQL servers from overloading or exhausting its connections.

    by ashraf at November 07, 2017 11:27 AM

    November 02, 2017

    Peter Zaitsev

    MySQL vs. MariaDB: Reality Check

    MySQL vs. MariaDB

    MySQL vs. MariaDBIn this blog, we’ll provide a comparison between MySQL vs. MariaDB (including Percona Server for MySQL).


    The goal of this blog post is to evaluate, at a higher level, MySQL, MariaDB and Percona Server for MySQL side-by-side to better inform the decision making process. It is largely an unofficial response to published comments from the MariaDB Corporation.

    It is worth noting that Percona Server for MySQL is a drop-in compatible branch of MySQL, where Percona contributes as much as possible upstream. MariaDB Server, on the other hand, is a fork of MySQL 5.5. They cherry-picked MySQL features, and don’t guarantee drop-in compatibility any longer.

    MySQL Percona Server for MySQL* MariaDB Server
    Protocols MySQL protocol over port 3306, X Protocol over port 33060 MySQL protocol over port 3306, X Protocol over port 33060 MySQL protocol, MariaDB Server extensions
    Community –
    Source Code
    Open Source Open Source Open Source
    Community – Development Open Source, contributions via signing the Oracle Contributor Agreement (OCA) Open Source Open Source, contributions via the new BSD license or signing the MariaDB Contributor Agreement (MCA)
    Community – Collaboration Mailing list, forums, bugs system Mailing list, forums, bugs system (Jira, Launchpad) Mailing list, bugs system (Jira), IRC channel
    Core –
    MySQL replication with GTID MySQL replication with GTID MariaDB Server replication, with own GTID, compatible only if MariaDB Server is a slave to MySQL, not vice versa
    Core –
    MySQL Router (GPLv2) ProxySQL (GPLv3) MariaDB MaxScale (Business Source License)
    Core –
    Standard Standard Standard, with extra engines like SPIDER/CONNECT that offer varying levels of support
    Tool –
    MySQL Workbench for Microsoft Windows, macOS, and Linux MySQL Workbench for Microsoft Windows, macOS, and Linux Webyog’s SQLYog for Microsoft Windows (MySQL Workbench notes an incompatible server)
    Tool –
    MySQL Enterprise Monitor Percona Monitoring & Management (PMM) (100% open source) Webyog’s Monyog
    Scalability –
    Client Connections
    MySQL Enterprise Threadpool Open Source Threadpool with support for priority tickets Open Source Threadpool
    Scalability –
    MySQL Group Replication MySQL Group Replication, Percona XtraDB Cluster (based on a further engineered Galera Cluster) MariaDB Enterprise Cluster (based on Galera Cluster)
    Security –
    Tablespace data-at-rest encryption. Amazon KMS, Oracle Vault Enterprise Edition Tablespace data-at-rest encryption with Keyring Vault plugin Tablespace and table data-at-rest encryption. Amazon KMS, binlog/redo/tmp file with Aria tablespace encryption
    Security –
    Data Masking
    ProxySQL data masking ProxySQL data masking MariaDB MaxScale data masking
    Security –
    MySQL Enterprise Firewall ProxySQL Firewall MariaDB MaxScale Firewall
    Security –
    MySQL Enterprise Audit Plugin Percona Audit Plugin (OSS) MariaDB Audit Plugin (OSS)
    Analytics No ClickHouse MariaDB ColumnStore
    SQL –
    Common Table Expressions
    In-development for MySQL 8.0 (now a release candidate) In-development for MySQL 8.0 (now a release candidate) Present in MariaDB Server 10.2
    SQL –
    Window Functions
    In-development for MySQL 8.0 (now a release candidate) In-development for MySQL 8.0 (now a release candidate) Present in MariaDB Server 10.2
    Temporal –
    Log-based rollback
    No No In development for MariaDB Server 10.3
    Temporal – system versioned tables No No In development for MariaDB Server 10.3
    JSON JSON Data type, 21 functions JSON Data type, 21 functions No JSON Data Type, 26 functions
    client connectors
    C (libmysqlclient), Java, ODBC, .NET, Node.js, Python, C++, mysqlnd for PHP C (libmysqlclient), Java, ODBC, .NET, Node.js, Python, C++, mysqlnd for PHP C (libmariadbclient), Java, ODBC
    Usability – CJK Language support Gb18030, ngram & MeCab for InnoDB full-text search Gb18030, ngram & MeCab for InnoDB full-text search No
    Monitoring – PERFORMANCE
    Thorough instrumentation in 5.7, sys schema included Thorough instrumentation in 5.7, sys schema included Instrumentation from MySQL 5.6, sys schema not included
    Security – Password authentication sha256_password (with caching_sha2_password in 8.0) sha256_password (with caching_sha2_password in 8.0) ed25519 (incompatible with sha256_password)
    Security –
    Secure out of the box
    validate_password on by default, to choose a strong password at the start validate_password on by default, to choose a strong password at the start No
    Usability – Syntax differences EXPLAIN FOR CONNECTION <thread_id> EXPLAIN FOR CONNECTION <thread_id> SHOW EXPLAIN FOR <thread_id>
    Optimiser –
    Optimiser Tracing
    Yes Yes No
    Optimiser –
    Optimiser Hints
    Yes Yes No
    DBA –
    Super readonly mode
    Yes Yes No
    Security – Password expiry Yes Yes No
    Security – Password last changed? Password lifetime? Yes Yes No
    Yes Yes No
    Security – ACCOUNT LOCK/UNLOCK Yes Yes No
    Usability – Query Rewriting Yes Yes No
    GIS – GeoJSON &
    GeoHash functionality
    Yes Yes Incomplete
    Security – mysql_ssl_rsa_setup Yes Yes No (setup SSL connections manually)
    MySQL Utilities Yes Yes No
    Backup locks No (in development for 8.0) Yes No
    Usability – InnoDB memcached interface Yes Yes No

    *Note. Third-party software (such as ProxySQL and ClickHouse) used in conjunction with Percona Server for MySQL is not necessarily covered by Percona Support services.

    To get a higher level view of what Percona Server for MySQL offers compared to MySQL, please visit: Percona Server Feature Comparison. Read this for a higher level view of compatibility between MariaDB Server and MySQL written by MariaDB Corporation.

    Open Community

    MariaDB Server undoubtedly has an open community, with governance mixed between MariaDB Foundation and MariaDB Corporation. There are open developer meetings on average about twice per year, two mailing lists (one for developers and users), an IRC channel and an open JIRA ticket system that logs bugs and feature requests.

    Percona Server for MySQL also has an open community. Developer meetings are not open to general contributors, but there is a mailing list, an IRC channel and two systems – Launchpad and JIRA – for logging bugs and feature requests.

    MySQL also has an open community where developer meetings are also not open to general contributors. There are many mailing lists, there are a few IRC channels and there is the MySQL bugs system. The worklogs are where the design for future releases happens, and these are opened up when their features are fully developed and  source-code-pushed.

    From a source code standpoint, MySQL makes pushes to Github when a release is made; whereas open source development happens for Percona Server for MySQL and MariaDB Server on Github.

    Feature development on MySQL continues in leaps and bounds, and Oracle has been an excellent steward of MySQL. Please refer to The Complete List of Features in 5.7, as well as The Unofficial MySQL 8 Optimiser Guide.

    Linux distributions have chosen MariaDB Server 5.5, and some have chosen MariaDB Server 10.0/10.1 when there was more backward compatibility to MySQL 5.5/5.6. It is the “default” MySQL in many Linux distributions (such as Red Hat Enterprise Linux, SUSE and Debian). However, Ubuntu still believes that when you ask for MySQL you should get it (and that is what Ubuntu ships).

    One of the main reasons Debian switched was due to the way Oracle publishes updates for security issues. They are released as a whole quarterly as Critical Patch Updates, without much detail about individual fixes. This is a policy that is unlikely to change, but has had no adverse effects on distribution.

    All projects actively embrace contributions from the open community. MariaDB Server does include contributions like the MyRocks engine developed at Facebook, but so does Percona Server for MySQL. Oracle accepts contributions from a long list of contributors, including Percona. Please see Licensing information for MySQL 5.7 as an example.

    A Shared Core Engine

    MariaDB Server has differed from MySQL since MySQL 5.5. This is one reason why you don’t get version numbers that follow the MySQL scheme. It is also worth noting that features are cherry-picked at merge time, because the source code has diverged so much since then.

    As the table below shows, it took Percona Server for MySQL over four months to get a stable 5.5 release based on MySQL 5.5, while it took MariaDB Server one year and four months to get a stable 5.5 release based on MySQL 5.5. Percona Server for MySQL 5.6 and 5.7 are based on their respective MySQL versions.

    MySQL Percona Server for MySQL MariaDB Server
    3 December 2010 5.5.8 GA
    28 April 2011 5.5.11-20.2 GA
    11 April 2012 5.5.23 GA
    5 February 2013 5.6.10 GA
    7 October 2013 5.6.13-61.0 GA
    31 March 2014 10.0.10 GA
    17 October 2015 10.1.8 GA
    21 October 2015 5.7.9 GA
    23 February 2016 5.7.10-3 GA
    23 May 2017 10.2.6 GA


    MySQL is currently at 8.0.3 Release Candidate, while MariaDB Server is at 10.3.2 Alpha as of this writing.

    MariaDB Server is by no means a drop-in replacement for MySQL. The risk of moving to MariaDB Server if you aren’t using newer MySQL features may be minimal, but the risk of moving out of MariaDB Server to MySQL is very prevalent. Linux distributions like Debian already warn you of this.

    MySQL vs. MariaDB

    The differences are beyond just default configuration options. Some features, like time-delayed replication that were present in MySQL since 2013, only make an appearance in MariaDB Server in 2017! (Refer to the MariaDB Server 10.2 Overview for more.) However, it is also worth noting some features such as multi-source replication appeared in MariaDB Server 10.0 first, and only then came to MySQL 5.7.


    MySQL and MariaDB Server have a storage engine interface, and this is how you access all engines, including the favored InnoDB/Percona XtraDB. It is worth noting that Percona XtraDB was the default InnoDB replacement in MariaDB Server 5.1, 5.2, 5.3, 5.5, 10.0 and 10.1. But in MariaDB Server 10.2, the InnoDB of choice is upstream MySQL.

    Stock MySQL has provided several storage engines beyond just InnoDB (the default) and MyISAM. You can find out more information about 5.7 Supported Engines.

    Percona Server for MySQL includes a modified MEMORY storage engine, ships Percona XtraDB as the default InnoDB and also ships TokuDB and MyRocks (currently experimental). MyRocks is based on the RocksDB engine, and both are developed extensively at Facebook.

    MariaDB Server includes many storage engines, beyond the default InnoDB. MyISAM is modified with segmented key caches, the default temporary table storage engine is Aria (which is a crash-safe MyISAM), the FederatedX engine is a modified FEDERATED engine, and there are more: CONNECT, Mroonga, OQGRAPH, Sequence, SphinxSE, SPIDER, TokuDB and of course MyRocks.

    Storage engines have specific use cases, and have different levels of feature completeness. You should thoroughly evaluate a storage engine before choosing it. We believe that over 90% of installations are fine with just InnoDB or Percona XtraDB. Percona TokuDB is another engine that users who need compression could use. We naturally expect more usage in the MyRocks sphere going forward.


    MariaDB ColumnStore is the MariaDB solution to analytics and using a column-based store. It is a separate download and product, and not a traditional storage engine (yet). It is based on the now defunct InfiniDB product.

    At Percona, we are quite excited by ClickHouse. We also have plenty of content around it. There is no MySQL story around this.

    High Availability

    High Availability is an exciting topic in the MySQL world, considering the server itself has been around for over 22 years. There are so many solutions out there, and some have had evolution as well.

    MySQL provides MySQL Cluster (NDBCLUSTER) (there is no equivalent in the MariaDB world). MySQL also provides group replication (similar to Galera Cluster). Combined with the proxy MySQL Router, and the mysqlsh for administration (part of the X Protocol/X Dev API), you can also get MySQL InnoDB Cluster.

    We benefit from the above at Percona, but also put lots of engineering work to make Percona XtraDB Cluster.

    MariaDB Server only provides Galera Cluster.


    While we don’t want to compare the proprietary MySQL Enterprise Firewall, MariaDB’s recommendation is the proprietary, non-open source MariaDB MaxScale (it uses a Business Source License). We highly recommend the alternative, ProxySQL.

    When it comes to encryption, MariaDB Server implements Google patches to provide complete data at rest encryption. This supports InnoDB, XtraDB and Aria temporary tables. The log files can also be encrypted (not present in MySQL, which only allows tablespace encryption and not log file encryption).

    When it comes to attack prevention, ProxySQL should offer everything you need.

    MySQL Enterprise provides auditing, while MariaDB Server provides an audit plugin as well as an extension to the audit interface for user filtering. Percona Server for MySQL has an audit plugin that sticks to the MySQL API, yet provides user filtering and controls the ability to audit (since auditing is expensive). Streaming to syslog is supported by the audit plugins from Percona and MariaDB.

    Supporting Ecosystem and Tools

    Upgrading from MySQL to MariaDB Server should be a relatively simple process (as stated above). If you want to upgrade away from MariaDB Server to MySQL, you may face hassles. For tools, see the following table:

    Purpose MySQL Percona Server for MySQL MariaDB Server
    Monitoring MySQL Enterprise Monitor Percona Monitoring & Management (PMM) (100% open source) Webyog Monyog
    Backup MySQL Enterprise Backup Percona XtraBackup MariaDB Backup (fork of Percona XtraBackup)
    SQL Management MySQL Workbench MySQL Workbench Webyog SQLyog
    Load Balancing & Routing MySQL Router ProxySQL MariaDB MaxScale
    Database Firewall MySQL Enterprise Firewall ProxySQL MariaDB MaxScale


    Enterprise Database Compatibility

    MariaDB Server today has window functions and common table expressions (CTEs). These appeared in MariaDB Server 10.2. MySQL 8 is presently in release candidate status and also has similar functionality.

    Looking ahead, MariaDB Server 10.3 also includes an Oracle SQL_MODE and a partial PL/SQL parser. This is to aid migration from Oracle to MariaDB Server.

    MariaDB Server 10.2 also has “flashback”, developed at Alibaba, to help with log-based rollback using the binary log.


    Percona sees healthy competition in the MySQL ecosystem. We support all databases in the ecosystem: MySQL, MariaDB Server and Percona Server for MySQL. Our focus is to provide alternatives to proprietary parts of open source software. Percona has a strong operations focus on compatibility, application scalability, high availability security and observability. We also support many additional tools within the ecosystem, and love integrating and contributing to open source code.

    For example, Percona Monitoring and Management (PMM) includes many open source tools like Prometheus, Consul, Grafana, Orchestrator and more. We have made the de facto open source hot backup solution for MySQL, MariaDB Server and Percona Server for MySQL (called Percona XtraBackup). We continue to maintain and extend useful tools for database engineers and administrators in Percona Toolkit. We make Percona XtraDB Cluster safe for deployment out of the box. We have invested in a write-optimized storage engine, TokuDB, and now continue to work with making MyRocks better.

    We look forward to supporting your deployments of MySQL or MariaDB Server, whichever option is right for you! If you need assistance on migrations between servers, or further information, don’t hesitate to contact your friendly Percona sales associate.

    by Colin Charles at November 02, 2017 05:55 PM

    November 01, 2017

    MariaDB AB

    MariaDB Server 10.2.10 now available

    MariaDB Server 10.2.10 now available dbart Wed, 11/01/2017 - 11:04

    The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.2.10. See the release notes and changelog for details and visit to download.

    Download MariaDB Server 10.2.10

    Release Notes Changelog What is MariaDB 10.2?

    The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.2.10. See the release notes and changelog for details.

    Login or Register to post comments

    by dbart at November 01, 2017 03:04 PM

    Peter Zaitsev

    Percona Server for MongoDB 3.2.17-3.8 Is Now Available

    Percona Server for MongoDB 3.4

    Percona Server for MongoDB 3.2Percona announces the release of Percona Server for MongoDB 3.2.17-3.8 on October 31, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

    Percona Server for MongoDB is an enhanced, open-source, fully compatible, highly scalable, zero-maintenance downtime database that supports the MongoDB v3.2 protocol and drivers. It extends MongoDB with MongoRocksPercona Memory Engine and PerconaFT storage engine, as well as enterprise-grade features like External Authentication, Audit Logging, Profiling Rate Limiting, and Hot Backup at no extra cost. The software requires no changes to MongoDB applications or code.

    NOTE: The PerconaFT storage engine is deprecated as of 3.2. It is no longer supported and isn’t available in higher version releases.

    This release is based on MongoDB 3.2.17 and does not include any additional changes.

    The Percona Server for MongoDB 3.2.17-3.8 release notes are available in the official documentation.

    by Alexey Zhebel at November 01, 2017 12:37 PM

    MariaDB Foundation

    MariaDB 10.2.10 and MariaDB 10.0.33 now available

    The MariaDB project is pleased to announce the availability of MariaDB 10.2.10 and MariaDB 10.0.33. See the release notes and changelogs for details. Download MariaDB 10.2.10 Release Notes Changelog What is MariaDB 10.2? MariaDB APT and YUM Repository Configuration Generator Download MariaDB 10.0.33 Release Notes Changelog What is MariaDB 10.0? MariaDB APT and YUM Repository […]

    The post MariaDB 10.2.10 and MariaDB 10.0.33 now available appeared first on

    by Ian Gilfillan at November 01, 2017 06:01 AM

    October 31, 2017

    Peter Zaitsev

    MySQL Dashboard Improvements in Percona Monitoring and Management 1.4.0

    In this blog post, I’ll walk through some of the improvements to the Percona Monitoring and Management (PMM) MySQL dashboard in release 1.4.0.

    As the part of Percona Monitoring and Management development, we’re constantly looking for better ways to visualize information and help you to spot and resolve problems faster. We’ve made some updates to the MySQL dashboard in the 1.4.0 release. You can see those improvements in action in our Percona Monitoring and Management Demo Site: check out the MySQL Overview and MySQL InnoDB Metrics dashboards.

    MySQL Client Thread Activity

    Percona Monitoring and Management 1

    One of the best ways to characterize a MySQL workload is to look at the number of MySQL server-client connections (Threads Connected). You should compare this number to how many of those threads are actually doing something on the server side (Threads Running), rather than just sitting idle waiting for a client to send the next request.

    MySQL can handle thousands of connected threads quite well. However, many threads (hundred) running concurrently often increases query latency. Increased internal contention can make the situation much worse.

    The problem with those metrics is that they are extremely volatile – one second you might have a lot of threads connected and running, and then none. This is especially true when some stalls on the MySQL level (or higher) causes pile-ups.

    To provide better insight, we now show Peak Threads Connected and Peak Threads Running to help easily spot such potential pile-ups, as well as Avg Threads Running. These stats allow you look at a high number of threads connected and running to see if it there are just minor spikes (which tend to happen in many systems on a regular basis), or something more prolonged that warrants deeper investigation.

    To simplify it even further: Threads Running spiking for a few seconds is OK, but spikes persisting for 5-10 seconds or more are often signs of problems that are impacting users (or problems about to happen).

    InnoDB Logging Performance

    Percona Monitoring and Management 2

    Since I wrote a blog post about Choosing MySQL InnoDB Log File Size, I thought it would be great to check out how long the log file space would last (instead of just looking at how much log space is written per hour). Knowing how long the innodb_log_buffer_size lasts is also helpful for tuning this variable, in general.

    This graph shows you how much data is written to the InnoDB Log Files, which helps to understand your disk bandwidth consumption. It also tells you how long it will take to go through your combined Redo Log Space and InnoDB Log Buffer Size (at this rate).

    As I wrote in the blog post, there are a lot of considerations for choosing the InnoDB log file size, but having enough log space to accommodate all the changes for an hour is a good rule of thumb. As we can see, this system is close to full at around 50 minutes.

    When it comes to innodb_log_buffer_sizeeven if InnoDB is not configured to flush the log at every transaction commit, it is going to be flushed every second by default. This means 10-15 seconds is usually good enough to accommodate the spikes. This system has it set at about 40 seconds (which is more than enough).

    InnoDB Read-Ahead

    Percona Monitoring and Management 3

    This graph helps you understand how InnoDB Read-Ahead is working out, and is a pretty advanced graph.

    In general, Innodb Read-Ahead is not very well understood. I think in most cases it is hard to tell if it is helping or hurting the current workload in its current configuration.

    The for Read-Ahead in any system (not just InnoDB) is to pre-fetch data before it is really needed (in order to reduce latency and improve performance). The risk, however, is pre-fetching data that isn’t needed. This is wasteful.

    InnoDB has two Read-Ahead options: Linear Read-Ahead (designed to speed up workloads that have physically sequential data access) and Random Read-Ahead (designed to help workloads that tend to access the data in the same vicinity but not in a linear order).

    Due to potential overhead, only Linear Read-Ahead is enabled by default. You need to enable Random Read-Ahead separately if you want to determine its impact on your workload

    Back to the graph in question: we show a number of pages pre-fetched by Linear and Random Read-Aheads to confirm if these are even in use with your workload. We show Number of Pages Fetched but Never Accessed (evicted without access) – shown as both the number of pages and as a percent of pages. If Fetched but Never Accessed is more than 30% or so, Read-Ahead might be producing more waste instead of helping your workload. It might need tuning.

    We also show the portion of IO requests that InnoDB Read-Ahead served, which can help you understand the portion of resources spent on InnoDB Read-Ahead

    Due to the timing of how InnoDB increments counters, the percentages of IO used for Read-Ahead and pages evicted without access shows up better on larger scale graphs.


    I hope you find these graphs helpful. We’ll continue making Percona Monitoring and Management more helpful for troubleshooting database systems and getting better performance!

    by Peter Zaitsev at October 31, 2017 08:08 PM

    Henrik Ingo

    impress.js HowTo: Slides over a background image

    A common and IMO cool way to create impress.js presentations, is to use some large background image for the entire presentation, then layout each slide over it. One of my first impress.js presentation was Selling Open Source 101 for Oscon 101. The presentation is inside a picture of a woman selling all kinds of stuff in a bazaar.

    Next week I will present something about EC2 at HighLoad++ conference, and my presentation is flying over some clouds, of course.

    read more

    by hingo at October 31, 2017 01:46 PM

    Jean-Jerome Schmidt

    How to Stop or Throttle SST Operation on a Galera Cluster

    State Snapshot Transfer (SST) is one of the two ways used by Galera to perform initial syncing when a node is joining a cluster, until the node is declared as synced and part of the “primary component”. Depending on the dataset size and workload, SST could be lightning fast, or an expensive operation which will bring your database service down on its knees.

    SST can be performed using 3 different methods:

    • mysqldump
    • rsync (or rsync_wan)
    • xtrabackup (or xtrabackup-v2, mariabackup)

    Most of the time, xtrabackup-v2 and mariabackup are the preferred options. We rarely see people running on rsync or mysqldump in production clusters.

    The Problem

    When SST is initiated, there are several processes triggered on the joiner node, which are executed by the "mysql" user:

    $ ps -fu mysql
    UID         PID   PPID  C STIME TTY          TIME CMD
    mysql    117814 129515  0 13:06 ?        00:00:00 /bin/bash -ue /usr//bin/wsrep_sst_xtrabackup-v2 --role donor --address --socket /var/lib/mysql/mysql.sock --datadir
    mysql    120036 117814 15 13:06 ?        00:00:06 innobackupex --no-version-check --tmpdir=/tmp/tmp.pMmzIlZJwa --user=backupuser --password=x xxxxxxxxxxxxxx --socket=/var/lib/mysql/mysql.sock --galera-inf
    mysql    120037 117814 19 13:06 ?        00:00:07 socat -u stdio TCP:
    mysql    129515      1  1 Oct27 ?        01:11:46 /usr/sbin/mysqld --wsrep_start_position=7ce0e31f-aa46-11e7-abda-56d6a5318485:4949331

    While on the donor node:

    mysql     43733      1 14 Oct16 ?        03:28:47 /usr/sbin/mysqld --wsrep-new-cluster --wsrep_start_position=7ce0e31f-aa46-11e7-abda-56d6a5318485:272891
    mysql     87092  43733  0 14:53 ?        00:00:00 /bin/bash -ue /usr//bin/wsrep_sst_xtrabackup-v2 --role donor --address --socket /var/lib/mysql/mysql.sock --datadir /var/lib/mysql/  --gtid 7ce0e31f-aa46-11e7-abda-56d6a5318485:2883115 --gtid-domain-id 0
    mysql     88826  87092 30 14:53 ?        00:00:05 innobackupex --no-version-check --tmpdir=/tmp/tmp.LDdWzbHkkW --user=backupuser --password=x xxxxxxxxxxxxxx --socket=/var/lib/mysql/mysql.sock --galera-info --stream=xbstream /tmp/tmp.oXDumYf392
    mysql     88827  87092 30 14:53 ?        00:00:05 socat -u stdio TCP:

    SST against a large dataset (hundreds of GBytes) is no fun. Depending on the hardware, network and workload, it may take hours to complete. Server resources may be saturated during the operation. Despite throttling is supported in SST (only for xtrabackup and mariabackup) using --rlimit and --use-memory options, we are still exposed to a degraded cluster when you are running out of majority active nodes. For example, if you are unlucky enough to find yourself with only one out of three nodes running. Therefore, you are advised to perform SST during quiet hours. You can, however, avoid SST by taking some manual steps, as described in this blog post.

    Stopping an SST

    Stopping an SST needs to be done on both the donor and the joiner nodes. The joiner triggers SST after determining how big the gap is when comparing the local Galera seqno with cluster's seqno. It executes the wsrep_sst_{wsrep_sst_method} command. This will be picked by the chosen donor, which will start streaming out data to the joiner. A donor node has no capabilities of refusing to serve snapshot transfer, once selected by Galera group communication, or by the value defined in wsrep_sst_donor variable. Once the syncing has started and you want to revert the decision, there is no single command to stop the operation.

    The basic principle when stopping an SST is to:

    • Make the joiner look dead from a Galera group communication point-of-view (shutdown, fence, block, reset, unplug cable, blacklist, etc)
    • Kill the SST processes on the donor

    One would think that killing the innobackupex process (kill -9 {innobackupex PID}) on the donor would be enough, but that is not the case. If you kill the SST processes on donor (or joiner) without fencing off the joiner, Galera still can see the joiner as active and will mark the SST process as incomplete, thus respawning a new set of processes to continue or start over again. You will be back to square one. This is the expected behaviour of /usr/bin/wsrep_sst_{method} script to safeguard SST operation which is vulnerable to timeouts (e.g., if it is long-running and resource intensive).

    Let's look at an example. We have a crashed joiner node that we would like to rejoin the cluster. We would start by running the following command on the joiner:

    $ systemctl start mysql # or service mysql start

    A minute later, we found out that the operation is too heavy at that particular moment, and decided to postpone it later during low traffic hours. The most straightforward way to stop an xtrabackup-based SST method is by simply shutting down the joiner node, and kill the SST-related processes on the donor node. Alternatively, you can also block the incoming ports on the joiner by running the following iptables command on the joiner:

    $ iptables -A INPUT -p tcp --dport 4444 -j DROP
    $ iptables -A INPUT -p tcp --dport 4567:4568 -j DROP

    Then on the donor, retrieve the PID of SST processes (list out the processes owned by "mysql" user):

    $ ps -u mysql
       PID TTY          TIME CMD
    117814 ?        00:00:00 wsrep_sst_xtrab
    120036 ?        00:00:06 innobackupex
    120037 ?        00:00:07 socat
    129515 ?        01:11:47 mysqld

    Finally, kill them all except the mysqld process (you must be extremely careful to NOT kill the mysqld process on the donor!):

    $ kill -9 117814 120036 120037

    Then, on the donor MySQL error log, you should notice the following line appearing after ~100 seconds:

    2017-10-30 13:24:08 139722424837888 [Warning] WSREP: Could not find peer: 42b85e82-bd32-11e7-87ae-eff2b8dd2ea0
    2017-10-30 13:24:08 139722424837888 [Warning] WSREP: 1.0 ( State transfer to -1.-1 (left the group) failed: -32 (Broken pipe)

    At this point, the donor should return to the "synced" state as reported by wsrep_local_state_comment and the SST process is completely stopped. The donor is back to its operational state and is able to serve clients in full capacity.

    For the cleanup process on the joiner, you can simply flush the iptables chain:

    $ iptables -F

    Or simply remove the rules with -D flag:

    $ iptables -D INPUT -p tcp --dport 4444 -j DROP
    $ iptables -D INPUT -p tcp --dport 4567:4568 -j DROP

    The similar approach can be used with other SST methods like rsync, mariabackup and mysqldump.

    Throttling an SST (xtrabackup method only)

    Depending on how busy the donor is, it's a good approach to throttle the SST process so it won't impact the donor significantly. We've seen a number of cases where, during catastrophic failures, users were desperate to bring back a failed cluster as a single bootstrapped node, and let the rest of the members catch up later. This attempt reduces the downtime from the application side, however, it creates additional burden on this “one-node cluster”, while the remaining members are still down or recovering.

    Xtrabackup can be throttled with --throttle=<rate of IO/sec> to simply limit the number of IO operation if you are afraid that it will saturate your disks, but this option is only applicable when running xtrabackup as a backup process, not as an SST operator. Similar options are available with rlimit (rate limit) and can be combined with --use-memory to limit the RAM usage. By setting up values under [sst] directive inside the MySQL configuration file, we can ensure that the SST operation won't put too much load on the donor, even though it can take longer to complete. On the donor node, set the following:


    More details on the Percona Xtrabackup SST documentation page.

    However, there is a catch. The process could be so slow that it will never catch up with the transaction logs that InnoDB is writing, so SST might never complete. Generally, this situation is very uncommon, unless if you really have a very write-intensive workload or you allocate very limited resources to SST.


    SST is critical but heavy, and could potentially be a long-running operation depending on the dataset size and network throughput between the nodes. Regardless of the consequences, there are still possibilities to stop the operation so we can have a better recovery plan at a better time.

    by ashraf at October 31, 2017 01:08 PM

    October 30, 2017

    MariaDB AB

    MariaDB Server 10.0.33 now available

    MariaDB Server 10.0.33 now available dbart Mon, 10/30/2017 - 14:12

    The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.0.33. See the release notes and changelog for details and visit to download.

    Download MariaDB Server 10.0.33

    Release Notes Changelog What is MariaDB 10.0?

    The MariaDB project is pleased to announce the immediate availability of MariaDB Server 10.0.33. See the release notes and changelog for details.

    Login or Register to post comments

    by dbart at October 30, 2017 06:12 PM

    Peter Zaitsev

    Percona XtraDB Cluster 5.7.19-29.22-3 is now available

    Percona XtraDB Cluster 5.7

    Percona XtraDB Cluster 5.7Percona announces the release of Percona XtraDB Cluster 5.7.19-29.22-3 on October 27, 2017. Binaries are available from the downloads section or our software repositories.

    NOTE: You can also run Docker containers from the images in the Docker Hub repository.

    Percona XtraDB Cluster 5.7.19-29.22-3 is now the current release, based on the following:

    All Percona software is open-source and free.

    Fixed Bugs

    • Added access checks for DDL commands to make sure they do not get replicated if they failed without proper permissions. Previously, when a user tried to perform certain DDL actions that failed locally due to lack of privileges, the command could still be replicated to other nodes, because access checks were performed after replication.This vulnerability is identified as CVE-2017-15365.

    Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

    by Alexey Zhebel at October 30, 2017 12:29 PM

    Percona XtraDB Cluster 5.6.37-26.21-3 is Now Available

    Percona XtraDB Cluster 5.7

    Percona XtraDB Cluster 5.6.34-26.19Percona announces the release of Percona XtraDB Cluster 5.6.37-26.21-3 on October 27, 2017. Binaries are available from the downloads section or our software repositories.

    Percona XtraDB Cluster 5.6.37-26.21-3 is now the current release, based on the following:

    All Percona software is open-source and free.

    Fixed Bugs

    • Added access checks for DDL commands to make sure they do not get replicated if they failed without proper permissions. Previously, when a user tried to perform certain DDL actions that failed locally due to lack of privileges, the command could still be replicated to other nodes, because access checks were performed after replication.This vulnerability is identified as CVE-2017-15365.

    Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

    by Alexey Zhebel at October 30, 2017 12:27 PM

    October 27, 2017

    Peter Zaitsev

    This Week in Data with Colin Charles 12: Open Source Summit Europe and Open Source Entrepreneur Network

    Colin Charles

    Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

    This week was exciting from a Percona exposure standpoint. We were at Open Source Summit Europe. I gave two talks and participated in a panel, as the co-located event for the Open Source Entrepreneur Network happened on the last day as well. We had a booth, and it was great to hang out and talk with my colleagues Dorothée Wuest and Dimitri Vanoverbeke as well as all the attendees that popped by.


    Link List


    I look forward to feedback/tips via e-mail at or on Twitter @bytebot.

    by Colin Charles at October 27, 2017 08:23 PM