Planet MariaDB

December 06, 2019


ClusterControl Takes the Spotlight as Top IT Management Software; Chosen as Rising Star of 2019 by B2B Review Platform

ClusterControl is all about delivering robust, open source, database management for the IT management needs of our clients. This goal drives us every day, so much so that it lead us to receive two awards recently from CompareCamp: the Rising Star of 2019 Award and the Great User Experience Award.

CompareCamp is a B2B Review Platform that delivers credible SaaS reviews and updated technology news from industry experts. Thousands of users rely on CompareCamp reviews that detail the pros and cons of software from different industries. 

ClusterControl was given the Great User Experience Award because it effectively boosted users’ rate of productivity through highly secured tools for real-time monitoring, failure detection, load balancing, data migrating, and automated recovery. Our dedicated features for node recovery, SSL encryption, and performance reporting received raves from experts.

We also received the Rising Star of 2019 Award due to our initial review and being a highly recommended IT management software by CompareCamp. 

To read the full review, please visit CompareCamp.

by fwlymburner at December 06, 2019 10:45 AM

December 05, 2019


How to Install ClusterControl to a Custom Location

ClusterControl consists of a number of components and one of the most important ones is the ClusterControl UI. This is the web application through which the user interacts with other backend ClusterControl components like controller, notification, web-ssh module and cloud module. Each component is packaged independently with its own name, therefore easier for potential issues to be fixed and delivered to the end user. For more info on ClusterControl components, check out the documentation page.

In this blog post, we are going to look into ways to customize our ClusterControl installation, especially the ClusterControl UI which by default will be located under /var/www/html (default document root of Apache). Note that it's recommended to host ClusterControl on a dedicated server where it can use all the default paths which will simplify ClusterControl maintenance operations.

Installing ClusterControl

For a fresh installation, go to our Download page to get the installation link. Then, start installing ClusterControl using the installer script as root user:

$ whoami
$ wget
$ chmod 755 install-cc
$ ./install-cc

Follow the installation wizard accordingly and the script will install all dependencies, configure ClusterControl components and start them up. The script will configure Apache 2.4, and use the package manager to install ClusterControl UI which by default located under /var/www/html.


Once ClusterControl is installed into its default location, we can then move the UI directories located under /var/www/html/clustercontrol and /var/www/html/cmon into somewhere else. Let's prepare the new path first.

Suppose we want to move the UI components to a user directory under /home. Firstly, create the user. In this example, the user name is "cc":

$ useradd -m cc

The above command will automatically create a home directory for user "cc", under /home/cc. Then, create the necessary directories for Apache usage for this user. We are going to create a directory called "logs" for Apache logs, "public_html" for Apache document root of this user and the "www" as a symbolic link to the public_html:

$ cd /home/cc
$ mkdir logs
$ mkdir public_html
$ ln -sf public_html www

Make sure all of them are owned by Apache:

$ chown apache:apache logs public_html

To allow Apache process to access public_html under user cc, we have to allow global read to the home directory of user cc:

$ chmod 755 /home/cc

We are now good to move stuff.

Customizing the Path

Stop ClusterControl related services and Apache:

$ systemctl stop httpd # RHEL/CentOS
$ systemctl stop apache2 # Debian/Ubuntu
$ systemctl stop cmon cmon-events cmon-ssh cmon-cloud

We basically have two options in moving the directory into the user's directory:

  1. Move everything from /var/www/html into /home/cc/public_html.
  2. Create a symbolic link from /var/www/html/clustercontrol to /home/cc/public_html (recommended).

If you opt for option #1, simply move the ClusterControl UI directories into the new path, /home/cc/public_html:

$ mv /var/www/html/clustercontrol /home/cc/public_html/
$ mv /var/www/html/cmon /home/cc/public_html/

Make sure the ownership is correct:

$ chown -R apache:apache /home/cc/public_html # RHEL/CentOS
$ chown -R www-data:www-data /home/cc/public_html # Debian/Ubuntu

However, there is a drawback since ClusterControl UI package will always get extracted under /var/www/html. This means if you upgrade the ClusterControl UI via package manager, the new content will be available under /var/www/html. Refer to "Potential Issues" section further down for more details.

If you choose option #2, which is the recommended way, you just need to create a symlink (link reference to another file or directory) under the user's public_html directory for both directories. When an upgrade happens, the DEB/RPM postinst script will replace the existing installation with the updated version under /var/www/html. To do a symlink, simply:

$ ln -sf /var/www/html/clustercontrol /home/cc/public_html/clustercontrol
$ ln -sf /var/www/html/cmon /home/cc/public_html/cmon

Another step is required for option #2, where we have to allow Apache to follow symbolic links outside of the user's directory. Create a .htaccess file under /home/cc/public_html and add the following line:

# /home/cc/public_html/.htaccess
Options +FollowSymlinks -SymLinksIfOwnerMatch

Open Apache site configuration file at /etc/httpd/conf.d/s9s.conf (RHEL/CentOS) or /etc/apache2/sites-enabled/001-s9s.conf (Debian/Ubuntu) using your favourite text editor and modify it to be as below (pay attention on lines marked with ##):

<VirtualHost *:80>
    ServerName  ## 

    DocumentRoot /home/cc/public_html  ##

    ErrorLog /home/cc/logs/error.log  ##
    CustomLog /home/cc/logs/access.log combined  ##

    # ClusterControl SSH & events
    RewriteEngine On
    RewriteRule ^/clustercontrol/ssh/term$ /clustercontrol/ssh/term/ [R=301]
    RewriteRule ^/clustercontrol/ssh/term/ws/(.*)$ ws://$1 [P,L]
    RewriteRule ^/clustercontrol/ssh/term/(.*)$$1 [P]
    RewriteRule ^/clustercontrol/sse/events/(.*)$$1 [P,L]

    # Main Directories
    <Directory />
            Options +FollowSymLinks
            AllowOverride All

    <Directory /home/cc/public_html>  ##
            Options +Indexes +FollowSymLinks +MultiViews
            AllowOverride All
            Require all granted


The similar modifications apply to the HTTPS configuration at /etc/httpd/conf.d/ssl.conf (RHEL/CentOS) or /etc/apache2/sites-enabled/001-s9s-ssl.conf (Debian/Ubuntu). Pay attention to lines marked with ##:

<IfModule mod_ssl.c>
        <VirtualHost _default_:443>

               ServerName  ##
               ServerAdmin ##

               DocumentRoot /home/cc/public_html  ##

                # ClusterControl SSH & events

                RewriteEngine On
                RewriteRule ^/clustercontrol/ssh/term$ /clustercontrol/ssh/term/ [R=301]
                RewriteRule ^/clustercontrol/ssh/term/ws/(.*)$ ws://$1 [P,L]
                RewriteRule ^/clustercontrol/ssh/term/(.*)$$1 [P]
                RewriteRule ^/clustercontrol/sse/events/(.*)$$1 [P,L]

                <Directory />
                        Options +FollowSymLinks
                        AllowOverride All

                <Directory /home/cc/public_html>  ##
                        Options +Indexes +FollowSymLinks +MultiViews
                        AllowOverride All
                        Require all granted

                SSLEngine on
                SSLCertificateFile /etc/pki/tls/certs/s9server.crt # RHEL/CentOS
                SSLCertificateKeyFile /etc/pki/tls/private/s9server.key # RHEL/CentOS
                SSLCertificateKeyFile /etc/ssl/certs/s9server.crt # Debian/Ubuntu
                SSLCertificateKeyFile /etc/ssl/private/s9server.key # Debian/Ubuntu

                <FilesMatch "\.(cgi|shtml|phtml|php)$">
                                SSLOptions +StdEnvVars

                <Directory /usr/lib/cgi-bin>
                                SSLOptions +StdEnvVars
                BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown

Restart everything:

$ systemctl restart httpd
$ systemctl restart cmon cmon-events cmon-ssh cmon-cloud

Consider the IP address of the ClusterControl is and domain resolves to, you can access ClusterControl UI via one of the following URLs:

You should see the ClusterControl login page at this point. The customization is now complete.

Potential Issues

Since the package manager simply executes the post-installation script during package upgrade, the content of the new ClusterControl UI package (the actual package name is clustercontrol.x86_64) will be extracted into /var/www/html (it's hard coded inside post installation script). The following is what would happen:

$ ls -al /home/cc/public_html # our current installation
$ ls -al /var/www/html # empty
$ yum upgrade clustercontrol -y
$ ls -al /var/www/html # new files are extracted here

Therefore, if you use symlink method, you may skip the following additional steps.

To complete the upgrade process, one has to replace the existing installation under the custom path with the new installation manually. First, perform the upgrade operation:

$ yum upgrade clustercontrol -y # RHEL/CentOS
$ apt upgrade clustercontrol -y # Debian/Ubuntu

Move the existing installation to somewhere safe. We will need the old bootstrap.php file later on:

$ mv /home/cc/public_html/clustercontrol /home/cc/public_html/clustercontrol_old

Move the new installation from the default path /var/www/html into user's document root:

$ mv /var/www/html/clustercontrol /home/cc/public_html

Copy bootstrap.php from the old installation to the new one:

$ mv /home/cc/public_html/clustercontrol_old/bootstrap.php /home/cc/public_html/clustercontrol

Get the new version string from bootstrap.php.default:

$ grep CC_UI_VERSION /home/cc/public_html/clustercontrol/bootstrap.php.default
define('CC_UI_VERSION', '');

Update the new version string for CC_UI_VERSION value inside bootstrap.php using your favourite text editor:

$ vim /home/cc/public_html/clustercontrol/bootstrap.php

Save the file and the upgrade is now complete.

That's it, folks. Happy customizing!


by ashraf at December 05, 2019 04:18 PM

December 04, 2019


Best Practices for Archiving Your Database in the Cloud

With the technology available today there is no excuse for failing to recover your data due to lack of backup policies or understanding of how vital it is to take backups as part of your daily, weekly, or monthly routine. Database backups must be taken on a regular basis as part of your overall disaster recovery strategy. 

The technology for handling backups has never been more efficient and many best practices have been adopted (or bundled) as part of a certain database technology or service that offers it.

To some extent, people still don’t understand how to store data backups efficiently, nor do they understand the difference between data backups versus archived data. 

Archiving your data provides many benefits, especially in terms of efficiency such as storage costs, optimizing data retrieval, data facility expenses, or payroll for skilled people to maintain your backup storage and its underlying hardware. In this blog, we'll look at the best practices for archiving your data in the cloud.

Data Backups vs Data Archives

For some folks in the data tech industry, these topics are often confusing, especially for newcomers.

Data backups are backups that are taken from your physical and raw data to be stored locally or offsite which can be accessed in case of emergency or data recovery. It is used to restore data in case it is lost, corrupted or destroyed. 

Data archived, on the other hand, are data (or can still be a backup data) but are no longer used or less critical to your business needs such as stagnant data, yet it's still not obsolete and has value on it. This means that data that is to be stored is still important but that doesn’t need to be accessed or modified frequently (if at all).  Its purpose can be among these:

  • reduce its primary consumption so it can be stored on a low-performant machines since data stored on it doesn't mean it has to be retrieved everyday or immediately.
  • Retain cost-efficiency on maintaining your data infrastructure
  • Worry-less for an overgrowing data especially those data that are old or data that are infrequently changed from time-to-time.
  • Avoid large expenses when maintaining backup appliances or software that are integrated into the backup system.
  • As a requirement to meet regulatory standards like HIPAA, PCI-DSS or GDPR to store legacy data or data that they are required to keep

While for databases, it has a very promising benefits which are,

  • it helps reduce data complexity especially when data grows drastically but archiving your data helps maintain the size of your data set.
  • It helps your daily, weekly, or monthly data backups perform optimally because it has less data since it doesn't need to include processing the old or un-useful data. It's un-useful since it's not a useless data but it's just un-useful for daily or frequent needs.
  • It helps your queries perform efficiently and optimization results can be consistent at times since it doesn't require to scan large and old data.
  • data storage space can be managed and controlled accordingly based on its data retention and policy.

Data archived facility is not necessarily has to be the same power and resources as the data backups storage have. Tape drives, magnetic disk, or optical drives can be used for data archiving opposes. While it's  purpose of storing the data means its infrequently accessed or shall be accessed not very soon but still can be accessible when it's needed.

Additionally, people involved in data archival requires to identify what archived data means. Data archives are those data that are not reproducible or data that can be re-generated or self-generated. If the data stored in the database are records that can be a result of a mathematical determinants or calculation that are predictably reproducible, then these can be re-generated if needed. This can be excluded for your data archival purposes.

Data Retention Standards

It's true that pruning your data records stored in your database and moving it to your archives has some great benefits. It doesn't mean, however, that you are free to do this as it depends on your business requirements. In fact, different countries have laws that require you to follow (or at least implement) based on the regulation. You will need to determine what archived data mean to your business application or what data are infrequently accessed. 

For example, Healthcare providers are commonly required (depending on its country of origin) to retain patient's information for long periods of time. While in Finance, the rules depend on the specific country. What data you need to retain should be verified so you can safely prune it for archival purposes and then store it in a safe, secure place.

The Data Life-Cycle

Data backups and data archives are usually taken alongside through a backup life-cycle process. This life-cycle process has to be defined within your backup policy. Most backup policies have to undergo the process as listed below...

  • it has the process defined on which it has to be taken (daily, weekly, monthly),
  • if it has to be a full backup or an incremental backup,
  • the format of the backup if it has to be compressed or stored in an archived file format, 
  • if the data has to be encrypted or not, 
  • its designated location to store the backup (locally stored on the same machine or over the local network), 
  • its secondary location to store the backup (cloud storage, or in a collo), 
  • and it's data retention on how old your data can be present until its end-of-life or destroyed. 

What Applications Need Data Archiving?

While everyone can enjoy the benefits of data archiving, there are certain fields that regularly practice this process for managing and maintaining their data. 

Government institutions fall into this criteria. Security and public safety (such as video surveillance, threats to personal, residential, social, and business safety) require that this information be retained. This type of data must be stored securely for years to come for forensic and investigative purposes.

Digital Media companies often have to store large amounts of content of their data and these files are often very large in size. Digital Libraries also has to store tons of data for research or information for public use. 

Healthcare providers, including insurance, are required to retain large amounts of information on their patients' for many years. Certainly, data can grow quickly and it can affect the efficiency of the database when it's not maintained properly. 

Cloud Storage Options For Your Archived Data

The oop cloud companies are actively competing to get you great features to store your archived data in the cloud. It starts with a low cost price and offers flexibility to access your data off-site. Cloud storage is a useful and reliable off-site data storage for data backups and data archiving purposes, especially because it's very cost efficient. You don't need to maintain large amounts of data. No need to maintain your hardware and storage services in your local site or primary site. It's less expensive, as well, in handling electricity billings. 

These points are important as you might not need to access your archived date in real-time. On certain occasions, especially when a recovery or investigation has to be done, you might require access to your data abruptly. For some businesses, they offer their customers the ability to access their old data, but you have to wait for hours or days before they can provide the access to download the archived data.

For example, in AWS, they have AWS S3 Glacier which offers a great flexibility. In fact, you can store your data via S3, setup a life-cycle policy and define the end of your data when it will be destroyed. Check out the documentation on How Do I Create a Lifecycle Policy for an S3 Bucket?. The great thing with AWS S3 Glacier is that, it is highly flexible. See their waterfall model below,

Image Courtesy of Amazon's Documentation "Transitioning Objects Using Amazon S3 Lifecycle".

At this level, you can store your backups to S3 and let the life-cycle process defined in that bucket handle the data archival purposes. 

If you're using GCP (Google Cloud Platform), they also offer similar approach. Check out their documentation about Object Lifecycle Management. GCP uses the TTL (or Time-to-Live) approach for retaining objects stored in their Cloud Storage. The great thing with the GCP offering is that they have Archival Cloud Storage which offers Nearline and Coldline storage types. 

Coldline is ideal for data that are infrequently modified or access in a year. Where as with the Nearline storage type, it's more frequent (a monthly rate or at least modified once a month) but possibly multiple times throughout the year. Your data stored in a life-cycle basis can be accessed in a sub-second and that could be fast.

With Microsoft Azure, its offerings are plain and simple. They offer the same thing as GCP and AWS does and it offers you to move your archived data into hot or cool tiers. You maybe able to prioritize your requested archived data when needed to the hot or cool tiers but comes with a price compared to a standard request. Checkout their documentation on Rehydrate blob data from the archive tier.

Overall, this provides hassle free when storing your archived data to the cloud. You may need to define your requirements and of course cost involved when determining which cloud would you need to avail.

Best Practices for Your Archived Data in the Cloud

Since we have tackled the differences of data backups and archived data (or data archives), and some of the top cloud vendor offerings, let's take a list of what's the best practices you must have when storing to the cloud.

  • Identify the type of data to be archived. As stated earlier, data backups is not data archived but your data backups can be a data archived. However, data archives are those data that are stagnant, old data, and has infrequently accessed. You need to identify first what are these, mark a tag or add a label to these archived data so you would be able to identify it when stored off-site.
  • Determine Data Access Frequency. Before everything else has to be archived,  you need to identify how frequently will you be going to access the archived data when needed. Certain price can differ on the time you have to access data. For example, Amazon S3 will charge higher if you avail for Expedite Retrieval using Provisioned instead of On-Demand, same thing with Microsoft Azure when you rehydrate archived data with a higher priority.
  • Ensure Multiple Copies Are Spread. Yes, you read it correctly. Even it's archived data or stagnant data, you still need to ensure that your copies are highly available and highly durable when needed. The cloud vendors we have mentioned earlier offers SLA's that will give you an overview of how they store the data for efficiency and faster accessibility. In fact, when configuring your life-cycle policy/backup policy, ensure that you are able to store it in multiple regions or replicate your archived data into a different region. Most of these tech-giant cloud vendors stores their archival cloud storage offerings with multiple zones to offer highly scalable and durable in times of data retrieval is requested.
  • Data Compliance. Ensure that data compliance and regulations are followed accordingly and make it happen during initial phase and not later. Unless the data doesn't affect customer's profile and are just business logic data and history, it might be harmless when it's destroyed but it's better to make things in accord.
  • Provider standards. Choose the right cloud backup and data-retention provider. Walking the path of online data archiving and backup with an experienced service provider could save you from unrecoverable data loss. The top 3 tech-giants of the cloud can be your top choice. But you're free to choose as well promising cloud vendors out there such as Alibaba, IBM or Oracle Archive Storage. It can be best to try it out before making your final decision.

Data Archiving Tools and Software

Database using MariaDB, MySQL, or Percona Server can benefit with using pt-archiver. pt-archiver has been widely used for almost a decade and allows you to prune your data while doing archiving as well. For example, the command below to remove orphan records can be done as,

pt-archiver --source h=host,D=db,t=child --purge \

  --where 'NOT EXISTS(SELECT * FROM parent WHERE col=child.col)'

or send the rows to a different host such as OLAP server,

pt-archiver --source h=oltp_server,D=test,t=tbl --dest h=olap_server \

  --file '/var/log/archive/%Y-%m-%d-%D.%t'                           \

  --where "1=1" --limit 1000 --commit-each

For PostgreSQL or TimescaleDB, you can try and use the CTE (Common Table Expressions) to achieve this. For example,

CREATE TABLE public.user_info_new (LIKE public.user_info INCLUDING ALL);

ALTER TABLE public.user_info_new OWNER TO sysadmin;

GRANT select ON public.user_info_new TO read_only

GRANT select, insert, update, delete ON public.user_info TO user1;

GRANT all ON public.user_info TO admin;

ALTER TABLE public.user_info INHERIT public.user_info_new;




ALTER TABLE public.user_info RENAME TO user_info_old;

ALTER TABLE public.user_info_new RENAME TO user_info;

COMMIT;  (or ROLLBACK; if there's a problem)

Then do a,

WITH row_batch AS (

    SELECT id FROM public.user_info_old WHERE updated_at >= '2016-10-18 00:00:00'::timestamp LIMIT 20000 ),

delete_rows AS (

    DELETE FROM public.user_info_old u USING row_batch b WHERE = RETURNING, account_id, created_at, updated_at, resource_id, notifier_id, notifier_type)

INSERT INTO public.user_info SELECT * FROM delete_rows;

Using CTE with Postgres might incur performance issues. You might have to run this during non-peak hours. See this external blog to be careful on using CTE with PostgreSQL.

For MongoDB, you can try and use mongodump with the --archive parameters just like below,

mongodump --archive=test.$(date +"%Y_%m_%d").archive --db=test

this will dump an archive file namely test.<current-date>.archive

Using ClusterControl for Data Archival

ClusterControl allows you to set a backup policy and upload data off-site to your desired cloud storage location. ClusterControl supports the Top three clouds (AWS, GCP, and Microsoft Azure). Please checkout our previous blog on Best Practices for Database Backups to learn more.

With ClusterControl you can take a backup by first defining the backup policy, choose the database, and archive the table just like below...

Make sure that the "Upload Backup to the cloud" is enabled or checked just like above. Define the backup settings and set retention,

Then define the cloud settings just like below.

For the selected bucket, ensure that you have setup lifecycle management, and in this scenario, we're using AWS S3. In order to setup the lifecycle rule, you just have to select the bucket, then go to the Management tab just like below,

then setup the lifecycle rules as follows,

then ensure its transitions,

In the example above, we're ensuring the transition will go to Amazon S3 Glacier, which is our best choice to retain archived data.

Once you are done setting up, you're good-to-go to take the backup. Your archived data will follow the lifecycle you have setup within AWS for this example. If you use GCP or Microsoft Azure, it's just the same process where you have to set the backup along with its lifecycle.


Adopting the best practices for archiving your data into the cloud can be cumbersome at the beginning, however, if you have the right set of tools or bundled software, it will make your life easier to implement the process.


by Paul Namuag at December 04, 2019 09:16 PM

December 03, 2019

MariaDB Foundation

MariaDB 10.5.0 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.5.0, the first alpha release in the new MariaDB 10.5 development series. […]

The post MariaDB 10.5.0 now available appeared first on

by Ian Gilfillan at December 03, 2019 11:47 PM


How to Backup an Encrypted Database with Percona Server for MySQL 8.0

Production interruptions are nearly guaranteed to happen at some point in time. We know it so we plan backups, create recovery standby databases, convert single instances into clusters.

Admitting the need for a proper recovery scenario, we must analyze the possible disaster timeline and failure scenarios and implement steps to bring your database up. Planned outage execution can help prepare, diagnose, and recover from the next one. To mitigate the impact of downtime, organizations need an appropriate recovery plan, which would include all factors required to bring service into life.

Backup Management is not as mild as just scheduling a backup job. There are many factors to consider, such as retention, storage, verification, and whether the backups you are taking are physical or logical and what is easy to overlook security. 

Many organizations vary their approach to backups, trying to have a combination of server image backups (snapshots), logical and physical backups stored in multiple locations. It is to avoid any local or regional disasters that would wipe up our databases and backups stored in the same data center.

We want to make it secure. Data and backups should be encrypted. But there are many implications when both options are in place. In this article, we will take a look at backup procedures when we deal with encrypted databases.

Encryption-at-Rest for Percona Server for MySQL 8.0

Starting from MySQL 5.7.11, the community version of MySQL began support for InnoDB tablespace encryption. It is called Transparent Tablespace Encryption or referred to as Encryption-at-Rest. 

The main difference compared to the enterprise version is the way the keys are stored - keys are not located in a secure vault, which is required for regulatory compliance. The same applies to Percona Server, starting version 5.7.11, it is possible to encrypt InnoDB tablespace. In the Percona Server 8.0, support for encrypting binary logs has been greatly extended. Version 8.0 added 

(Per Percona 8.0 release doc):

  • Temporary File Encryption
  • InnoDB Undo Tablespace Encryption
  • InnoDB System Tablespace Encryption (InnoDB System Tablespace Encryption)
  • default_table_encryption  =OFF/ON (General Tablespace Encryption)
  • table_encryption_privilege_check =OFF/ON (Verifying the Encryption Settings)
  • InnoDB redo log encryption (for master key encryption only) (Redo Log Encryption)
  • InnoDB merge file encryption (Verifying the Encryption Setting)
  • Percona Parallel doublewrite buffer encryption (InnoDB Tablespace Encryption)

For those interested in-migration from MySQL Enterprise version to Percona -  It is also possible to integrate with Hashicorp Vault server via a keyring_vault plugin, matching the features available in Oracle’s MySQL Enterprise edition.

Data at rest encryption requires that a keyring plugin. There are two options here:

How to Enable Tablespace Encryption

To enable encryption start your database with the --early-plugin-load option:

either by hand:

$ mysqld --early-plugin-load=""

or by modifying the configuration file:


Starting Percona Server 8.0 two types of tablespaces can be encrypted. General tablespace and system tablespace. Sys tablespace is controlled via parameter innodb_sys_tablespace_encrypt. By default, the sys tablespace is not encrypted, and if you have one already, it's not possible to convert it to encrypted state, a new instance must be created (start an instance with --bootstrap option). 

General tablespace support encryption either of all tables in tablespace or none. It's not possible to run encryption in mixed mode. In order to create ate tablespace with encryption use ENCRYPTION='Y/N' flag. 


mysql> CREATE TABLESPACE severalnines ADD DATAFILE 'severalnines.ibd' ENCRYPTION='Y';

Backing up an Encrypted Database

When you add encrypted tablespaces it's necessary to include keyring file in the xtrabackup command. To do it you must specify the path to a keyring file as the value of the --keyring-file-data option.

$ xtrabackup --backup --target-dir=/u01/mysql/data/backup/ --user=root --keyring-file-data=/u01/secure_location/keyring_file

Make sure to store the keyring file in a secure location. Also, make sure to always have a backup of the file. Xtrabackup will not copy the keyring file in the backup directory. To prepare the backup, you need to make a copy of the keyring file yourself.

Preparing the Backup

Once we have our backup file we should prepare it for the recovery. Here you also need to specify the keyring-file-data.

$ xtrabackup --prepare --target-dir=/u01/mysql/data/backup/ --keyring-file-data=/u01/secure_location/keyring_file

The backup is now prepared and can be restored with the --copy-back option. In the case that the keyring has been rotated, you will need to restore the keyring (which was used to take and prepare the backup).

In order to prepare the backup xtrabackup, we will need access to the keyring.  Xtrabackup doesn’t talk directly to the MySQL server and doesn’t read the default my.cnf configuration file during prepare, specify keyring settings via the command line:

$ xtrabackup --prepare --target-dir=/data/backup --keyring-vault-config=/etc/vault.cnf

The backup is now prepared and can be restored with the --copy-back option:

$ xtrabackup --copy-back --target-dir=/u01/backup/ --datadir=/u01/mysql/data/

Performing Incremental Backups

The process of taking incremental backups with InnoDB tablespace encryption is similar to taking the same incremental backups with an unencrypted tablespace.

To make an incremental backup, begin with a full backup. The xtrabackup binary writes a file called xtrabackup_checkpoints into the backup’s target directory. This file contains a line showing the to_lsn, which is the database’s LSN at the end of the backup.

First, you need to create a full backup with the following command:

$ xtrabackup --backup --target-dir=/data/backups/base --keyring-file-data=/var/lib/mysql-keyring/keyring

Now that you have a full backup, you can make an incremental backup based on it. Use a command such as the following:

$ xtrabackup --backup --target-dir=/data/backups/inc1 \

--incremental-basedir=/data/backups/base \


The /data/backups/inc1/ directory should now contain delta files, such as and test/

The meaning should be self-evident. It’s now possible to use this directory as the base for yet another incremental backup:

$ xtrabackup --backup --target-dir=/data/backups/inc2 \

--incremental-basedir=/data/backups/inc1 \


Preparing Incremental Backups

So far the process of backing up the database is similar to a regular backup, except for the flag where we specified location of keyring file. 

Unfortunately, the --prepare step for incremental backups is not the same as for normal backups.

In normal backups, two types of operations are performed to make the database consistent: committed transactions are replayed from the log file against the data files, and uncommitted transactions are rolled back. You must skip the rollback of uncommitted transactions when preparing a backup, because transactions that were uncommitted at the time of your backup may be in progress, and it’s likely that they will be committed in the next incremental backup. You should use the --apply-log-only option to prevent the rollback phase.

If you do not use the --apply-log-only option to prevent the rollback phase, then your incremental backups will be useless. After transactions have been rolled back, further incremental backups cannot be applied.

Beginning with the full backup you created, you can prepare it and then apply the incremental differences to it. Recall that you have the following backups:




To prepare the base backup, you need to run --prepare as usual, but prevent the rollback phase:

$ xtrabackup --prepare --apply-log-only --target-dir=/data/backups/base --keyring-file-data=/var/lib/mysql-keyring/keyring

To apply the first incremental backup to the full backup, you should use the following command:

$ xtrabackup --prepare --apply-log-only --target-dir=/data/backups/base \

--incremental-dir=/data/backups/inc1 \


if the keyring has been rotated between the base and incremental backup that you’ll need to use the keyring that was in use when the first incremental backup has been taken.

Preparing the second incremental backup is a similar process

$ xtrabackup --prepare --target-dir=/data/backups/base \

--incremental-dir=/data/backups/inc2 \


Note; --apply-log-only should be used when merging all incrementals except the last one. That’s why the previous line doesn’t contain the --apply-log-only option. Even if the --apply-log-only was used on the last step, backup would still be consistent but in that case server would perform the rollback phase.
The last step is to restore it with --copy-back option. In case the keyring has been rotated you’ll need to restore the keyring which was used to take and prepare the backup.

While the described restore method works, it requires an access to the same keyring that the server is using. It may not be possible if the backup is prepared on a different server or at a much later time, when keys in the keyring are purged, or, in the case of a malfunction, when the keyring vault server is not available at all.

The --transition-key=<passphrase> option should be used to make it possible for xtrabackup to process the backup without access to the keyring vault server. In this case, xtrabackup derives the AES encryption key from the specified passphrase and will use it to encrypt tablespace keys of tablespaces that are being backed up.

Creating a Backup with a Passphrase

The following example illustrates how the backup can be created in this case:

$ xtrabackup --backup --user=root -p --target-dir=/data/backup \


Restoring the Backup with a Generated Key

When restoring a backup you will need to generate a new master key. Here is the example for keyring_file:

$ xtrabackup --copy-back --target-dir=/data/backup --datadir=/data/mysql \

--transition-key=MySecetKey --generate-new-master-key \


In case of keyring_vault, it will look like this:

$ xtrabackup --copy-back --target-dir=/data/backup --datadir=/data/mysql \

--transition-key=MySecetKey --generate-new-master-key \


by Bart Oles at December 03, 2019 06:37 PM

December 02, 2019


Clustered Database Node Failure and its Impact on High Availability

A node crash can happen at any time, it is unavoidable in any real world situation. Back then, when giant, standalone databases roamed the data world, each fall of such a titan created ripples of issues that moved across the world. Nowadays data world has changed. Few of the titans survived, they were replaced by swarms of small, agile, clustered database instances that can adapt to the ever changing business requirements. 

One example of such a database is Galera Cluster, which (typically) is deployed in the form of a cluster of nodes. What changes if one of the Galera nodes fail? How does this affect the availability of the cluster as a whole? In this blog post we will dig into this and explain the Galera High Availability basics.

Galera Cluster and Database High Availability

Galera Cluster is typically deployed in clusters of at least three nodes. This is due to the fact that Galera uses a quorum mechanism to ensure that the cluster state is clear for all of the nodes and that the automated fault handling can happen. For that three nodes are required - more than 50% of the nodes have to be alive after a node’s crash in order for cluster to be able to operate.

Galera Cluster

Let’s assume you have a three nodes in Galera Cluster, just as on the diagram above. If one node crashes, the situation changes into following:

Node “3” is off but there are nodes “1” and “2”, which consist of 66% of all nodes in the cluster. This means, those two nodes can continue to operate and form a cluster. Node “3” (if it happens to be alive but it cannot connect to the other nodes in the cluster) will account for 33% of the nodes in the cluster, thus it will cease to operate.

We hope this is now clear: three nodes are the minimum. With two nodes each would be 50% of the nodes in the cluster thus neither will have majority - such cluster does not provide HA. What if we would add one more node?

Such setup allows also for one node to fail:

In such case we have three (75%) nodes up-and-running, which is the majority. What would happen if two nodes fail?

Two nodes are up, two are down. Only 50% of the nodes are available, there is no majority thus cluster has to cease its operations. The minimal cluster size to support failure of two nodes is five nodes:

In the case as above two nodes are off, three are remaining which makes it 60% available thus the majority is reached and cluster can operate.

To sum up, three nodes are the minimum cluster size to allow for one node to fail. Cluster should have an odd number of nodes - this is not a requirement but as we have seen, increasing cluster size from three to four did not make any difference on the high availability - still only one failure at the same time is allowed. To make the cluster more resilient and support two node failures at the same time, cluster size has to be increased from three to five. If you want to increase the cluster's ability to handle failures even further you have to add another two nodes.

Impact of Database Node Failure on the Cluster Load

In the previous section we have discussed the basic math of the high availability in Galera Cluster. One node can be off in a three node cluster, two off in a five node cluster. This is a basic requirement for Galera. 

You have to also keep in mind other aspects too. We’ll take a quick look at them just now. For starters, the load on the cluster. 

Let’s assume all nodes have been created equal. Same configuration, same hardware, they can handle the same load. Having load on one node only doesn’t make too much sense cost-wise on three node cluster (not to mention five node clusters or larger). You can safely expect that if you invest in three or five galera nodes you want to utilize all of them. This is quite easy - load balancers can distribute the load across all Galera nodes for you. You can send the writes to one node and balance reads across all nodes in the cluster. This poses additional threat you have to keep in mind. How does the load will look like if one node will be taken out of the cluster? Let’s take a look at the following case of a five node cluster.

We have five nodes, each one is handling 50% load. This is quite ok, nodes are fairly loaded yet they still have some capacity to accommodate unexpected spikes in the workload. As we discussed, such cluster can handle up to two node failures. Ok, let’s see how this would look like:

Two nodes are down, that’s ok. Galera can handle it. 100% of the load has to be redistributed across three remaining nodes. This makes it a total 250% of the load distributed across three nodes. As a result, each of them will be running at 83% of their capacity. This may be acceptable but 83% of the load on average means that the response time will be increased, queries will take longer and any spike in the workload most likely will cause serious issues. 

Will our five node cluster (with 50% utilization of all nodes) really able to handle failure of two nodes? Well, not really, no. It will definitely not be as performant as the cluster before the crashes. It may survive but it’s availability may be seriously affected by temporary spikes in the workload.

You also have to keep in mind one more thing - failed nodes will have to be rebuilt. Galera has an internal mechanism that allows it to provision nodes which join the cluster after the crash. It can either be IST, incremental state transfer, when one of the remaining nodes have required data in gcache. If not, full data transfer will have to happen - all data will be transferred from one node (donor) to the joining node. The process is called SST - state snapshot transfer. Both IST and SST requires some resources. Data has to be read from disk on the donor and then transferred over the network. IST is more light-weight, SST is much heavier as all the data has to be read from disk on the donor. No matter which method will be used, some additional CPU cycles will be burnt. Will the 17% of the free resources on the donor enough to run the data transfer? It’ll depend on the hardware. Maybe. Maybe not. What doesn’t help is that most of the proxies, by default, remove donor node from the pool of nodes to send traffic to. This makes perfect sense - node in “Donor/Desync” state may lag behind the rest of the cluster. 

When using Galera, which is virtually a synchronous cluster, we don’t expect nodes to lag. This could be a serious issue for the application. On the other hand, in our case, removing donor from the pool of nodes to load balance the workload ensures that the cluster will be overloaded (250% of the load will be distributed across two nodes only, 125% of node’s capacity is, well, more than it can handle). This would make the cluster definitely not available at all.


As you can see, high availability in the cluster is not just a matter of quorum calculation. You have to account for other factors like workload, its change in time, handling state transfers. When in doubt, test yourself. We hope this short blog post helped you to understand that high availability is quite a tricky subject even if only discussed based on two variables - number of nodes and node’s capacity. Understanding this should help you design better and more reliable HA environments with Galera Cluster.

by krzysztof at December 02, 2019 08:40 PM

November 30, 2019

Valeriy Kravchuk

Fun with Bugs #90 - On MySQL Bug Reports I am Subscribed to, Part XXIV

Previous post in this series was published 3 months ago and the last Bug #96340 from it is already closed as fixed in upcoming MySQL 8.0.19. I've picked up 50+ more bugs to follow since that time, so I think I should send quick status update about interesting public MySQL bug reports that are still active.

As usual I concentrate mostly on InnoDB, replication and optimizer bugs. Here is the list, starting from the oldest:
  • Bug #96374  - "binlog rotation deadlock when innodb concurrency limit setted". This bug was reported by Jia Liu, who used gdb to show threads deadlock details. I admit that recently more bug reporters use gdb and sysbench with custom(ized) Lua scripts to prove the point, and I am happy to see this happening!
  • Bug #96378 - "Subquery with parameter is exponentially slower than hard-coded value". In my primitive test with user variables replaced by constants (on MariaDB 10.3.7) I get the same plan for the query, so I am not 100% sure that the analysis by my dear friend Sinisa Milivojevic was right and it's about optimization (and not comparing values with different collations, for example). But anyway, this problem reported by Jeff Johnson ended up as a verified feature request. Let's see what may happen to it next.
  • Bug #96379 - "First query successful, second - ERROR 1270 (HY000): Illegal mix of collations ". This really funny bug was reported by Владислав Сокол.
  • Bug #96400 - "MTS STOP SLAVE takes over a minute when master crashed during event logging". Nice bug report by Przemyslaw Malkowski from Percona, who used sysbench and dbdeployer to demonstrate the problem. Later Przemysław Skibiński (also from Percona) provided a patch to resolve the problem.
  • Bug #96412 - "Mess usages of latch meta data for InnoDB latches (mutex and rw_lock)". Fungo Wang had to make a detailed code analysis to get this bug verified. I am not sure why it ended up with severity S6 (Debug Builds) though.
  • Bug #96414 - "CREATE TABLE events in wrong order in a binary log.". This bug was reported by Iwo P. His test case to demonstarte the problem included small source code modification, but (unlike with some other bug reports) this had NOT prevented accepting it as a true, verified bug. The bug not affect MySQL 8.0.3+ thanks to WL#6049 "Meta-data locking for FOREIGN KEY tables" implemented there.
  • Bug #96472 - "Memory leak after 'innodb.alter_crash'". Yet another bug affecting only MySQL 4.7 and not MySQL 8.0. It was reported by Yura Sorokin from Percona.
  • Bug #96475 - "ALTER TABLE t IMPORT TABLESPACE blocks SELECT on I_S.tables.".  Clear and simple "How to repeat" instructions (using dbdeployer) by Jean-François Gagné. See also his related Bug #96477 - "FLUSH TABLE t FOR EXPORT or ALTER TABLE t2 IMPORT TABLESPACE broken in 8.0.17" for MySQL 8. The latter is a regression bug (without a regression tag), and I just do not get how the GA releases with such new bugs introduced may happen.
  • Bug #96504 - "Refine atomics and barriers for weak memory order platform". Detailed analysis, with links to code etc from Cai Yibo.
  • Bug #96525 - "Huge malloc when open file limit is high". Looks more like a systemd problem (in versions < 240) to me. Anyway, useful report from Andreas Hasenack.
  • Bug #96615 - "mysql server cannot handle write operations after set system time to the past". A lot of arguments were needed to get this verified, but Shangshang Yu was not going to give up. First time I see gstack used in the bug report to get a stack trace quickly. It's a part of gdb RPM on CentOS 6+. I have to try it vs gdb and pstack one day and decide what is the easiest and most efficient way to get backtraces of all threads in production...
  • Bug #96637 - "Clone fails on just upgraded server from 5.7". I had not used MySQL 8 famous clone plugin yet in practice, but I already know that it has bugs. This bug was reported by Satya Bodapati, who also suggested a patch.
  • Bug #96644 - "Set read_only on a master waiting for semi-sync ACK blocked on global read lock". Yet another problem (documented limitation) report from Przemyslaw Malkowski. Not sure why it was not verified on MySQL 8.0. Without a workaround to set master to read only it is unsafe to use long rpl_semi_sync_master_timeout values, as we may end up with that long downtime.
  • Bug #96677 - ""SELECT ... INTO var_name FOR UPDATE" not working in MySQL 8". This regression bug was reported by Vinodh Krish. Some analysis and patch were later suggested by Zsolt Parragi.
  • Bug #96690 - "sql_require_primary_key should not apply to temporary tables". This bug was also reported by Przemyslaw Malkowski from Percona. It ended up as a verified feature request, but not everyone in community is happy with this. Let me quote:
    "[30 Aug 8:08] Jean-François Gagné
    Could we know what was the original severity of this bug as reported by Przemyslaw ? This is now hidden as it has been reclassified as S4 (Feature Request).

    From my point of view, this is actually a bug, not a feature request and it should be classified as S2. A perfectly working application would break for no reason when a temporary table does not have a Primary Key, so this is actually a big hurdle for using sql_require_primary_key, hence serious bug in the implementation of this otherwise very nice and useful feature.
That's all about bugs I've subscribed to in summer.
Winter is coming, so why not to remember nice warm sunny days and interesting MySQL bugs reported back then.
To summarize:
  1. We still see some strange "games" played during bugs processing and trend to decrease severity of reports. I think this is a waste of time for both Oracle engineers and community bug reporters.
  2. I am still not sure if Oracle's QA does not use ASan or just ignore problems reported for MTR test cases. Anyway, Percona engineers do this for them, and report related bugs :)
  3. dbdeployer and sysbench are really popular among MySQL bug reporters recently!
  4. Importing of InnoDB tablespaces is broken in MySQL 8.0.17+ at least.
  5. There are many interesting MySQL bugs reported during last 3 months, so I epxect more posts in this series soon.

by Valerii Kravchuk ( at November 30, 2019 04:53 PM

Oli Sennhauser

Migration from MySQL 5.7 to MariaDB 10.4

Up to version 5.5 MariaDB and MySQL can be considered as "the same" databases. The official wording at those times was "drop-in-replacement". But now we are a few years later and times and features changed. Also the official wording has slightly changed to just "compatible".
FromDual recommends that you consider MariaDB 10.3 and MySQL 8.0 as completely different database products (with some common roots) nowadays. Thus you should work and act accordingly.

Because more and more FromDual customers consider a migration from MySQL to MariaDB we were testing some migration paths to find the pitfalls. One upgrade of some test schemas led to the following warnings:

# mysql_upgrade --user=root
MariaDB upgrade detected
Phase 1/7: Checking and upgrading mysql database
Processing databases
mysql.columns_priv                                 OK
mysql.user                                         OK
Phase 2/7: Installing used storage engines
Checking for tables with unknown storage engine
Phase 3/7: Fixing views from mysql
Error    : Table 'performance_schema.memory_summary_by_host_by_event_name' doesn't exist
status   : Operation failed
Error    : Column count of mysql.proc is wrong. Expected 21, found 20. Created with MariaDB 50723, now running 100407. Please use mysql_upgrade to fix this error
error    : Corrupt
Error    : Table 'performance_schema.memory_summary_by_host_by_event_name' doesn't exist
Error    : View 'sys.x$host_summary' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them
error    : Corrupt
sys.x$waits_global_by_latency                      OK
Phase 4/7: Running 'mysql_fix_privilege_tables'
Phase 5/7: Fixing table and database names
Phase 6/7: Checking and upgrading tables
Processing databases
staging.sales                                      OK
Warning  : Row size too large (> 8126). Changing some columns to TEXT or BLOB or using ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED may help. In current row format, BLOB prefix of 768 bytes is stored inline.
status   : OK
Phase 7/7: Running 'FLUSH PRIVILEGES'

If you run the mysql_upgrade utility a 2nd time all issues are gone...

# mysql_upgrade --user=root --force

Some hints for upgrading

  • Make a backup first before you start!
  • Dropping MySQL sys Schema before the upgrade and installing MariaDB sys Schema again afterwards reduces noise a bit and lets you having a working sys Schema again.
    The MariaDB sys Schema you can find at GitHub: FromDual / mariadb-sys .
  • It makes sense to read this document before you begin with the upgrade: MariaDB versus MySQL: Compatibility.


by Shinguz at November 30, 2019 01:17 PM

November 29, 2019

Federico Razzoli

The dangers of replication filters in MySQL

MySQL supports replication filters and binlog filters. These features are powerful, but dangerous. Here you'll find out the risks, and how to mitigate them.

by Federico Razzoli at November 29, 2019 11:43 AM


Automating MongoDB with SaltStack

Database deployment for a multiple number of servers becomes more complex and time consuming with time when adding new resources or making changes. In addition, there is a likelihood of human errors that may lead to catastrophic outcomes whenever the system is configured manually.  

A database deployment automation tool will enable us to deploy a database across multiple servers ranging from development to production environments. The results from an automated deployment are reliable, more efficient and predictable besides providing the current state information of your nodes which can be further used to plan for resources you will need to add to your servers. With a well-managed deployment, the productivity of both development and operational teams improves thereby enabling the business to develop faster, accomplish more and due to easy frequent deployment, the overall software setup will be ultimately better and function reliably for end-users. 

MongoDB can be deployed manually but the task becomes more and more cumbersome when you have to configure a cluster of many members being hosted on different servers. We therefore need to resolve to use an automotive tool that can save us the stress. Some of the available tools that can be used include Puppet, Chef, Ansible, and SaltStack.

The main benefits of deploying your MongoDB with any of these tools are:

  1. Time saving. Imagine having 50 nodes for your database and you need to update MongoDB version for each. This will take you ages going through the process. However, with an automatic tool, you will just need to write some instructions and issues a command to do the rest of the update for you. Developers will then have time to work on new features rather than fixing manual deployments.
  2. Reduced errors hence customer satisfaction. Making new updates may introduce errors to a database system especially if the configuration has to be done manually. With a tool like SaltStack, removing manual steps reduces human error and frequent updates with new features will address customer needs hence keeping the organization competitive.
  3. Lower configuration cost. With a deployment tool, anyone can deploy even yourself since the process itself will be much easier. This will eliminate the need for experts to do the work and reduced errors

What is SaltStack

SaltStack is an open-source remote execution tool and a configuration management system developed in Python. 

The remote execution features are used to run commands on various machines in parallel with a flexible targeting system. If for example you have 3 server machines and you would like to install MongoDB for each, you can run the installation commands on these machines simultaneously from a master node. 

In terms of configuration management, a client-server interface is established to ease and securely transform the infrastructure components into the desired state.

SaltStack Architecture

The basic setup model for SaltStack is Client-Server where the server can be referred to as the master and the Clients as slaves. The master issues command or rather instructions as the controlling system that need to be executed by the clients/minions which are the controlled systems.

SaltSack Components

The following are what SaltStack is made of

  1. Master: Responsible for issuing instructions to the slaves and change them to the desired state after execution.
  2. Minion: It is the controlled system which needs to be transformed into some desired state.
  3. Salt Grains:  this is static data or metadata regarding the minion and it constitutes information like the model, serial number, memory capacity, and the Operating System. They are collected when the minion first connects to the server. They can be used for targeting a certain group of minions in relation to some aspect. For example, you can run a command stating, install MongoDB for all machines with a Windows operating system. 
  4. Execution Modules/instructions: These are Ad hoc commands issued to one or more target minions and are executed from the command line.
  5. Pillars: are user defined variables distributed among the minions. They are used for: minion configuration, highly sensitive data, arbitrary data, and variables. Not all minions are accessible to all pillars, one can restrict which pillars are for a certain group of minions.
  6. State files. This is the core of Salt state System (SLS) and it represents the state in which the system should be in. It is an equivalent to a playbook in case of Ansible considering that they are also in YAML format i.e
#/srv/salt/mongodbInstall.sls (file root)

install_mongodb: (task id)

pkg.installed: (state declaration)

-name:mongodb  (name of package to install)
  1. Top file: Used to map a group of machines and define which state files should be applied . i.e.




  1. Salt Proxy:  This is a feature that enables controlling devices that cannot run a standard salt-minion. They include network gears with an API running on a proprietary OS, devices with CPU and memory limitations or ones that cannot run minions due to security reasons. A Junos proxy has to be used for discovery, control, remote execution and state management of these devices.

SaltStack Installation

We can use the pip command to install SaltStack as 

$ pip install salt

To confirm the installation, run the command $ salt --version and you should get something like salt 2019.2.2 (Fluorine)

Before connecting to the master the minion will require a minimum configuration of the master ip address and minion id which will be used by the master for its reference. These configurations can be done in the files /etc/salt/minion.

We can then run the master in various modes that is daemon or in debug mode. For the daemon case you will have $salt-master -d and for debug mode,  $salt-master -l debug. You will need to accept the minion’s key before starting it by running $ salt-key -a nameOfMinion. To list the available keys, run $ salt-key -l

In the case of the minion, we can start it with $salt-minion -l debug.

For example, if we want to create a file in all the minions from the master, we can run the command 

$ salt ‘’*” file.touch ‘/tmp/salt_files/sample.text

All nodes will have a new sample.text file in the salt_files folder. The * option is used to refer to all minions. To specify for example all minions with id name having the string minion, we will use a regex expression as below 

$ salt “minion*” file.touch ‘/tmp/salt_files/sample.text

To see the metadata collected for a given minion, run:

$salt ‘minion1’ grains.items.

Setting up MongoDB with SaltStack

We can create a database called myAppdata with the setDatabase.sls with the contents below 


- service.mongodb.server.cluster



     mongodb_server_replica_set: myAppdata

     mongodb_myAppdata_password: myAppdataPasword

     mongodb_admin_password: cloudlab

     mongodb_shared_key: xxx





           enabled: true

           password: ${_param:mongodb_myAppdata_password}


           -  name: myAppdata

              password: ${_param:mongodb_myAppdata_password}

Starting a Single MongoDB Server 



    enabled: true



      port: 27017


      username: admin

      password: myAppdataPasword



        enabled: true

        encoding: 'utf8'


        - name: 'username'

          password: 'password'

Setting up a MongoDB Cluster with SaltStack



    enabled: true


      verbose: false

      logLevel: 1

      oplogLevel: 0


      user: admin

      password: myAppdataPasword

    master: mongo01


      - host:

        priority: 2

      - host:

      - host:

    replica_set: default

    shared_key: myAppdataPasword


Like ClusterControl, SaltStack is an automation tool that can be used to ease deployment and operations tasks. With an automation tool, there are reduced errors, reduced time of configuration, and more reliable results.

by Onyancha Brian Henry at November 29, 2019 10:45 AM

November 28, 2019


How ClusterControl Performs Automatic Database Recovery and Failover

ClusterControl is programmed with a number of recovery algorithms to automatically respond to different types of common failures affecting your database systems. It understands different types of database topologies and database-related process management to help you determine the best way to recover the cluster. In a way, ClusterControl improves your database availability.

Some topology managers only cover cluster recovery like MHA, Orchestrator and mysqlfailover but you have to handle the node recovery by yourself. ClusterControl supports recovery at both cluster and node level.

Configuration Options

There are two recovery components supported by ClusterControl, namely:

  • Cluster - Attempt to recover a cluster to an operational state
  • Node - Attempt to recover a node to an operational state

These two components are the most important things in order to make sure the service availability is as high as possible. If you already have a topology manager on top of ClusterControl, you can disable automatic recovery feature and let other topology manager handle it for you. You have all the possibilities with ClusterControl. 

The automatic recovery feature can be enabled and disabled with a simple toggle ON/OFF, and it works for cluster or node recovery. The green icons mean enabled and red icons means disabled. The following screenshot shows where you can find it in the database cluster list:

There are 3 ClusterControl parameters that can be used to control the recovery behaviour. All parameters are default to true (set with boolean integer 0 or 1):

  • enable_autorecovery - Enable cluster and node recovery. This parameter is the superset of enable_cluster_recovery and enable_node_recovery. If it's set to 0, the subset parameters will be turned off.
  • enable_cluster_recovery - ClusterControl will perform cluster recovery if enabled.
  • enable_node_recovery - ClusterControl will perform node recovery if enabled.

Cluster recovery covers recovery attempt to bring up entire cluster topology. For example, a master-slave replication must have at least one master alive at any given time, regardless of the number of available slave(s). ClusterControl attempts to correct the topology at least once for replication clusters, but infinitely for multi-master replication like NDB Cluster and Galera Cluster.

Node recovery covers node recovery issue like if a node was being stopped without ClusterControl knowledge, e.g, via system stop command from SSH console or being killed by OOM process.

Node Recovery

ClusterControl is able to recover a database node in case of intermittent failure by monitoring the process and connectivity to the database nodes. For the process, it works similarly to systemd, where it will make sure the MySQL service is started and running unless if you intentionally stopped it via ClusterControl UI.

If the node comes back online, ClusterControl will establish a connection back to the database node and will perform the necessary actions. The following is what ClusterControl would do to recover a node:

  • It will wait for systemd/chkconfig/init to start up the monitored services/processes for 30 seconds
  • If the monitored services/processes are still down, ClusterControl will try to start the database service automatically.
  • If ClusterControl is unable to recover the monitored services/processes, an alarm will be raised.

Note that if a database shutdown is initiated by user, ClusterControl will not attempt to recover the particular node. It expects the user to start it back via ClusterControl UI by going to Node -> Node Actions -> Start Node or use the OS command explicitly.

The recovery includes all database-related services like ProxySQL, HAProxy, MaxScale, Keepalived, Prometheus exporters and garbd. Special attention to Prometheus exporters where ClusterControl uses a program called "daemon" to daemonize the exporter process. ClusterControl will try to connect to exporter's listening port for health check and verification. Thus, it's recommended to open the exporter ports from ClusterControl and Prometheus server to make sure no false alarm during recovery.

Cluster Recovery

ClusterControl understands the database topology and follows best practices in performing the recovery. For a database cluster that comes with built-in fault tolerance like Galera Cluster, NDB Cluster and MongoDB Replicaset, the failover process will be performed automatically by the database server via quorum calculation, heartbeat and role switching (if any). ClusterControl monitors the process and make necessary adjustments to the visualization like reflecting the changes under Topology view and adjusting the monitoring and management component for the new role e.g, new primary node in a replica set.

For database technologies that do not have built-in fault tolerance with automatic recovery like MySQL/MariaDB Replication and PostgreSQL/TimescaleDB Streaming Replication, ClusterControl will perform the recovery procedures by following the best-practices provided by the database vendor. If the recovery fails, user intervention is required, and of course you will get an alarm notification regarding this.

In a mixed/hybrid topology, for example an asynchronous slave which is attached to a Galera Cluster or NDB Cluster, the node will be recovered by ClusterControl if cluster recovery is enabled.

Cluster recovery does not apply to standalone MySQL server. However, it's recommended to turn on both node and cluster recoveries for this cluster type in the ClusterControl UI.

MySQL/MariaDB Replication

ClusterControl supports recovery of the following MySQL/MariaDB replication setup:

  • Master-slave with MySQL GTID
  • Master-slave with MariaDB GTID
  • Master-slave with without GTID (both MySQL and MariaDB)
  • Master-master with MySQL GTID
  • Master-master with MariaDB GTID
  • Asynchronous slave attached to a Galera Cluster

ClusterControl will respect the following parameters when performing cluster recovery:

  • enable_cluster_autorecovery
  • auto_manage_readonly
  • repl_password
  • repl_user
  • replication_auto_rebuild_slave
  • replication_check_binlog_filtration_bf_failover
  • replication_check_external_bf_failover
  • replication_failed_reslave_failover_script
  • replication_failover_blacklist
  • replication_failover_events
  • replication_failover_wait_to_apply_timeout
  • replication_failover_whitelist
  • replication_onfail_failover_script
  • replication_post_failover_script
  • replication_post_switchover_script
  • replication_post_unsuccessful_failover_script
  • replication_pre_failover_script
  • replication_pre_switchover_script
  • replication_skip_apply_missing_txs
  • replication_stop_on_error

For more details on each of the parameter, refer to the documentation page.

ClusterControl will obey the following rules when monitoring and managing a master-slave replication:

  • All nodes will be started with read_only=ON and super_read_only=ON (regardless of its role).
  • Only one master (read_only=OFF) is allowed to operate at any given time.
  • Rely on MySQL variable report_host to map the topology.
  • If there are two or more nodes that have read_only=OFF at a time, ClusterControl will automatically set read_only=ON on both masters, to protect them against accidental writes. User intervention is required to pick the actual master by disabling the read-only. Go to Nodes -> Node Actions -> Disable Readonly.

In case the active master goes down, ClusterControl will attempt to perform the master failover in the following order:

  1. After 3 seconds of master unreachability, ClusterControl will raise an alarm.
  2. Check the slave availability, at least one of the slaves must be reachable by ClusterControl.
  3. Pick the slave as a candidate to be a master.
  4. ClusterControl will calculate the probability of errant transactions if GTID is enabled. 
  5. If no errant transaction is detected, the chosen will be promoted as the new master.
  6. Create and grant replication user to be used by slaves.
  7. Change master for all slaves that were pointing to the old master to the newly promoted master.
  8. Start slave and enable read only.
  9. Flush logs on all nodes.
  10. If the slave promotion fails, ClusterControl will abort the recovery job. User intervention or a cmon service restart is required to trigger the recovery job again.
  11. When old master is available again, it will be started as read-only and will not be part of the replication. User intervention is required.

At the same time, the following alarms will be raised:

Check out Introduction to Failover for MySQL Replication - the 101 Blog and Automatic Failover of MySQL Replication - New in ClusterControl 1.4 to get further information on how to configure and manage MySQL replication failover with ClusterControl.

PostgreSQL/TimescaleDB Streaming Replication

ClusterControl supports recovery of the following PostgreSQL replication setup:

ClusterControl will respect the following parameters when performing cluster recovery:

  • enable_cluster_autorecovery
  • repl_password
  • repl_user
  • replication_auto_rebuild_slave
  • replication_failover_whitelist
  • replication_failover_blacklist

For more details on each of the parameter, refer to the documentation page.

ClusterControl will obey the following rules for managing and monitoring a PostgreSQL streaming replication setup:

  • wal_level is set to "replica" (or "hot_standby" depending on the PostgreSQL version).
  • Variable archive_mode is set to ON on the master.
  • Set recovery.conf file on the slave nodes, which turns the node into a hot standby with read-only enabled.

In case if the active master goes down, ClusterControl will attempt to perform the cluster recovery in the following order:

  1. After 10 seconds of master unreachability, ClusterControl will raise an alarm.
  2. After 10 seconds of graceful waiting timeout, ClusterControl will initiate the master failover job.
  3. Sample the replayLocation and receiveLocation on all available nodes to determine the most advanced node.
  4. Promote the most advanced node as the new master.
  5. Stop slaves.
  6. Verify the synchronization state with pg_rewind.
  7. Restarting slaves with the new master.
  8. If the slave promotion fails, ClusterControl will abort the recovery job. User intervention or a cmon service restart is required to trigger the recovery job again.
  9. When old master is available again, it will be forced to shut down and will not be part of the replication. User intervention is required. See further down.

When the old master comes back online, if PostgreSQL service is running, ClusterControl will force shutdown of the PostgreSQL service. This is to protect the server from accidental writes, since it would be started without a recovery file (recovery.conf), which means it would be writable. You should expect the following lines will appear in postgresql-{day}.log:

2019-11-27 05:06:10.091 UTC [2392] LOG:  database system is ready to accept connections

2019-11-27 05:06:27.696 UTC [2392] LOG:  received fast shutdown request

2019-11-27 05:06:27.700 UTC [2392] LOG:  aborting any active transactions

2019-11-27 05:06:27.703 UTC [2766] FATAL:  terminating connection due to administrator command

2019-11-27 05:06:27.704 UTC [2758] FATAL:  terminating connection due to administrator command

2019-11-27 05:06:27.709 UTC [2392] LOG:  background worker "logical replication launcher" (PID 2419) exited with exit code 1

2019-11-27 05:06:27.709 UTC [2414] LOG:  shutting down

2019-11-27 05:06:27.735 UTC [2392] LOG:  database system is shut down

The PostgreSQL was started after the server was back online around 05:06:10 but ClusterControl performs a fast shutdown 17 seconds after that around 05:06:27. If this is something that you would not want it to be, you can disable node recovery for this cluster momentarily.

Check out Automatic Failover of Postgres Replication and Failover for PostgreSQL Replication 101 to get further information on how to configure and manage PostgreSQL replication failover with ClusterControl.


ClusterControl automatic recovery understands database cluster topology and is able to recover a down or degraded cluster to a fully operational cluster which will improve the database service uptime tremendously. Try ClusterControl now and achieve your nines in SLA and database availability. Don't know your nines? Check out this cool nines calculator.

by ashraf at November 28, 2019 10:45 AM

November 27, 2019


How to Avoid PostgreSQL Cloud Vendor Lock-in

Vendor lock-in is a well-known concept for database technologies. With cloud usage increasing, this lock-in has also expanded to include cloud providers. We can define vendor lock-in as a proprietary lock-in that makes a customer dependent on a vendor for their products or services. Sometimes this lock-in doesn’t mean that you can’t change the vendor/provider, but it could be an expensive or time-consuming task.

PostgreSQL, an open source database technology, doesn’t have the vendor lock-in problem in itself, but if you’re running your systems in the cloud, it’s likely you’ll need to cope with that issue at some time.

In this blog, we’ll share some tips about how to avoid PostgreSQL cloud lock-in and also look at how ClusterControl can help in avoiding it.

Tip #1: Check for Cloud Provider Limitations or Restrictions

Cloud providers generally offer a simple and friendly way (or even a tool) to migrate your data to the cloud. The problem is when you want to leave them it can be hard to find an easy way to migrate the data to another provider or to an on-prem setup. This task usually has a high cost (often based on the amount of traffic).

To avoid this issue, you must always first check the cloud provider documentation and limitations to know the restrictions that may be inevitable when leaving.

Tip #2: Pre-Plan for a Cloud Provider Exit

The best recommendation that we can give you is don’t wait until the last minute to know how to leave your cloud provider. You should plan it long in advance so you can know the best, fastest, and least expensive way to make your exit., 

Because this plan most-likely depends on your specific business requirements the plan will be different depending on whether you can schedule maintenance windows and if the company will accept any downtime periods. Planning it beforehand, you will definitely avoid a headache at the end of the day.

Tip #3: Avoid Using Any Exclusive Cloud Provider Products

A cloud provider’s product will almost always run better than an open source product. This is due to the fact that it was designed and tested to run on the cloud provider’s infrastructure. The performance will often be considerably better than the second one.

If you need to migrate your databases to another provider, you’ll have the technology lock-in problem as the cloud provider product is only available in the current cloud provider environment. This means you won’t be able to migrate easily. You can probably find a way to do it by generating a dump file (or another backup method), but you'll probably have a long downtime period (depending on the amount of data and technologies that you want to use).

If you are using Amazon RDS or Aurora, Azure SQL Database, or Google Cloud SQL, (to focus on the most currently used cloud providers) you should consider checking the alternatives to migrate it to an open source database. With this, we’re not saying that you should migrate it, but you should definitely have an option to do it if needed.

Tip #4: Store You Backups to Another Cloud Provider

A good practice to decrease downtime, whether in the case of migration or for disaster recovery, is not only to store backups in the same place (for a faster recovery reasons), but also to store backups in a different cloud provider or even on-prem. 

By following this practice when you need to restore or migrate your data, you just need to copy the latest data after the backup was taken back. The amount of traffic and time will be considerably less than copying all data without compression during the migration or failure event.

Tip #5: Use a Multi-Cloud or Hybrid Model

This is probably the best option if you want to avoid cloud lock-in. Storing the data in two or more places in real-time (or as close to real-time as you can get) allows you to migrate in a fast way and you can do it with the least downtime possible. If you have a PostgreSQL cluster in one cloud provider and you have a PostgreSQL standby node in another one, in case that you need to change your provider, you can just promote the standby node and send the traffic to this new primary PostgreSQL node. 

A similar concept is applied to the hybrid model. You can keep your production cluster in the cloud, and then you can create a standby cluster or database node on-prem, which generates a hybrid (cloud/on-prem) topology, and in case of failure or migration necessities, you can promote the standby node without any cloud lock-in as you’re using your own environment.

In this case, keep in mind that probably the cloud provider will charge you for the outbound traffic, so under heavy traffic, keep this method working could generate an excessive cost for the company.

How ClusterControl Can Help Avoid PostgreSQL Lock-in

In order to avoid PostgreSQL lock-in, you can also use ClusterControl to deploy (or import), manage, and monitor your database clusters. This way you won’t depend on a specific technology or provider to keep your systems up and running.

ClusterControl has a friendly and easy-to-use UI, so you don’t need to use a cloud provider management console to manage your databases, you just need to login in and you’ll have an overview of all your database clusters in the same system.

It has three different versions (including a community free version). You can still use ClusterControl (without some paid features) even if your license is expired and it won’t affect your database performance.

You can deploy different open source database engines from the same system, and only SSH access and a privileged user is required to use it.

ClusterControl can also help in managing your backup system. From here, you can schedule a new backup using different backup methods (depending on the database engine), compress, encrypt, verify your backups by restoring it in a different node. You can also store it in multiple different locations at the same time (including the cloud).

The multi-cloud or hybrid implementation is easily doable with ClusterControl by using the Cluster-to-Cluster Replication or the Add Replication Slave feature. You only need to follow a simple wizard to deploy a new database node or cluster in a different place. 


As data is probably the most important asset to the company, most probably you’ll want to keep data as controlled as possible. Having a cloud lock-in doesn’t help on this. If you’re in a cloud lock-in scenario, it means that you can’t manage your data as you wish, and that could be a problem.

However, cloud lock-in is not always a problem. It could be possible that you’re running all your system (databases, applications, etc) in the same cloud provider using the provider products (Amazon RDS or Aurora, Azure SQL Database, or Google Cloud SQL) and you’re not looking for migrating anything, instead of that, it's possible that you’re taking advantage of all the benefits of the cloud provider. Avoiding cloud lock-in is not always a must as it depends on each case.

We hope you enjoyed our blog sharing the most common ways to avoid a PostgreSQL cloud lock-in and how ClusterControl can help.

by Sebastian Insausti at November 27, 2019 08:21 PM

MariaDB Foundation

MariaDB Server’s continuous integration & testing available to community

How MariaDB Server is tested
MariaDB Foundation is commited to ensuring MariaDB Server has a thriving community of developers and contributors. A software project cannot be maintained without proper tests. […]

The post MariaDB Server’s continuous integration & testing available to community appeared first on

by Vicențiu Ciorbaru at November 27, 2019 06:13 AM

November 26, 2019


Comparing Percona XtraBackup to MySQL Enterprise Backup: Part One

When it comes to backups and data archiving, IT departments are often under stress to meet stringent service level agreements as well as deliver more robust backup procedures that would minimize the downtime, speed up the backup process, cost less, and meet tight security requirements.

There are multiple ways to take a backup of a MySQL database, but we can divide these methods into two groups - logical and physical.

Logical Backups contain data that is exported using SQL commands and stored in a file. It can be, e.g., a set of SQL commands, that, when executed, will result in restoring the content of the database. With some modifications to the output file's syntax, you can store your backup in CSV files.

Logical backups are easy to perform, solely with a one-liner, you can take a backup of all of your table, database, or all mysql databases in the instance. 

Unfortunately, logical backups have many limitations.  They are usually slower than a physical one. This is due to the overhead needed to execute SQL commands to get the data out and then to execute another set of SQL commands to get the data back into the database.  They are less flexible, unless you write complex backup workloads that would include multiple steps. It doesn't work well in a parallel environment, provides less security, and so on and so one.

Physical Backups in MySQL World

MySQL doesn't come with online physical backup for community edition. You can either pay for an Enterprise version or use a third-party tool. The most popular third-party tool on the market is XtraBackup. Those we are going to compare in this blog article.

Percona XtraBackup is the very popular, open-source, MySQL/MariaDB hot backup software that performs non-blocking backups for InnoDB and XtraDB databases. It falls into the physical backup category, which consists of exact copies of the MySQL data directory and files underneath it.

One of the biggest advantages of XtraBackup is that it does not lock your database during the backup process. For large databases (100+ GB), it provides much better restoration time as compared to mysqldump. The restoration process involves preparing MySQL data from the backup files, before replacing or switching it with the current data directory on the target node.

Percona XtraBackup works by remembering the log sequence number (LSN) when it starts and then copies away the data files to another location. Copying data takes time, and if the files are changing, they reflect the state of the database at different points in time. At the same time, XtraBackup runs a background process that keeps an eye on the transaction log (aka redo log) files, and copies changes from it. This has to be done continually because the transaction logs are written in a round-robin fashion, and can be reused after a while. XtraBackup needs the transaction log records for every change to the data files since it began execution.

By using this tool you can:

  • Create hot InnoDB backups, that complete quickly and reliably, without pausing your database or adding load to the server
  • Make incremental backups
  • Move tables between MySQL servers on-line
  • Create new MySQL replication slaves easily
  • Stream compressed MySQL backups to another server
  • Save on disk space and network bandwidth

MySQL Enterprise Backup delivers hot, online, non-blocking backups on multiple platforms. It's not a free backup tool, but it offers a lot of features. The standard license cost is $5000 (but may vary on your agreement with Oracle.) 

Backup Process Supported Platforms

MySQL Enterprise

It may run on Linux, Windows, Mac & Solaris. What is essential it may also store backup to tape, which is usually a cheaper solution than writes to disks. The direct tape writes supports integration with Veritas Netbackup, Tivoli Storage Manager, and EMC NetWorker. 


XtraBackup may run only on the Linux platform, which may be undoubtedly a show stopper for those running on windows. A solution here maybe replication to the slave running on Linux and running backup from there. 

Backup Process Main Differences

MySQL Enterprise Backup provides a rich set of back and recovery features and functionality including significant performance improvements over existing MySQL backup methods. 

Oracle shows Enterprise backup to be even 49x faster than mysqldump. That, of course, may vary depending on you data however there are many features to improve the backup process. A parallel backup is definitely one of the biggest differences between mysqldump and Enterprise backup. It increases performance by multi-threaded processing. The most interesting feature, however, is compression.


Creates a backup in compressed format. For a regular backup, among all the storage engines supported by MySQL, only data files of the InnoDB format are compressed, and they bear the .ibz extension after the compression. Similarly, for a single-image backup, only data files of the InnoDB format inside the backup image are compressed. The binary log and relay log files are compressed and saved with the .bz extension when being included in a compressed backup.

-compress-method=zlib,lz4(default), lzma, punch-hole



MySQL Backups with ClusterControl

ClusterControl allows you to schedule backups using XtraBackup and mysqldump. It can store the backup files locally on the node where the backup is taken, or the backup files can also be streamed to the controller node and compressed on-the-fly. It does not support MySQL Enterprise backup however with the extended features of mysqldump and XtraBackup it may be a good option. 

ClusterControl is the all-inclusive open source database management system for users with mixed environments. It provides advanced backup management functionality for MySQL or MariaDB.

ClusterControl Backup Repository

With ClusterControl you can:

  • Create backup policies
  • Monitor backup status, executions, and servers without backups
  • Execute backups and restores (including a point in time recovery)
  • Control backup retention
  • Save backups in cloud storage
  • Validate backups (full test with the restore on the standalone server)
  • Encrypt backups
  • Compress backups
  • And many others
ClusterControl Backup Recovery


As a DBA, you need to make sure that the databases are backed up regularly, and appropriate recovery procedures are in place and tested. Both Percona XtraBackup and MySQL Enterprise Backup provides DBAs with a high-performance, online backup solution with data compression and encryption technology to warrant your data is protected in the event of downtime or an outage

Backups should be planned according to the restoration requirement. Data loss can be full or partial. For instance, you do not always need to recover the whole data. In some cases, you might just want to do a partial recovery by restoring missing tables or rows. With the reach feature set, both solutions would be a great replacement of mysqldump, which is still a very popular method to do the backup. Having mysqldump is also important for partial recovery, where corrupted databases can be corrected by analyzing at the contents of the dump. Binary logs allow us to achieve point-in-time recovery, e.g., up to right before the MySQL server went down. 

This is all for part one, in the next part we are going to test the performance of both solutions and run some real case backup and recovery scenarios. 


by Bart Oles at November 26, 2019 07:30 PM

MariaDB Foundation

MariaDB Foundation Endorses the SaveDotOrg Campaign to Protect the .org Domain

The MariaDB Foundation is proud to put its name behind the SaveDotOrg campaign. We urge the Internet Society (ISOC) to cancel the sale of the Public Interest Registry (PIR) to Ethos Capital. […]

The post MariaDB Foundation Endorses the SaveDotOrg Campaign to Protect the .org Domain appeared first on

by Ian Gilfillan at November 26, 2019 12:19 PM

November 25, 2019


Top Ten Reasons to Migrate from Oracle to PostgreSQL

Oracle Relational Database Management System (RDBMS) has been widely used by large organizations and is considered by far to be the most advanced database technology available in the market. It’s typically the most often compared RDBMS with other database products serving as the standard “de-facto” of what a product should offer. It is ranked by as the #1 RDBMS available in the market today.

PostgreSQL is ranked as the #4 RDBMS, but that doesn’t mean there aren't any advantages to migrating to  PostgreSQL. PostgreSQL has been around since 1989 it open-sourced in 1996. PostgreSQL won DBMS of the year on two consecutive years from 2017 and 2018. That just indicates there's no signs of stopping from attracting large number of users and big organizations. 

One of the reasons why PostgreSQL has attracted a lot of attention is because people are looking for an alternative to Oracle so they can cut off the organizations high costs and escape vendor lock-in. 

Moving from a working and productive Oracle Database can be a daunting task. Concerns such as the company's TCO (Total Cost of Ownership) is one of the reasons why companies drag their decision whether or not to ditch Oracle. 

In this blog we will take a look at some of the main reasons why companies are choosing to leave Oracle and migrate to PostgreSQL.

Reason One: It’s a True Open Source Project

PostgreSQL is open-source and is released under the PostgreSQL License, a liberal Open Source license, similar to the BSD or MIT licenses. Acquiring the product and support requires no fee. 

If you want to leverage the database software, it means that you can get all the available features of PostgreSQL database for free. PostgreSQL has been more than 30 years old of maturity in the database world and has been touch based as open-source since 1996. It has enjoyed decades developers working to create extensions. That, in itself, makes developers, institutions, and organizations choose PostgreSQL for enterprise applications; powering leading business and mobile applications.

Once again, organizations are waking up to the realization that open source database solutions like Postgres offer greater capacity, flexibility, and support that isn’t entirely dependent on any one company or developer. Postgres, like Linux before it, has been (and continues to be) engineered by dedicated users solving day-to-day business problems who choose to return their solutions to the community. Unlike a large developer like Oracle, which may have different motives of developing products that are profitable or support a narrow but lucrative market, the Postgres community is committed to developing the best possible tools for everyday relational database users.

PostgreSQL often carries out those tasks without adding too much complexity. Its design is focused strictly on handling the database without having to waste resources like managing additional IT environments through added features. It's one of the things that consumers of this open-source software like when migrating from Oracle to PostgreSQL. Spending hours to study complex technology about how an Oracle database functions, or how to optimize and tune up might  end up with its expensive support. This lures institutions or organizations to find an alternative that can be less headache on the cost and brings profit and productivity. Please check out our previous blog about how capable does PostgreSQL to match SQL syntax presence with Oracle's syntax.

Reason Two: No License and a Large Community

For users of the Oracle RDBMS platform, it's difficult to find any type of community support that is free or without a hefty fee. Institutions, organizations, and developers often end up finding an alternative information online that can offer answers or solutions to their problems for free. 

When using Oracle, it's difficult to decide on a specific product or whether to go with Product Support because (typically) a lot of money is involved. You might try a specific product to test it, end up buying it, just to realize it can’t help you out. With PostgreSQL, the community is free and full of experts who have extensive experience that are happy to help you out with your current problems.

You can subscribe to the mailing list right here at to start reaching out with the community. Newbies or prodigies of PostgreSQL touch based here to communicate, showcase, and share their solutions, technology, bugs, new findings or even share their emerging software. You may even ask help from IRC chat using and joining to #postgresql channel. You can also reach out to the community through Slack by joining with or  There's a lot of options to take and lots of Open Source organizations that can offer you questions

For more details and information about where to start, go check out here

If you want to go and checkout for Professional Services in PostgreSQL, there's tons of options to choose from. Even checking their website at, you can find a large list of companies there and some of these are at a cheap price. Even here at Severalnines, we do offer also Support for Postgres, which is part of the ClusterControl license or a DBA Consultancy.

Reason Three:  Wide Support for SQL Conformance

PostgreSQL has always been keen to adapt and conform to SQL as a de facto standard for its language. The formal name of the SQL standard is ISO/IEC 9075 “Database Language SQL”.  Any successive revised versions of the standard releases replaces the previous one, so claims of conformance to earlier versions have no official merit. 

Unlike Oracle, some keyword or operators are still present that does not conform the ANSI-standard SQL (Structured Query Language). Example, the OUTER JOIN (+) operator can attribute confusions with other DBA's that have not touched or with the least familiarity to Oracle. PostgreSQL follows the ANSI-SQL standard for JOIN syntax and that leaves an advantage to jump easily and simply with other open-source RDBMS database such as MySQL/Percona/MariaDB databases. 

Another syntax that is very common with Oracle is on using hierarchical queries.  Oracle uses the non-standard START WITH..CONNECT BY syntax, while in SQL:1999, hierarchical queries are implemented by way of recursive common table expressions. For example, the queries below differs its syntax in accordance to hierarchical queries:






    restaurants rs 

START WITH rs.city_name = 'TOKYO'

CONNECT BY PRIOR rs.restaurant_name = rs.city_name;


WITH RECURSIVE tmp AS (SELECT restaurant_name, city_name

                                 FROM restaurants

                                WHERE city_name = 'TOKYO'


                               SELECT m.restaurant_name, m.city_name

                                 FROM restaurants m

                                 JOIN tmp ON tmp.restaurant_name = m.city_name)

                  SELECT restaurant_name, city_name FROM tmp;

PostgreSQL has a very similar approach as the other top open-source RDBMS like MySQL/MariaDB

According to the PostgreSQL manual, PostgreSQL development aims for conformance with the latest official version of the standard where such conformance does not contradict traditional features or common sense. Many of the features required by the SQL standard are supported, though sometimes with slightly differing syntax or function. This is, in fact, what is great with PostgreSQL as it's also supported and collaborated by the different organizations, whether it's small or large. The beauty stays on its SQL language conformity to what has the standard push through.

PostgreSQL development aims for conformance with the latest official version of the standard where such conformance does not contradict traditional features or common sense. Many of the features required by the SQL standard are supported, though sometimes with slightly differing syntax or function. Further moves towards conformance can be expected over time.

Reason Four: Query Parallelism

To be fair, PostgreSQL's Query Parallelism is not as rich when compared to Oracle's parallel execution for SQL statements. Amongst the features that Oracle's parallelism are statement queuing with hints, ability to set the degree of parallelism (DOP), set a parallel degree policy, or adaptive parallelism. 

PostgreSQL has a simple degree of parallelism based on the plans supported, but that does not define that Oracle edges over the open source PostgreSQL. 

PostgreSQL's parallelism has been constantly improving and continuously enhanced by the community. When PostgreSQL 10 was released, it added more appeal to the public especially the improvements on parallelism support for merge join, bitmap heap scan, index scan and index-only scan, gather merge, etc. Improvements also adds statistics to pg_stat_activity.

In PostgreSQL versions < 10, parallelism is disabled by default which you need to set the variable max_parallel_workers_per_gather. 

postgres=# \timing

Timing is on.

postgres=# explain analyze select * from imdb.movies where birthyear >= 1980 and birthyear <=2005;

                                                   QUERY PLAN                                                   


 Seq Scan on movies  (cost=0.00..215677.28 rows=41630 width=68) (actual time=0.013..522.520 rows=84473 loops=1)

   Filter: ((birthyear >= 1980) AND (birthyear <= 2005))

   Rows Removed by Filter: 8241546

 Planning time: 0.039 ms

 Execution time: 525.195 ms

(5 rows)

Time: 525.582 ms

postgres=# \o /dev/null 

postgres=#  select * from imdb.movies where birthyear >= 1980 and birthyear <=2005;

Time: 596.947 ms

Query plan reveals that it's the actual time can go around 522.5 ms then the real query execution time goes around 596.95 ms. Whereas enabling parallelism,

postgres=# set max_parallel_workers_per_gather=2;

Time: 0.247 ms

postgres=# explain analyze select * from imdb.movies where birthyear >= 1980 and birthyear <=2005;

                                                          QUERY PLAN                                                           


 Gather  (cost=1000.00..147987.62 rows=41630 width=68) (actual time=0.172..339.258 rows=84473 loops=1)

   Workers Planned: 2

   Workers Launched: 2

   ->  Parallel Seq Scan on movies  (cost=0.00..142824.62 rows=17346 width=68) (actual time=0.029..264.980 rows=28158 loops=3)

         Filter: ((birthyear >= 1980) AND (birthyear <= 2005))

         Rows Removed by Filter: 2747182

 Planning time: 0.096 ms

 Execution time: 342.735 ms

(8 rows)

Time: 343.142 ms

postgres=# \o /dev/null

postgres=#  select * from imdb.movies where birthyear >= 1980 and birthyear <=2005;

Time: 346.020 ms

The query plan determines that the query needs to use parallelism and then it does use a Gather node. It's actual time estimates to 339ms with 2 works and estimates to 264ms before it has been aggregated by the query plan. Now, the real execution time of the query took 346ms, which is very near to the estimated actual time from the query plan. 

This just illustrates how fast and beneficial it is with PostgreSQL. Although PostgreSQL has its own limits when parallelism can occur or when query plan determine it's faster than to use parallelism, it does not make its feature a huge difference than Oracle. PostgreSQL's parallelism is flexible and can be enabled or utilized correctly as long as your query matches the sequence required for query parallelism.

Reason Five: Advanced JSON Support and is Always Improving

JSON support in PostgreSQL is always on par compared to the other open source RDBMS. Take a look at this external blog from LiveJournal where PostgreSQL's JSON support reveals to be always more advanced when compared to the other RDBMS. PostgreSQL has a large number of JSON functions and features.

The JSON data-type was introduced in PostgreSQL-9.2. Since then, it has a lot of significant enhancements  and amongst the major addition came-up in PostgreSQL-9.4 with the addition of JSONB data-type. PostgreSQL offers two data types for storing JSON data: json and jsonb. With jsonb, it is an advanced version of JSON data-type which stores the JSON data in binary format. This is the major enhancement which made a big difference to the way JSON data was searched and processed in PostgreSQL.

Oracle has extensive support of JSON as well. In contrast, PostgreSQL has extensive support as well as functions that can be used for data retrieval, data formatting, or conditional operations that affects the output of the data or even the data stored in the database. Data stored with jsonb data type has a greater advantage with the ability to use GIN (Generalized Inverted Index) which can be used to efficiently search for keys or key/value pairs occurring within a large number of jsonb documents.

PostgreSQL has additional extensions that are helpful to implement TRANSFORM FOR TYPE for the jsonb type to its supported procedure languages. These extensions are jsonb_plperl and jsonb_plperlu for PL/Perl. Whereas for PL/Python, these are jsonb_plpythonu, jsonb_plpython2u, and jsonb_plpython3u. For example, using jsonb values to map Perl arrays, you can use jsonb_plperl or jsonb_plperlu extensions.

ArangoDB had posted a benchmark comparing PostgreSQL's JSON performance along with other JSON-support databases. Although it's an old blog, still it showcases how PostgreSQL's JSON performs compared to other databases where JSON is it's core feature in their database kernel. This just makes PostgreSQL has its own advantage even with its side feature.

Reason Six: DBaaS Support By Major Cloud Vendors

PostgreSQL has been supported widely as a DBaaS. These services are coming from Amazon, Microsoft's with its Azure Database for PostgreSQL, and Google's Cloud SQL for PostgreSQL

In comparison Oracle, is only available on Amazon RDS for Oracle. The services offered by the major players start at an affordable price and are very flexible to setup in accordance to your needs. This helps institutions and organizations to setup accordingly and relieve from its large cost tied up on the Oracle platform.

Reason Seven:  Better Handling of Massive Amounts of Data

PostgreSQL RDBMS are not designed to handle analytical and data warehousing workloads. PostgreSQL is a row-oriented database, but it has the capability to store large amount of data. PostgreSQL has the following limits for dealing with data store:



Maximum Database Size


Maximum Table Size

32 TB

Maximum Row Size

1.6 TB

Maximum Field Size

1 GB

Maximum Rows per Table


Maximum Columns per Table

250-1600 depending on column types

Maximum Indexes per Table


The major benefit with PostgreSQL is that, there have been plugins that can be incorporated to handle large amounts of data. TimeScaleDB and CitusData's cstore_fdw are one of the plugins that you can incorporate for time series database, storing large data from mobile applications, or data from your IoT applications, or data analytics or data warehousing. In fact, ClusterControl offers support for TimeScaleDB which made simple yet easy to deploy.

If you want to use the core features of PostgreSQL, you may store large amount of data using jsonb. For example, a large amount of documents (PDF, Word, Spreadsheets) and store this using jsonb data type. For geolocation applications and systems, you can use PostGIS.

Reason Eight: Scalability, High-Availability, Redundancy/Geo-Redundancy, and Fault-Tolerant Solutions on the Cheap

Oracle offers similar, but powerful, solutions such as Oracle Grid, Oracle Real Application Clusters (RAC), Oracle Clusterware, and Oracle Data Guard to name a few. These technologies can add to your increasing costs and are unpredictably expensive to deploy and make stable. It's hard to ditch these solutions. Training and skills must be enhanced and develop the people involved on the deployment and implementation process. 

PostgreSQL has massive support and that has a lot of options to choose from. PostgreSQL includes streaming and logical replication built-in to the core package of the software. You may also able to setup a synchronous replication for PostgreSQL to have more high-availability cluster, while making a stand by node process your read queries. For high availability, we suggest you read our blog Top PG Clustering High Availability (HA) Solutions for PostgreSQL and that covers a lot of great tools and technology to choose from. 

There are enterprise features as well that offers high-availability, monitoring, and backup solutions. ClusterControl is one of this technology and offers at an affordable price compared to Oracle Solutions.

Reason Nine:  Support for Several Procedural Languages: PL/pgSQL, PL/Tcl, PL/Perl, and PL/Python.

Since version 9.4, PostgreSQL has a great feature where you can define a new procedural language in accordance to your choice. Although not all variety of programming languages are supported, but it has a number of languages that are supported. Currently, with base distribution, it includes PL/pgSQL, PL/Tcl, PL/Perl, and PL/Python. The external languages are:











Unix shell




The great thing about this is that, unlike Oracle, developers that have jump off newly to PostgreSQL can quickly provide business logic to their application systems without further taking time to learn about PL/SQL. PostgreSQL makes the environment for developers easier and efficient. This nature of PostgreSQL contributes to the reason why developers loves PostgreSQL and starts to shift away on enterprise platform solutions to the open source environment.

Reason Ten:  Flexible Indexes for Large and Textual Data (GIN, GiST, SP-GiST, and BRIN)

PostgreSQL has a huge advantage when it comes to the support of indexes which are beneficial to handling large data. Oracle has a lot of index types that are beneficial for handling large data sets as well, especially for full text indexing. But for PostgreSQL, these types of indexes are made to be flexible according to your purpose. For example, these types of indexes are applicable for large data:

GIN - (Generalized Inverted Indexes) 

This type of index is applicable for jsonb, hstore, range, and arrays data type columns. It is useful when you have data types that contain multiple values in a single column. According to the PostgreSQL docs, “GIN is designed for handling cases where the items to be indexed are composite values, and the queries to be handled by the index need to search for element values that appear within the composite items. For example, the items could be documents, and the queries could be searches for documents containing specific words.”

GiST - (Generalized Search Tree)

A height-balanced search tree that consists of node pages. The nodes consist of index rows. Each row of a leaf node (leaf row), in general, contains some predicate (boolean expression) and a reference to a table row (TID). GiST indexes are best if you use this for geometrical data type like, you want to see if two polygons contained some point. In one case a specific point may be contained within box, while another point only exists within one polygon. The most common datatypes where you want to leverage GiST indexes are geometry types and text when dealing with full-text search

In choosing which index type to use, GiST or GIN, consider these performance differences:

  • GIN index lookups are about three times faster than GiST
  • GIN indexes take about three times longer to build than GiST
  • GIN indexes are moderately slower to update than GiST indexes, but about 10 times slower if fast-update support was disabled
  • GIN indexes are two-to-three times larger than GiST indexes

As a rule of thumb, GIN indexes are best for static data because lookups are faster. For dynamic data, GiST indexes are faster to update.

SP-GiST - (Space Partitioned GiST) 

For larger datasets with natural but uneven clustering. This type of index leverage space partitioning trees. SP-GiST indexes are most useful when your data has a natural clustering element to it, and is also not an equally balanced tree. A great example of this is phone numbers, for example in the US, they use the following format:

  • 3 digits for area code
  • 3 digits for prefix (historically related to a phone carrier’s switch)
  • 4 digits for line number

This means that you have some natural clustering around the first set of 3 digits, around the second set of 3 digits, then numbers may fan out in a more even distribution. But, with phone numbers some area codes have a much higher saturation than others. The result may be that the tree is very unbalanced. Because of that natural clustering up front and the unequal distribution of data–data like phone numbers could make a good case for SP-GiST.

BRIN - (Block Range Index) 

For really large datasets that line up sequentially. A block range is a group of pages adjacent to each other, where summary information about all those pages is stored in Index. Block range indexes can focus on some similar use cases to SP-GiST in that they’re best when there is some natural ordering to the data, and the data tends to be very large. Have a billion record table especially if it’s time series data? BRIN may be able to help. If you’re querying against a large set of data that is naturally grouped together such as data for several zip codes (which then roll up to some city) BRIN helps to ensure that similar zip codes are located near each other on disk.

When you have very large datasets that are ordered such as dates or zip codes BRIN indexes allow you to skip or exclude a lot of the unnecessary data very quickly. BRIN additionally are maintained as smaller indexes relative to the overall data size making them a big win for when you have a large dataset.


PostgreSQL has some major advantages when competing against Oracle's enterprise platform and business solutions. It's definitely easy to hail PostgreSQL as your go-to choice of open source RDBMS as it is nearly powerful as Oracle. 

Oracle is hard to beat (and that is a hard truth to accept) and it's also not easy to ditch the tech-giant’s enterprise platform. When systems provide you power and productive results, that could be a dilemma.

Sometimes though there are situations where a decision has to be made as continued over-investing in your platform cost can outweigh the cost of your other business layers and priorities which can affect progress. 

PostgreSQL and its underlying platform solutions can be of choice to help you cut down the cost, relieve your budgetary problems; all with moderate to small changes.

by Paul Namuag at November 25, 2019 08:06 PM

November 22, 2019


An Overview of VACUUM Processing in PostgreSQL

PostgreSQL does not use IN-PLACE update mechanism, so as per the way DELETE and UPDATE command is designed,

  • Whenever DELETE operations are performed, it marks the existing tuple as DEAD instead of physically removing those tuples.
  • Similarly, whenever UPDATE operation is performed, it marks the corresponding existing tuple as DEAD and inserts a new tuple (i.e. UPDATE operation = DELETE + INSERT).

So each DELETE and UPDATE command will result in one DEAD tuple, which is never going to be used (unless there are parallel transactions). These dead tuples will lead to unnecessary extra space usage even though the same or less number of effective records. This is also called space bloating in PostgreSQL. Since PostgreSQL is widely used as OLTP kind of relational database system, where there are frequent INSERT, UPDATE and DELETE operations carried out, there will be many DEAD tuples and hence corresponding consequences. So PostgreSQL required a strong maintenance mechanism to deal with these DEAD tuples. VACUUM is the maintenance process which takes care of dealing with DEAD tuple along with a few more activities useful for optimizing VACUUM operation. Let’s understand some terminology to be used later in this blog.

Visibility Map

As the name implies, it maintains visibility info about pages containing only tuples that are known to be visible to all active transactions. For each page, one bit is used. If the bit is set to 1 means all tuples of the corresponding page are visible. The bit set to 0 means there is no free space on the given page and tuples can be visible to all transactions.

Visibility map is maintained for each relation (table and index) and gets associated alongside main relations i.e. if the relation file node name is 12345, then the visibility file gets stored in the parallel file 12345_vm.

Free Space Map

It maintains free space info containing details about the available space in the relation. This is also stored in the file parallel to relation main file i.e. if the relation file node name is 12345, then the free space map file gets stored in the parallel file 12345_fsm.

Freeze Tuple

PostgreSQL uses 4 bytes for storing transaction id, which means a maximum of 2 billion transactions can be generated before it wraps around. Now consider still at this time some tuple contains initial transaction id say 100, then for the new transaction (which uses the wrapped around transaction) say 5, transaction id 100 will look into future and it won’t be able to see the data added/modified by it even though it was actually in the past. In order to avoid this special transaction id FrozenTransactionId (equal to 2) is assigned. This special transaction id is always considered to be in the past and will be visible to all transactions.


The VACUUM primary job is to reclaim storage space occupied by DEAD tuples. Reclaimed storage space is not given back to the operating system rather they are just defragmented within the same page, so they are just available to be re-used by future data insertion within the same table. While VACUUM operation going on a particular table, concurrently other READ/WRITE operation can be done on the same table as exclusive lock is not taken on the particular table. In-case a table name is not specified, VACUUM will be performed on all tables of the database. The VACUUM operation performs below a series of operation within a ShareUpdateExclusive lock:

  • Scan all pages of all tables (or specified table) of the database to get all dead tuples.
  • Freeze old tuples if required.
  • Remove the index tuple pointing to the respective DEAD tuples.
  • Remove the DEAD tuples of a page corresponding to a specific table and reallocate the live tuples in the page.
  • Update Free space Map (FSM) and Visibility Map (VM).
  • Truncate the last page if possible (if there were DEAD tuples which got freed).
  • Update all corresponding system tables.

As we can see from the above steps of work for VACUUM, it is clear that it is a very costly operation as it needs to process all pages of the relation. So it is very much needed to skip possible pages which do not require to be vacuumed. Since Visibility map (VM) gives information of the page where if there is no free space, it can be assumed that the corresponding page vacuum is not required and hence this page can be safely skipped.

Since VACUUM anyway traverse through all pages and their all tuples, so it takes the opportunity to do other important task of freezing the qualifying tuples.


As discussed in the previous section, even though VACUUM removes all DEAD tuples and defragment the page for future use, it does not help in reducing the overall storage of the table as space is actually not released to the operating system. Suppose a table tbl1 that the total storage has reached 1.5GB and out of this 1GB occupied by dead tuple, then after VACUUM another approximately 1GB will be available for further tuple insertion but still, the total storage will remain as 1.5GB.

Full VACUUM solves this problem by actually freeing space and returning it back to the operating system. But this comes at a cost. Unlike VACUUM, FULL VACUUM does not allow parallel operation as it takes an exclusive lock on the relation getting FULL VACUUMed. Below are the steps:

  • Takes exclusive lock on the relation.
  • Create a parallel empty storage file.
  • Copy all live tuples from current storage to newly allocated storage.
  • Then free up the original storage.
  • Free up the lock.

So as it is clear from steps also, it will have storage only required for the remaining data.


Instead of doing VACUUM manually, PostgreSQL supports a demon which does automatically trigger VACUUM periodically. Every time VACUUM wakes up (by default 1 minute) it invokes multiple works (depending on configuration autovacuum_worker processes).

Auto-vacuum workers do VACUUM processes concurrently for the respective designated tables. Since VACUUM does not take any exclusive lock on tables, it does not (or minimal) impact other database work.

The configuration of Auto-VACUUM should be done based on the usage pattern of the database. It should not be too frequent (as it will waste worker wake-up as there may not be or too little dead tuples) or too much delayed (it will cause a lot of dead tuples together and hence table bloat).


Ideally, database application should be designed in a way that there is no need for the FULL VACUUM. As explained above, FULL VACUUM recreates storage space and put back the data, so if there are only less dead tuples, then immediately storage space will be recreated to put back all original data. Also since FULL VACUUM takes exclusive lock on the table, it blocks all operations on the corresponding table. So doing FULL VACUUM sometimes can slow down the overall database.

In summary Full VACUUM should be avoided unless it is known that the majority of storage space is because of dead tuples. PostgreSQL extension pg_freespacemap can be used to get a fair hint about free space.

Let’s see an example of the explained VACUUM process.

First, let’s create a table demo1:

postgres=# create table demo1(id int, id2 int);


And insert some data there:

postgres=# insert into demo1 values(generate_series(1,10000), generate_series(1,


INSERT 0 10000

postgres=# SELECT count(*) as npages, round(100 * avg(avail)/8192 ,2) as average_freespace_ratio FROM pg_freespace('demo1');

 npages | average_freespace_ratio


  45 |                0.00

(1 row)

Now, let’s delete data:

postgres=# delete from demo1 where id%2=0;


And run a manual vacuum:

postgres=# vacuum demo1;


postgres=# SELECT count(*) as npages, round(100 * avg(avail)/8192 ,2) as average_freespace_ratio FROM pg_freespace('demo1');

 npages | average_freespace_ratio


  45 |               45.07

(1 row)

This freespace is now available to be reused by PostgreSQL, but if you want to release that space to the operating system, run:

postgres=# vacuum full demo1;


postgres=# SELECT count(*) as npages, round(100 * avg(avail)/8192 ,2) as average_freespace_ratio FROM pg_freespace('demo1');

 npages | average_freespace_ratio


  23 |                0.00

(1 row)


And this was a short example of how the VACUUM process works. Luckily, thanks to the auto vacuum process, most of the time and in a common PostgreSQL environment, you don’t need to think about this because it’s managed by the engine itself.

by Kumar Rajeev Rastogi at November 22, 2019 04:33 PM

November 21, 2019


MySQL InnoDB Cluster 8.0 - A Complete Operation Walk-through: Part Two

In the first part of this blog, we covered a deployment walkthrough of MySQL InnoDB Cluster with an example on how the applications can connect to the cluster via a dedicated read/write port.

In this operation walkthrough, we are going to show examples on how to monitor, manage and scale the InnoDB Cluster as part of the ongoing cluster maintenance operations. We’ll use the same cluster what we deployed in the first part of the blog. The following diagram shows our architecture:

We have a three-node MySQL Group Replication and one application server running with MySQL router. All servers are running on Ubuntu 18.04 Bionic.

MySQL InnoDB Cluster Command Options

Before we move further with some examples and explanations, it's good to know that you can get an explanation of each function in MySQL cluster for cluster component by using the help() function, as shown below:

$ mysqlsh
MySQL|localhost:3306 ssl|JS> shell.connect("clusteradmin@db1:3306");
MySQL|db1:3306 ssl|JS> cluster = dba.getCluster();
MySQL|db1:3306 ssl|JS>

The following list shows the available functions on MySQL Shell 8.0.18, for MySQL Community Server 8.0.18:

  • addInstance(instance[, options])- Adds an Instance to the cluster.
  • checkInstanceState(instance)- Verifies the instance gtid state in relation to the cluster.
  • describe()- Describe the structure of the cluster.
  • disconnect()- Disconnects all internal sessions used by the cluster object.
  • dissolve([options])- Deactivates replication and unregisters the ReplicaSets from the cluster.
  • forceQuorumUsingPartitionOf(instance[, password])- Restores the cluster from quorum loss.
  • getName()- Retrieves the name of the cluster.
  • help([member])- Provides help about this class and it's members
  • options([options])- Lists the cluster configuration options.
  • rejoinInstance(instance[, options])- Rejoins an Instance to the cluster.
  • removeInstance(instance[, options])- Removes an Instance from the cluster.
  • rescan([options])- Rescans the cluster.
  • resetRecoveryAccountsPassword(options)- Reset the password of the recovery accounts of the cluster.
  • setInstanceOption(instance, option, value)- Changes the value of a configuration option in a Cluster member.
  • setOption(option, value)- Changes the value of a configuration option for the whole cluster.
  • setPrimaryInstance(instance)- Elects a specific cluster member as the new primary.
  • status([options])- Describe the status of the cluster.
  • switchToMultiPrimaryMode()- Switches the cluster to multi-primary mode.
  • switchToSinglePrimaryMode([instance])- Switches the cluster to single-primary mode.

We are going to look into most of the functions available to help us monitor, manage and scale the cluster.

Monitoring MySQL InnoDB Cluster Operations

Cluster Status

To check the cluster status, firstly use the MySQL shell command line and then connect as clusteradmin@{one-of-the-db-nodes}:

$ mysqlsh
MySQL|localhost:3306 ssl|JS> shell.connect("clusteradmin@db1:3306");

Then, create an object called "cluster" and declare it as "dba" global object which provides access to InnoDB cluster administration functions using the AdminAPI (check out MySQL Shell API docs):

MySQL|db1:3306 ssl|JS> cluster = dba.getCluster();

Then, we can use the object name to call the API functions for "dba" object:

MySQL|db1:3306 ssl|JS> cluster.status()
    "clusterName": "my_innodb_cluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "db1:3306",
        "ssl": "REQUIRED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "db1:3306": {
                "address": "db1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
            "db2:3306": {
                "address": "db2:3306",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": "00:00:09.061918",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
            "db3:3306": {
                "address": "db3:3306",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": "00:00:09.447804",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
        "topologyMode": "Single-Primary"
    "groupInformationSourceMember": "db1:3306"

The output is pretty long but we can filter it out by using the map structure. For example, if we would like to view the replication lag for db3 only, we could do like the following:

MySQL|db1:3306 ssl|JS> cluster.status().defaultReplicaSet.topology["db3:3306"].replicationLag

Note that replication lag is something that will happen in group replication, depending on the write intensivity of the primary member in the replica set and the group_replication_flow_control_* variables. We are not going to cover this topic in detail here. Check out this blog post to understand further on the group replication performance and flow control.

Another similar function is the describe() function, but this one is a bit more simple. It describes the structure of the cluster including all its information, ReplicaSets and Instances:

MySQL|db1:3306 ssl|JS> cluster.describe()
    "clusterName": "my_innodb_cluster",
    "defaultReplicaSet": {
        "name": "default",
        "topology": [
                "address": "db1:3306",
                "label": "db1:3306",
                "role": "HA"
                "address": "db2:3306",
                "label": "db2:3306",
                "role": "HA"
                "address": "db3:3306",
                "label": "db3:3306",
                "role": "HA"
        "topologyMode": "Single-Primary"

Similarly, we can filter the JSON output using map structure:

MySQL|db1:3306 ssl|JS> cluster.describe().defaultReplicaSet.topologyMode

When the primary node went down (in this case, is db1), the output returned the following:

MySQL|db1:3306 ssl|JS> cluster.status()
    "clusterName": "my_innodb_cluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "db2:3306",
        "ssl": "REQUIRED",
        "status": "OK_NO_TOLERANCE",
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
        "topology": {
            "db1:3306": {
                "address": "db1:3306",
                "mode": "n/a",
                "readReplicas": {},
                "role": "HA",
                "shellConnectError": "MySQL Error 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 104",
                "status": "(MISSING)"
            "db2:3306": {
                "address": "db2:3306",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
            "db3:3306": {
                "address": "db3:3306",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
        "topologyMode": "Single-Primary"
    "groupInformationSourceMember": "db2:3306"

Pay attention to the status OK_NO_TOLERANCE, where the cluster is still up and running but it can't tolerate any more failure after one over three node is not available. The primary role has been taken over by db2 automatically, and the database connections from the application will be rerouted to the correct node if they connect through MySQL Router. Once db1 comes back online, we should see the following status:

MySQL|db1:3306 ssl|JS> cluster.status()
    "clusterName": "my_innodb_cluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "db2:3306",
        "ssl": "REQUIRED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "db1:3306": {
                "address": "db1:3306",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
            "db2:3306": {
                "address": "db2:3306",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
            "db3:3306": {
                "address": "db3:3306",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
        "topologyMode": "Single-Primary"
    "groupInformationSourceMember": "db2:3306"

It shows that db1 is now available but served as secondary with read-only enabled. The primary role is still assigned to db2 until something goes wrong to the node, where it will be automatically failed over to the next available node.

Check Instance State

We can check the state of a MySQL node before planning to add it into the cluster by using the checkInstanceState() function. It analyzes the instance executed GTIDs with the executed/purged GTIDs on the cluster to determine if the instance is valid for the cluster.

The following shows instance state of db3 when it was in standalone mode, before part of the cluster:

MySQL|db1:3306 ssl|JS> cluster.checkInstanceState("db3:3306")
Cluster.checkInstanceState: The instance 'db3:3306' is a standalone instance but is part of a different InnoDB Cluster (metadata exists, instance does not belong to that metadata, and Group Replication is not active).

If the node is already part of the cluster, you should get the following:

MySQL|db1:3306 ssl|JS> cluster.checkInstanceState("db3:3306")
Cluster.checkInstanceState: The instance 'db3:3306' already belongs to the ReplicaSet: 'default'.

Monitor Any "Queryable" State

With MySQL Shell, we can now use the built-in \show and \watch command to monitor any administrative query in real-time. For example, we can get the real-time value of threads connected by using:

MySQL|db1:3306 ssl|JS> \show query SHOW STATUS LIKE '%thread%';

Or get the current MySQL processlist:

MySQL|db1:3306 ssl|JS> \show query SHOW FULL PROCESSLIST

We can then use \watch command to run a report in the same way as the \show command, but it refreshes the results at regular intervals until you cancel the command using Ctrl + C. As shown in the following examples:

MySQL|db1:3306 ssl|JS> \watch query SHOW STATUS LIKE '%thread%';
MySQL|db1:3306 ssl|JS> \watch query --interval=1 SHOW FULL PROCESSLIST

The default refresh interval is 2 seconds. You can change the value by using the --interval flag and specified a value from 0.1 up to 86400.

MySQL InnoDB Cluster Management Operations

Primary Switchover

Primary instance is the node that can be considered as the leader in a replication group, that has the ability to perform read and write operations. Only one primary instance per cluster is allowed in single-primary topology mode. This topology is also known as replica set and is the recommended topology mode for Group Replication with protection against locking conflicts.

To perform primary instance switchover, login to one of the database nodes as the clusteradmin user and specify the database node that you want to promote by using the setPrimaryInstance() function:

MySQL|db1:3306 ssl|JS> shell.connect("clusteradmin@db1:3306");
MySQL|db1:3306 ssl|JS> cluster.setPrimaryInstance("db1:3306");
Setting instance 'db1:3306' as the primary instance of cluster 'my_innodb_cluster'...

Instance 'db2:3306' was switched from PRIMARY to SECONDARY.
Instance 'db3:3306' remains SECONDARY.
Instance 'db1:3306' was switched from SECONDARY to PRIMARY.

WARNING: The cluster internal session is not the primary member anymore. For cluster management operations please obtain a fresh cluster handle using <Dba>.getCluster().

The instance 'db1:3306' was successfully elected as primary.

We just promoted db1 as the new primary component, replacing db2 while db3 remains as the secondary node.

Shutting Down the Cluster

The best way to shut down the cluster gracefully by stopping the MySQL Router service first (if it's running) on the application server:

$ myrouter/

The above step provides cluster protection against accidental writes by the applications. Then shutdown one database node at a time using the standard MySQL stop command, or perform system shutdown as you wish:

$ systemctl stop mysql

Starting the Cluster After a Shutdown

If your cluster suffers from a complete outage or you want to start the cluster after a clean shutdown, you can ensure it is reconfigured correctly using dba.rebootClusterFromCompleteOutage() function. It simply brings a cluster back ONLINE when all members are OFFLINE. In the event that a cluster has completely stopped, the instances must be started and only then can the cluster be started.

Thus, ensure all MySQL servers are started and running. On every database node, see if the mysqld process is running:

$ ps -ef | grep -i mysql

Then, pick one database server to be the primary node and connect to it via MySQL shell:

MySQL|JS> shell.connect("clusteradmin@db1:3306");

Run the following command from that host to start them up:

MySQL|db1:3306 ssl|JS> cluster = dba.rebootClusterFromCompleteOutage()

You will be presented with the following questions:

After the above completes, you can verify the cluster status:

MySQL|db1:3306 ssl|JS> cluster.status()

At this point, db1 is the primary node and the writer. The rest will be the secondary members. If you would like to start the cluster with db2 or db3 as the primary, you could use the shell.connect() function to connect to the corresponding node and perform the rebootClusterFromCompleteOutage() from that particular node.

You can then start the MySQL Router service (if it's not started) and let the application connect to the cluster again.

Setting Member and Cluster Options

To get the cluster-wide options, simply run:

MySQL|db1:3306 ssl|JS> cluster.options()

The above will list out the global options for the replica set and also individual options per member in the cluster. This function changes an InnoDB Cluster configuration option in all members of the cluster. The supported options are:

  • clusterName: string value to define the cluster name.
  • exitStateAction: string value indicating the group replication exit state action.
  • memberWeight: integer value with a percentage weight for automatic primary election on failover.
  • failoverConsistency: string value indicating the consistency guarantees that the cluster provides.
  • consistency: string value indicating the consistency guarantees that the cluster provides.
  • expelTimeout: integer value to define the time period in seconds that cluster members should wait for a non-responding member before evicting it from the cluster.
  • autoRejoinTries: integer value to define the number of times an instance will attempt to rejoin the cluster after being expelled.
  • disableClone: boolean value used to disable the clone usage on the cluster.

Similar to other function, the output can be filtered in map structure. The following command will only list out the options for db2:

MySQL|db1:3306 ssl|JS> cluster.options().defaultReplicaSet.topology["db2:3306"]

You can also get the above list by using the help() function:

MySQL|db1:3306 ssl|JS>"setOption")

The following command shows an example to set an option called memberWeight to 60 (from 50) on all members:

MySQL|db1:3306 ssl|JS> cluster.setOption("memberWeight", 60)
Setting the value of 'memberWeight' to '60' in all ReplicaSet members ...

Successfully set the value of 'memberWeight' to '60' in the 'default' ReplicaSet.

We can also perform configuration management automatically via MySQL Shell by using setInstanceOption() function and pass the database host, the option name and value accordingly:

MySQL|db1:3306 ssl|JS> cluster = dba.getCluster()
MySQL|db1:3306 ssl|JS> cluster.setInstanceOption("db1:3306", "memberWeight", 90)

The supported options are:

  • exitStateActionstring value indicating the group replication exit state action.
  • memberWeight: integer value with a percentage weight for automatic primary election on failover.
  • autoRejoinTries: integer value to define the number of times an instance will attempt to rejoin the cluster after being expelled.
  • label a string identifier of the instance.

Switching to Multi-Primary/Single-Primary Mode

By default, InnoDB Cluster is configured with single-primary, only one member capable of performing reads and writes at one given time. This is the safest and recommended way to run the cluster and suitable for most workloads. 

However, if the application logic can handle distributed writes, it's probably a good idea to switch to multi-primary mode, where all members in the cluster are able to process reads and writes at the same time. To switch from single-primary to multi-primary mode, simply use the switchToMultiPrimaryMode() function:

MySQL|db1:3306 ssl|JS> cluster.switchToMultiPrimaryMode()
Switching cluster 'my_innodb_cluster' to Multi-Primary mode...

Instance 'db2:3306' was switched from SECONDARY to PRIMARY.
Instance 'db3:3306' was switched from SECONDARY to PRIMARY.
Instance 'db1:3306' remains PRIMARY.

The cluster successfully switched to Multi-Primary mode.

Verify with:

MySQL|db1:3306 ssl|JS> cluster.status()
    "clusterName": "my_innodb_cluster",
    "defaultReplicaSet": {
        "name": "default",
        "ssl": "REQUIRED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "db1:3306": {
                "address": "db1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
            "db2:3306": {
                "address": "db2:3306",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
            "db3:3306": {
                "address": "db3:3306",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
        "topologyMode": "Multi-Primary"
    "groupInformationSourceMember": "db1:3306"

In multi-primary mode, all nodes are primary and able to process reads and writes. When sending a new connection via MySQL Router on single-writer port (6446), the connection will be sent to only one node, as in this example, db1:

(app-server)$ for i in {1..3}; do mysql -usbtest -p -h192.168.10.40 -P6446 -e 'select @@hostname, @@read_only, @@super_read_only'; done

| @@hostname | @@read_only | @@super_read_only |
| db1        | 0           | 0                 |

| @@hostname | @@read_only | @@super_read_only |
| db1        | 0           | 0                 |

| @@hostname | @@read_only | @@super_read_only |
| db1        | 0           | 0                 |

If the application connects to the multi-writer port (6447), the connection will be load balanced via round robin algorithm to all members:

(app-server)$ for i in {1..3}; do mysql -usbtest -ppassword -h192.168.10.40 -P6447 -e 'select @@hostname, @@read_only, @@super_read_only'; done

| @@hostname | @@read_only | @@super_read_only |
| db2        | 0           | 0                 |

| @@hostname | @@read_only | @@super_read_only |
| db3        | 0           | 0                 |

| @@hostname | @@read_only | @@super_read_only |
| db1        | 0           | 0                 |

As you can see from the output above, all nodes are capable of processing reads and writes with read_only = OFF. You can distribute safe writes to all members by connecting to the multi-writer port (6447), and send the conflicting or heavy writes to the single-writer port (6446).

To switch back to the single-primary mode, use the switchToSinglePrimaryMode() function and specify one member as the primary node. In this example, we chose db1:

MySQL|db1:3306 ssl|JS> cluster.switchToSinglePrimaryMode("db1:3306");

Switching cluster 'my_innodb_cluster' to Single-Primary mode...

Instance 'db2:3306' was switched from PRIMARY to SECONDARY.
Instance 'db3:3306' was switched from PRIMARY to SECONDARY.
Instance 'db1:3306' remains PRIMARY.

WARNING: Existing connections that expected a R/W connection must be disconnected, i.e. instances that became SECONDARY.

The cluster successfully switched to Single-Primary mode.

At this point, db1 is now the primary node configured with read-only disabled and the rest will be configured as secondary with read-only enabled.

MySQL InnoDB Cluster Scaling Operations

Scaling Up (Adding a New DB Node)

When adding a new instance, a node has to be provisioned first before it's allowed to participate with the replication group. The provisioning process will be handled automatically by MySQL. Also, you can check the instance state first whether the node is valid to join the cluster by using checkInstanceState() function as previously explained.

To add a new DB node, use the addInstances() function and specify the host:

MySQL|db1:3306 ssl|JS> cluster.addInstance("db3:3306")

The following is what you would get when adding a new instance:

Verify the new cluster size with:

MySQL|db1:3306 ssl|JS> cluster.status() //or cluster.describe()

MySQL Router will automatically include the added node, db3 into the load balancing set.

Scaling Down (Removing a Node)

To remove a node, connect to any of the DB nodes except the one that we are going to remove and use the removeInstance() function with the database instance name:

MySQL|db1:3306 ssl|JS> shell.connect("clusteradmin@db1:3306");
MySQL|db1:3306 ssl|JS> cluster = dba.getCluster()
MySQL|db1:3306 ssl|JS> cluster.removeInstance("db3:3306")

The following is what you would get when removing an instance:

Verify the new cluster size with:

MySQL|db1:3306 ssl|JS> cluster.status() //or cluster.describe()

MySQL Router will automatically exclude the removed node, db3 from the load balancing set.

Adding a New Replication Slave

We can scale out the InnoDB Cluster with asynchronous replication slave replicates from any of the cluster nodes. A slave is loosely coupled to the cluster, and will be able to handle a heavy load without affecting the performance of the cluster. The slave can also be a live copy of the database for disaster recovery purposes. In multi-primary mode, you can use the slave as the dedicated MySQL read-only processor to scale out the reads workload, perform analytices operation, or as a dedicated backup server.

On the slave server, download the latest APT config package, install it (choose MySQL 8.0 in the configuration wizard), install the APT key, update repolist and install MySQL server.

$ wget
$ dpkg -i mysql-apt-config_0.8.14-1_all.deb
$ apt-key adv --recv-keys --keyserver 5072E1F5
$ apt-get update
$ apt-get -y install mysql-server mysql-shell

Modify the MySQL configuration file to prepare the server for replication slave. Open the configuration file via text editor:

$ vim /etc/mysql/mysql.conf.d/mysqld.cnf

And append the following lines:

server-id = 1044 # must be unique across all nodes
gtid-mode = ON
enforce-gtid-consistency = ON
log-slave-updates = OFF
read-only = ON
super-read-only = ON
expire-logs-days = 7

Restart MySQL server on the slave to apply the changes:

$ systemctl restart mysql

On one of the InnoDB Cluster servers (we chose db3), create a replication slave user and followed by a full MySQL dump:

$ mysql -uroot -p
mysql> CREATE USER 'repl_user'@'' IDENTIFIED BY 'password';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'';
mysql> exit
$ mysqldump -uroot -p --single-transaction --master-data=1 --all-databases --triggers --routines --events > dump.sql

Transfer the dump file from db3 to the slave:

$ scp dump.sql root@slave:~

And perform the restoration on the slave:

$ mysql -uroot -p < dump.sql

With master-data=1, our MySQL dump file will automatically configure the GTID executed and purged value. We can verify it with the following statement on the slave server after the restoration:

$ mysql -uroot -p
mysql> show global variables like '%gtid_%';
| Variable_name                    | Value                                        |
| binlog_gtid_simple_recovery      | ON                                           |
| enforce_gtid_consistency         | ON                                           |
| gtid_executed                    | d4790339-0694-11ea-8fd5-02f67042125d:1-45886 |
| gtid_executed_compression_period | 1000                                         |
| gtid_mode                        | ON                                           |
| gtid_owned                       |                                              |
| gtid_purged                      | d4790339-0694-11ea-8fd5-02f67042125d:1-45886 |

Looks good. We can then configure the replication link and start the replication threads on the slave:


Verify the replication state and ensure the following status return 'Yes':

mysql> show slave status\G
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

At this point, our architecture is now looking like this:


Common Issues with MySQL InnoDB Clusters

Memory Exhaustion

When using MySQL Shell with MySQL 8.0, we were constantly getting the following error when the instances were configured with 1GB of RAM:

Can't create a new thread (errno 11); if you are not out of available memory, you can consult the manual for a possible OS-dependent bug (MySQL Error 1135)

Upgrading each host's RAM to 2GB of RAM solved the problem. Apparently, MySQL 8.0 components require more RAM to operate efficiently.

Lost Connection to MySQL Server

In case the primary node goes down, you would probably see the "lost connection to MySQL server error" when trying to query something on the current session:

MySQL|db1:3306 ssl|JS> cluster.status()
Cluster.status: Lost connection to MySQL server during query (MySQL Error 2013)

MySQL|db1:3306 ssl|JS> cluster.status()
Cluster.status: MySQL server has gone away (MySQL Error 2006)

The solution is to re-declare the object once more:

MySQL|db1:3306 ssl|JS> cluster = dba.getCluster()
MySQL|db1:3306 ssl|JS> cluster.status()

At this point, it will connect to the newly promoted primary node to retrieve the cluster status.

Node Eviction and Expelled

In an event where communication between nodes is interrupted, the problematic node will be evicted from the cluster without any delay, which is not good if you are running on a non-stable network. This is what it looks like on db2 (the problematic node):

2019-11-14T07:07:59.344888Z 0 [ERROR] [MY-011505] [Repl] Plugin group_replication reported: 'Member was expelled from the group due to network failures, changing member status to ERROR.'
2019-11-14T07:07:59.371966Z 0 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.'

Meanwhile from db1, it saw db2 was offline:

2019-11-14T07:07:44.086021Z 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address db2:3306 has become unreachable.'
2019-11-14T07:07:46.087216Z 0 [Warning] [MY-011499] [Repl] Plugin group_replication reported: 'Members removed from the group: db2:3306'

To tolerate a bit of delay on node eviction, we can set a higher timeout value before a node is being expelled from the group. The default value is 0, which means expel immediately. Use the setOption() function to set the expelTimeout value:

Thanks to Frédéric Descamps from Oracle who pointed this out:

Instead of relying on expelTimeout, it's recommended to set the autoRejoinTries option instead. The value represents the number of times an instance will attempt to rejoin the cluster after being expelled. A good number to start is 3, which means, the expelled member will try to rejoin the cluster for 3 times, which after an unsuccessful auto-rejoin attempt, the member waits 5 minutes before the next try.

To set this value cluster-wide, we can use the setOption() function:

MySQL|db1:3306 ssl|JS> cluster.setOption("autoRejoinTries", 3)
WARNING: Each cluster member will only proceed according to its exitStateAction if auto-rejoin fails (i.e. all retry attempts are exhausted).

Setting the value of 'autoRejoinTries' to '3' in all ReplicaSet members ...

Successfully set the value of 'autoRejoinTries' to '3' in the 'default' ReplicaSet.



For MySQL InnoDB Cluster, most of the management and monitoring operations can be performed directly via MySQL Shell (only available from MySQL 5.7.21 and later).

by ashraf at November 21, 2019 06:35 PM

November 20, 2019


PostgreSQL Deployment & Configuration with Puppet

Puppet is open source software for configuration management and deployment. Founded in 2005, it’s multi-platform and even has its own declarative language for configuration.

The tasks related to administration and maintenance of PostgreSQL (or other software really) consists of daily, repetitive processes that require monitoring. This applies even to those tasks operated by scripts or commands through a scheduling tool. The complexity of these tasks increases exponentially when executed on a massive infrastructure, however, using Puppet for these kind of tasks can often solve these types of large scale problems as Puppet centralizes and automates the performance of these operations in a very agile way.

Puppet works within the architecture at the client/server level where the configuration is being performed; these ops are then diffused and executed on all the clients (also known as nodes).

Typically running every 30 minutes, the agents’ node will collect a set of information (type of processor, architecture, IP address, etc..), also called as facts, then sends the information to the master which is waiting for an answer to see if there are any new configurations to apply. 

These facts will allow the master to customize the same configuration for each node.

In a very simplistic way, Puppet is one of the most important DevOps tools available today. In this blog we will take a look at the following...

  • The Use Case for Puppet & PostgreSQL
  • Installing Puppet
  • Configuring & Programming Puppet
  • Configuring Puppet for PostgreSQL 

The installation and setup of Puppet (version 5.3.10) described below were performed in a set of hosts using CentOS 7.0 as operating system.

The Use Case for Puppet & PostgreSQL

Suppose that there is an issue in your firewall on the machines that host all your PostgreSQL servers, it would then be necessary to deny all outbound connections to PostgreSQL, and do it as soon as possible.

Puppet is the perfect tool for this situation, especially because speed and efficiency are essential. We’ll’ talk about this example presented in the section “Configuring Puppet for PostgreSQL” by managing the parameter listen_addresses.

Installing Puppet

There are a set of common steps to perform either on master or agent hosts:

Step One

Updating of /etc/hosts file with host names and their IP address agent master puppet

Step Two

Adding the Puppet repositories on the system

$ sudo rpm –Uvh

For other operating systems or CentOS versions, the most appropriate repository can be found in Puppet, Inc. Yum Repositories.

Step Three

Configuration of NTP (Network Time Protocol) server

$ sudo yum -y install chrony

Step Four

The chrony is used to synchronize the system clock from different NTP servers and thus keeps the time synchronized between master and agent server.

Once installed chrony  it must be enabled and restarted:

$ sudo systemctl enable chronyd.service

$ sudo systemctl restart chronyd.service

Step Five

Disable the SELinux parameter

On the file /etc/sysconfig/selinux the parameter SELINUX (Security-Enhanced Linux)  must be disabled in order do not restricts access on both hosts.


Step Six

Before the Puppet installation (either master or agent) the firewall in these hosts must be defined accordingly:

$ sudo firewall-cmd -–add-service=ntp -–permanent 

$ sudo firewall-cmd –-reload 

Installing the Puppet  Master

Once the package repository puppet5-release-5.0.0-1-el7.noarch.rpm added to the system the puppetserver installation can be done:

$ sudo yum install -y puppetserver

The max memory allocation parameter is an important setting to update on /etc/sysconfig/puppetserver file to 2GB (or to 1GB if the service doesn’t start):

JAVA_ARGS="-Xms2g –Xmx2g "

In the configuration file /etc/puppetlabs/puppet/puppet.conf it’s necessary to add the following parameterization:



certname =

server =

environment = production

runinterval = 1h

The puppetserver service uses the  port 8140 to listen to the node requests, thus it's necessary to ensure that this port will be enabled:

$ sudo firewall-cmd --add-port=8140/tcp --permanent

$ sudo firewall-cmd --reload

Once all settings made in puppet master, it’s time to start this service up:

$ sudo systemctl start puppetserver

$ sudo systemctl enable puppetserver

Installing the Puppet Agent

The Puppet agent in the package repository puppet5-release-5.0.0-1-el7.noarch.rpm is also added to the system, the puppet-agent installation can be performed right away:

$ sudo yum install -y puppet-agent

The puppet-agent configuration file /etc/puppetlabs/puppet/puppet.conf needs also to be updated by adding the following parameter:


certname =

server =

environment = production

runinterval = 1h

The next step consists of registering the agent node on the master host by executing the following command:

$ sudo /opt/puppetlabs/bin/puppet resource service puppet ensure=running enable=true

service { ‘puppet’:

ensure => ‘running’,

enable => ‘true’


At this moment, on the master host, there is a pending request from the puppet agent to sign a certificate:

That must be signed by executing one of the following commands:

$ sudo /opt/puppetlabs/bin/puppet cert sign


$ sudo /opt/puppetlabs/bin/puppet cert sign --all

Finally (and once the puppet master has signed the certificate) it’s time to apply the configurations to the agent by retrieving the catalog from puppet master:

$ sudo /opt/puppetlabs/bin/puppet agent --test

In this command, the parameter --test doesn’t mean a test, the settings retrieved from the master will be applied to the local agent. In order to test/check the configurations from master the following command must  be executed:

$ sudo /opt/puppetlabs/bin/puppet agent --noop

Configuring & Programming Puppet

Puppet uses a declarative programming approach on which the purpose is to specify what to do and doesn't matter the way to achieve it!

The most elementary piece of code on Puppet is the resource that specifies a system property such as command, service, file, directory, user or package.

Below it’s presented the syntax of a resource to create an user:

user { 'admin_postgresql':

  ensure     => present,

  uid        => '1000',

  gid        => '1000',

  home       => '/home/admin/postresql'


Different resources could be joined to the former class (also known as a manifest) of file with “pp” extension (it stands for Puppet Program), nevertheless, several manifests and data (such as facts, files, and templates) will compose a module. All there logical hierarchies and rules are represented in the diagram below:

The purpose of each module is to contain all the needed manifests to execute single tasks in a modular way. On the other hand, the concept of class isn't the same one from object-oriented programming languages, in Puppet, it works as an aggregator of resources.

These files organization has a specific directory structure to follow:

On which the purpose of each folder is the following:




Puppet code


Static files to be copied to nodes


Template files to be copied to managed nodes(it can be customized with variables)


Manifest to show how to use the module

The classes(manifests) can be used by other classes as shown in the example below: the manifest init.pp on dev_accounts are using the manifest groups from the accounts module.
class dev_accounts {

  $rootgroup = $osfamily ? {

    'Debian'  => 'sudo',

    'RedHat'  => 'wheel',

    default   => warning('This distribution is not supported by the Accounts module'),


  include accounts::groups

  user { 'username':

    ensure      => present,

    home        => '/home/admin/postresql',

    shell       => '/bin/bash',

    managehome  => true,

    gid         => 'admin_db',

    groups      => "$rootgroup",

    password    => '$1$7URTNNqb$65ca6wPFDvixURc/MMg7O1'



In the next section, we’ll show you how to generate the contents of the examples folder as well the commands to test and publish each module.

Configuring Puppet for PostgreSQL

Before to present the several configuration examples to deploy and maintain a PostgreSQL database it’s necessary to install the PostgreSQL puppet module (on the server host) to use all of their functionalities:

$ sudo /opt/puppetlabs/bin/puppet module install puppetlabs-postgresql

Currently, thousands of modules ready to use on Puppet are available on the public module repository Puppet Forge.

Step One

Configure and deploy a new PostgreSQL instance. Here is all the necessary programming and configuration to install a new PostgreSQL instance in all nodes.

The first step is to create a new module structure directory as shared previously:

$ cd /etc/puppetlabs/code/environments/production/modules

$ mkdir db_postgresql_admin

$ cd db_postgresql_admin; mkdir{examples,files,manifests,templates}

Then, in the manifest file manifests/init.pp, you need to include the class postgresql::server provided by the installed module :

class db_postgresql_admin{

  include postgresql::server


To check the syntax of the manifest, it's a good practice to execute the following command:

$ sudo /opt/puppetlabs/bin/puppet parser validate init.pp

If nothing is returned, it means that the syntax is correct

To show you how to use this module in the example folder, it’s necessary to create a new manifest file init.pp with the following content:

include db_postgresql_admin

The example location in the module must be tested and applied to the master catalog:

$ sudo /opt/puppetlabs/bin/puppet apply --modulepath=/etc/puppetlabs/code/environments/production/modules --noop init.pp

Finally, it’s necessary to define which module each node has access in the file “/etc/puppetlabs/code/environments/production/manifests/site.pp” :

node ’’,’’{

 include db_postgresql_admin


Or a default configuration for all nodes:

node default {

 include db_postgresql_admin


Usually each 30min the nodes check the master catalog, nevertheless this query can be forced on node side by the following command:

$ /opt/puppetlabs/bin/puppet agent -t

Or if the purpose is to simulate the differences between the master configuration and the current node settings, it could be used the nopp parameter (no operation):

$ /opt/puppetlabs/bin/puppet agent -t --noop

Step Two

Update the PostgreSQL instance to listen all interfaces. The previous installation defines an instance setting in a very restrictive mode: only allows connections on localhost as can be confirmed by the hosts associated for the port 5432 (defined for PostgreSQL):

$ sudo netstat -ntlp|grep 5432

tcp        0 0* LISTEN   3237/postgres       

tcp6       0 0 ::1:5432                :::* LISTEN   3237/postgres       

In order to allow listening all interface, it’s necessary to have the following content in the file /etc/puppetlabs/code/environments/production/modules/db_postgresql_admin/manifests/init.pp

class db_postgresql_admin{


        listen_addresses=>’*’ #listening all interfaces



In the example above there is declared the class postgresql::server and setting the parameter listen_addresses to “*” that means all interfaces.

Now the port 5432 is associated with all interfaces, it can be confirmed with the following IP address/port: “”

$ sudo netstat -ntlp|grep 5432

tcp        0 0  * LISTEN   1232/postgres       

tcp6       0 0 :::5432                 :::* LISTEN   1232/postgres  

To put back the initial setting: only allow database connections from localhost the listen_addresses parameter must be set to “localhost” or specifying a list of hosts, if desired:

listen_addresses = ',,localhost'

To retrieve the new configuration from the master host, only it’s needed to request it on the node:

$ /opt/puppetlabs/bin/puppet agent -t

Step Three

Create a PostgreSQL Database. The PostgreSQL instance can be created with a new database as well as a new user (with password) to use this database and a rule on pg_hab.conf file to allow the database connection for this new user:

class db_postgresql_admin{


        listen_addresses=>’*’ #listening all interfaces



     user => ‘severalnines’,          password=> postgresql_password(‘severalnines’,’passwd12’)


   postgresql::server::pg_hba_rule{‘Authentication for severalnines’:

     Description =>’Open access to severalnines’,

     type => ‘local’,

database => ‘nines_blog_db’,

     user => ‘severalnines’,

address => ‘’

         auth_method => ‘md5’



This last resource has the name of “Authentication for severalnines” and the pg_hba.conf file will have one more additional rule:

# Rule Name: Authentication for severalnines

# Description: Open access for severalnines

# Order: 150

local   nines_blog_db   severalnines    md5

To retrieve the new configuration from the master host, all that is needed is to request it on the node:

$ /opt/puppetlabs/bin/puppet agent -t

Step Four

Create a Read-Only User.  To create a new user, with read only privileges, the following resources need to be added to the previous manifest:

postgresql::server::role{‘Creation of a new role nines_reader’:

createdb   => false,

createrole => false,

superuser => false,     password_hash=> postgresql_password(‘nines_reader’,’passwd13’)


postgresql::server::pg_hba_rule{‘Authentication for nines_reader’:

     description =>’Open access to nines_reader’,

     type => ‘host’,

database => ‘nines_blog_db’,

     user => ‘nines_reader’,

address => ‘’,

         auth_method => ‘md5’


To retrieve the new configuration from the master host, all that is needed is to request it on the node:

$ /opt/puppetlabs/bin/puppet agent -t


In this blog post, we showed you the basic steps to deploy and start configuring your PostgreSQL database through an automatic and customized way on several nodes (which could even be virtual machines).

These types of automation can help you to become more effective then doing it manually and PostgreSQL configuration can easily be performed by using several of the classes available in the puppetforge repository

by Hugo Dias at November 20, 2019 08:17 PM

November 19, 2019


Converting from Asynchronous to Synchronous Replication in PostgreSQL

High Availability is a requirement for just about every company around the world using PostgreSQL It is well known that PostgreSQL uses Streaming Replication as the replication method. PostgreSQL Streaming Replication is asynchronous by default, so it is possible to have some transactions committed in the primary node which have not yet been replicated to the standby server. This means there is the possibility of some potential data loss.

This delay in the commit process is supposed to be very small... if the standby server is powerful enough to keep up with the load. If this small data loss risk is not acceptable in the company, you can also use synchronous replication instead of the default.

In synchronous replication, each commit of a write transaction will wait until the confirmation that the commit has been written to the write-ahead log on disk of both the primary and standby server.

This method minimizes the possibility of data loss. For data loss to occur you would need both the primary and the standby to fail at the same time.

The disadvantage of this method is the same for all synchronous methods as with this method the response time for each write transaction increases. This is due to the need to wait until all the confirmations that the transaction was committed. Luckily, read-only transactions will not be affected by this but; only the write transactions.

In this blog, you show you how to install a PostgreSQL Cluster from scratch, convert the asynchronous replication (default) to a synchronous one. I’ll also show you how to rollback  if the response time is not acceptable as you can easily go back to the previous state. You will see how to deploy, configure, and monitor a PostgreSQL synchronous replication easily using ClusterControl using only one tool for the entire process.

Installing a PostgreSQL Cluster

Let’s start to install and configure an async PostgreSQL replication, that is the usual replication mode used in a PostgreSQL cluster. We will use PostgreSQL 11 on CentOS 7.

PostgreSQL Installation

Following the PostgreSQL official installation guide, this task is pretty simple.

First, install the repository:

$ yum install

Install the PostgreSQL client and server packages:

$ yum install postgresql11 postgresql11-server

Initialize the database:

$ /usr/pgsql-11/bin/postgresql-11-setup initdb

$ systemctl enable postgresql-11

$ systemctl start postgresql-11

On the standby node, you can avoid the last command (start the database service) as you will restore a binary backup to create the streaming replication.

Now, let’s see the configuration required by an asynchronous PostgreSQL replication.

Configuring Asynchronous PostgreSQL Replication

Primary Node Setup

In the PostgreSQL primary node, you must use the following basic configuration to create an Async replication. The files that will be modified are postgresql.conf and pg_hba.conf. In general, they are in the data directory (/var/lib/pgsql/11/data/) but you can confirm it on the database side:

postgres=# SELECT setting FROM pg_settings WHERE name = 'data_directory';




(1 row)


Change or add the following parameters in the postgresql.conf configuration file.

Here you need to add the IP address(es) where to listen on. The default value is 'localhost', and for this example, we’ll use  '*' for all IP addresses in the server.

listen_addresses = '*' 

Set the server port where to listen on. By default 5432. 

port = 5432 

Determine how much information is written to the WALs. The possible values are minimal, replica, or logical. The hot_standby value is mapped to replica and it is used to keep the compatibility with previous versions.

wal_level = hot_standby 

Set the max number of walsender processes, which manage the connection with a standby server.

max_wal_senders = 16

Set the minimum amount of WAL files to be kept in the pg_wal directory.

wal_keep_segments = 32

Changing these parameters requires a database service restart.

$ systemctl restart postgresql-11


Change or add the following parameters in the pg_hba.conf configuration file.

# TYPE  DATABASE        USER ADDRESS                 METHOD

host  replication  replication_user  IP_STANDBY_NODE/32  md5

host  replication  replication_user  IP_PRIMARY_NODE/32  md5

As you can see, here you need to add the user access permission. The first column is the connection type, that can be host or local. Then, you need to specify database (replication), user, source IP Address and authentication method. Changing this file requires a database service reload.

$ systemctl reload postgresql-11

You should add this configuration in both primary and standby nodes, as you will need it if the standby node is promoted to master in case of failure.

Now, you must create a replication user.

Replication Role

The ROLE (user) must have REPLICATION privilege to use it in the streaming replication.



After configuring the corresponding files and the user creation, you need to create a consistent backup from the primary node and restore it on the standby node.

Standby Node Setup

On the standby node, go to the /var/lib/pgsql/11/ directory and move or remove the current datadir:

$ cd /var/lib/pgsql/11/

$ mv data data.bk

Then, run the pg_basebackup command to get the current primary datadir and assign the correct owner (postgres):

$ pg_basebackup -h -D /var/lib/pgsql/11/data/ -P -U replication_user --wal-method=stream

$ chown -R postgres.postgres data

Now, you must use the following basic configuration to create an Async replication. The file that will be modified is postgresql.conf, and you need to create a new recovery.conf file. Both will be located in /var/lib/pgsql/11/.


Specify that this server will be a standby server. If it is on, the server will continue recovering by fetching new WAL segments when the end of archived WAL is reached.

standby_mode = 'on'

Specify a connection string to be used for the standby server to connect to the primary node.

primary_conninfo = 'host=IP_PRIMARY_NODE port=5432 user=replication_user password=PASSWORD'

Specify recovering into a particular timeline. The default is to recover along the same timeline that was current when the base backup was taken. Setting this to “latest” recovers to the latest timeline found in the archive.

recovery_target_timeline = 'latest'

Specify a trigger file whose presence ends recovery in the standby. 

trigger_file = '/tmp/failover_5432.trigger'


Change or add the following parameters in the postgresql.conf configuration file.

Determine how much information is written to the WALs. The possible values are minimal, replica, or logical. The hot_standby value is mapped to replica and it is used to keep the compatibility with previous versions. Changing this value requires a service restart.

wal_level = hot_standby

Allow the queries during recovery. Changing this value requires a service restart.

hot_standby = on

Starting Standby Node

Now you have all the required configuration in place, you just need to start the database service on the standby node.

$  systemctl start postgresql-11

And check the database logs in /var/lib/pgsql/11/data/log/. You should have something like this:

2019-11-18 20:23:57.440 UTC [1131] LOG:  entering standby mode

2019-11-18 20:23:57.447 UTC [1131] LOG:  redo starts at 0/3000028

2019-11-18 20:23:57.449 UTC [1131] LOG:  consistent recovery state reached at 0/30000F8

2019-11-18 20:23:57.449 UTC [1129] LOG:  database system is ready to accept read only connections

2019-11-18 20:23:57.457 UTC [1135] LOG:  started streaming WAL from primary at 0/4000000 on timeline 1

You can also check the replication status in the primary node by running the following query:

postgres=# SELECT pid,usename,application_name,state,sync_state FROM pg_stat_replication;

 pid  | usename      | application_name |   state | sync_state


 1467 | replication_user | walreceiver      | streaming | async

(1 row)

As you can see, we are using an async replication.

Converting Asynchronous PostgreSQL Replication to Synchronous Replication

Now, it’s time to convert this async replication to a sync one, and for this, you will need to configure both the primary and the standby node.

Primary Node

In the PostgreSQL primary node, you must use this basic configuration in addition to the previous async configuration.


Specify a list of standby servers that can support synchronous replication. This standby server name is the application_name setting in the standby’s recovery.conf file.

synchronous_standby_names = 'pgsql_0_node_0'synchronous_standby_names = 'pgsql_0_node_0'

Specifies whether transaction commit will wait for WAL records to be written to disk before the command returns a “success” indication to the client. The valid values are on, remote_apply, remote_write, local, and off. The default value is on.

synchronous_commit = on

Standby Node Setup 

In the PostgreSQL standby node, you need to change the recovery.conf file adding the 'application_name value in the primary_conninfo parameter.


standby_mode = 'on'

primary_conninfo = 'application_name=pgsql_0_node_0 host=IP_PRIMARY_NODE port=5432 user=replication_user password=PASSWORD'

recovery_target_timeline = 'latest'

trigger_file = '/tmp/failover_5432.trigger'

Restart the database service in both the primary and in the standby nodes:

$ service postgresql-11 restart

Now, you should have your sync streaming replication up and running:

postgres=# SELECT pid,usename,application_name,state,sync_state FROM pg_stat_replication;

 pid  | usename      | application_name |   state | sync_state


 1561 | replication_user | pgsql_0_node_0   | streaming | sync

(1 row)

Rollback from Synchronous to Asynchronous PostgreSQL Replication

If you need to go back to asynchronous PostgreSQL replication, you just need to rollback the changes performed in the postgresql.conf file on the primary node:


#synchronous_standby_names = 'pgsql_0_node_0'

#synchronous_commit = on

And restart the database service.

$ service postgresql-11 restart

So now, you should have asynchronous replication again.

postgres=# SELECT pid,usename,application_name,state,sync_state FROM pg_stat_replication;

 pid  | usename      | application_name |   state | sync_state


 1625 | replication_user | pgsql_0_node_0   | streaming | async

(1 row)

How to Deploy a PostgreSQL Synchronous Replication Using ClusterControl

With ClusterControl you can perform the deployment, configuration, and monitoring tasks all-in-one from the same job and you will be able to manage it from the same UI.

We will assume that you have ClusterControl installed and it can access the database nodes via SSH. For more information about how to configure the ClusterControl access please refer to our official documentation.

Go to ClusterControl and use the “Deploy” option to create a new PostgreSQL cluster.

When selecting PostgreSQL, you must specify User, Key, or Password and a port to connect by SSH to our servers. You also need a name for your new cluster and if you want ClusterControl to install the corresponding software and configurations for you.

After setting up the SSH access information, you must enter the data to access your database. You can also specify which repository to use.

In the next step, you need to add your servers to the cluster that you are going to create. When adding your servers, you can enter IP or hostname. 

And finally, in the last step, you can choose the replication method, which can be asynchronous or synchronous replication.

That’s it. You can monitor the job status in the ClusterControl activity section.

And when this job finishes, you will have your PostgreSQL synchronous cluster installed, configured and monitored by ClusterControl.


As we mentioned at the beginning of this blog, High Availability is a requirement for all companies, so you should know the available options to achieve it for each technology in use. For PostgreSQL, you can use synchronous streaming replication as the safest way to implement it, but this method doesn’t work for all environments and workloads. 

Be careful with the latency generated by waiting for the confirmation of each transaction that could be a problem instead of a High Availability solution.


by Sebastian Insausti at November 19, 2019 08:11 PM

MariaDB Foundation

2019 MariaDB Developers Unconference Shanghai Presentations

The 2019 Shanghai MariaDB Developers Unconference is being hosted by Microsoft Shanghai, from 19 November. Slides will be added to this post as they become available. […]

The post 2019 MariaDB Developers Unconference Shanghai Presentations appeared first on

by Ian Gilfillan at November 19, 2019 09:43 AM

November 18, 2019


Database Load Balancing in the Cloud - MySQL Master Failover with ProxySQL 2.0: Part Two (Seamless Failover)

In the previous blog we showed you how to set up an environment in Amazon AWS EC2 that consists of a Percona Server 8.0 Replication Cluster (in Master - Slave topology). We deployed ProxySQL and we configured our application (Sysbench). 

We also used ClusterControl to make the deployment easier, faster and more stable. This is the environment we ended up with...

This is how it looks in ClusterControl:

In this blog post we are going to review the requirements and show you how, in this setup, you can seamlessly perform master switches.

Seamless Master Switch with ProxySQL 2.0

We are going to benefit from ProxySQL ability to queue connections if there are no nodes available in a hostgroup. ProxySQL utilizes hostgroups to differentiate between backend nodes with different roles. You can see the configuration on the screenshot below.

In our case we have two host groups - hostgroup 10 contains writers (master) and hostgroup 20 contains slaves (and also it may contain master, depends on the configuration). As you may know, ProxySQL uses SQL interface for configuration. ClusterControl exposes most of the configuration options in the UI but some settings cannot be set up via ClusterControl (or they are configured automatically by ClusterControl). One of such settings is how the ProxySQL should detect and configure backend nodes in replication environment.

mysql> SELECT * FROM mysql_replication_hostgroups;


| writer_hostgroup | reader_hostgroup | check_type | comment     |


| 10               | 20 | read_only  | host groups |


1 row in set (0.00 sec)

Configuration stored in mysql_replication_hostgroups table defines if and how ProxySQL will automatically assign master and slaves to correct hostgroups. In short, the configuration above tells ProxySQL to assign writers to HG10, readers to HG20. If a node is a writer or reader is determined by the state of variable ‘read_only’. If read_only is enabled, node is marked as reader and assigned to HG20. If not, node is marked as writer and assigned to HG10. On top of that we have a variable:

Which determines if writer should also show up in the readers’ hostgroup or not. In our case it is set to ‘True’ thus our writer (master) is also a part of HG20.

ProxySQL does not manage backend nodes but it does access them and check the state of them, including the state of the read_only variable. This is done by monitoring user, which has been configured by ClusterControl according to your input at the deployment time for ProxySQL. If the state of the variable changes, ProxySQL will reassign it to proper hostgroup, based on the value for read_only variable and based on the settings in mysql-monitor_writer_is_also_reader variable in ProxySQL.

Here enters ClusterControl. ClusterControl monitors the state of the cluster. Should master is not available, failover will occur. It is more complex than that and we explained this process in detail in one of our earlier blogs. What is important for us is that, as long as it is safe, ClusterControl will execute the failover and in the process it will reconfigure read_only variables on old and new master. ProxySQL will see the change and modify its hostgroups accordingly. This will also happen in case of the regular slave promotion, which can easily be executed from ClusterControl by starting this job:

The final outcome will be that the new master will be promoted and assigned to HG10 in ProxySQL while the old master will be reconfigured as a slave (and it will be a part of HG20 in ProxySQL). The process of master change may take a while depending on environment, application and traffic (it is even possible to failover in  11 seconds, as my colleague has tested). During this time database (master) will not be reachable in ProxySQL. This leads to some problems. For starters, the application will receive errors from the database and user experience will suffer - no one likes to see errors. Luckily, under some circumstances,  we can reduce the impact. The requirement for this is that the application does not use (at all or at that particular time) multi-statement transactions. This is quite expected - if you have a multi-statement transaction (so, BEGIN; … ; COMMIT;) you cannot move it from server to server because this will no longer be a transaction. In such cases the only safe way is to rollback the transaction and start once more on a new master. Prepared statements are also a no-no: they are prepared on a particular host (master) and they do not exist on slaves so once one slave will be promoted to a new master, it is not possible for it to execute prepared statements which has been prepared on old master. On the other hand if you run only auto-committed, single-statement transactions, you can benefit from the feature we are going to describe below.

One of the great features ProxySQL has is an ability to queue incoming transactions if they are directed to a hostgroup that does not have any nodes available. This is defined by following two variables:

ClusterControl increases them to 20 seconds, allowing even for quite some long failovers to perform without any error being sent to the application.

Testing the Seamless Master Switch

We are going to run the test in our environment. As the application we are going to use SysBench started as:

while true ; do sysbench /root/sysbench/src/lua/oltp_read_write.lua --threads=4 --events=0 --time=3600 --reconnect=1 --mysql-socket=/tmp/proxysql.sock --mysql-user=sbtest --mysql-password=sbtest --tables=32 --report-interval=1 --skip-trx=on --table-size=100000 --db-ps-mode=disable --rate=5 run ; done

Basically, we will run sysbench in a loop (in case an error show up). We will run it in 4 threads. Threads will reconnect after every transaction. There will be no multi-statement transactions and we will not use prepared statements. Then we will trigger the master switch by promoting a slave in the ClusterControl UI. This is how the master switch looks like from the application standpoint:

[ 560s ] thds: 4 tps: 5.00 qps: 90.00 (r/w/o: 70.00/20.00/0.00) lat (ms,95%): 18.95 err/s: 0.00 reconn/s: 5.00

[ 560s ] queue length: 0, concurrency: 0

[ 561s ] thds: 4 tps: 5.00 qps: 90.00 (r/w/o: 70.00/20.00/0.00) lat (ms,95%): 17.01 err/s: 0.00 reconn/s: 5.00

[ 561s ] queue length: 0, concurrency: 0

[ 562s ] thds: 4 tps: 7.00 qps: 126.00 (r/w/o: 98.00/28.00/0.00) lat (ms,95%): 28.67 err/s: 0.00 reconn/s: 7.00

[ 562s ] queue length: 0, concurrency: 0

[ 563s ] thds: 4 tps: 3.00 qps: 68.00 (r/w/o: 56.00/12.00/0.00) lat (ms,95%): 17.95 err/s: 0.00 reconn/s: 3.00

[ 563s ] queue length: 0, concurrency: 1

We can see that the queries are being executed with low latency.

[ 564s ] thds: 4 tps: 0.00 qps: 42.00 (r/w/o: 42.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 564s ] queue length: 1, concurrency: 4

Then the queries paused - you can see this by the latency being zero and transactions per second being equal to zero as well.

[ 565s ] thds: 4 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 565s ] queue length: 5, concurrency: 4

[ 566s ] thds: 4 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 566s ] queue length: 15, concurrency: 4

Two seconds in queue is growing, still no response coming from the database.

[ 567s ] thds: 4 tps: 20.00 qps: 367.93 (r/w/o: 279.95/87.98/0.00) lat (ms,95%): 3639.94 err/s: 0.00 reconn/s: 20.00

[ 567s ] queue length: 1, concurrency: 4

After three seconds application was finally able to reach the database again. You can see the traffic is now non-zero and the queue length has been reduced. You can see the latency around 3.6 seconds - this is for how long the queries have been paused

[ 568s ] thds: 4 tps: 10.00 qps: 116.04 (r/w/o: 84.03/32.01/0.00) lat (ms,95%): 539.71 err/s: 0.00 reconn/s: 10.00

[ 568s ] queue length: 0, concurrency: 0

[ 569s ] thds: 4 tps: 4.00 qps: 72.00 (r/w/o: 56.00/16.00/0.00) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 4.00

[ 569s ] queue length: 0, concurrency: 0

[ 570s ] thds: 4 tps: 8.00 qps: 144.01 (r/w/o: 112.00/32.00/0.00) lat (ms,95%): 24.83 err/s: 0.00 reconn/s: 8.00

[ 570s ] queue length: 0, concurrency: 0

[ 571s ] thds: 4 tps: 5.00 qps: 98.99 (r/w/o: 78.99/20.00/0.00) lat (ms,95%): 21.50 err/s: 0.00 reconn/s: 5.00

[ 571s ] queue length: 0, concurrency: 1

[ 572s ] thds: 4 tps: 5.00 qps: 80.98 (r/w/o: 60.99/20.00/0.00) lat (ms,95%): 17.95 err/s: 0.00 reconn/s: 5.00

[ 572s ] queue length: 0, concurrency: 0

[ 573s ] thds: 4 tps: 2.00 qps: 36.01 (r/w/o: 28.01/8.00/0.00) lat (ms,95%): 14.46 err/s: 0.00 reconn/s: 2.00

[ 573s ] queue length: 0, concurrency: 0

Everything is stable again, total impact for the master switch was 3.6 second increase in the latency and no traffic hitting database for 3.6 seconds. Other than that the master switch was transparent to the application. Of course, whether it will be 3.6 seconds or more depends on the environment, traffic and so on but as long as the master switch can be performed under 20 seconds, no error will be returned to the application.


As you can see, with ClusterControl and ProxySQL 2.0 you are just a couple of clicks from achieving a seamless failover and master switch for your MySQL Replication clusters.

by krzysztof at November 18, 2019 04:20 PM

November 17, 2019

Valeriy Kravchuk

Dynamic Tracing of MySQL Server with Timestamps Using gdb

Some time ago I wanted a customer to trace some MariaDB function execution and make sure that when it is executed I get both timestamp of execution and some of the arguments printed into some log file. Our InnoDB guru in MariaDB, Marko Mäkelä, suggested to use gdb, set breakpoint on the function and use its command ... end syntax to print whatever we needed, and log the output.

Adding a timestamp to each breakpoint "command" execution was the next step, and for this I've suggested to use plain "shell date" command. Redirecting output to the same file gdb uses to log everything indeed did the trick. Let me document it here for readers and myself. I'll consider the example of logging SQL queries I've used in the previous blog post on dynamic tracing with perf and use recent Percona Server 5.7.28-31 on Ubuntu 16.04 for tests.

Skipping the details already explained in that previous post, let's assume I want to set a breakpoint at the dispatch_command() function and print the value of com_data->com_query.query every time it hit it. This is what I did with gdb after attaching it to the proper mysqld process:
[New LWP 6562]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/".
0x00007fc563d8774d in poll () at ../sysdeps/unix/syscall-template.S:84
84      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) set height 0
(gdb) set log on
Copying output to gdb.txt.
(gdb) b dispatch_command
Breakpoint 1 at 0xbe9660: file /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/, line 1254.
(gdb) command 1
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>shell date >> ./gdb.txt
>p com_data->com_query.query
(gdb) continue

Key detail here is to use command and refer to the breakpoint by number, 1, and then append the output of the date command to the gdb.txt file in the current directory that is used by default to log the gdb output.

Then in another shell I executed the following:
openxs@ao756:~$ mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.28-31-log Percona Server (GPL), Release '31', Revision 'd14ef86'

Copyright (c) 2009-2019 Percona LLC and/or its affiliates
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select 1;
| 1 |
| 1 |
1 row in set (0.04 sec)

mysql> select 2;
| 2 |
| 2 |
1 row in set (0.04 sec)

mysql> shutdown;
Query OK, 0 rows affected (0.04 sec)

mysql> quit

I had the output in the gdb window, but what's more important is that I've got it in the gdb.txt file as well. The content looked as follows:
openxs@ao756:~$ ls -l gdb.txt
-rw-r--r-- 1 root root 7123 лис 17 19:21 gdb.txt
openxs@ao756:~$ cat gdb.txt
Breakpoint 1 at 0xbe9660: file /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/, line 1254.
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
[New Thread 0x7fc566411700 (LWP 6611)]
[Switching to Thread 0x7fc566411700 (LWP 6611)]

Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (thd=thd@entry=
    0x7fc548631000, com_data=com_data@entry=0x7fc566410da0, command=COM_QUERY)
    at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/
1254    /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/ No such file or directory.
неділя, 17 листопада 2019 19:20:47 +0200
$1 = 0x7fc54865b021 "show databases"
Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
    thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
    at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/
1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/
неділя, 17 листопада 2019 19:20:47 +0200
$2 = 0x7fc54865b021 "show tables"
Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
    thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
    at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/
1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/
неділя, 17 листопада 2019 19:20:47 +0200
$9 = 0x7fc54865b021 "select @@version_comment limit 1"

Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
    thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
    at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/
1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/
неділя, 17 листопада 2019 19:21:01 +0200
$10 = 0x7fc54865b021 "select 1"

Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
    thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
    at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/
1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/
неділя, 17 листопада 2019 19:21:08 +0200
$11 = 0x7fc54865b021 "select 2"

Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
    thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
    at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/
1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/
неділя, 17 листопада 2019 19:21:12 +0200
$12 = 0x7fc54865b021 "shutdown"
[Thread 0x7fc566411700 (LWP 6611) exited]

Thread 1 "mysqld" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7fc56645b780 (LWP 6525)]
0x00007fc563d8774d in poll () at ../sysdeps/unix/syscall-template.S:84
84      ../sysdeps/unix/syscall-template.S: No such file or directory.
Detaching from program: /usr/sbin/mysqld, process 6525
That's it. For basic tracing of any function calls in MySQL or MariaDB server (or any binary with debuginfo available) including timestamps and any arguments, variables or expressions printed you do not strictly need anything but gdb. With different format options for date command you can format timestamp any way you need, for example, you can remove day and month names etc and get even nanosecond precision if you prefer:
openxs@ao756:~$ for i in `seq 1 20`; do sleep 0.1; date +'%H:%m:%S.%N'; done
* * * 
Sometimes event basic tools allow to get useful results. This photo was made with my dumb Nokia phone while running. Just a stop for a moment. Nice and useful result, IMHO, same as with gdb trick discussed here.
To summarize, for basic dynamic tracing with timestamps and arbitrary information printed you can use a command line debugger, like gdb. In some cases this simple approach is quite useful.

This kind of tracing comes with a cost, both in terms of performance impact (awful) and additional steps to parse the output (we do not really care about all that breakpoint details, we justneed a timestamp and some values printed). eBPF and related dynamic tracing tools (bcc trace and bpftrace) may help with both problems, as I am going to demonstrate in upcoming blog posts. Stay tuned!

by Valerii Kravchuk ( at November 17, 2019 06:37 PM

November 15, 2019


Tips for Migrating from MySQL Replication to MySQL Galera Cluster 4.0

We have previously blogged about What’s New in MySQL Galera Cluster 4.0, Handling Large Transactions with Streaming Replication and MariaDB 10.4 and presented some guides about using the new Streaming Replication feature in a part 1 & part 2 series.

Moving your database technology from MySQL Replication to MySQL Galera Cluster requires you to have the right skills and an understanding of what you are doing to be successful. In this blog we’ll share some tips for migrating from a MySQL Replication setup to MySQL Galera Cluster 4.0 one.

The Differences Between MySQL Replication and Galera Cluster

If you're not yet familiar with Galera, we suggest you to go over our Galera Cluster for MySQL Tutorial. Galera Cluster uses a whole different level of replication based on synchronous replication, in contrast to the MySQL Replication which uses asynchronous replication (but could be configured also to achieve a semi-synchronous replication). 

Galera Cluster also supports multi-master replication. It is capable of unconstrained parallel applying (i.e., “parallel replication”), multicast replication, and automatic node provisioning. 

The primary focus of Galera Cluster is data consistency, whereas with MySQL Replication, it's prone to data inconsistency (which can be avoided with best practices and proper configuration such as enforcing read-only on the slaves to avoid unwanted writes within the slaves).

Although transactions received by Galera are either applied to every node or not at all,  each of these nodes certifies the replicated write-set in the applier queue (transaction commits) which also includes information on all of the locks that were held by the database during the transaction. These write-set, once no conflicting locks identified, are applied. Up to this point, transactions are considered committed and continues to apply it to the tablespace. Unlike in asynchronous replication, this approach is also called virtually synchronous replication since the writes and commits happens in a logical synchronous mode but the actual writing and committing to the tablespace happens independently and goes asynchronous on each node.

Unlike MySQL Replication, a Galera Cluster is a true multi-master, multi-threaded slave, a pure hot-standby, with no need for master-failover or read-write splitting. However, migrating to Galera Cluster doesn't mean an automatic answer to your problems. Galera Cluster supports only InnoDB, so there could be design modifications if you are using MyISAM or Memory storage engines. 

Converting Non-InnoDB Tables to InnoDB

Galera Cluster does allow you to use MyISAM, but this is not what Galera Cluster was designed for. Galera Cluster is designed to strictly implement data consistency within all of the nodes within the Cluster and this requires a strong ACID compliant database engine. InnoDB is an engine that has this strong capabilities in this area and is recommended that you use InnoDB; especially when dealing with transactions.

If you're using ClusterControl, you can benefit easily to determine your database instance(s) for any MyISAM tables which is provided by Performance Advisors. You can find this under Performance → Advisors tab. For example,

If you require MyISAM and MEMORY tables, you can still use it but make sure your data that does not need to be replicated. You can use your data stored for read-only and, use "START TRANSACTION READONLY" wherever appropriate.

Adding Primary Keys To your InnoDB Tables

Since Galera Cluster only supports InnoDB, it is very important that all of your tables must have a clustered index, (also called primary key or unique key).  To get the best performance from queries, inserts, and other database operations, it is very important that you must define every table with a unique key(s) since InnoDB uses the clustered index to optimize the most common lookup and DML operations for each table. This helps avoid long running queries within the cluster and possible can slow down write/read operations in the cluster.

In ClusterControl, there are advisors which can notify you of this. For example, in your MySQL Replication master/slave cluster, you'll an alarm from the or view from the list of advisors. The example screenshot below reveals that you have no tables that has no primary key:

Identify a Master (or Active-Writer) Node

Galera Cluster is purely a true multi-master replication. However, it doesn't mean that you're all free to write whichever node you would like to target. One thing to identify is, when writing on a different node and a conflicting transaction will be detected, you'll get into a deadlock issue just like below:

2019-11-14T21:14:03.797546Z 12 [Note] [MY-011825] [InnoDB] *** Priority TRANSACTION:

TRANSACTION 728431, ACTIVE 0 sec starting index read

mysql tables in use 1, locked 1

MySQL thread id 12, OS thread handle 140504401893120, query id 1414279 Applying batch of row changes (update)

2019-11-14T21:14:03.797696Z 12 [Note] [MY-011825] [InnoDB] *** Victim TRANSACTION:

TRANSACTION 728426, ACTIVE 3 sec updating or deleting

mysql tables in use 1, locked 1

, undo log entries 11409

MySQL thread id 57, OS thread handle 140504353195776, query id 1414228 localhost root updating

update sbtest1_success set k=k+1 where id > 1000 and id < 100000

2019-11-14T21:14:03.797709Z 12 [Note] [MY-011825] [InnoDB] *** WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 1663 page no 11 n bits 144 index PRIMARY of table `sbtest`.`sbtest1_success` trx id 728426 lock_mode X

Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0

 0: len 8; hex 73757072656d756d; asc supremum;;

The problem with multiple nodes writing without identifying a current active-writer node, you'll end up with these issues which are very common problems I've seen when using Galera Cluster when writing on multiple nodes at the same time. In order to avoid this, you can use single-master setup approach:

From the documentation,

To relax flow control, you might use the settings below:

wsrep_provider_options = "gcs.fc_limit = 256; gcs.fc_factor = 0.99; gcs.fc_master_slave = YES"

The above requires a server restart since fc_master_slave is not dynamic.

Enable Debugging Mode For Logging Conflicts or Deadlocks

Debugging or tracing issues with your Galera Cluster is very important. Locks in Galera is implemented differently compared to MySQL Replication. It uses optimistic locking when dealing with transactions cluster-wide. Unlike the MySQL Replication, it has only pessimistic locking which doesn't know if there's such same or conflicting transaction being executed in a co-master on a multi-master setup. Galera still uses pessimistic locking but on the local node since it's managed by InnoDB, which is the storage engine supported. Galera uses optimistic locking when it goes to other nodes. This means that no checks are made with other nodes on the cluster when local locks are attained (pessimistic locking). Galera assumes that, once the transaction passes the commit phase within the storage engine and the other nodes are informed, everything will be okay and no conflicts will arise.

In practice, it's best to enable wsrep_logs_conflicts. This will log the details of conflicting MDL as well as InnoDB locks in the cluster. Enabling this variable can be set dynamically but caveat once this is enabled. It will verbosely populate your error-log file and can fill up your disk once your error-log file size is too large.

Be Careful With Your DDL Queries

Unlike MySQL Replication, running an ALTER statement can affect only incoming connections that requires to access or reference that table targeted by your ALTER statement. It can also affect slaves if the table is large and can bring slave lag. However, writes to your master won't be block as long as your queries does not conflict with the current ALTER. However, this is entirely not the case when running your DDL statements such as ALTER with Galera Cluster. ALTER statements  can bring problems such as Galera Cluster stuck due to cluster-wide lock or flow control starts to relax the replication while some nodes are recovering from large writes.

In some situations, you might end up having downtime to your Galera Cluster if that table is too large and is a primary and vital table to your application. However, it can be achieved without downtime. As Rick James pointed out in his blog, you can follow the recommendations below:


  • Rolling Schema Upgrade = manually do one node (offline) at a time
  • Total Order Isolation = Galera synchronizes so that it is done at the same time (in the replication sequence) on all nodes. RSU and TOI

Caution: Since there is no way to synchronize the clients with the DDL, you must make sure that the clients are happy with either the old or the new schema. Otherwise, you will probably need to take down the entire cluster while simultaneously switching over both the schema and the client code.

A "fast" DDL may as well be done via TOI. This is a tentative list of such:

  • ALTER to change DEFAULT
  • ALTER to change definition of ENUM or SET (see caveats in manual)
  • Certain PARTITION ALTERs that are fast.
  • DROP INDEX (other than PRIMARY KEY)
  • Other ALTERs on 'small' tables.
  • With 5.6 and especially 5.7 having a lot of ALTER ALGORITHM=INPLACE cases, check which ALTERs should be done which way.

Otherwise, use RSU. Do the following separately for each node:

SET GLOBAL wsrep_OSU_method='RSU';

This also takes the node out of the cluster.

SET GLOBAL wsrep_OSU_method='TOI';

Puts back in, leading to resync (hopefully a quick IST, not a slow SST)

Preserve the Consistency Of Your Cluster

Galera Cluster does not support replication filters such as binlog_do_db or binlog_ignore_db since Galera does not rely with binary logging. It relies on the ring-buffer file also called GCache which stores write-sets that are replicated along the cluster. You cannot apply any inconsistent behavior or state of such database nodes. 

Galera, on the other hand, strictly implements data consistency within the cluster. It's still possible that there can be inconsistency where rows or records cannot be found. For example, setting your variable wsrep_OSU_method either RSU or TOI for your DDL ALTER statements might bring inconsistent behavior.  Check this external blog from Percona discussing about inconsistency with Galera with TOI vs RSU.

Setting wsrep_on=OFF and subsequently run DML or DDL queries can be dangerous to your cluster. You must also review your stored procedures, triggers, functions, events, or views if results are not dependent on a node's state or environment. When a certain node(s) can be inconsistent, it can potentially bring the entire cluster to go down. Once Galera detects an inconsistent behavior, Galera will attempt to leave the cluster and terminate that node. Hence, it's possible that all of the nodes can be inconsistent leaving you under a state of dilemma. 

If a Galera Cluster node as well experiences a crash especially upon a high-traffic period, it's better not to start right away the node. Instead, perform a full SST or bring a new instance as soon as possible or once the traffic goes low. It can be possible that node can bring inconsistent behavior which might have corrupted data. 

Segregate Large Transactions and Determine Whether to Use Streaming Replication 

Let's get straight on this one. One of the biggest changes features especially on Galera Cluster 4.0 is the streaming replication. Past versions of Galera Cluster 4.0, it limits transactions < 2GiB which is typically controlled by variables wsrep_max_ws_rows and wsrep_max_ws_size. Since Galera Cluster 4.0, you can able to send > 2GiB of transactions but you must determine how large the fragments has to be processed during replication. It has to be set by session and the only variables you need to take care are wsrep_trx_fragment_unit and wsrep_trx_fragment_size. Disabling the Streaming Replication is simple as setting the wsrep_trx_fragment_size = 0 will do it. Take note that, replicating a large transaction also possess overhead on the slave nodes (nodes that are replicating against the current active-writer/master node) since logs will be written to wsrep_streaming_log table in the MySQL database.

Another thing to add, since you're dealing with large transaction, it's considerable that your transaction might take some time to finish so setting the variable innodb_lock_wait_timeout high must have to be taken into account. Set this via session depending on the time you estimate but larger than the time you estimate it to finish, otherwise raise a timeout.

We recommend you read this previous blog about streaming replication in action.

Replicating GRANTs Statements

If you're using GRANTs and related operations act on the MyISAM/Aria tables in the database `mysql`. The GRANT statements will be replicated, but the underlying tables will not. So this means, INSERT INTO mysql.user ... will not be replicated because the table is MyISAM.

However, the above might not be true anymore since Percona XtraDB Cluster(PXC) 8.0 (currently experimental) as mysql schema tables have been converted to InnoDB, whilst in MariaDB 10.4, some of the tables are still in Aria format but others are in CSV or InnoDB. You should determine what version and provider of Galera you have but best to avoid using DML statements referencing mysql schema. Otherwise, you might end up on unexpected results unless you're sure that this is PXC 8.0.


Galera Cluster does not support XA Transactions since XA Transactions handles rollback and commits differently. LOCK/UNLOCK or GET_LOCK/RELEASE_LOCK statements are dangerous to be applied or used with Galera. You might experience a crash or locks that are not killable and stay locked. For example,

---TRANSACTION 728448, ACTIVE (PREPARED) 13356 sec

mysql tables in use 2, locked 2

3 lock struct(s), heap size 1136, 5 row lock(s), undo log entries 5

MySQL thread id 67, OS thread handle 140504353195776, query id 1798932 localhost root wsrep: write set replicated and certified (13)

insert into sbtest1(k,c,pad) select k,c,pad from sbtest1_success limit 5

This transaction has already been unlocked and even been killed but to no avail. We suggest that you have to redesign your application client and get rid of these functions when migrating to Galera Cluster.

Network Stability is a MUST!!!

Galera Cluster can work even with inter-WAN topology or inter-geo topology without any issues (check this blog about implementing inter-geo topology with Galera). However, if your network connectivity between each nodes is not stable or intermittently going down for an unsuspected time, it can be problematic for the cluster. It's best you have a cluster running in a private and local network where each of these nodes are connected. When designing a node as a disaster recovery, then plan to create a cluster if these are on a different region or geography.  You may start reading our previous blog, Using MySQL Galera Cluster Replication to Create a Geo-Distributed Cluster: Part One as this could help you best to decide your Galera Cluster topology.

Another thing to add about investing your network hardware, it would be problematic if your network transfer rate provides you a lower speed during rebuilding of an instance during IST or worse at SST especially if your data set is massive. It can take long hours of network transfer and that might affect the stability of your cluster especially if you have a 3-node cluster while 2 nodes are not available where these 2 are a donor and a joiner. Take note that, during SST phase, the DONOR/JOINER nodes cannot be in-used until it's finally able to sync with the primary cluster.

In previous version of Galera, when it comes to donor node selection, the State Snapshot Transfer (SST) donor was selected at random. In Glera 4, it has much more improved and has the ability to choose the right donor within the cluster, as it will favour a donor that can provide an Incremental State Transfer (IST), or pick a donor in the same segment. Alternatively, you can set wsrep_sst_donor variable to the right donor you would like to always pick.

Backup Your Data and Do Rigid Testing During Migration and Before Production

Once you are suit up and has decided to try and migrate your data to Galera Cluster 4.0, make sure you always have your backup prepared. If you tried ClusterControl, taking backups shall be easier to do this.

Ensure that you are migrating to the right version of InnoDB and do not forget to always apply and run mysql_upgrade before doing the test. Ensure that all your test passes the desired result from which the MySQL Replication can offer you. Most likely, there's no difference with the InnoDB storage engine you're using in a MySQL Replication Cluster versus the MySQL Galera Cluster as long as the recommendations and tips have been applied and prepared beforehand.


Migrating to Galera Cluster 4.0 might not be your desired database technology solution. However, it is not pulling you away to utilize Galera Cluster 4.0 as long as its specific requirements can be prepared, setup, and provided. Galera Cluster 4.0 has now become a very powerful viable choice and option especially on a highly-available platform and solution. We also suggest that you read these external blogs about Galera Caveats or the Limitations of Galera Cluster or this manual from MariaDB.

by Paul Namuag at November 15, 2019 07:34 PM

November 14, 2019


MySQL InnoDB Cluster 8.0 - A Complete Deployment Walk-Through: Part One

MySQL InnoDB Cluster consists of 3 components:

  • MySQL Group Replication (a group of database server which replicates to each other with fault tolerance).
  • MySQL Router (query router to the healthy database nodes)
  • MySQL Shell (helper, client, configuration tool)

In the first part of this walkthrough, we are going to deploy a MySQL InnoDB Cluster. There are a number of hands-on tutorial available online but this walkthrough covers all the necessary steps/commands to install and run the cluster in one place. We will be covering monitoring, management and scaling operations as well as some gotchas when dealing with MySQL InnoDB Cluster in the second part of this blog post.

The following diagram illustrates our post-deployment architecture:

We are going to deploy a total of 4 nodes; A three-node MySQL Group Replication and one MySQL router node co-located within the application server. All servers are running on Ubuntu 18.04 Bionic.

Installing MySQL

The following steps should be performed on all database nodes db1, db2 and db3.

Firstly, we have to do some host mapping. This is crucial if you want to use hostname as the host identifier in InnoDB Cluster and this is the recommended way to do. Map all hosts as the following inside /etc/hosts:

$ vi /etc/hosts   router apps   db1 db1.local   db2 db2.local   db3 db3.local       localhost localhost.localdomain

Stop and disable AppArmor:

$ service apparmor stop
$ service apparmor teardown
$ systemctl disable apparmor

Download the latest APT config repository from MySQL Ubuntu repository website at At the time of this writing, the latest one is dated 15-Oct-2019 which is mysql-apt-config_0.8.14-1_all.deb:

$ wget

Install the package and configure it for "mysql-8.0":

$ dpkg -i mysql-apt-config_0.8.14-1_all.deb

Install the GPG key:

$ apt-key adv --recv-keys --keyserver 5072E1F5

Update the repolist:

$ apt-get update

Install Python and followed by MySQL server and MySQL shell:

$ apt-get -y install mysql-server mysql-shell

You will be presented with the following configuration wizards:

  1. Set a root password - Specify a strong password for the MySQL root user.
  2. Set the authentication method - Choose "Use Legacy Authentication Method (Retain MySQL 5.x Compatibility)"

MySQL should have been installed at this point. Verify with:

$ systemctl status mysql

Ensure you get an "active (running)" state.

Preparing the Server for InnoDB Cluster

The following steps should be performed on all database nodes db1, db2 and db3.

Configure the MySQL server to support Group Replication. The easiest and recommended way to do this is to use the new MySQL Shell:

$ mysqlsh

Authenticate as the local root user and follow the configuration wizard accordingly as shown in the example below:

MySQL  JS > dba.configureLocalInstance("root@localhost:3306");

Once authenticated, you should get a number of questions like the following:

Responses to those questions with the following answers:

  • Pick 2 - Create a new admin account for InnoDB cluster with minimal required grants
  • Account Name: clusteradmin@%
  • Password: mys3cret&&
  • Confirm password: mys3cret&&
  • Do you want to perform the required configuration changes?: y
  • Do you want to restart the instance after configuring it?: y

Don't forget to repeat the above on the all database nodes. At this point, the MySQL daemon should be listening to all IP addresses and Group Replication is enabled. We can now proceed to create the cluster.

Creating the Cluster

Now we are ready to create a cluster. On db1, connect as cluster admin from MySQL Shell:

MySQL|JS> shell.connect('clusteradmin@db1:3306');
Creating a session to 'clusteradmin@db1:3306'
Please provide the password for 'clusteradmin@db1:3306': ***********
Save password for 'clusteradmin@db1:3306'? [Y]es/[N]o/Ne[v]er (default No): Y
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 9
Server version: 8.0.18 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.

You should be connected as clusteradmin@db1 (you can tell by looking at the prompt string before '>'). We can now create a new cluster:

MySQL|db1:3306 ssl|JS> cluster = dba.createCluster('my_innodb_cluster');

Check the cluster status:

MySQL|db1:3306 ssl|JS> cluster.status()
    "clusterName": "my_innodb_cluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "db1:3306",
        "ssl": "REQUIRED",
        "status": "OK_NO_TOLERANCE",
        "statusText": "Cluster is NOT tolerant to any failures.",
        "topology": {
            "db1:3306": {
                "address": "db1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
        "topologyMode": "Single-Primary"
    "groupInformationSourceMember": "db1:3306"

At this point, only db1 is part of the cluster. The default topology mode is Single-Primary, similar to a replica set concept where only one node is a writer at a time. The remaining nodes in the cluster will be readers. 

Pay attention on the cluster status which says OK_NO_TOLERANCE, and further explanation under statusText key. In a replica set concept, one node will provide no fault tolerance. Minimum of 3 nodes is required in order to automate the primary node failover. We are going to look into this later.

Now add the second node, db2 and accept the default recovery method, "Clone":

MySQL|db1:3306 ssl|JS> cluster.addInstance('clusteradmin@db2:3306');

The following screenshot shows the initialization progress of db2 after we executed the above command. The syncing operation is performed automatically by MySQL:

Check the cluster and db2 status:

MySQL|db1:3306 ssl|JS> cluster.status()
    "clusterName": "my_innodb_cluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "db1:3306",
        "ssl": "REQUIRED",
        "status": "OK_NO_TOLERANCE",
        "statusText": "Cluster is NOT tolerant to any failures.",
        "topology": {
            "db1:3306": {
                "address": "db1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
            "db2:3306": {
                "address": "db2:3306",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
        "topologyMode": "Single-Primary"
    "groupInformationSourceMember": "db1:3306"

At this point, we have two nodes in the cluster, db1 and db2. The status is still showing OK_NO_TOLERANCE with further explanation under statusText value. As stated above, MySQL Group Replication requires at least 3 nodes in a cluster for fault tolerance. That's why we have to add the third node as shown next.

Add the last node, db3 and accept the default recovery method, "Clone" similar to db2:

MySQL|db1:3306 ssl|JS> cluster.addInstance('clusteradmin@db3:3306');

The following screenshot shows the initialization progress of db3 after we executed the above command. The syncing operation is performed automatically by MySQL:

Check the cluster and db3 status:

MySQL|db1:3306 ssl|JS> cluster.status()
    "clusterName": "my_innodb_cluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "db1:3306",
        "ssl": "REQUIRED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "db1:3306": {
                "address": "db1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
            "db2:3306": {
                "address": "db2:3306",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
            "db3:3306": {
                "address": "db3:3306",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
        "topologyMode": "Single-Primary"
    "groupInformationSourceMember": "db1:3306"

Now the cluster looks good, where the status is OK and the cluster can tolerate up to one failure node at one time. The primary node is db1 where it shows "primary": "db1:3306" and "mode": "R/W", while other nodes are in "R/O" state. If you check the read_only and super_read_only values on RO nodes, both are showing as true.

Our MySQL Group Replication deployment is now complete and in synced.

Deploying the Router

On the app server that we are going to run our application, make sure the host mapping is correct:

$ vim /etc/hosts   router apps   db1 db1.local   db2 db2.local   db3 db3.local       localhost localhost.localdomain

Stop and disable AppArmor:

$ service apparmor stop
$ service apparmor teardown
$ systemctl disable apparmor

Then install MySQL repository package, similar to what we have done when performing database installation:

$ wget
$ dpkg -i mysql-apt-config_0.8.14-1_all.deb

Add GPG key:

$ apt-key adv --recv-keys --keyserver 5072E1F5

Update the repo list:

$ apt-get update

Install MySQL router and client:

$ apt-get -y install mysql-router mysql-client

MySQL Router is now installed under /usr/bin/mysqlrouter. MySQL router provides a bootstrap flag to automatically configure the router operation with a MySQL InnoDB cluster. What we need to do is to specify the string URI to one of the database node as the InnoDB cluster admin user (clusteradmin). 

To simplify the configuration, we will run the mysqlrouter process as root user:

$ mysqlrouter --bootstrap clusteradmin@db1:3306 --directory myrouter --user=root

Here is what we should get after specifying the password for clusteradmin user:

The bootstrap command will assist us to generate the router configuration file at /root/myrouter/mysqlrouter.conf. Now we can start the mysqlrouter daemon with the following command from the current directory:

$ myrouter/

Verify if the anticipated ports are listening correctly:

$ netstat -tulpn | grep mysql
tcp        0 0  * LISTEN   14726/mysqlrouter
tcp        0 0  * LISTEN   14726/mysqlrouter
tcp        0 0 * LISTEN   14726/mysqlrouter
tcp        0 0 * LISTEN   14726/mysqlrouter

Now our application can use port 6446 for read/write and 6447 for read-only MySQL connections.

Connecting to the Cluster

Let's create a database user on the master node. On db1, connect to the MySQL server via MySQL shell:

$ mysqlsh root@localhost:3306

Switch from Javascript mode to SQL mode:

MySQL|localhost:3306 ssl|JS> \sql

Switching to SQL mode... Commands end with ;

Create a database:

MySQL|localhost:3306 ssl|SQL> CREATE DATABASE sbtest;

Create a database user:

MySQL|localhost:3306 ssl|SQL> CREATE USER sbtest@'%' IDENTIFIED BY 'password';

Grant the user to the database:

MySQL|localhost:3306 ssl|SQL> GRANT ALL PRIVILEGES ON sbtest.* TO sbtest@'%';

Now our database and user is ready. Let's install sysbench to generate some test data. On the app server, do:

$ apt -y install sysbench mysql-client

Now we can test on the app server to connect to the MySQL server via MySQL router. For write connection, connect to port 6446 of the router host:

$ mysql -usbtest -p -h192.168.10.40 -P6446 -e 'select user(), @@hostname, @@read_only, @@super_read_only'
| user()        | @@hostname | @@read_only | @@super_read_only |
| sbtest@router | db1        | 0           | 0                 |

For read-only connection, connect to port 6447 of the router host:

$ mysql -usbtest -p -h192.168.10.40 -P6447 -e 'select user(), @@hostname, @@read_only, @@super_read_only'
| user()        | @@hostname | @@read_only | @@super_read_only |
| sbtest@router | db3        | 1           | 1                 |

Looks good. We can now generate some test data with sysbench. On the app server, generate 20 tables with 100,000 rows per table by connecting to port 6446 of the app server:

$ sysbench \
/usr/share/sysbench/oltp_common.lua \
--db-driver=mysql \
--mysql-user=sbtest \
--mysql-db=sbtest \
--mysql-password=password \
--mysql-port=6446 \
--mysql-host= \
--tables=20 \
--table-size=100000 \

To perform a simple read-write test on port 6446 for 300 seconds, run:

$ sysbench \
/usr/share/sysbench/oltp_read_write.lua \
--report-interval=2 \
--threads=8 \
--time=300 \
--db-driver=mysql \
--mysql-host= \
--mysql-port=6446 \
--mysql-user=sbtest \
--mysql-db=sbtest \
--mysql-password=password \
--tables=20 \
--table-size=100000 \

For read-only workloads, we can send the MySQL connection to port 6447:

$ sysbench \
/usr/share/sysbench/oltp_read_only.lua \
--report-interval=2 \
--threads=1 \
--time=300 \
--db-driver=mysql \
--mysql-host= \
--mysql-port=6447 \
--mysql-user=sbtest \
--mysql-db=sbtest \
--mysql-password=password \
--tables=20 \
--table-size=100000 \


That's it. Our MySQL InnoDB Cluster setup is now complete with all of its components running and tested. In the second part, we are going to look into management, monitoring and scaling operations of the cluster as well as solutions to a number of common problems when dealing with MySQL InnoDB Cluster. Stay tuned!


by ashraf at November 14, 2019 04:54 PM

November 13, 2019


How to Configure Cluster-to-Cluster Replication for PostgreSQL

As we recently announced, ClusterControl 1.7.4 has a new feature called Cluster-to-Cluster Replication. It allows you to have a replication running between two autonomous clusters. For more detailed information please refer to the above mentioned announcement.

We will take a look at how to use this new feature for an existing PostgreSQL cluster. For this task, we will assume you have ClusterControl installed and the Master Cluster was deployed using it.

Requirements for the Master Cluster

There are some requirements for the Master Cluster to make it work:

  • PostgreSQL 9.6 or later.
  • There must be a PostgreSQL server with the ClusterControl role 'Master'.
  • When setting up the Slave Cluster the Admin credentials must be identical to the Master Cluster.

Preparing the Master Cluster

The Master Cluster needs to meet the requirements above mentioned.

About the first requirement, make sure you are using the correct PostgreSQL version in the Master Cluster and chose the same for the Slave Cluster.

$ psql

postgres=# select version();

PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit

If you need to assign the master role to a specific node, you can do it from the ClusterControl UI. Go to ClusterControl -> Select Master Cluster -> Nodes -> Select the Node -> Node Actions -> Promote Slave.

And finally, during the Slave Cluster creation, you must use the same admin credentials that you are using currently in the Master Cluster. You will see where to add it in the following section.

Creating the Slave Cluster from the ClusterControl UI

To create a new Slave Cluster, go to ClusterControl -> Select Cluster -> Cluster Actions -> Create Slave Cluster.

The Slave Cluster will be created by streaming data from the current Master Cluster.

In this section, you must also choose the master node of the current cluster from which the data will be replicated.

When you go to the next step, you must specify User, Key or Password and port to connect by SSH to your servers. You also need a name for your Slave Cluster and if you want ClusterControl to install the corresponding software and configurations for you.

After setting up the SSH access information, you must define the database version, datadir, port, and admin credentials. As it will use streaming replication, make sure you use the same database version, and as we mentioned earlier, the credentials must be the same used by the Master Cluster. You can also specify which repository to use.

In this step, you need to add the server to the new Slave Cluster. For this task, you can enter both IP Address or Hostname of the database node.

You can monitor the status of the creation of your new Slave Cluster from the ClusterControl activity monitor. Once the task is finished, you can see the cluster in the main ClusterControl screen.

Managing Cluster-to-Cluster Replication Using the ClusterControl UI

Now you have your Cluster-to-Cluster Replication up and running, there are different actions to perform on this topology using ClusterControl.

Rebuilding a Slave Cluster

To rebuild a Slave Cluster, go to ClusterControl -> Select Slave Cluster -> Nodes -> Choose the Node connected to the Master Cluster -> Node Actions -> Rebuild Replication Slave.

ClusterControl will perform the following steps:

  • Stop PostgreSQL Server
  • Remove content from its datadir
  • Stream a backup from the Master to the Slave using pg_basebackup
  • Start the Slave

Stop/Start Replication Slave

The stop and start replication in PostgreSQL means pause and resume it, but we use these terms to be consistent with other database technologies we support.

This function will be available to use from the ClusterControl UI soon. This action will use the pg_wal_replay_pause and pg_wal_replay_resume PostgreSQL functions to perform this task.

Meanwhile, you can use a workaround to stop and start the replication slave stopping and starting the database node in an easy way using ClusterControl.

Go to ClusterControl -> Select Slave Cluster -> Nodes -> Choose the Node -> Node Actions -> Stop Node/Start Node. This action will stop/start the database service directly.

Managing Cluster-to-Cluster Replication Using the ClusterControl CLI

In the previous section, you were able to see how to manage a Cluster-to-Cluster Replication using the ClusterControl UI. Now, let’s see how to do it by using the command line. 

Note: As we mentioned at the beginning of this blog, we will assume you have ClusterControl installed and the Master Cluster was deployed using it.

Create the Slave Cluster

First, let’s see an example command to create a Slave Cluster by using the ClusterControl CLI:

$ s9s cluster --create --cluster-name=PostgreSQL1rep --cluster-type=postgresql --provider-version=11 --nodes=""  --os-user=root --os-key-file=/root/.ssh/id_rsa --db-admin=admin --db-admin-passwd=********* --vendor=postgres --remote-cluster-id=21 --log

Now you have your create slave process running, let’s see each used parameter:

  • Cluster: To list and manipulate clusters.
  • Create: Create and install a new cluster.
  • Cluster-name: The name of the new Slave Cluster.
  • Cluster-type: The type of cluster to install.
  • Provider-version: The software version.
  • Nodes: List of the new nodes in the Slave Cluster.
  • Os-user: The user name for the SSH commands.
  • Os-key-file: The key file to use for SSH connection.
  • Db-admin: The database admin user name.
  • Db-admin-passwd: The password for the database admin.
  • Remote-cluster-id: Master Cluster ID for the Cluster-to-Cluster Replication.
  • Log: Wait and monitor job messages.

Using the --log flag, you will be able to see the logs in real time:

Verifying job parameters. Checking ssh/sudo. Checking if host already exists in another cluster.

Checking job arguments.

Found top level master node:

Verifying nodes.

Checking nodes that those aren't in another cluster.

Checking SSH connectivity and sudo. Checking ssh/sudo.

Checking OS system tools.

Installing software.

Detected centos (core 7.5.1804).

Data directory was not specified. Using directory '/var/lib/pgsql/11/data'. Configuring host and installing packages if neccessary.


Cluster 26 is running.

Generated & set RPC authentication token.

Rebuilding a Slave Cluster

You can rebuild a Slave Cluster using the following command:

$ s9s replication --stage --master="" --slave="" --cluster-id=26 --remote-cluster-id=21 --log

The parameters are:

  • Replication: To monitor and control data replication.
  • Stage: Stage/Rebuild a Replication Slave.
  • Master: The replication master in the master cluster.
  • Slave: The replication slave in the slave cluster.
  • Cluster-id: The Slave Cluster ID.
  • Remote-cluster-id: The Master Cluster ID.
  • Log: Wait and monitor job messages.

The job log should be similar to this one:

Rebuild replication slave from master

Remote cluster id = 21 Checking size of '/var/lib/pgsql/11/data'. /var/lib/pgsql/11/data size is 201.13 MiB. Checking free space in '/var/lib/pgsql/11/data'. /var/lib/pgsql/11/data has 28.78 GiB free space. Verifying PostgreSQL version. Verifying the timescaledb-postgresql-11 installation. Package timescaledb-postgresql-11 is not installed.

Setting up replication>

Collecting server variables. Using the pg_hba.conf contents for the slave. Will copy the postmaster.opts to the slave. Updating slave configuration.

Writing file ''. GRANT new node on members to do pg_basebackup. granting Stopping slave. Stopping PostgreSQL node. waiting for server to shut down.... done

server stopped

… waiting for server to start....2019-11-12 15:51:11.767 UTC [8005] LOG:  listening on IPv4 address "", port 5432

2019-11-12 15:51:11.767 UTC [8005] LOG:  listening on IPv6 address "::", port 5432

2019-11-12 15:51:11.769 UTC [8005] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"

2019-11-12 15:51:11.774 UTC [8005] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"

2019-11-12 15:51:11.798 UTC [8005] LOG:  redirecting log output to logging collector process

2019-11-12 15:51:11.798 UTC [8005] HINT:  Future log output will appear in directory "log".


server started Grant cluster members on the new node (for failover).

Grant connect access for new host in cluster.

Adding grant on Waiting until the service is started.

Replication slave job finished.

Stop/Start Replication Slave

As we mentioned in the UI section, the stop and start replication in PostgreSQL means pause and resume it, but we use these terms to keep the parallelism with other technologies.

You can stop to replicate the data from the Master Cluster in this way:

$ s9s replication --stop --slave="" --cluster-id=26 --log

You will see this: Pausing recovery of the slave. Successfully paused recovery on the slave using select pg_wal_replay_pause().

And now, you can start it again:

$ s9s replication --start --slave="" --cluster-id=26 --log

So, you will see: Resuming recovery on the slave. Collecting replication statistics. Slave resumed recovery successfully using select pg_wal_replay_resume().

Now, let’s check the used parameters.

  • Replication: To monitor and control data replication.
  • Stop/Start: To make the slave stop/start replicating.
  • Slave: The replication slave node.
  • Cluster-id: The ID of the cluster in which the slave node is.
  • Log: Wait and monitor job messages.


This new ClusterControl feature will allow you to quickly set up replication between different PostgreSQL clusters, and manage the setup in an easy and friendly way. The Severalnines dev team is working on enhancing this feature, so any ideas or suggestions would be very welcome.

by Sebastian Insausti at November 13, 2019 04:04 PM

November 12, 2019


Avoiding Database Vendor Lock-In for MySQL or MariaDB

Vendor lock-in is defined as "Proprietary lock-in or customer lock-in, which makes a customer dependent on a vendor for their products and services; unable to use another vendor without substantial cost" (wikipedia). Undeniably for many software companies that would be the desired business model. But is it good for their customers?

Proprietary databases have great support for migrations from other popular database software solutions. However, that would just cause another vendor lock-in. Is it then open source a solution? 

Due to limitations that open source had years back many chosen expensive database solutions. Unfortunately, for many open-source was not an option.

In fact over the years, the open-source database has earned Enterprise support and maturity to run critical and complex data transaction systems. 

With the new version database like Percona and MariaDB has added some great new features, either compatibility or enterprise necessitates like 24/7 support, security, auditing, clustering, online backup or fast restore. All that made the migration process more accessible than ever before.

Migration may be a wise move however it comes with the risk. Whether you're planning to migrate from proprietary to open support migration manually or with the help of a commercial tool to automate the entire migration process, you need to know all the possible bottlenecks and methods involved in the process and the validation of the results.

Changing the database system is also an excellent time to consider further vendor lock-in risks. During the migration process, you may think about how to avoid to be locked with some technology.  In this article, we are going to focus on some leading aspects of vendor lock-in of MySQL and MariaDB.

Avoiding Lock-in for Database Monitoring

Users of open source databases often have to use a mixture of tools and homegrown scripts to monitor their production database environments. However, even while having its own homegrown scripts in the solution, it’s hard to maintain it and keep up with new database features. 

Hopefully, there are many interesting free monitoring tools for MySQL/MariaDB.  The most DBA recommended free tools are PMM, Zabbix, ClusterControl Community Edition, Nagios MySQL plugin. Although PMM and ClusterControl are dedicated database sollutions.

Percona Monitoring and Management (PMM) is a fully open-source solution for managing MySQL platform performance and tuning query performance. PMM is an on-premises solution that retains all of your performance and query data inside the confines of your environment. You can find the PMM demo under the below link.

PMM by Percona

Traditional server monitoring tools are not built for modern distributed database architectures. Most production databases today run in some high availability setup - from more straightforward master-slave replication to multi-master clusters fronted by redundant load balancers. Operations teams deal with dozens, often hundreds of services that make up the database environment.

Free Database Monitoring from ClusterControl

Having multiple database systems means your organization will become more agile on the development side and allows the choice to the developers, but it also imposes additional knowledge on the operations side. Extending your infrastructure from only MySQL to deploying other storage backends like MongoDB and PostgreSQL, implies you also have to monitor, manage, and scale them. As every storage backend excels at different use cases, this also means you have to reinvent the wheel for every one of them.

ClusterControl: Replication Setup

ClusterControl was designed to address modern, highly distributed database setups based on replication or clustering. It shows the status of the entire cluster solution however it can be greatly used for a single instance. ClusterControl will show you many advanced metrics however you can also find there build in advisors that will help you to understand them. You can find the ClusterControl demo under the below link.

ClusterControl: Advisors

Avoiding Lock-in for Database Backup Solutions

There are multiple ways to take backups, but which method fits your specific needs? How do I implement point in time recovery?

If you are migrating from Oracle or SQL Server we would like to recommend you xtrabackup tool from Percona or similar mariabackup from Mark.

Percona XtraBackup is the most popular, open-source, MySQL/MariaDB hot backup software that performs non-blocking backups for InnoDB and XtraDB databases. It falls into the physical backup category, which consists of exact copies of the MySQL data directory and files underneath it.

XtraBackup does not lock your database during the backup process. For large databases (100+ GB), it provides much better restoration time as compared to mysqldump. The restoration process involves preparing MySQL data from the backup files, before replacing or switching it with the current data directory on the target node.

Avoiding Lock-in for Database High Availability and Scalability

It is said that if you are not designing for failure, then you are heading for a crash. How do you create a database system from the ground up to withstand failure? This can be a challenge as failures happen in many different ways, sometimes in ways that would be hard to imagine. It is a consequence of the complexity of today's database environments.

Clustering is an expensive feature of databases like Oracle and SQL Server. It requires extra licenses. 

Galera Cluster is a mainstream option for high availability MySQL and MariaDB. And though it has established itself as a credible replacement for traditional MySQL master-slave architectures, it is not a drop-in replacement.

Galera Cluster is a synchronous active-active database clustering technology for MySQL and MariaDB. Galera Cluster differs from what is known as Oracle’s MySQL Cluster - NDB. MariaDB cluster is based on the multi-master replication plugin provided by Codership.

While the Galera Cluster has some characteristics that make it unsuitable for specific use cases, most applications can still be adapted to run on it.

The benefits are clear: multi-master InnoDB setup with built-in failover and read scalability.

Avoiding Lock-in for Database Load Balancing

Proxies are building blocks of high availability setups for MySQL. They can detect failed nodes and route queries to hosts that are still available. If your master failed and you had to promote one of your slaves, proxies will detect such topology changes and route your traffic accordingly.

More advanced proxies can do much more, such as route traffic based on precise query rules, cache queries, or mirror them. They can be even used to implement different types of sharding.

The most useful ones are ProxySQL, HAproxy, MaxScale (limited free usage).

ClusterControl: ProxySQL Load Balancing

Avoiding Lock-in When Migrating to the Cloud

In the last ten years, many businesses have moved to cloud-based technology to avoid the budgetary limitations for data centers and agile software development. Utilizing the cloud enables your company and applications to profit from the cost-savings and versatility that originate with cloud computing.

While cloud solutions offer companies many benefits, it still introduces some risks. For example, vendor lock-in is as high in the cloud as it was in the data center.

As more companies run their workloads in the cloud, cloud database services are increasingly being used to manage data. One of the advantages of using a cloud database service instead of maintaining your database is that it reduces the management overhead. Database services from the leading cloud vendors share many similarities, but they have individual characteristics that may make them well-, or ill-suited to your workload.

ClusterControl: Deploy various database systems in the cloud

The Database Hosting Hybrid Model

As more enterprises are moving to the cloud, the hybrid model is actually becoming more popular. The hybrid model is seen as a safe model for many businesses. 

In fact, it's challenging to do a heart transplant and port everything over immediately. Many companies are doing a slow migration that usually takes a year or even maybe forever until everything is migrated. The move should be made in an acceptable peace.

The hybrid model will not only allow you to build a highly available scalable system but due to its nature is a great first step to avoid lock-in. By architecture design, your systems will work in a kind of mixed mode.

An example of such architectures could be a cluster that operates in house data center and it’s copy located in the cloud. 

ClusterControl: Cluster to Cluster Replication


Migrating from a proprietary database to open source can come with several benefits: lower cost of ownership, access to and use of an open-source database engine, tight integration with the web. Open source has many to offer and due its nature is a great option to avoid vendor lock-in.


by Bart Oles at November 12, 2019 08:47 PM

November 11, 2019


9 ClusterControl Features You Won't Find in Other Database Management Tools

Ensuring smooth operations of your production databases is not a trivial task, and there are a number of tools and utilities to help with the job. There are tools available for monitoring health, server performance, analyzing queries, deployments, managing failover, upgrades, and the list goes on. ClusterControl as a management and monitoring platform for your database infrastructure stands out with its ability to manage the full lifecycle from deployment to monitoring, ongoing management and scaling. 

Although ClusterControl offers important features like automatic database failover, encryption in-transit/at-rest, backup management, point-in-time recovery, Prometheus integration, database scaling, these can be found in other enterprise management/monitoring tools on the market. However, there are some features that you won’t find that easily. In this blog post, we’ll present 9 features that you won't find in any other management and monitoring tools on the market (as the time of this writing). 

Backup Verification

Any backup is literally not a backup until you know it can be recovered - by really verifying that it can be recovered. ClusterControl allows a backup to be verified after the backup has been taken by spinning a new server and testing restore. Verifying a backup is a critical process to make sure you meet your Recovery Point Objective (RPO) policy in the event of disaster recovery. The verification process will perform the restoration on a new standalone host (where ClusterControl will install necessary database packages before restoring) or on a server dedicated for backup verification.

To configure backup verification, simply select an existing backup and click on Restore. There will be an option to Restore and Verify:

Database Backup Verification

Then, simply specify the IP address of the server that you would want to restore and verify:

Database Backup Verification

Make sure the specified host is accessible via passwordless SSH beforehand. You also have a handful of options underneath for provisioning process. You can also shutdown the verification server after restoration to save costs and resources after the backup has been verified. ClusterControl will look for the restoration process exit code and observe the restore log to check whether the verification fails or succeeds.

Simplifying ProxySQL Management Through a GUI

Many would agree that having a graphical user interface is more efficient and less prone to human error when configuring a system. ProxySQL is a part of the critical database layer (although it sits on top of it) and must be visible enough to DBA's eyes to spot common problems and issues. ClusterControl provides a comprehensive graphical user interface for ProxySQL.

ProxySQL instances can be deployed on fresh hosts, or existing ones can be imported into ClusterControl. ClusterControl can configure ProxySQL to be integrated with a virtual IP address (provided by Keepalived) for single endpoint access to the database servers. It also provides monitoring insight to the key ProxySQL components like Queries Backend, Slow Queries, Top Queries, Query Hits, and a bunch of other monitoring stats. The following is a screenshot showing how to add a new query rule:

ProxySQL Management GUI

If you were adding a very complex query rule, you would be more comfortable doing it via the graphical user interface. Every field has a tooltip to assist you when filling in the Query Rule form. When adding or modifying any ProxySQL configuration, ClusterControl will make sure the changes are made to runtime, and saved onto disk for persistency.

ClusterControl 1.7.4 now supports both ProxySQL 1.x and ProxySQL 2.x.

Operational Reports

Operational Reports are a set of summary reports of your database infrastructure that can be generated on-the-fly or can be scheduled to be sent to different recipients. These reports consist of different checks and address various day-to-day DBA tasks. The idea behind ClusterControl operational reporting is to put all of the most relevant data into a single document which can be quickly analyzed in order to get a clear understanding of the status of the databases and its processes.

With ClusterControl you can schedule cross-cluster environment reports like Daily System Report, Package Upgrade Report, Schema Change Report as well as Backups and Availability. These reports will help you to keep your environment secure and operational. You will also see recommendations on how to fix gaps. Reports can be addressed to SysOps, DevOps or even managers who would like to get regular status updates about a given system’s health. 

The following is a sample of daily operational report sent to your mailbox in regards to availability:

Database Operational Report

We have covered this in detail in this blog post, An Overview of Database Operational Reporting in ClusterControl.

Resync a Slave via Backup

ClusterControl allows staging a slave (whether a new slave or a broken slave) via the latest full or incremental backup. It doesn't sound very exciting, but this feature is huge if you have large datasets of 100GB and above. Common practice when resyncing a slave is to stream a backup of the current master which will take some time depending on the database size. This will add an additional burden to the master, which may jeopardize the performance of the master.

To resync a slave via backup, pick the slave node under Nodes page and go to Node Actions -> Rebuild Replication Slave -> Rebuild from a backup. Only PITR-compatible backup will be listed in the dropdown:

Rebuild Database Replication Slave

Resyncing a slave from a backup will not bring any additional overhead to the master, where ClusterControl extracts and streams the backup from the backup storage location into the slave and eventually configures the replication link between the slave to the master. The slave will later catch up with the master once the replication link is established. The master is untouched during the whole process, and you can monitor the whole progress under Activity -> Jobs. 

Bootstrap a Galera Cluster

Galera Cluster is a very popular when implementing high availability for MySQL or MariaDB, but the wrong management commands can lead to disastrous consequences. Take a look at this blog post on how to bootstrap a Galera Cluster under different conditions. This illustrates that bootstrapping a Galera Cluster has many variables and must be performed with extreme care. Otherwise, you may lose data or cause a split-brain. ClusterControl understands the database topology and knows exactly what to do in order to bootstrap a database cluster properly. To bootstrap a cluster via ClusterControl, click on the Cluster Actions -> Bootstrap Cluster:

Bootstrap a Galera Cluster

You will have the option to let ClusterControl pick the right bootstrap node automatically, or perform an initial bootstrap where you pick one of the database nodes from the list to become the reference node and wipe out the MySQL datadir on the joiner nodes to force SST from the bootstrapped node. If bootstrapping process fails, ClusterControl will pull the MySQL error log.

If you would like to perform a manual bootstrap, you can also use "Find Most Advanced Node" feature and perform the cluster bootstrap operation on the most advanced node reported by ClusterControl.

Centralized Configuration and Logging

ClusterControl pulls a number of important configuration and logging files and displays them in a tree structure within ClusterControl. A centralized view of these files is key to efficiently understanding and troubleshooting distributed database setups. The traditional way of tailing/grepping these files is long gone with ClusterControl. The following screenshot shows ClusterControl's configuration file manager which listed out all related configuration files for this cluster in one single view (with syntax highlighting, of course):

Centralized Database Configuration and Logging

ClusterControl eliminates the repetitiveness when changing a configuration option of a database cluster. Changing a configuration option on multiple nodes can be performed via a single interface and will be applied to the database node accordingly. When you click on "Change/Set Parameter", you can select the database instances that you would want to change and specify the configuration group, parameter and value:

Centralized Database Configuration and Logging

You can add a new parameter into the configuration file or modify an existing parameter. The parameter will be applied to the chosen database nodes' runtime and into the configuration file if the option passes the variable validation process. Some variable might require a server restart, which will then be advised by ClusterControl.

Database Cluster Cloning

With ClusterControl, you can quickly clone an existing MySQL Galera Cluster so you have an exact copy of the dataset on the other cluster. ClusterControl performs the cloning operation online, without any locking or bringing downtime to the existing cluster. It's like a cluster scale out operation except both clusters are independent from each other after the syncing completes. The cloned cluster does not necessarily need to be as the same cluster size as the existing one. We could start with a “one-node cluster”, and scale it out with more database nodes at a later stage.

Database Cluster Cloning

Another similar feature offered by ClusterControl is "Create Cluster from Backup". This feature was introduced in ClusterControl 1.7.1, specifically for Galera Cluster and PostgreSQL clusters where one can create a new cluster from the existing backup. Contrary to cluster cloning, this operation does not bring additional load to the source cluster with the tradeoff that the cloned cluster will not be in the same state as the source cluster.

We have covered this topic in detail in this blog post, How to Create a Clone of Your MySQL or PostgreSQL Database Cluster.

Restore Physical Backup

Most database management tools allow backing up a database, and only a handful of them support database restoration of logical backup only. ClusterControl supports full restoration not only for logical backups, but also physical backups, whether it is a full or incremental backup. Restoring a physical backup requires a number of critical steps (especially incremental backups) which basically involves preparing a backup, copying the prepared data into the data directory, assigning correct permission/ownership and starting up the node in a correct order to maintain data consistency across all members in the cluster. ClusterControl performs all of these operations automatically.

You can also restore a physical backup onto another node that is not part of a cluster. In ClusterControl, the option for this is called "Create Cluster from Backup". You can start with a “one-node cluster” to test out the restoration process on another server or to copy out your database cluster to another location.

ClusterControl also supports restoring an external backup, a backup that has been taken not through ClusterControl. You just need to upload the backup to ClusterControl server and specify the physical path to the backup file when restoring. ClusterControl will take care the rest.

Cluster-to-Cluster Replication

This is a new feature introduced in ClusterControl 1.7.4. ClusterControl can now handle and monitor cluster-cluster replication, which basically extends the asynchronous database replication between multiple cluster sets in multiple geographical locations. A cluster can be set as a master cluster (active cluster which processes reads/writes) and the slave cluster can be set as a read-only cluster (standby cluster which can also processes reads). ClusterControl supports asynchronous cluster-cluster replication for Galera Cluster (binary log must be enabled) and also master-slave replication for PostgreSQL Streaming Replication. 

To create a new cluster the replicates from another cluster, go to Cluster Actions -> Create Slave Cluster:

Cluster-to-Cluster Replication

The result of the above deployment is presented clearly on the Database Cluster List dashboard:

Cluster-to-Cluster Replication

The slave cluster is automatically configured as read-only, replicating from the primary cluster and acting as a standby cluster. If disaster strikes the primary cluster and you want to activate the secondary site, simply pick the "Disable Readonly"  menu available under the Nodes -> Node Actions dropdown to promote it as an active cluster.

by ashraf at November 11, 2019 10:45 AM

November 10, 2019

Colin Charles

Database Tab Sweep

I miss a proper database related newsletter for busy people. There’s so much happening in the space, from tech, to licensing, and even usage. Anyway, quick tab sweep.

Paul Vallée (of Pythian fame) has been working on Tehama for sometime, and now he gets to do it full time as a PE firm, bought control of Pythian’s services business. Pythian has more than 350 employees, and 250 customers, and raised capital before. More at Ottawa’s Pythian spins out software platform Tehama.

Database leaks data on most of Ecuador’s citizens, including 6.7 million children – ElasticSearch.

Percona has launched Percona Distribution for PostgreSQL 11. This means they have servers for MySQL, MongoDB, and now PostgreSQL. Looks very much like a packaged server with tools from 3rd parties (source).

Severalnines has launched Backup Ninja, an agent-based SaaS service to backup popular databases in the cloud. Backup.Ninja (cool URL) supports MySQL (and variants), MongoDB, PostgreSQL and TimeScale. No pricing available, but it is free for 30 days.

Comparing Database Types: How Database Types Evolved to Meet Different Needs

New In PostgreSQL 12: Generated Columns – anyone doing a comparison with MariaDB Server or MySQL?

Migration Complete – Amazon’s Consumer Business Just Turned off its Final Oracle Database – a huge deal as they migrated 75 petabytes of internal data to DynamoDB, Aurora, RDS and Redshift. Amazon, powered by AWS, and a big win for open source (a lot of these services are built-on open source).

MongoDB and Alibaba Cloud Launch New Partnership – I see this as a win for the SSPL relicense. It is far too costly to maintain a drop-in compatible fork, in a single company (Hi Amazon DocumentDB!). Maybe if the PostgreSQL layer gets open sourced, there is a chance, but otherwise, all good news for Alibaba and MongoDB.

MySQL 8.0.18 brings hash join, EXPLAIN ANALYZE, and more interestingly, HashiCorp Vault support for MySQL Keyring. (Percona has an open source variant).

by Colin Charles at November 10, 2019 02:22 PM

Valeriy Kravchuk

Time in Performance Schema

I've seen questions like this:
"Is there a way to know when (date and time) the last statement captured in ... was actually ran?"
more than once in some discussions (and customer issues) related to Performance Schema. I've seen answers provided by my colleagues and me after some limited testing. I've also noticed statements that it may not be possible.

Indeed, examples of wall clock date and time in the output of queries from the performance_schema are rare (and usually come from tables in the information_schemasys.format_time() function converts time to a nice, human readable format, but it still remains relative - it is not a date and time when something recorded in performance_schema happened.

In this post I'd like to document the answer I've seen and have in mind (and steps to get it) here, to save time for readers and myself faced with similar questions in the future. I'll also show the problem with this answer that I've noticed after testing for more than few minutes.

Let's start with simple setup of testing environment. In my case it is good old MariaDB 10.3.7 running on this netbook under Windows. First, let's check if Performance Schema is enabled:
 MariaDB [test]> select version(), @@performance_schema;
| version()          | @@performance_schema |
| 10.3.7-MariaDB-log |                    1 |
1 row in set (0.233 sec)
Then let's enable recording of time for everything and enable all consumers:
MariaDB [test]> update performance_schema.setup_instruments set enabled='yes', timed='yes';
Query OK, 459 rows affected (0.440 sec)
Rows matched: 707  Changed: 459  Warnings: 0

MariaDB [test]> update performance_schema.setup_consumers set enabled='yes';
Query OK, 8 rows affected (0.027 sec)
Rows matched: 12  Changed: 8  Warnings: 0
Now we can expect recently executed statements to be recorded, like this:
MariaDB [test]> select now(), event_id, timer_start, timer_end, sql_text from performance_schema.events_statements_current\G*************************** 1. row ***************************
      now(): 2019-11-03 17:42:51
   event_id: 46
timer_start: 22468653162059216
  timer_end: 22468697203533224
   sql_text: select now(), event_id, timer_start, timer_end, sql_text from performance_schema.events_statements_current
1 row in set (0.045 sec)
Good, but how we can get a real time when the statement was executed (like now() reports)? We all know from the fine manual that timer_start and timer_end values are in "picoseconds". So we can easily convert them into seconds (or whatever units we prefer):
MariaDB [test]> select now(), event_id, timer_start/1000000000000, sql_text from performance_schema.events_statements_current\G
*************************** 1. row ***************************
                    now(): 2019-11-03 17:54:02
                 event_id: 69
timer_start/1000000000000: 23138.8159
                 sql_text: select now(), event_id, timer_start/1000000000000, sql_text from performance_schema.events_statements_current
1 row in set (0.049 sec)
This value is related startup time one might assume, and we indeed can expect that timer in Performance Schema is initialized at some very early stage of startup. But how to get date and time of server startup with SQL statement?

This also seems to be easy, as we have a global status variable called Uptime measured in seconds. Depending on fork and version used we can get the value of Uptime either from the Performance Schema (in MySQL 5.7+) or from the Information Schema (in MariaDB and older MySQL versions). For example:
MariaDB [test]> select variable_value from information_schema.global_status where variable_name = 'Uptime';
| variable_value |
| 23801          |
1 row in set (0.006 sec)
So, server startup time is easy to get with a date_sub() function:
MariaDB [test]> select @start := date_sub(now(), interval (select variable_value from information_schema.global_status where variable_name = 'Uptime') second) as start;
| start                      |
| 2019-11-03 11:28:18.000000 |
1 row in set (0.007 sec)
In the error log of MariaDB server I see:
2019-11-03 11:28:18 0 [Note] mysqld (mysqld 10.3.7-MariaDB-log) starting as process 5636 ...
So, I am sure the result is correct. Now, if we use date_add() to add timer value converted to seconds, for example to the server startup time, we can get the desired answer, date and time when the statement recorded in performance_schema was really executed:
MariaDB [test]> select event_id, @ts := date_add(@start, interval timer_start/1000000000000 second) as ts, sql_text, now(), timediff(now(), @ts) from performance_schema.events_statements_current\G
*************************** 1. row ***************************
            event_id: 657
                  ts: 2019-11-03 18:24:00.501654
            sql_text: select event_id, @ts := date_add(@start, interval timer_start/1000000000000 second) as ts, sql_text, now(), timediff(now(), @ts) from performance_schema.events_statements_current
               now(): 2019-11-03 18:24:05
timediff(now(), @ts): 00:00:04.498346
1 row in set (0.002 sec)
I was almost ready to publish this blog post a week ago, before paying more attention to the result (that used to be perfectly correct in earlier simple tests) and executing a variation of statement presented above. The problem I noticed is that when Uptime of the server is not just few minutes (as it often happens in quick test environments), but hours or days, timestamp that we get for a recent event from the performance_schema using the approach suggested may differ from current timestamp notably (we see 4.5+ seconds difference highlighted above). Moreover, this difference seem to fluctuate:
MariaDB [test]> select event_id, @ts := date_add(@start, interval timer_start/1000000000000 second) as ts, sql_text, now(), timediff(now(), @ts) from performance_schema.events_statements_current\G
*************************** 1. row ***************************
            event_id: 682
                  ts: 2019-11-03 18:24:01.877763
            sql_text: select event_id, @ts := date_add(@start, interval timer_start/1000000000000 second) as ts, sql_text, now(), timediff(now(), @ts) from performance_schema.events_statements_current
               now(): 2019-11-03 18:24:07
timediff(now(), @ts): 00:00:05.122237
1 row in set (0.002 sec)
and tend to grow with Uptime. This make the entire idea of converting timer_start and timer_end Performance Schema "counters" in "picoseconds" questionable and unreliable for the precise real timestamp matching and comparing with other timestamp information sources in production.
Same as with this photo of sunset at Brighton taken with my Nokia dumb phone back in June, I do not see a clear picture of time measurement in Performance Schema...

After spending some more time thinking about this I decided to involve MySQL team somehow and created the feature request, Bug #97558 - "Add function (like sys.format_time) to convert TIMER_START to timestamp", that ended up quickly "Verified" (so I have small hope that I had not missed anything really obvious - correct me if I am wrong). I'd be happy to see further comments there and, surely, the function I asked about implemented. But I feel there is some internal problem with this and some new feature at server side may be needed to take the "drift" of time in Performance Schema into account.

There is also a list of currently open questions that I may try to answer in followup posts:
  1. Is the problem of time drift I noticed a MariaDB 10.3.7-specific, or recent MySQL 5..x and 8.0.x are also affected?
  2. Is this difference in time growing monotonically with time or really fluctuating?
  3. When exactly Performance Schema time "counter" starts, where is it in the code?
  4. Are there any other, better or at least more precise and reliable ways to get timestamps of some specific events that happen during MySQL server work? I truly suspect that gdb and especially dynamic tracing on Linux with tools like bpftrace may give us more reliable results...

by Valerii Kravchuk ( at November 10, 2019 01:44 PM

November 08, 2019

MariaDB Foundation

MariaDB 10.4.10, 10.3.20, 10.2.29 and 10.1.43 Now Available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.4.10, MariaDB 10.3.20, MariaDB 10.2.29 and MariaDB 10.1.43 the latest stable releases in their respective series. […]

The post MariaDB 10.4.10, 10.3.20, 10.2.29 and 10.1.43 Now Available appeared first on

by Ian Gilfillan at November 08, 2019 06:41 PM


How to Configure a Cluster-to-Cluster Replication for Percona XtraDB Cluster or MariaDB Cluster

In a previous blog, we announced a new ClusterControl 1.7.4 feature called Cluster-to-Cluster Replication. It automates the entire process of setting up a DR cluster off your primary cluster, with replication in between. For more detailed information please refer to the above mentioned blog entry.

Now in this blog, we will take a look at how to configure this new feature for an existing cluster. For this task, we will assume you have ClusterControl installed and the Master Cluster was deployed using it.

Requirements for the Master Cluster

There are some requirements for the Master Cluster to make it work:

  • Percona XtraDB Cluster version 5.6.x and later, or MariaDB Galera Cluster version 10.x and later.
  • GTID enabled.
  • Binary Logging enabled on at least one database node.
  • The backup credentials must be the same across the Master Cluster and Slave Cluster. 

Preparing the Master Cluster

The Master Cluster needs to be prepared to use this new feature. It requires configuration from both ClusterControl and Database side.

ClusterControl Configuration

In the database node, check the backup user credentials stored in /etc/my.cnf.d/secrets-backup.cnf (For RedHat Based OS) or in /etc/mysql/secrets-backup.cnf (For Debian Based OS).

$ cat /etc/my.cnf.d/secrets-backup.cnf

# Security credentials for backup.









In the ClusterControl node, edit the /etc/cmon.d/cmon_ID.cnf configuration file (where ID is the Cluster ID Number) and make sure it contains the same credentials stored in secrets-backup.cnf.

$ cat /etc/cmon.d/cmon_8.cnf







Any change on this file requires a cmon service restart:

$ service cmon restart

Check the database replication parameters to make sure that you have GTID and Binary Logging enabled.

Database Configuration

In the database node, check the file /etc/my.cnf (For RedHat Based OS) or /etc/mysql/my.cnf (For Debian Based OS) to see the configuration related to the replication process.

Percona XtraDB:

$ cat /etc/my.cnf




log_bin = /var/lib/mysql-binlog/binlog

log_slave_updates = ON

gtid_mode = ON

enforce_gtid_consistency = true

relay_log = relay-log

expire_logs_days = 7

MariaDB Galera Cluster:

$ cat /etc/my.cnf




log_bin = /var/lib/mysql-binlog/binlog

log_slave_updates = ON

relay_log = relay-log






expire_logs_days = 7

Insted checking the configuration files, you can verify if it’s enabled in the ClusterControl UI. Go to ClusterControl -> Select Cluster -> Nodes. There you should have something like this:

The “Master” role added in the first node means that the Binary Logging is enabled.

Enabling Binary Logging

If you don’t have the binary logging enabled, go to ClusterControl -> Select Cluster -> Nodes -> Node Actions -> Enable Binary Logging.

Then, you must specify the binary log retention, and the path to store it. You should also specify if you want ClusterControl to restart the database node after configuring it, or if you prefer to restart it by yourself.

Keep in mind that Enabling Binary Logging always requires a restart of the database service.

Creating the Slave Cluster from the ClusterControl GUI

To create a new Slave Cluster, go to ClusterControl -> Select Cluster -> Cluster Actions -> Create Slave Cluster.

The Slave Cluster can be created by streaming data from the current Master Cluster or by using an existing backup. 

In this section, you must also choose the master node of the current cluster from which the data will be replicated.

When you go to the next step, you must specify User, Key or Password and port to connect by SSH to your servers. You also need a name for your Slave Cluster and if you want ClusterControl to install the corresponding software and configurations for you.

After setting up the SSH access information, you must define the database vendor and version, datadir, database port, and the admin password. Make sure you use the same vendor/version and credentials as used by the Master Cluster. You can also specify which repository to use.

In this step, you need to add servers to the new Slave Cluster. For this task, you can enter both IP Address or Hostname of each database node.

You can monitor the status of the creation of your new Slave Cluster from the ClusterControl activity monitor. Once the task is finished, you can see the cluster in the main ClusterControl screen.

Managing Cluster-to-Cluster Replication Using the ClusterControl GUI

Now you have your Cluster-to-Cluster Replication up and running, there are different actions to perform on this topology using ClusterControl.

Configure Active-Active Clusters

As you can see, by default the Slave Cluster is set up in Read-Only mode. It’s possible to disable the Read-Only flag on the nodes one by one from the ClusterControl UI, but keep in mind that Active-Active clustering is only recommended if applications are only touching disjoint data sets on either cluster since MySQL/MariaDB doesn’t offer any Conflict Detection or Resolution.

To disable the Read-Only mode, go to ClusterControl -> Select Slave Cluster -> Nodes. In this section, select each node and use the Disable Read-Only option.

Rebuilding a Slave Cluster

To avoid inconsistencies, if you want to rebuild a Slave Cluster, this must be a Read-Only cluster, this means that all nodes must be in Read-Only mode.

Go to ClusterControl -> Select Slave Cluster -> Nodes -> Choose the Node connected to the Master Cluster -> Node Actions -> Rebuild Replication Slave.

Topology Changes

If you have the following topology:

And for some reason, you want to change the replication node in the Master Cluster. It’s possible to change the master node used by the Slave Cluster to another master node in the Master Cluster. 

To be considered as a master node, it must have the binary logging enabled.

Go to ClusterControl -> Select Slave Cluster -> Nodes -> Choose the Node connected to the Master Cluster -> Node Actions -> Stop Slave/Start Slave.

Stop/Start Replication Slave

You can stop and start replication slaves in an easy way using ClusterControl.

Go to ClusterControl -> Select Slave Cluster -> Nodes -> Choose the Node connected to the Master Cluster -> Node Actions -> Stop Slave/Start Slave.

Reset Replication Slave

Using this action, you can reset the replication process using RESET SLAVE or RESET SLAVE ALL. The difference between them is, RESET SLAVE doesn’t change any replication parameter like master host, port and credentials. To delete this information you must use RESET SLAVE ALL that removes all the replication configuration, so using this command the Cluster-to-Cluster Replication link will be destroyed.

Before using this feature, you must stop the replication process (please refer to the previous feature).

Go to ClusterControl -> Select Slave Cluster -> Nodes -> Choose the Node connected to the Master Cluster -> Node Actions -> Reset Slave/Reset Slave All.

Managing Cluster-to-Cluster Replication Using the ClusterControl CLI

In the previous section, you were able to see how to manage a Cluster-to-Cluster Replication using the ClusterControl UI. Now, let’s see how to do it by using the command line. 

Note: As we mentioned at the beginning of this blog, we will assume you have ClusterControl installed and the Master Cluster was deployed using it.

Create the Slave Cluster

First, let’s see an example command to create a Slave Cluster by using the ClusterControl CLI:

$ s9s cluster --create --cluster-name=Galera1rep --cluster-type=galera  --provider-version=10.4 --nodes=";;"  --os-user=root --os-key-file=/root/.ssh/id_rsa --db-admin=root --db-admin-passwd=xxxxxxxx --vendor=mariadb --remote-cluster-id=11 --log

Now you have your create slave process running, let’s see each used parameter:

  • Cluster: To list and manipulate clusters.
  • Create: Create and install a new cluster.
  • Cluster-name: The name of the new Slave Cluster.
  • Cluster-type: The type of cluster to install.
  • Provider-version: The software version.
  • Nodes: List of the new nodes in the Slave Cluster.
  • Os-user: The user name for the SSH commands.
  • Os-key-file: The key file to use for SSH connection.
  • Db-admin: The database admin user name.
  • Db-admin-passwd: The password for the database admin.
  • Remote-cluster-id: Master Cluster ID for the Cluster-to-Cluster Replication.
  • Log: Wait and monitor job messages.

Using the --log flag, you will be able to see the logs in real time:

Verifying job parameters.

Checking ssh/sudo on 3 hosts.

All 3 hosts are accessible by SSH. Checking if host already exists in another cluster. Checking if host already exists in another cluster. Checking if host already exists in another cluster. Binary logging is enabled. Binary logging is enabled.

Creating the cluster with the following:

wsrep_cluster_address = 'gcomm://,,'

Calling job: setupServer( Checking OS information.


Caching config files.

Job finished, all the nodes have been added successfully.

Configure Active-Active Clusters

As you could see earlier, you can disable the Read-Only mode in the new cluster by disabling it in each node, so let’s see how to do it from the command line.

$ s9s node --set-read-write --nodes="" --cluster-id=16 --log

Let’s see each parameter:

  • Node: To handle nodes.
  • Set-read-write: Set the node to Read-Write mode.
  • Nodes: The node where to change it.
  • Cluster-id: The ID of the cluster in which the node is.

Then, you will see: Setting read_only=OFF.

Rebuilding a Slave Cluster

You can rebuild a Slave Cluster using the following command:

$ s9s replication --stage --master="" --slave="" --cluster-id=19 --remote-cluster-id=11 --log

The parameters are:

  • Replication: To monitor and control data replication.
  • Stage: Stage/Rebuild a Replication Slave.
  • Master: The replication master in the master cluster.
  • Slave: The replication slave in the slave cluster.
  • Cluster-id: The Slave Cluster ID.
  • Remote-cluster-id: The Master Cluster ID.
  • Log: Wait and monitor job messages.

The job log should be similar to this one:

Rebuild replication slave from master

Remote cluster id = 11

Shutting down Galera Cluster. Stopping node. Stopping mysqld (timeout=60, force stop after timeout=true). Stopping MySQL service. All processes stopped. Stopped node. Stopping node. Stopping mysqld (timeout=60, force stop after timeout=true). Stopping MySQL service. All processes stopped.

… Changing master to Changed master to Starting slave. Collecting replication statistics. Started slave successfully. Starting node

Writing file ''.

Writing file ''.

Writing file ''.

Topology Changes

You can change your topology using another node in the Master Cluster from which replicate the data, so for example, you can run:

$ s9s replication --failover --master="" --slave="" --cluster-id=10 --remote-cluster-id=9 --log

Let’s check the used parameters.

  • Replication: To monitor and control data replication.
  • Failover: Take the role of master from a failed/old master.
  • Master: The new replication master in the Master Cluster.
  • Slave: The replication slave in the Slave Cluster.
  • Cluster-id: The ID of the Slave Cluster.
  • Remote-Cluster-id: The ID of the Master Cluster.
  • Log: Wait and monitor job messages.

You will see this log: belongs to cluster id 9. Changing master to My master is Sanity checking replication master '[cid:9]' to be used by '[cid:139814070386698]'. Executing GRANT REPLICATION SLAVE ON *.* TO 'cmon_replication'@''.

Setting up link between and Stopping slave. Successfully stopped slave. Setting up replication using MariaDB GTID:> Changing Master using master_use_gtid=slave_pos. Changing master to Changed master to Starting slave. Collecting replication statistics. Started slave successfully. Flushing logs to update 'SHOW SLAVE HOSTS'

Stop/Start Replication Slave

You can stop to replicate the data from the Master Cluster in this way:

$ s9s replication --stop --slave="" --cluster-id=19 --log

You will see this: Ensuring the datadir '/var/lib/mysql' exists and is owned by 'mysql'. Stopping slave. Successfully stopped slave.

And now, you can start it again:

$ s9s replication --start --slave="" --cluster-id=19 --log

So, you will see: Ensuring the datadir '/var/lib/mysql' exists and is owned by 'mysql'. Starting slave. Collecting replication statistics. Started slave successfully.

Now, let’s check the used parameters.

  • Replication: To monitor and control data replication.
  • Stop/Start: To make the slave stop/start replicating.
  • Slave: The replication slave node.
  • Cluster-id: The ID of the cluster in which the slave node is.
  • Log: Wait and monitor job messages.

Reset Replication Slave

Using this command, you can reset the replication process using RESET SLAVE or RESET SLAVE ALL. For more information about this command, please check the usage of this in the previous ClusterControl UI section.

Before using this feature, you must stop the replication process (please refer to the previous command).


$ s9s replication --reset  --slave="" --cluster-id=19 --log

The log should be like: Ensuring the datadir '/var/lib/mysql' exists and is owned by 'mysql'. Executing 'RESET SLAVE'. Command 'RESET SLAVE' succeeded.


$ s9s replication --reset --force  --slave="" --cluster-id=19 --log

And this log should be: Ensuring the datadir '/var/lib/mysql' exists and is owned by 'mysql'. Executing 'RESET SLAVE /*!50500 ALL */'. Command 'RESET SLAVE /*!50500 ALL */' succeeded.

Let’s see the used parameters for both RESET SLAVE and RESET SLAVE ALL.

  • Replication: To monitor and control data replication.
  • Reset: Reset the slave node.
  • Force: Using this flag you will use the RESET SLAVE ALL command on the slave node.
  • Slave: The replication slave node.
  • Cluster-id: The Slave Cluster ID.
  • Log: Wait and monitor job messages.


This new ClusterControl feature will allow you to create Cluster-to-Cluster Replication fast and manage it in an easy and friendly way. This environment will improve your database/cluster topology and it would be useful for a Disaster Recovery Plan, testing environment and even more options mentioned in the overview blog.

by Sebastian Insausti at November 08, 2019 10:45 AM

November 07, 2019


An Overview of Cluster-to-Cluster Replication

Nowadays, it’s pretty common to have a database replicated in another server/datacenter, and it’s also a must in some cases. There are different reasons to replicate your databases to a totally separate environment. 

  • Migrate to another datacenter.
  • Upgrade (hardware/software) requirements.
  • Maintain a fully synced operational system in a Disaster Recovery (DR) site that can take over at any time
  • Keep a slave database as part of a lower cost DR Plan.
  • For geo-location requirements (data needs to be locally in a specific country).
  • Have a testing environment.
  • Troubleshooting purpose.
  • Reporting database.

And, there are different ways to perform this replication task:

  • Backup/Restore: Backing up a production database and restoring it in a new server/environment is the classic way to do this, but it is also an old-fashioned way as you won’t keep your data up-to-date and you need to wait for each restoring process if you need some recent data. If you have a cluster (master-slave, multi-master), and if you want to recreate it, you should restore the initial backup and then recreate the rest of the nodes, which could be a time-consuming task.
  • Clone Cluster: It is similar to the previous one but the backup and restore process is for the whole cluster, not only one specific database server. In this way, you can clone the entire cluster in the same task and you don’t need to recreate the rest of the nodes manually. This method still has the problem of keeping data up-to-date between clones.
  • Replication: This way includes the backup/restore option, but after the initial restore, the replication process will keep your data synchronized with the master node. In this way, if you have a database cluster, you need to restore the backup to one node, and recreate all the nodes manually.

In this blog, we will see a new ClusterControl 1.7.4 feature that allows you to use a mix of the method that we mentioned earlier to improve this task.

What is Cluster-to-Cluster Replication?

Replication between two clusters is not the same thing as extending a cluster to run across two datacenters. When setting up replication between two clusters, we actually have 2 separate systems that can operate autonomously. Replication is used to keep them in sync, so that the slave system has an updated state and can take over. 

From ClusterControl 1.7.4, it is possible to create a new cluster by directly cloning a running source cluster, or by using a recent backup of the source cluster.

After cloning the cluster, you will have a Slave Cluster (SC) receiving data, and a Master Cluster (MC) sending changes to the slave one.

Database Cluster to Cluster Replication

ClusterControl supports Cluster-to-Cluster Replication for the following cluster types:

Cluster-to-Cluster Replication for Percona XtraDB / MariaDB Galera Cluster

For MySQL based engines, GTID is required to use this feature, and asynchronous replication between the Master and Slave cluster will be used. 

There are a couple of actions to perform in order to prepare the current cluster for this job. First, at least one node on the current cluster must have the binary logs enabled. Then, you need to add the backup user configured in the database node in the ClusterControl configuration file, which will be used for management tasks. All these actions can be performed by using the ClusterControl UI or ClusterControl CLI.

Now you are ready to create the Percona XtraDB/MariaDB Galera Cluster-to-Cluster replication. When the job is finished, you will have:

  • One node in the Slave Cluster will replicate from one node in the Master Cluster.
  • The replication will be bi-directional between the clusters.
  • All nodes in the Slave Cluster will be read-only by default. It is possible to disable the read-only flag on the nodes one by one. 
  • Active-Active clustering is only recommended if applications are only touching disjoint data sets on either Cluster since the engine doesn’t offer any Conflict Detection or Resolution.

From both ClusterControl UI or ClusterControl CLI, you will be able to:

  • Create this Replication Cluster.
  • Enable the Active-Active configuration.
  • Change the Cluster Topology.
  • Rebuild a Replication Cluster.
  • Stop/Start a Replication Slave.
  • Reset Replication Slave (only implemented using ClusterControl CLI atm).


  • The backup user must be added manually in the ClusterControl configuration file.
  • The backup user credentials must be the same in both the current and new cluster.
  • The MySQL root password specified when creating the Slave Cluster must be the same as the root password used on the Master Cluster.

Known Limitations

  • Automatic Failover is not supported yet. If the master fails, then it is the responsibility of the administrator to failover to another master.
  • It is only possible to “RESET” a replication slave from the ClusterControl CLI as it’s not implemented in the ClusterControl UI yet.
  • It is only possible to Rebuild a Cluster that is in read-only mode. All nodes in a Cluster must be read-only to count as read-only Cluster.

Cluster-to-Cluster Replication for PostgreSQL

ClusterControl Cluster-to-Cluster Replication is supported on PostgreSQL using streaming replication.

As a requirement, there must be a PostgreSQL server with the ClusterControl role 'master', and when you set up the Slave Cluster, the Admin credentials must be identical to the Master Cluster.

Now you are ready to create the PostgreSQL Cluster-to-Cluster replication. When the job is finished, you will have:

  • One node in the Slave Cluster will replicate from one node in the Master Cluster.
  • The replication will be unidirectional between the clusters.
  • The node in the Slave Cluster will be read-only.
Database Cluster to Cluster Replication for PostgreSQL

From both ClusterControl UI or ClusterControl CLI, you will be able to:

  • Create this Replication Cluster.
  • Rebuild a Replication Cluster.
  • Stop/Start a Replication Slave.


  • The Admin Credentials must be identical in the Master and Slave Cluster.

Known Limitations

  • The max size of the Slave Cluster is one node.
  • The Slave Cluster cannot be staged from a backup.
  • Topology changes are not supported.
  • Only unidirectional replication is supported.


Using this new ClusterControl feature, you don’t need to do each step to create a Cluster Replication separately or manually, and as a result of using it, you will save time and effort. Give it a try!

by Sebastian Insausti at November 07, 2019 10:45 AM

November 06, 2019

Henrik Ingo


Announcing ClusterControl 1.7.4: Cluster-to-Cluster Replication - Ultimate Disaster Recovery

We’re excited to announce the 1.7.4 release of ClusterControl - the only database management system you’ll ever need to take control of your open source database infrastructure. 

In this release we launch a new function that could be the ultimate way to minimize RTO as part of your disaster recovery strategy. Cluster-to-Cluster Replication for MySQL and PostgreSQL lets you build-out a clone of your entire database infrastructure and deploy it to a secondary data center, while keeping both synced. This ensures you always have an available up-to-date database setup ready to switch-over to should disaster strike.  

In addition we are also announcing support for the new MariaDB 10.4 / Galera Cluster 4.x as well as support for ProxySQL 2.0, the latest release from the industry leading MySQL load balancer.

Lastly, we continue our commitment to PostgreSQL by releasing new user management functions, giving you complete control over who can access or administer your postgres setup.

Release Highlights

Cluster-to-Cluster Database Replication

  • Asynchronous MySQL Replication Between MySQL Galera Clusters.
  • Streaming Replication Between PostgreSQL Clusters.
  • Ability to Build Clusters from a Backup or by Streaming Directly from a Master Cluster.

Added Support for MariaDB 10.4 & Galera Cluster 4.x

  • Deployment, Configuration, Monitoring and Management of the Newest Version of Galera Cluster Technology, initially released by MariaDB
    • New Streaming Replication Ability
    • New Support for Dealing with Long Running & Large Transactions
    • New Backup Locks for SST

Added Support for ProxySQL 2.0

  • Deployment and Configuration of the Newest Version of the Best MySQL Load Balancer on the Market
    • Native Support for Galera Cluster
    • Enables Causal Reads Using GTID
    • New Support for SSL Frontend Connections
    • Query Caching Improvements

New User Management Functions for PostgreSQL Clusters

  • Take full control of who is able to access your PostgreSQL database.

View Release Details and Resources

Release Details

Cluster-to-Cluster Replication

Either streaming from your master or built from a backup, the new Cluster-to-Cluster Replication function in ClusterControl let’s you create a complete disaster recovery database system into another data center; which you can then easily failover to should something go wrong, during maintenance, or during a major outage.

In addition to disaster recovery, this function also allows you to create a copy of your database infrastructure (in just a couple of clicks) which you can use to test upgrades, patches, or to try some database performance enhancements.

You can also use this function to deploy an analytics or reporting setup, allowing you to separate your reporting load from your OLTP traffic.

Cluster to Cluster Replication

PostgreSQL User Management

You now have the ability to add or remove user access to your PostgreSQL setup. With the simple interface, you can specify specific permissions or restrictions at the individual level. It also provides a view of all defined users who have access to the database, with their respective permissions. For tips on best practices around PostgreSQL user management you can check out this blog.

MariaDB 10.4 / Galera Cluster 4.x Support

In an effort to boost performance for long running or large transactions, MariaDB & Codership have partnered to add Streaming Replication to the new MariaDB Cluster 10.4. This addition solves many challenges that this technology has previously experienced with these types of transactions. There are three new system tables added to the release to support this new function as well as new synchronisation functions.  You can read more about what’s included in this release here.

Deploy MariaDB Cluster 10.4

by fwlymburner at November 06, 2019 10:45 AM

November 05, 2019


Tips for Migrating from Proprietary to Open Source Databases

Back in the day proprietary databases were the only acceptable options. 

“No one ever got fired for buying from Oracle/Microsoft/IBM” was the saying. 

Huge, monolithic databases used for every single purpose. Paid support - that’s how the database landscape looked like in the 90s and early 00s. Sure, open source databases were there but they were treated like a “toy database,” suitable for a small website, a blog maybe, or a very small e-shop. No one sane would use them for anything critical. 

Things have changed over time and open source databases have matured. More and more are being created every year. We now see specialization, allowing users to pick the best option for given workload - time series, analytical, columnstore, NoSQL, relational, key-value  - you can pick whatever databases you need and, usually, there are numerous options to pick from. This leads to open source being more and more popular in the database world. Companies are now looking at their bills from their proprietary database vendor and wonder if they can reduce it a bit by an adopting free open source database.

As usual, there are pros and cons. What are things you may want to consider before implementing open source database? In this blog post we will share some tips you may want to keep in mind when planning to migrate to open source databases.

Start Small and Expand

If your organisation does not use open source databases, most likely it doesn’t have experience in managing them either. While doable through external consultants, it is probably not a good idea to migrate your whole environment into open source databases. 

The better approach, instead, would be to start with small projects. Maybe you need to build some sort of an internal tool or website, maybe some sort of a monitoring tool - that’s a great opportunity to use open source databases and get the experience of using it in real world environments. This will let you to learn the ins and outs of a database - how to diagnose it, how to track its performance, how to tune it to improve its performance. 

On top of that you would learn more about the high availability options and implementations. Working with it also helps to understand the differences between the proprietary database that you use and the open source databases that you implemented for smaller projects. In time you should see the open source database footprint in your organization increased significantly along with the experience of your teams. 

At some point, after collecting enough experience to run the open source databases, you may decide to pull the lever and start a migration project. Another, also quite likely option, is that you would be gradually moving your operations to open source databases on project-by-project basis. Eventually your existing applications that use proprietary RDBMS will become deprecated and, eventually, replaced by new iteration of the software, which rely on open source databases.

Pick the Right Database for the Job

We already mentioned that open source databases are, typically, quite specialised. You can pick a datastore suitable for different uses - time series, document store, key-value store, columnar store, text search. This, combined with the previous tip, lets you pick exactly the type of a database that your small project requires. Let’s say you want to build a monitoring stack for your environment. 

You may want to look into time series databases like Prometheus or TimeScaleDB to power your monitoring application. Maybe you want to implement some sort of analytical/big data solutions. You would have to design some sort of an ETL process to extract the data from your main RDBMS and load it into an open source solution. There are numerous options to be used as a datastore. Depending on the requirements and data you can choose from a wide variety of databases, for example Clickhouse, Cassandra, Hive and many more.

What about moving some parts of the application from the proprietary RDBMS to an open source solution?

There are options for that as well. Datastores like PostgreSQL or MariaDB have a great deal of features related to easy the migration from datastores like Oracle. They do come with, for example, support for PL/SQL and other SQL features available for Oracle. This is a great help - less code has to be converted alongside with the migration from Oracle to other databases, it is more likely that your stored procedures and functions also will work mostly out of the box and you won’t have to spend a significant amount of time to rewrite, test and debug code that represents a core logic of your application.

You should also not forget about the resources you already have in your team. Maybe someone has some experience in more popular open source datastores? If those match your requirements and fit in your environment, you might be able to utilize existing knowledge in your team and easy transition into the open source world.

Consider Getting Support

Migrating into datastores that you don’t have too much experience with is always tricky. Even if you proceed with small steps you still may be caught by surprise behavior, unexpected situation, bugs or even just operational situations that you are not familiar with. 

Open source databases typically have a great community support - email lists, forums, Slack channels. Usually, when you ask, someone will attempt to help you. This may not be enough in some cases. If that’s the case, you should look for a paid support. There are numerous ways how this can be achieved. 

First of all, not all open source projects are made equal. Larger projects, like PostgreSQL or MySQL may have developed an ecosystem of consulting companies that hire experts who can do consulting for you. If such option, for whatever reason, is not available or feasible, you can always reach out to the project maintainers.

It is very common that the company, which develops the datastore, will be happy to help for a price.

Independent consulting companies have a significant advantage - they are independent and they will propose a solution based on their experience, not based on the datastore and tools they develop. Vendors, well, they will usually push their own solutions and environment. On the other hand, vendors are quite interested in increasing the adoption of their datastore. If you are migrating from a proprietary datastore, they may have been in that situation several times before thus they should be able to provide you with good practices, help you to design the migration process and so on. For an inexperienced company access to the experts in the field of migration might be a great asset.

You may find access to external support useful not only at the migration phase. Dealing with new datastore is always tricky and learning all of the aspects of the operations are time consuming. Even after migration you may still benefit from having someone you can call, ask and, what's most important, learn from. Trainings, remote DBA, all of those are common options in the world of open source databases, especially for those larger, more established projects.

Find a System or Tool to Help

Open source databases come with a variety of tools, some more or less complex. Some adds functionality (high availability, backups, monitoring), some are designed to make the database management easier. 

It is important to leverage them (consulting contracts may be useful here to introduce you to the most important tooling for the open source datastore of your choice) as they can significantly increase your operational velocity and the performance and stability of the whole setup. 

Some tools are free, some require a license to operate, you should look into the pool and pick what suits you the most. We are talking here about load balancers, tools to manage replication topology, Virtual IP management tools, clustering solutions, more or less dedicated monitoring and observability platforms that may track anything starting from typical database performance metrics through giving smart predictions based on the data to specialized analysis of the query performance. 

The open source world is where such tools thrive - this also has pros and cons. It’s easy to find “A” tool, it’s hard to find “THE” tool. You can find numerous projects on GitHub, everyone doing almost the same as others. Which one to choose is the hard part. That’s why, ideally you would have a helping hand in either a support contract or you can also rely on management platforms that will help you to manage multiple aspects of the operations on open source databases, including helping you to standardize and pick the correct tools for that.

There are also platforms like ClusterControl, which was designed to help non-experienced people fully grasp the power of open source databases. ClusterControl supports multiple different types of open source datastores: MySQL and its flavors, PostgreSQL, TimeScaleDB or MongoDB. It provides unified user interface to access management functions for those datastores, making it easy to deploy and manage them. Instead of spending time testing different tools and solutions, using ClusterControl you can easily deploy highly available environment, schedule backups and keep track of metrics in the system. 

Tools like ClusterControl reduce the load on your team, increasing their ability to understand what is happening in the database they are not as familiar with as they would have wanted.

Take Your Time

What’s good to keep in mind is that there is no need to rush. The stability of your environment and the well-being of your users is paramount. Take your time to run the tests, verify all the aspects of your application. Verify that you have proper high availability and disaster recovery processes in place. 

Only when you are 100% sure you are good to switch, it’d be time to pull the lever and make the switch to open source databases.


by krzysztof at November 05, 2019 06:43 PM

November 04, 2019


Building a Hot Standby on Amazon AWS Using MariaDB Cluster

Galera Cluster 4.0 was first released as part of the MariaDB 10.4 and there are a lot of significant improvements in this version release. The most impressive feature in this release is the Streaming Replication which is designed to handle the following problems.

  • Problems with long transactions
  • Problems with large transactions
  • Problems with hot-spots in tables

In a previous blog, we deep-dove into the new Streaming Replication feature in a two-part series blog (Part 1 and Part 2). Part of this new feature in Galera 4.0 are new system tables which are very useful for querying and checking the Galera Cluster nodes and also the logs that have been processed in Streaming Replication. 

Also in previous blogs, we also showed you the Easy Way to Deploy a MySQL Galera Cluster on AWS and also how to Deploy a MySQL Galera Cluster 4.0 onto Amazon AWS EC2.

Percona hasn't released a GA for their Percona XtraDB Cluster (PXC) 8.0 yet as some features are still under development, such as the MySQL wsrep function WSREP_SYNC_WAIT_UPTO_GTID which looks to be not present yet (at least on PXC 8.0.15-5-27dev.4.2 version). Yet, when PXC 8.0 will be released, it will be packed with great features such as...

  • Improved resilient cluster
  • Cloud friendly cluster
  • improved packaging
  • Encryption support
  • Atomic DDL

While we're waiting for the release of PXC 8.0 GA, we'll cover in this blog how you can create a Hot Standby Node on Amazon AWS for Galera Cluster 4.0 using MariaDB.

What is a Hot Standby?

A hot standby is a common term in computing, especially on highly distributed systems. It's a method for redundancy in which one system runs simultaneously with an identical primary system. When failure happens on the primary node, the hot standby immediately takes over replacing the primary system. Data is mirrored to both systems in real time.

For database systems, a hot standby server is usually the second node after the primary master that is running on powerful resources (same as the master). This secondary node has to be as stable as the primary master to function correctly. 

It also serves as a data recovery node if the master node or the entire cluster goes down. The hot standby node will replace the failing node or cluster while continuously serving the demand from the clients.

In Galera Cluster, all servers part of the cluster can serve as a standby node. However, if the region or entire cluster goes down, how will you be able to cope up with this? Creating a standby node outside the specific region or network of your cluster is one option here. 

In the following section, we'll show you how to create a standby node on AWS EC2 using MariaDB.

Deploying a Hot Standby On Amazon AWS

Previously, we have showed you how you can create a Galera Cluster on AWS. You might want to read Deploying MySQL Galera Cluster 4.0 onto Amazon AWS EC2 in the case that you are new to Galera 4.0.

Deploying your hot standby node can be on another set of Galera Cluster which uses synchronous replication (check this blog Zero Downtime Network Migration With MySQL Galera Cluster Using Relay Node) or by deploying an asynchronous MySQL/MariaDB node. In this blog, we'll setup and deploy the hot standby node replicating asynchronously from one of the Galera nodes.

The Galera Cluster Setup

In this sample setup, we deployed 3-node cluster using MariaDB 10.4.8 version. This cluster is being deployed under US East (Ohio) region and the topology is shown below:

We'll use server as the master for our asynchronous slave which will serve as the standby node.

Setting up your EC2 Instance for Hot Standby Node

In the AWS console, go to EC2 found under the Compute section and click Launch Instance to create an EC2 instance just like below.

We'll create this instance under the US West (Oregon) region. For your OS type, you can choose what server you like (I prefer Ubuntu 18.04) and choose the type of instance based on your preferred target type. For this example I will use t2.micro since it doesn't require any sophisticated setup and it's only for this sample deployment.

As we've mentioned earlier that its best that your hot standby node be located on a different region and not collocated or within the same region. So in case the regional data center goes down or suffers a network outage, your hot standby can be your failover target when things gone bad. 

Before we continue, in AWS, different regions will have its own Virtual Private Cloud (VPC) and its own network. In order to communicate with the Galera cluster nodes, we must first define a VPC Peering so the nodes can communicate within the Amazon infrastructure and do not need to go outside the network which just adds overhead and security concerns. 

First, go to your VPC from where your hot standby node shall reside, then go to Peering Connections. Then you need to specify the VPC of your standby node and the Galera cluster VPC. In the example below, I have us-west-2 interconnecting to us-east-2.

Once created, you'll see an entry under your Peering Connections. However, you need to accept the request from the Galera cluster VPC, which is on us-east-2 in this example. See below,

Once accepted, do not forget to add the CIDR to the routing table. See this external blog VPC Peering about how to do it after VPC Peering.

Now, let's go back and continue creating the EC2 node. Make sure your Security Group has the correct rules or required ports that needs to be opened. Check the firewall settings manual for more information about this. For this setup,  I just set All Traffic to be accepted since this is just a test. See below,

Make sure when creating your instance, you have set the correct VPC and you have defined your proper subnet. You can check this blog in case you need some help about that. 

Setting up the MariaDB Async Slave

Step One

First we need to setup the repository, add the repo keys and update the package list in the repository cache,

$ vi /etc/apt/sources.list.d/mariadb.list

$ apt-key adv --recv-keys --keyserver hkp:// 0xF1656F24C74CD1D8

$ apt update

Step Two

Install the MariaDB packages and its required binaries

$ apt-get install mariadb-backup  mariadb-client mariadb-client-10.4 libmariadb3 libdbd-mysql-perl mariadb-client-core-10.4 mariadb-common mariadb-server-10.4 mariadb-server-core-10.4 mysql-common

Step Three

Now, let's take a backup using xbstream to transfer the files to the network from one of the nodes in our Galera Cluster.

## Wipe out the datadir of the newly fresh installed MySQL in your hot standby node.

$ systemctl stop mariadb

$ rm -rf /var/lib/mysql/*

## Then on the hot standby node, run this on the terminal,

$ socat -u tcp-listen:9999,reuseaddr stdout 2>/tmp/netcat.log | mbstream -x -C /var/lib/mysql

## Then on the target master, i.e. one of the nodes in your Galera Cluster (which is the node in this example), run this on the terminal,

$ mariabackup  --backup --target-dir=/tmp --stream=xbstream | socat - TCP4:

where is the IP of the host standby node.

Step Four

Prepare your MySQL configuration file. Since this is in Ubuntu, I am editing the file in /etc/mysql/my.cnf and with the following sample my.cnf taken from our ClusterControl template,










# log_output = FILE

#Slow logging    









innodb_data_file_path = ibdata1:100M:autoextend

## You may want to tune the below depending on number of cores and disk sub









# innodb_file_format = barracuda

innodb_flush_method = O_DIRECT


# innodb_locks_unsafe_for_binlog = 1


## avoid statistics update when doing e.g show tables




# collation_server = utf8_unicode_ci

# init_connect = 'SET NAMES utf8'

# character_set_server = utf8













key_buffer_size = 24M

tmp_table_size = 64M

max_heap_table_size = 64M

max_allowed_packet = 512M

# sort_buffer_size = 256K

# read_buffer_size = 256K

# read_rnd_buffer_size = 512K

# myisam_sort_buffer_size = 8M






query_cache_type = 0

query_cache_size = 0



# 5.6 backwards compatibility (FIXME)

# explicit_defaults_for_timestamp = 1

performance_schema = OFF

performance-schema-max-mutex-classes = 0

performance-schema-max-mutex-instances = 0



# default_character_set = utf8



# default_character_set = utf8



max_allowed_packet = 512M

# default_character_set = utf8



# log_error = /var/log/mysqld.log


# datadir = /var/lib/mysql

Of course, you can change this according to your setup and requirements.

Step Five

Prepare the backup from step #3 i.e. the finish backup that is now in the hot standby node by running the command below,

$ mariabackup --prepare --target-dir=/var/lib/mysql

Step Six

Set the ownership of the datadir in the hot standby node,

$ chown -R mysql.mysql /var/lib/mysql

Step Seven

Now, start the MariaDB instance

$  systemctl start mariadb

Step Eight

Lastly, we need to setup the asynchronous replication,

## Create the replication user on the master node, i.e. the node in the Galera cluster

MariaDB [(none)]> CREATE USER 'cmon_replication'@'' IDENTIFIED BY 'PahqTuS1uRIWYKIN';

Query OK, 0 rows affected (0.866 sec)

MariaDB [(none)]> GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'cmon_replication'@'';

Query OK, 0 rows affected (0.127 sec)

## Get the GTID slave position from xtrabackup_binlog_info as follows,

$  cat /var/lib/mysql/xtrabackup_binlog_info

binlog.000002   71131632 1000-1000-120454

##  Then setup the slave replication as follows,

MariaDB [(none)]> SET GLOBAL gtid_slave_pos='1000-1000-120454';

Query OK, 0 rows affected (0.053 sec)

MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='', MASTER_USER='cmon_replication', master_password='PahqTuS1uRIWYKIN', MASTER_USE_GTID = slave_pos;

## Now, check the slave status,

MariaDB [(none)]> show slave status \G

*************************** 1. row ***************************

                Slave_IO_State: Waiting for master to send event


                   Master_User: cmon_replication

                   Master_Port: 3306

                 Connect_Retry: 60


           Read_Master_Log_Pos: 4

                Relay_Log_File: relay-bin.000001

                 Relay_Log_Pos: 4


              Slave_IO_Running: Yes

             Slave_SQL_Running: Yes







                    Last_Errno: 0


                  Skip_Counter: 0

           Exec_Master_Log_Pos: 4

               Relay_Log_Space: 256

               Until_Condition: None


                 Until_Log_Pos: 0

            Master_SSL_Allowed: No






         Seconds_Behind_Master: 0

 Master_SSL_Verify_Server_Cert: No

                 Last_IO_Errno: 0


                Last_SQL_Errno: 0



              Master_Server_Id: 1000



                    Using_Gtid: Slave_Pos

                   Gtid_IO_Pos: 1000-1000-120454



                 Parallel_Mode: conservative

                     SQL_Delay: 0

           SQL_Remaining_Delay: NULL

       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it

              Slave_DDL_Groups: 0

Slave_Non_Transactional_Groups: 0

    Slave_Transactional_Groups: 0

1 row in set (0.000 sec)

Adding Your Hot Standby Node To ClusterControl

If you are using ClusterControl, it's easy to monitor your database server's health. To add this as a slave, select the Galera node cluster you have then go to the selection button as shown below to Add Replication Slave:

Click Add Replication Slave and choose adding an existing slave just like below,

Our topology looks promising.

As you might notice, our node serving as our hot standby node has a different CIDR which prefixes as 172.32.% us-west-2 (Oregon) while the other nodes are of 172.31.% located on us-east-2 (Ohio). They're totally on different region, so in case network failure occurs on your Galera nodes, you can failover to your hot standby node.


Building a Hot Standby on Amazon AWS is easy and straightforward. All you need is to determine your capacity requirements and your networking topology, security, and protocols that need to be setup. 

Using VPC Peering helps speed up inter-communication between different region without going to the public internet, so the connection stays within the Amazon network infrastructure. 

Using asynchronous replication with one slave is, of course, not enough, but this blog serves as the foundation on how you can initiate this. You can now easily create another cluster where the asynchronous slave is replicating and create another series of Galera Clusters serving as your Disaster Recovery nodes, or you can also use gmcast.segment variable in Galera to replicate synchronously just like what we have on this blog

by Paul Namuag at November 04, 2019 08:15 PM

MariaDB Foundation

MariaDB 10.4.9, 10.3.19 and 10.2.28, 10.1.42 and 5.5.66 Now Available

Note that 10.4.9, 10.3.19, 10.2.28 and 10.1.42 contain a critical bug, MDEV-20987, and should not be used. Releases fixing the issue will be available shortly. […]

The post MariaDB 10.4.9, 10.3.19 and 10.2.28, 10.1.42 and 5.5.66 Now Available appeared first on

by Ian Gilfillan at November 04, 2019 08:36 AM

November 01, 2019


Database Load Balancing in the Cloud - MySQL Master Failover with ProxySQL 2.0: Part One (Deployment)

The cloud provides very flexible environments to work with. You can easily scale it up and down by adding or removing nodes. If there’s a need, you can easily create a clone of your environment. This can be used for processes like upgrades, load tests, disaster recovery. The main problem you have to deal with is that applications have to connect to the databases in some way, and flexible setups can be tricky for databases - especially with master-slave setups. Luckily, there are some options to make this process easier. 

One way is to utilize a database proxy. There are several proxies to pick from, but in this blog post we will use ProxySQL, a well known proxy available for MySQL and MariaDB. We are going to show how you can use it to efficiently move traffic between MySQL nodes without visible impact for the application. We are also going to explain some limitations and drawbacks of this approach.

Initial Cloud Setup

At first, let’s discuss the setup. We will use AWS EC2 instances for our environment. As we are only testing, we don’t really care about high availability other than what we want to prove to be possible - seamless master changes. Therefore we will use a single application node and a single ProxySQL node. As per good practices, we will collocate ProxySQL on the application node and the application will be configured to connect to ProxySQL through Unix socket. This will reduce overhead related to TCP connections and increase security - traffic from the application to the proxy will not leave the local instance, leaving only ProxySQL - > MySQL connection to encrypt. Again, as this is a simple test, we will not setup SSL. In production environments you want to do that, even if you use VPC.

The environment will look like in the diagram below:

As the application, we will use Sysbench - a synthetic benchmark program for MySQL. It has an option to disable and enable the use of transactions, which we will use to demonstrate how ProxySQL handles them.

Installing a MySQL Replication Cluster Using ClusterControl

To make the deployment fast and efficient, we are going to use ClusterControl to deploy the MySQL replication setup for us. The installation of ClusterControl requires just a couple of steps. We won’t go into details here but you should open our website, register and installation of ClusterControl should be pretty much straightforward. Please keep in mind that you need to setup passwordless SSH between ClusterControl instance and all nodes that we will be managing with it.

Once ClusterControl has been installed, you can log in. You will be presented with a deployment wizard:

As we already have instances running in cloud, therefore we will just go with “Deploy” option. We will be presented with the following screen:

We will pick MySQL Replication as the cluster type and we need to provide connectivity details. It can be connection using root user or it can as well be a sudo user with or without a password.

In the next step, we have to make some decisions. We will use Percona Server for MySQL in its latest version. We also have to define a password for the root user on the nodes we will deploy.

In the final step we have to define a topology - we will go with what we proposed at the beginning - a master and three slaves.

ClusterControl will start the deployment - we can track it in the Activity tab, as shown on the screenshot above.

Once the deployment has completed, we can see the cluster in the cluster list:

Installing ProxySQL 2.0 Using ClusterControl

The next step will be to deploy ProxySQL. ClusterControl can do this for us.

We can do this in Manage -> Load Balancer.

As we are just testing things, we are going to reuse the ClusterControl instance for ProxySQL and Sysbench. In real life you would probably want to use your “real” application server. If you can’t find it in the drop down, you can always write the server address (IP or hostname) by hand.

We also want to define credentials for monitoring and administrative user. We also double-checked that ProxySQL 2.0 will be deployed (you can always change it to 1.4.x if you need).

On the bottom part of the wizard we will define the user which will be created in both MySQL and ProxySQL. If you have an existing application, you probably want to use an existing user. If you use numerous users for your application you can always import the rest of them later, after ProxySQL will be deployed.

We want to ensure that all the MySQL instances will be configured in ProxySQL. We will use explicit transactions so we set the switch accordingly. This is all we needed to do - the rest is to click on the “Deploy ProxySQL” button and let ClusterControl does its thing.

When the installation is completed, ProxySQL will show up on the list of nodes in the cluster. As you can see on the screenshot above, it already detected the topology and distributed nodes across reader and writer hostgroups.

Installing Sysbench

The final step will be to create our “application” by installing Sysbench. The process is fairly simple. At first we have to install prerequisites, libraries and tools required to compile Sysbench:

root@ip-10-0-0-115:~# apt install git automake libtool make libssl-dev pkg-config libmysqlclient-dev

Then we want to clone the sysbench repository:

root@ip-10-0-0-115:~# git clone

Finally we want to compile and install Sysbench:

root@ip-10-0-0-115:~# cd sysbench/

root@ip-10-0-0-115:~/sysbench# ./ && ./configure && make && make install

This is it, Sysbench has been installed. We now need to generate some data. For that, at first, we need to create a schema. We will connect to local ProxySQL and through it we will create a ‘sbtest’ schema on the master. Please note we used Unix socket for connection with ProxySQL.

root@ip-10-0-0-115:~/sysbench# mysql -S /tmp/proxysql.sock -u sbtest -psbtest

mysql> CREATE DATABASE sbtest;

Query OK, 1 row affected (0.01 sec)

Now we can use sysbench to populate the database with data. Again, we do use Unix socket for connection with the proxy:

root@ip-10-0-0-115:~# sysbench /root/sysbench/src/lua/oltp_read_write.lua --threads=4 --events=0 --time=3600 --mysql-socket=/tmp/proxysql.sock --mysql-user=sbtest --mysql-password=sbtest --tables=32 --report-interval=1 --skip-trx=on --table-size=100000 --db-ps-mode=disable prepare

Once the data is ready, we can proceed to our tests. 


In the second part of this blog, we will discuss ProxySQL’s handling of connections, failover and its settings that can help us to manage the master switch in a way that will be the least intrusive to the application.

by krzysztof at November 01, 2019 05:54 PM


Use MySQL Without a Password (and Still be Secure)

Use MySQL Without a Password

Use MySQL Without a PasswordSome say that the best password is the one you don’t have to remember. That’s possible with MySQL, thanks to the auth_socket plugin and its MariaDB version unix_socket.

Neither of these plugins is new, and some words have been written about the auth_socket on this blog before, for example: how to change passwords in MySQL 5.7 when using plugin: auth_socket. But while reviewing what’s new with MariaDB 10.4, I saw that the unix_socket now comes installed by default and is one of the authentication methods (one of them because in MariaDB 10.4 a single user can have more than one authentication plugin, as explained in the Authentication from MariaDB 10.4 document).

As already mentioned this is not news, and even when one installs MySQL using the .deb packages maintained by the Debian team, the root user is created so it uses the socket authentication. This is true for both MySQL and MariaDB:

root@app:~# apt-cache show mysql-server-5.7 | grep -i maintainers
Original-Maintainer: Debian MySQL Maintainers <>
Original-Maintainer: Debian MySQL Maintainers <>

Using the Debian packages of MySQL, the root is authenticated as follows:

root@app:~# whoami
root@app:~# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.7.27-0ubuntu0.16.04.1 (Ubuntu)

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select user, host, plugin, authentication_string from mysql.user where user = 'root';
| user | host      | plugin | authentication_string |
| root | localhost | auth_socket |                       |
1 row in set (0.01 sec)

Same for the MariaDB .deb package:

10.0.38-MariaDB-0ubuntu0.16.04.1 Ubuntu 16.04

MariaDB [(none)]> show grants;
| Grants for root@localhost                                                                      |
| GRANT PROXY ON ''@'%' TO 'root'@'localhost' WITH GRANT OPTION                                  |
2 rows in set (0.00 sec)

For Percona Server, the .deb packages from the official Percona Repo are also setting the root user authentication to auth_socket. Here is an example of Percona Server for MySQL 8.0.16-7 and Ubuntu 16.04:

root@app:~# whoami
root@app:~# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 9
Server version: 8.0.16-7 Percona Server (GPL), Release '7', Revision '613e312'

Copyright (c) 2009-2019 Percona LLC and/or its affiliates
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select user, host, plugin, authentication_string from mysql.user where user ='root';
| user | host      | plugin | authentication_string |
| root | localhost | auth_socket |                       |
1 row in set (0.00 sec)

So, what’s the magic? The plugin checks that the Linux user matches the MySQL user using the SO_PEERCRED socket option to obtain information about the user running the client program. Thus, the plugin can be used only on systems that support the SO_PEERCRED option, such as Linux. The SO_PEERCRED socket option allows retrieving the uid of the process that is connected to the socket. It is then able to get the user name associated with that uid.

Here’s an example with the user “vagrant”:

vagrant@mysql1:~$ whoami
vagrant@mysql1:~$ mysql
ERROR 1698 (28000): Access denied for user 'vagrant'@'localhost'

Since no user “vagrant” exists in MySQL, the access is denied. Let’s create the user and try again:

MariaDB [(none)]> GRANT ALL PRIVILEGES ON *.* TO 'vagrant'@'localhost' IDENTIFIED VIA unix_socket;
Query OK, 0 rows affected (0.00 sec)

vagrant@mysql1:~$ mysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 45
Server version: 10.0.38-MariaDB-0ubuntu0.16.04.1 Ubuntu 16.04
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show grants;
| Grants for vagrant@localhost                                                    |
| GRANT ALL PRIVILEGES ON *.* TO 'vagrant'@'localhost' IDENTIFIED VIA unix_socket |
1 row in set (0.00 sec)


Now, what about on a non-debian distro, where this is not the default? Let’s try it on Percona Server for MySQL 8 installed on a CentOS 7:

mysql> show variables like '%version%comment';
| Variable_name   | Value                                   |
| version_comment | Percona Server (GPL), Release 7, Revision 613e312 |
1 row in set (0.01 sec)

mysql> CREATE USER 'percona'@'localhost' IDENTIFIED WITH auth_socket;
ERROR 1524 (HY000): Plugin 'auth_socket' is not loaded

Failed. What is missing? The plugin is not loaded:

mysql> pager grep socket
PAGER set to 'grep socket'
mysql> show plugins;
47 rows in set (0.00 sec)

Let’s add the plugin in runtime:

mysql> nopager
PAGER set to stdout
mysql> INSTALL PLUGIN auth_socket SONAME '';
Query OK, 0 rows affected (0.00 sec)

mysql> pager grep socket; show plugins;
PAGER set to 'grep socket'
| auth_socket                     | ACTIVE | AUTHENTICATION | | GPL     |
48 rows in set (0.00 sec)

We got all we need now. Let’s try again:

mysql> CREATE USER 'percona'@'localhost' IDENTIFIED WITH auth_socket;
Query OK, 0 rows affected (0.01 sec)
mysql> GRANT ALL PRIVILEGES ON *.* TO 'percona'@'localhost';
Query OK, 0 rows affected (0.01 sec)

And now we can log in as the OS user “percona”.

[percona@ip-192-168-1-111 ~]$ whoami
[percona@ip-192-168-1-111 ~]$ mysql -upercona
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 19
Server version: 8.0.16-7 Percona Server (GPL), Release 7, Revision 613e312

Copyright (c) 2009-2019 Percona LLC and/or its affiliates
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select user, host, plugin, authentication_string from mysql.user where user ='percona';
| user    | host   | plugin   | authentication_string |
| percona | localhost | auth_socket |                       |
1 row in set (0.00 sec)

Success again!

Question: Can I try to log as the user percona from another user?

[percona@ip-192-168-1-111 ~]$ logout
[root@ip-192-168-1-111 ~]# mysql -upercona
ERROR 1698 (28000): Access denied for user 'percona'@'localhost'

No, you can’t.


MySQL is flexible enough in several aspects, one being the authentication methods. As we see in this post, one can achieve access without passwords by relying on OS users. This is helpful in several scenarios, but just to mention one: when migrating from RDS/Aurora to regular MySQL and using IAM Database Authentication to keep getting access without using passwords.

by Daniel Guzmán Burgos at November 01, 2019 01:28 PM

October 31, 2019


Mobile Alerts & Notifications For Your Database Using Telegram

One of the great features of Telegram is its bot platform. Users can interact with bots by sending them messages, commands and inline requests and it can be controlled by using HTTPS requests to Telegram's bot API. A bot allows automated systems and servers to send telegram messages to users. Quite often, it can be useful to send stuff to yourself.

In this blog post, we are going to show you how to send push notifications to your mobile via Telegram from the database server. This is a very useful trick to get notified in real-time when a problem happens, especially when the issue occurs randomly and you can't see a pattern. The basic idea is to detect something and send a push notification via Telegram. Telegram apps can be downloaded from the Apple App Store, Google Play Store, as well as for the desktop version on Windows, Mac and Linux. In our example, we’ll show you howyou can get notified on your cell phone in case a query takes more than 30 seconds to complete.

Create a Detection Script

Firstly, we have to create a detection script. This script will query some stuff, check something and present an output if the conditions are met. Suppose we are having a MySQL database server, and we would like to detect all queries that have been running for more than 30 seconds by scanning the MySQL processlist every 10 seconds. We will utilize information_schema for this purpose where we can populate the needed result using SQL query.

The following bash script should do the job:



while true
        OUTPUT=$(mysql -A -Bse "SELECT * FROM information_schema.processlist WHERE command = 'Query' AND time > $QUERY_TIME\G")
        if [[ ! -z $OUTPUT ]]; then
                echo "Date: $(date)"
                echo "$OUTPUT"

        sleep $INTERVAL

Create an option file so we can automate the user login via "mysql" client:

$ vim ~/.my.cnf

Before running the script, give the script an execution permission beforehand:

$ chmod 755
$ ./

Now, run a query that would take more than 30 seconds to complete on the database node. You should see the following output:

Date: Thu Oct 31 03:13:14 UTC 2019
*************************** 1. row ***************************
             ID: 4987
           USER: root
           HOST: localhost
             DB: sbtest2
        COMMAND: Query
           TIME: 38
          STATE: Altering table
           INFO: ALTER TABLE sbtest1 FORCE
        TIME_MS: 38030.478
          STAGE: 0
      MAX_STAGE: 0
       PROGRESS: 0.000
    MEMORY_USED: 84696
       QUERY_ID: 101661
            TID: 6838

Ctrl+C to exit. The output verbosity of the captured query is informative enough. We want the above output to be pushed to our mobile phone immediately.

Create a Telegram Bot and Channel

Now let's create a telegram bot and channel. To create a Telegram bot:

  1. Download, install and open the Telegram apps (registration is required)
  2. Search for the "botfather" telegram bot (he’s the one that’ll assist you with creating and managing your bot).
  3. You can start with "/help" to see all the possible commands.
  4. Click on or type "/newbot" to create a new bot.
  5. Give it a name. Here we are going to use "Long Query Detector"
  6. Give the bot a username. It must end up with the word "bot". Here we are going to use "my_long_query_detector_bot".

Example as in the following screenshot (via Telegram Desktop app):

You will get a token to access the HTTP API HOME directory of the user(token has been masked partially). Keep it safe for the next section.

Now we have to create a channel. This is the destination of the push notifications. Subscriber to this channel can write and read messages. From the Telegram application, go to "New Channel" and enter the required information:

Next, choose "Public Channel" and write down a unique URL (this is the channel ID as well) for the channel. Here we use "long_query_detector":

Click SAVE and proceed to add channel members. In the search box, look up "my_long_query_detector_bot" and invite it into the channel:

Channel created. If you clicked on the Channel Info, you should see the following details:

There are 2 members in the channel, the bot and us. The channel name is long-query-detector while the channel ID is long_query_detector. Notification is turned on. We can now proceed to send messages to this channel.

Install Telegram Script

Get the Telegram script and put it into the environment path of the database node:

$ git clone
$ cd
$ cp telegram /usr/bin/

Now we can test to push a notification to the channel (token has been partially masked):

$ telegram -t 10****8477:AAFduz0qz******************FiVcbnzE -c @long_query_detector "test"

You should get the test message by @my_long_query_detector_bot, as shown in the following screenshot:

Looks good. Now we have to save the token somewhere inside the server where the telegram script would recognize. Create a file called "" under the HOME directory of the user and append the following line (token has been partially masked):

$ vim ~/

We can now push a new notification without the token and channel ID parameters in the command:

$ telegram "test again"

You should get a new message in the channel. Now we can integrate Telegram with our detection script.

Push it Out

Make some amendments to our detection script:


while true; do

        OUTPUT=$(mysql -A -Bse "SELECT * FROM information_schema.processlist WHERE command = 'Query' AND time > $QUERY_TIME\G")
        if [[ ! -z $OUTPUT ]]; then
                echo "Date: $(date)" > $OUTPUT_FILE
                echo "$OUTPUT" >> $OUTPUT_FILE
                cat $OUTPUT_FILE | telegram -

        sleep $INTERVAL


Pay attention to lines 5, 11, 12 and 13 where we added output redirection to a file and then send the output of the file to the Telegram channel. Now, we are ready to start the detection script in the background:

$ nohup ./ &

Let's try by executing the following statement on the database server (to force table rebuilding for a 5 million-rows table):

MariaDB> ALTER TABLE sbtest2.sbtest1 FORCE;

Wait for a maximum of 40 seconds (30 seconds long query time + 10 seconds interval) and you should see the following push notification coming in:

Cool! You can now sit back and wait knowing that queries exceeding 30 seconds will be logged into this channel and you will get notified.

ClusterControl and Telegram Integration

All ClusterControl notifications can be pushed to Telegram by using ClusterControl integration module configurable at ClusterControl -> Integrations -> 3rd party Notifications -> Add New Integrations. On step 1, we have to choose one of supported integration services before proceeding to step 2. For Telegram, you have to specify the required bot token and the channel ID with a "@" prefix:

Click on the "Test" button where ClusterControl will send out a test notification to the channel for verification. Then, proceed to step 3 to select a cluster (or you can pick multiple clusters) and the event (or you can pick multiple events) that will be triggered by ClusterControl:

Save the setting and you are good. The triggered events will be pushed to the configured Telegram channel. Here is an example of what you would get in the channel if something went wrong:

That's it for now. Happy notifying!

by ashraf at October 31, 2019 06:47 PM

October 30, 2019


PostgreSQL Backup Method Features in AWS S3

Amazon released S3 in early 2006 and the first tool enabling PostgreSQL backup scripts to upload data in the cloud — s3cmd — was born just shy of a year later. By 2010 (according to my Google search skills) Open BI blogs about it. It is then safe to say that some of the PostgreSQL DBAs have been backing up data to AWS S3 for as long as 9 years. But how? And what has changed in that time? While s3cmd is still referenced by some in the context of known PostgreSQL backup tools, the methods have seen changes allowing for better integration with either the filesystem or PostgreSQL native backup options in order to achieve the desired recovery objectives RTO and RPO.

Why Amazon S3

As pointed out throughout the Amazon S3 documentation (S3 FAQs being a very good starting point) the advantages of using the S3 service are:

  • 99.999999999 (eleven nines) durability
  • unlimited data storage
  • low costs (even lower when combined with BitTorrent)
  • inbound network traffic free of charge
  • only outbound network traffic is billable

AWS S3 CLI Gotchas

The AWS S3 CLI toolkit provides all the tools needed for transferring data in and out of the S3 storage, so why not use those tools? The answer lies in the Amazon S3 implementation details which include measures for handling the limitations and constraints related to object storage:

As an example refer to the aws s3 cp help page:

--expected-size (string) This argument specifies the expected size of a stream in terms of bytes. Note that this argument is needed only when a stream is being uploaded to s3 and the size is larger than 5GB. Failure to include this argument under these conditions may result in a  failed upload due to too many parts in upload.

Avoiding those pitfalls requires in-depth knowledge of the S3 ecosystem which is what the purpose-built PostgreSQL and S3 backup tools are trying to achieve.

PostgreSQL Native Backup Tools With Amazon S3 Support

S3 integration is provided by some of the well-known backup tools, implementing the PostgreSQL native backup features.


BarmanS3 is implemented as Barman Hook Scripts. It does rely on AWS CLI, without addressing the recommendations and limitations listed above. The simple setup makes it a good candidate for small installations. The development is somewhat stalled, last update about a year ago, making this product a choice for those already using Barman in their environments.

S3 Dumps

S3dumps is an active project, implemented using Amazon’s Python library Boto3. Installation is easily performed via pip. Although relying on the Amazon S3 Python SDK, a search of the source code for regex keywords such as multi.*part or storage.*class doesn’t reveal any of the advanced S3 features, such as multipart transfers.


pgBackRest implements S3 as a repository option. This is one of the well-known PostgreSQL backup tools, providing a feature-rich set of backup options such as parallel backup and restore, encryption, and tablespaces support. It is mostly C code, which provides the speed and throughput we are looking for, however, when it comes to interacting with the AWS S3 API this comes at the price of the additional work required for implementing the S3 storage features. Recent version implement S3 multi-part upload.


WAL-G announced 2 years ago is being actively maintained. This rock-solid PostgreSQL backup tool implements storage classes, but not multipart upload (searching the code for CreateMultipartUpload didn’t find any occurrence).


pghoard was released about 3 years ago. It is a performant and feature-rich PostgreSQL backup tool with support for S3 multipart transfers. It doesn’t offer any of the other S3 features such as storage class and object lifecycle management.

S3 as a local filesystem

Being able to access S3 storage as a local filesystem, is a highly desired feature as it opens up the possibility of using the PostgreSQL native backup tools.

For Linux environments, Amazon offers two options: NFS and iSCSI. They take advantage of the AWS Storage Gateway.


A locally mounted NFS share is provided by the AWS Storage Gateway File service. According to the link we need to create a File Gateway.

AWS Storage Gateway: Creating a File gateway

At the Select host platform screen select Amazon EC2 and click the Launch instance button to start the EC2 wizard for creating the instance.

Now, just out of this Sysadmin’s curiosity, let’s inspect the AMI used by the wizard as it gives us an interesting perspective on some of the AWS internal pieces. With the image ID known ami-0bab9d6dffb52fef5 let’s look at details:

AWS Storage Gateway: Client instance details

As shown above, the AMI name is aws-thinstaller — so what is a “thinstaller”? Internet searches reveal that Thinstaller is an IBM Lenovo software configuration management tool for Microsoft products and is referenced first in this 2008 blog, and later in this Lenovo forum post and this school district request for service. I had no way of knowing that one, as my Windows sysadmin job ended 3 years earlier. So was this AMI built with the Thinstaller product To make matters even more confusing, the AMI operating system is listed as “Other Linux” which can be confirmed by SSH-ing into the system as admin.

A wizard gotcha: despite the EC2 firewall setup instructions my browser was timing out when connecting to the storage gateway. Allowing port 80 is documented at Port Requirements — we could argue that the wizard should either list all required ports, or link to documentation, however in the spirit of the cloud, the answer is “automate” with tools such as CloudFormation.

AWS Storage Gateway: File gateway EC2 setup wizard

The wizard also suggests to start with an xlarge size instance.

AWS Storage Gateway: File gateway EC2 instance details

Once the storage gateway is ready, configure the NFS share by clicking the Create file share button in the Gateway menu:

AWS Storage Gateway: Gateway screen Viorel Tabara Viorel Tabara 12:36 AM Yesterday AWS Storage Gateway: File gateway EC2 setup wizard Viorel Tabara Viorel Tabara 12:39 AM Yesterday AWS Storage Gateway: Volume gateway Amazon Marketplace --- Cost estimator

Once the NFS share is ready, follow the instructions to mount the filesystem:

AWS Storage Gateway: File gateway EC2 setup wizard

In the above screenshot, note that the mount command references the instance private IP address. To mount from a public host just use the instance public address as shown in the EC2 instance details above.

The wizard will not block if the S3 bucket doesn’t exist at the time of creating the file share, however, once the S3 bucket is created we need to restart the instance, otherwise, the mount command fails with:

[root@omiday ~]# mount -t nfs -o nolock,hard /mnt

mount.nfs: mounting failed, reason given by server: No such file or directory

Verify that the share has been made available:

[root@omiday ~]# df -h /mnt

Filesystem                            Size Used Avail Use% Mounted on  8.0E 0 8.0E 0% /mnt

Now let’s run a quick test:

postgres@[local]:54311 postgres# \l+ test

                                                List of databases

Name |  Owner | Encoding |   Collate | Ctype | Access privileges |  Size | Tablespace | Description


test | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                   | 2763 MB | pg_default |

(1 row)

[root@omiday ~]# date ; time pg_dump -d test | gzip -c >/mnt/test.pg_dump.gz ; date

Sun 27 Oct 2019 06:06:24 PM PDT

real    0m29.807s

user    0m15.909s

sys     0m2.040s

Sun 27 Oct 2019 06:06:54 PM PDT

Note that the Last modified timestamp on the S3 bucket is about a minute later, which as mentioned earlier has to with the Amazon S3 data consistency model.

AWS Storage Gateway: S3 bucket contents after pg_dump

Here’s a more exhaustive test:

~ $ for q in {0..20} ; do touch /mnt/touched-at-$(date +%Y%m%d%H%M%S) ;

sleep 1 ; done

~ $ aws s3 ls s3://s9s-postgresql-backup | nl

    1      2019-10-27 19:50:40          0 touched-at-20191027194957

    2      2019-10-27 19:50:40          0 touched-at-20191027194958

    3      2019-10-27 19:50:40          0 touched-at-20191027195000

    4      2019-10-27 19:50:40          0 touched-at-20191027195001

    5      2019-10-27 19:50:40          0 touched-at-20191027195002

    6      2019-10-27 19:50:40          0 touched-at-20191027195004

    7      2019-10-27 19:50:40          0 touched-at-20191027195005

    8      2019-10-27 19:50:40          0 touched-at-20191027195007

    9      2019-10-27 19:50:40          0 touched-at-20191027195008

   10      2019-10-27 19:51:10          0 touched-at-20191027195009

   11      2019-10-27 19:51:10          0 touched-at-20191027195011

   12      2019-10-27 19:51:10          0 touched-at-20191027195012

   13      2019-10-27 19:51:10          0 touched-at-20191027195013

   14      2019-10-27 19:51:10          0 touched-at-20191027195014

   15      2019-10-27 19:51:10          0 touched-at-20191027195016

   16      2019-10-27 19:51:10          0 touched-at-20191027195017

   17      2019-10-27 19:51:10          0 touched-at-20191027195018

   18      2019-10-27 19:51:10          0 touched-at-20191027195020

   19      2019-10-27 19:51:10          0 touched-at-20191027195021

   20      2019-10-27 19:51:10          0 touched-at-20191027195022

   21      2019-10-27 19:51:10          0 touched-at-20191027195024

Another issue worth mentioning: after playing with various configurations, creating and destroying gateways and shares, at some point when attempting to activate a File gateway, I was getting an Internal Error:

AWS Storage Gateway: browser Internal Error on gateway activation

The command line gives some more details, although not pointing to any issue:

~$ curl -sv ""

*   Trying


* Connected to ( port 80 (#0)

> GET /?gatewayType=FILE_S3&activationRegion=us-east-1 HTTP/1.1

> Host:

> User-Agent: curl/7.65.3

> Accept: */*


* Mark bundle as not supporting multiuse

< HTTP/1.1 500 Internal Server Error

< Date: Mon, 28 Oct 2019 06:33:30 GMT

< Content-type: text/html

< Content-length: 14


* Connection #0 to host left intact

Internal Error~ $

This forum post pointed out that my issue may have something to do with the VPC Endpoint I had created. My fix was deleting the VPC endpoint I had setup during various iSCSI trial and error runs.

While S3 encrypts data at rest, the NFS wire traffic is plain text. To wit, here’s a tcpdump packet dump:

23:47:12.225273 IP > Flags [P.], seq 2665:3377, ack 2929, win 501, options [nop,nop,TS val 1899459538 ecr 38013066], length 712: NFS request xid 3511704119 708 getattr fh 0,2/53

E...8.@.@.......k.......        ...c..............


..............&...........]....................# inittab is no longer used.




# Ctrl-Alt-Delete is handled by /usr/lib/systemd/system/


# systemd uses 'targets' instead of runlevels. By default, there are two main targets:


# analogous to runlevel 3

# analogous to runlevel 5


# To view current default target, run:

# systemctl get-default


# To set a default target, run:

# systemctl set-default

.....   .........0..

23:47:12.331592 IP > Flags [P.], seq 2929:3109, ack 3377, win 514, options [nop,nop,TS val 38013174 ecr 1899459538], length 180: NFS reply xid 3511704119 reply ok 176 getattr NON 4 ids 0/33554432 sz -2138196387

Until this IEE draft is approved the only secure option for connecting from outside AWS is by using a VPN tunnel. This complicates the setup, making the on-premise NFS option less appealing than the FUSE based tools I’m going to discuss a bit later.


This option is provided by the AWS Storage Gateway Volume service. Once the service is configured head to the Linux iSCSI client setup section.

The advantage of using iSCSI over NFS consists in the ability of taking advantage of the Amazon cloud native backup, cloning, and snapshot services. For details and step by step instructions, follow the links to AWS Backup, Volume Cloning, and EBS Snapshots

While there are plenty of advantages, there is an important restriction that will likely throw off many users: it is not possible to access the gateway via its public IP address. So, just as the NFS option, this requirement adds complexity to the setup.

Despite the clear limitation and convinced that I will not be able to complete this setup, I still wanted to get a feeling of how it’s done. The wizard redirects to an AWS Marketplace configuration screen.

AWS Storage Gateway: Volume gateway Amazon EC2 setup

Note that the Marketplace wizard creates a secondary disk, however not big enough in size, and therefore we still need to add the two required volumes as indicated by the host setup instructions. If the storage requirements aren’t met the wizard will block at the local disks configuration screen:

AWS Storage Gateway: Volume gateway local disks configuration

Here’s a glimpse of Amazon Marketplace configuration screen:

AWS Storage Gateway: Volume gateway Amazon Marketplace --- Software details

There is a text interface accessible via SSH (log in as user sguser) which provides basic network troubleshooting tools and other configuration options which cannot be performed via the web GUI:

~ $ ssh

Warning: Permanently added ',' (ECDSA) to the list of known hosts.

'screen.xterm-256color': unknown terminal type.

      AWS Storage Gateway Configuration


      ##  Currently connected network adapters:


      ##  eth0:


      1: SOCKS Proxy Configuration

      2: Test Network Connectivity

      3: Gateway Console

      4: View System Resource Check (0 Errors)

      0: Stop AWS Storage Gateway

      Press "x" to exit session

      Enter command:

And a couple of other important points:


In this category, I have listed the FUSE based tools that provide a more complete S3 compatibility compared to the PostgreSQL backup tools, and in contrast with Amazon Storage Gateway, allow data transfers from an on-premise host to Amazon S3 without additional configuration. Such a setup could provide S3 storage as a local filesystem that PostgreSQL backup tools can use in order to take advantage of features such as parallel pg_dump.


s3fs-fuse is written in C++, a language supported by the Amazon S3 SDK toolkit, and as such is well suited for implementing advanced S3 features such as multipart uploads, caching, S3 storage class, server-side encryption, and region selection. It is also highly POSIX compatible.

The application is included with my Fedora 30 making the installation straightforward.

To test:

~/mnt/s9s $ time pg_dump -d test | gzip -c >test.pg_dump-$(date +%Y%m%d-%H%M%S).gz

real    0m35.761s

user    0m16.122s

sys     0m2.228s

~/mnt/s9s $ aws s3 ls s3://s9s-postgresql-backup

2019-10-28 03:16:03   79110010 test.pg_dump-20191028-031535.gz

Note that the speed is somewhat slower than using the Amazon Storage Gateway with the NFS option. It does make up for the lower performance by providing a highly POSIX compatible filesystem.


S3QL provides S3 features such as storage class, and server side encryption. The many features are described in the exhaustive S3QL Documentation, however, if you are looking for multipart upload it is nowhere mentioned. This is because S3QL implements its own file splitting algorithm in order to provide the de-duplication feature. All files are broken into 10 MB blocks.

The installation on a Red Hat based system is straightforward: install the required RPM dependencies via yum:




















Then install the Python dependencies using pip3:

pip-3.6 install setuptools cryptography defusedxml apsw dugong pytest requests llfuse==1.3.6

A notable characteristic of this tool is the S3QL filesystem created on top of the S3 bucket.


goofys is an option when performance trumps the POSIX compliance. It’s goals are the opposite of s3fs-fuse. Focus on speed is also reflected in the distribution model. For Linux there are pre-built binaries. Once downloaded run:

~/temp/goofys $ ./goofys s9s-postgresql-backup ~/mnt/s9s/

And backup:

~/mnt/s9s $ time pg_dump -d test | gzip -c >test.pg_dump-$(date +%Y%m%d-%H%M%S).gz

real    0m27.427s

user    0m15.962s

sys     0m2.169s

~/mnt/s9s $ aws s3 ls s3://s9s-postgresql-backup

2019-10-28 04:29:05   79110010 test.pg_dump-20191028-042902.gz

Note that the object creation time on S3 is only 3 seconds away from file timestamp.


ObjectFS appears to have been maintained until about 6 months ago. A check for multipart upload reveals that it is not implemented, From the author’s research paper we learn that the system is still in development, and since the paper was released in 2019 I thought it’d be worth mentioning it.

S3 Clients

As mentioned earlier, in order to use the AWS S3 CLI, we need to take into consideration several aspects specific to object storage in general, and Amazon S3 in particular. If the only requirement is the ability to transfer data in and out of the S3 storage, then a tool that closely follows the Amazon S3 recommendations can do the job.

s3cmd is one of the tools that stood the test of time. This 2010 Open BI blog talks about it, at a time when S3 was the new kid on the block.

Notable features:

  • server-side encryption
  • automatic multipart uploads
  • bandwidth throttling

Head to the S3cmd: FAQ and Knowledge Base for more information.


The options available for backing up a PostgreSQL cluster to Amazon S3 differ in the data transfer methods and how they align with the Amazon S3 strategies.

AWS Storage Gateway complements the Amazon’s S3 object storage, at the cost of increased complexity along with additional knowledge required in order to get the most out of this service. For example, selecting the correct number of disks requires careful planning, and a good grasp of Amazon’s S3 related costs is a must in order to minimize the operational costs.

While applicable to any cloud storage not only Amazon S3, the decision of storing the data in a public cloud has security implications. Amazon S3 provides encryption for data at rest and data in transit, with no guarantee of zero knowledge, or no knowledge proofs. Organizations wishing to have full control over their data should implement client-side encryption and storing the encryption keys outside their AWS infrastructure.

For commercial alternatives to mapping S3 to a local filesystem it’s worth checking out the products from ObjectiveFS or NetApp.

Lastly, organizations seeking to developing their own backup tools, either by building on the foundation provided by the many open source applications, or starting from zero, should consider using the S3 compatibility test, made available by the Ceph project.

by Viorel Tabara at October 30, 2019 08:41 PM

October 29, 2019


An Overview of pgModeler for PostgreSQL

When a project is being designed, the first thing to think about is what its purpose will be... what is the best solution, and what are the alternatives.  In software engineering, everything is done to serve data, whether it's a graphical interface or business logic, so it's no wonder the best starting point might be database planning.

The official documentation of a database can be very complicated, whatever technology it may be. Using the best concepts for a specific situation is not an easy task.

pgModeler is the program you can use to increase your productivity with PostgreSQL. It's free, works on Windows, Mac, or Linux and provides a way to work with DDL commands through a rich interface built on top of SVG.

pgModeler Logo


Installation is very simple, just download it from the site, and run the file. Some operating systems already have pgModeler included in their repositories, which is an alternative to downloading it.

pgModeler setup wizard found on the installation file (.run .exe .dmg).

pgModeler is an open source solution, and you can find it on GitHub, where new releases are published. 

It has a  paid version option, where you can support the project and use the latest features, for example, compatibility with the latest versions of PostgreSQL.

If you need a desktop entry, check it out in the following. This file can be named pgmodeler.desktop, and you can place it on /usr/share/applications/, but don’t forget to copy the logo presented in this blog, saving it on /etc/pgmodeler/pgmodeler_logo.png.

[Desktop Entry]
GenericName=PostgreSQL Database Modeler
Comment=Program with nice Qt interface for visual modeling PostgreSQL on Entity Relationship Diagram

Graphical Interface

The curriculum of information technology courses, including colleges, contains data modeling disciplines, with UML as the standard for project design and documentation.

The graphical interface of pgModeler allows working with a kind of diagram specific for databases, the Entity Relationship Diagram (ERD), reproducing what you’ve built inside of your PostgreSQL cluster seamlessly.

Several languages are available:

  • English (en_US);
  • Spanish (es_ES);
  • French (fr_FR);
  • Dutch (nl_NL);
  • Portuguese (pt_BR); and
  • Chinese (zh_CN).

Printing what you’ve built is also available, and customizations are possible in the appearance, changing the font and colors of schemas, tables, relationships, etc.


The features of pgModeler are simply tools to help you navigate between logical and physical models.

A logical model, is the diagram. You can use it to transform the idea of your customer, into a well documented project that other person can understand in the future, and make modifications on it.

The physical model is the script, the SQL code. PostgreSQL understands it, and so pgModeler too.

Through its reverse engineering algorithm, you can connect into your PostgreSQL cluster, and look at your existing domain model with a different perspective, or build it first, and then create the domain model executing the script, generated by what you’ve built in the diagram.

The goal between pgModeler and PostgreSQL.

Entity Relationship Diagram

Once you understand its purpose, let’s see how a diagram looks like for a very simple project where you can visualize the relationship between the tables customer and film, named rental.

Note the lines between the tables, they are easy to see, and most importantly, understand. Primary and foreign keys are the starting points to visualize the relationships, and at their edges, the cardinality is being shown.

Entity Relationship Diagram containing three tables.

The constraints representing the keys can be seen, as pk, fk, and even the NOT NULL, as nn, in green at the right of each table. The schema is named store, and the picture above has been generated by the program itself.

Earlier we saw that diagrams are the logical model, which can be applied into a PostgreSQL cluster. In order to apply it, a connection must be established, for this example, I created a cluster running inside a Docker container.

Configuring the connection between pgModeler and PostgreSQL.

Now with the database connection configured and tested, exporting is easy. Security concerns must be considered at this point, like establishing SSL with your cluster.

In the following, pgModeler creates the store schema, inside of a completely new database named blog_db, as I wanted to, not forgetting to mention the new role, with login permission.

Creating the domain model in the cluster.

Exporting process successfully ended! – Ok, there is a mistake, but it has been successfully ended, for sure.

Verifying is the domain model has been successfully initialized by pgModeler.
postgres@716f75fdea56:~$ psql -U thiago -w -d blog_db;
psql (10.10 (Debian 10.10-1.pgdg90+1))
Type "help" for help.
blog_db=> set search_path to store;
blog_db=> \dt
        List of relations
Schema |   Name | Type  | Owner
store  | customer | table | thiago
store  | film   | table | thiago
store  | rental   | table | thiago
(3 rows)
blog_db=> \du
                                  List of roles
Role name |                         Attributes | Member of
postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
thiago    |                                                   | {}


Domain models are also known as mini worlds, and rarely you’ll see the same being applied on different projects. pgModeler can help you focus on what is really important, avoiding waste of time concerning the SQL syntax.

by thiagolopes at October 29, 2019 02:54 PM

October 28, 2019


Deploying a Highly Available Nextcloud with MySQL Galera Cluster and GlusterFS

Nextcloud is an open source file sync and share application that offers free, secure, and easily accessible cloud file storage, as well as a number of tools that extend its feature set. It's very similar to the popular Dropbox, iCloud and Google Drive but unlike Dropbox, Nextcloud does not offer off-premises file storage hosting. 

Nextcloud Logo

In this blog post, we are going to deploy a high-available setup for our private "Dropbox" infrastructure using Nextcloud, GlusterFS, Percona XtraDB Cluster (MySQL Galera Cluster), ProxySQL with ClusterControl as the automation tool to manage and monitor the database and load balancer tiers. 

Note: You can also use MariaDB Cluster, which uses the same underlying replication library as in Percona XtraDB Cluster. From a load balancer perspective, ProxySQL behaves similarly to MaxScale in that it can understand the SQL traffic and has fine-grained control on how traffic is routed. 

Database Architecture for Nexcloud

In this blog post, we used a total of 6 nodes.

  • 2 x proxy servers 
  • 3 x database + application servers
  • 1 x controller server (ClusterControl)

The following diagram illustrates our final setup:

Highly Available MySQL Nextcloud Database Architecture

For Percona XtraDB Cluster, a minimum of 3 nodes is required for a solid multi-master replication. Nextcloud applications are co-located within the database servers, thus GlusterFS has to be configured on those hosts as well. 

Load balancer tier consists of 2 nodes for redundancy purposes. We will use ClusterControl to deploy the database tier and the load balancer tiers. All servers are running on CentOS 7 with the following /etc/hosts definition on every node: nextcloud1 db1 nextcloud2 db2 nextcloud3 db3 vip db proxy1 lb1 proxysql1 proxy2 lb2 proxysql2

Note that GlusterFS and MySQL are highly intensive processes. If you are following this setup (GlusterFS and MySQL resides in a single server), ensure you have decent hardware specs for the servers.

Nextcloud Database Deployment

We will start with database deployment for our three-node Percona XtraDB Cluster using ClusterControl. Install ClusterControl and then setup passwordless SSH to all nodes that are going to be managed by ClusterControl (3 PXC + 2 proxies). On ClusterControl node, do:

$ whoami


$ ssh-copy-id

$ ssh-copy-id

$ ssh-copy-id

$ ssh-copy-id

$ ssh-copy-id

**Enter the root password for the respective host when prompted.

Open a web browser and go to https://{ClusterControl-IP-address}/clustercontrol and create a super user. Then go to Deploy -> MySQL Galera. Follow the deployment wizard accordingly. At the second stage 'Define MySQL Servers', pick Percona XtraDB 5.7 and specify the IP address for every database node. Make sure you get a green tick after entering the database node details, as shown below:

Deploy a Nextcloud Database Cluster

Click "Deploy" to start the deployment. The database cluster will be ready in 15~20 minutes. You can follow the deployment progress at Activity -> Jobs -> Create Cluster -> Full Job Details. The cluster will be listed under Database Cluster dashboard once deployed.

We can now proceed to database load balancer deployment.

Nextcloud Database Load Balancer Deployment

Nextcloud is recommended to run on a single-writer setup, where writes will be processed by one master at a time, and the reads can be distributed to other nodes. We can use ProxySQL 2.0 to achieve this configuration since it can route the write queries to a single master. 

To deploy a ProxySQL, click on Cluster Actions > Add Load Balancer > ProxySQL > Deploy ProxySQL. Enter the required information as highlighted by the red arrows:

Deploy ProxySQL for Nextcloud

Fill in all necessary details as highlighted by the arrows above. The server address is the lb1 server, Further down, we specify the ProxySQL admin and monitoring users' password. Then include all MySQL servers into the load balancing set and then choose "No" in the Implicit Transactions section. Click "Deploy ProxySQL" to start the deployment.

Repeat the same steps as above for the secondary load balancer, lb2 (but change the "Server Address" to lb2's IP address). Otherwise, we would have no redundancy in this layer.

Our ProxySQL nodes are now installed and configured with two host groups for Galera Cluster. One for the single-master group (hostgroup 10), where all connections will be forwarded to one Galera node (this is useful to prevent multi-master deadlocks) and the multi-master group (hostgroup 20) for all read-only workloads which will be balanced to all backend MySQL servers.

Next, we need to deploy a virtual IP address to provide a single endpoint for our ProxySQL nodes so your application will not need to define two different ProxySQL hosts. This will also provide automatic failover capabilities because virtual IP address will be taken over by the backup ProxySQL node in case something goes wrong to the primary ProxySQL node.

Go to ClusterControl -> Manage -> Load Balancers -> Keepalived -> Deploy Keepalived. Pick "ProxySQL" as the load balancer type and choose two distinct ProxySQL servers from the dropdown. Then specify the virtual IP address as well as the network interface that it will listen to, as shown in the following example:

Deploy Keepalived & ProxySQL for Nextcloud

Once the deployment completes, you should see the following details on the cluster's summary bar:

Nextcloud Database Cluster in ClusterControl

Finally, create a new database for our application by going to ClusterControl -> Manage -> Schemas and Users -> Create Database and specify "nextcloud". ClusterControl will create this database on every Galera node. Our load balancer tier is now complete.

GlusterFS Deployment for Nextcloud

The following steps should be performed on nextcloud1, nextcloud2, nextcloud3 unless otherwise specified.

Step One

It's recommended to have a separate this for GlusterFS storage, so we are going to add additional disk under /dev/sdb and create a new partition:

$ fdisk /dev/sdb

Follow the fdisk partition creation wizard by pressing the following key:

n > p > Enter > Enter > Enter > w

Step Two

Verify if /dev/sdb1 has been created:

$ fdisk -l /dev/sdb1

Disk /dev/sdb1: 8588 MB, 8588886016 bytes, 16775168 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Step Three

Format the partition with XFS:

$ mkfs.xfs /dev/sdb1

Step Four

Mount the partition as /storage/brick:

$ mkdir /glusterfs

$ mount /dev/sdb1 /glusterfs

Verify that all nodes have the following layout:

$ lsblk


sda      8:0 0 40G  0 disk

└─sda1   8:1 0 40G  0 part /

sdb      8:16 0   8G 0 disk

└─sdb1   8:17 0   8G 0 part /glusterfs

Step Five

Create a subdirectory called brick under /glusterfs:

$ mkdir /glusterfs/brick

Step Six

For application redundancy, we can use GlusterFS for file replication between the hosts. Firstly, install GlusterFS repository for CentOS:

$ yum install centos-release-gluster -y

$ yum install epel-release -y

Step Seven

Install GlusterFS server

$ yum install glusterfs-server -y

Step Eight

Enable and start gluster daemon:

$ systemctl enable glusterd

$ systemctl start glusterd

Step Nine

On nextcloud1, probe the other nodes:

(nextcloud1)$ gluster peer probe

(nextcloud1)$ gluster peer probe

You can verify the peer status with the following command:

(nextcloud1)$ gluster peer status

Number of Peers: 2


Uuid: f9d2928a-6b64-455a-9e0e-654a1ebbc320

State: Peer in Cluster (Connected)


Uuid: 100b7778-459d-4c48-9ea8-bb8fe33d9493

State: Peer in Cluster (Connected)

Step Ten

On nextcloud1, create a replicated volume on probed nodes:

(nextcloud1)$ gluster volume create rep-volume replica 3

volume create: rep-volume: success: please start the volume to access data

Step Eleven

Start the replicated volume on nextcloud1:

(nextcloud1)$ gluster volume start rep-volume

volume start: rep-volume: success

Verify the replicated volume and processes are online:

$ gluster volume status

Status of volume: rep-volume

Gluster process                             TCP Port RDMA Port Online Pid


Brick         49152 0 Y 32570

Brick         49152 0 Y 27175

Brick         49152 0 Y 25799

Self-heal Daemon on localhost               N/A N/A Y 32591

Self-heal Daemon on            N/A N/A Y 27196

Self-heal Daemon on            N/A N/A Y 25820

Task Status of Volume rep-volume


There are no active volume tasks

Step Twelve

Mount the replicated volume on /var/www/html. Create the directory:

$ mkdir -p /var/www/html

Step Thirteen

13) Add following line into /etc/fstab to allow auto-mount:

/dev/sdb1 /glusterfs xfs defaults,defaults 0 0

localhost:/rep-volume /var/www/html   glusterfs defaults,_netdev 0 0

Step Fourteen

Mount the GlusterFS to /var/www/html:

$ mount -a

And verify with:

$ mount | grep gluster

/dev/sdb1 on /glusterfs type xfs (rw,relatime,seclabel,attr2,inode64,noquota)

localhost:/rep-volume on /var/www/html type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

The replicated volume is now ready and mounted in every node. We can now proceed to deploy the application.

Nextcloud Application Deployment

The following steps should be performed on nextcloud1, nextcloud2 and nextcloud3 unless otherwise specified.

Nextcloud requires PHP 7.2 and later and for CentOS distribution, we have to enable a number of repositories like EPEL and Remi to simplify the installation process.

Step One

If SELinux is enabled, disable it first:

$ setenforce 0

$ sed -i 's/^SELINUX=.*/SELINUX=permissive/g' /etc/selinux/config

You can also run Nextcloud with SELinux enabled by following this guide.

Step Two

Install Nextcloud requirements and enable Remi repository for PHP 7.2:

$ yum install -y epel-release yum-utils unzip curl wget bash-completion policycoreutils-python mlocate bzip2

$ yum install -y

$ yum-config-manager --enable remi-php72

Step Three

Install Nextcloud dependencies, mostly Apache and PHP 7.2 related packages:

$ yum install -y httpd php72-php php72-php-gd php72-php-intl php72-php-mbstring php72-php-mysqlnd php72-php-opcache php72-php-pecl-redis php72-php-pecl-apcu php72-php-pecl-imagick php72-php-xml php72-php-pecl-zip

Step Four

Enable Apache and start it up:

$ systemctl enable httpd.service

$ systemctl start httpd.service

Step Five

Make a symbolic link for PHP to use PHP 7.2 binary:

$ ln -sf /bin/php72 /bin/php

Step Six

On nextcloud1, download Nextcloud Server from here and extract it:

$ wget

$ unzip nextcloud*

Step Seven

On nextcloud1, copy the directory into /var/www/html and assign correct ownership:

$ cp -Rf nextcloud /var/www/html

$ chown -Rf apache:apache /var/www/html

**Note the copying process into /var/www/html is going to take some time due to GlusterFS volume replication.

Step Eight

Before we proceed to open the installation wizard, we have to disable pxc_strict_mode variable to other than "ENFORCING" (the default value). This is due to the fact that Nextcloud database import will have a number of tables without primary key defined which is not recommended to run on Galera Cluster. This is explained further details under Tuning section further down.

To change the configuration with ClusterControl, simply go to Manage -> Configurations -> Change/Set Parameters:

Change Set Parameters - ClusterControl

Choose all database instances from the list, and enter:

  • Group: MYSQLD
  • Parameter: pxc_strict_mode
  • New Value: PERMISSIVE

ClusterControl will perform the necessary changes on every database node automatically. If the value can be changed during runtime, it will be effective immediately. ClusterControl also configure the value inside MySQL configuration file for persistency. You should see the following result:

Change Set Parameter - ClusterControl

Step Nine

Now we are ready to configure our Nextcloud installation. Open the browser and go to nextcloud1's HTTP server at and you will be presented with the following configuration wizard:

Nextcloud Account Setup

Configure the "Storage & database" section with the following value:

  • Data folder: /var/www/html/nextcloud/data
  • Configure the database: MySQL/MariaDB
  • Username: nextcloud
  • Password: (the password for user nextcloud)
  • Database: nextcloud
  • Host: (The virtual IP address with ProxySQL port)

Click "Finish Setup" to start the configuration process. Wait until it finishes and you will be redirected to Nextcloud dashboard for user "admin". The installation is now complete. Next section provides some tuning tips to run efficiently with Galera Cluster.

Nextcloud Database Tuning

Primary Key

Having a primary key on every table is vital for Galera Cluster write-set replication. For a relatively big table without primary key, large update or delete transaction would completely block your cluster for a very long time. To avoid any quirks and edge cases, simply make sure that all tables are using InnoDB storage engine with an explicit primary key (unique key does not count).

The default installation of Nextcloud will create a bunch of tables under the specified database and some of them do not comply with this rule. To check if the tables are compatible with Galera, we can run the following statement:

mysql> SELECT DISTINCT CONCAT(t.table_schema,'.',t.table_name) as tbl, t.engine, IF(ISNULL(c.constraint_name),'NOPK','') AS nopk, IF(s.index_type = 'FULLTEXT','FULLTEXT','') as ftidx, IF(s.index_type = 'SPATIAL','SPATIAL','') as gisidx FROM information_schema.tables AS t LEFT JOIN information_schema.key_column_usage AS c ON (t.table_schema = c.constraint_schema AND t.table_name = c.table_name AND c.constraint_name = 'PRIMARY') LEFT JOIN information_schema.statistics AS s ON (t.table_schema = s.table_schema AND t.table_name = s.table_name AND s.index_type IN ('FULLTEXT','SPATIAL'))   WHERE t.table_schema NOT IN ('information_schema','performance_schema','mysql') AND t.table_type = 'BASE TABLE' AND (t.engine <> 'InnoDB' OR c.constraint_name IS NULL OR s.index_type IN ('FULLTEXT','SPATIAL')) ORDER BY t.table_schema,t.table_name;


| tbl                                   | engine | nopk | ftidx | gisidx |


| nextcloud.oc_collres_accesscache      | InnoDB | NOPK | | |

| nextcloud.oc_collres_resources        | InnoDB | NOPK | | |

| nextcloud.oc_comments_read_markers    | InnoDB | NOPK | | |

| nextcloud.oc_federated_reshares       | InnoDB | NOPK | | |

| nextcloud.oc_filecache_extended       | InnoDB | NOPK | | |

| nextcloud.oc_notifications_pushtokens | InnoDB | NOPK |       | |

| nextcloud.oc_systemtag_object_mapping | InnoDB | NOPK |       | |


The above output shows there are 7 tables that do not have a primary key defined. To fix the above, simply add a primary key with auto-increment column. Run the following commands on one of the database servers, for example nexcloud1:

(nextcloud1)$ mysql -uroot -p

mysql> ALTER TABLE nextcloud.oc_collres_accesscache ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

mysql> ALTER TABLE nextcloud.oc_collres_resources ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

mysql> ALTER TABLE nextcloud.oc_comments_read_markers ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

mysql> ALTER TABLE nextcloud.oc_federated_reshares ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

mysql> ALTER TABLE nextcloud.oc_filecache_extended ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

mysql> ALTER TABLE nextcloud.oc_notifications_pushtokens ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

mysql> ALTER TABLE nextcloud.oc_systemtag_object_mapping ADD COLUMN `id` INT PRIMARY KEY AUTO_INCREMENT;

Once the above modifications have been applied, we can reconfigure back the pxc_strict_mode to the recommended value, "ENFORCING". Repeat step #8 under "Application Deployment" section with the corresponding value.

READ-COMMITTED Isolation Level

The recommended transaction isolation level as advised by Nextcloud is to use READ-COMMITTED, while Galera Cluster is default to stricter REPEATABLE-READ isolation level. Using READ-COMMITTED can avoid data loss under high load scenarios (e.g. by using the sync client with many clients/users and many parallel operations).

To modify the transaction level, go to ClusterControl -> Manage -> Configurations -> Change/Set Parameter and specify the following:

Nextcloud Change Set Parameter - ClusterControl

Click "Proceed" and ClusterControl will apply the configuration changes immediately. No database restart is required.

Multi-Instance Nextcloud

Since we performed the installation on nextcloud1 when accessing the URL, this IP address is automatically added into 'trusted_domains' variable inside Nextcloud. When you tried to access other servers, for example the secondary server,, you would see an error that this is host is not authorized and must be added into the trusted_domain variable.

Therefore, add all the hosts IP address under "trusted_domain" array inside /var/www/html/nextcloud/config/config.php, as example below:

  'trusted_domains' =>

  array (

    0 => '',

    1 => '',

    2 => ''


The above configuration allows users to access all three application servers via the following URLs:

Note: You can add a load balancer tier on top of these three Nextcloud instances to achieve high availability for the application tier by using HTTP reverse proxies available in the market like HAProxy or nginx. That is out of the scope of this blog post.

Using Redis for File Locking

Nextcloud’s Transactional File Locking mechanism locks files to avoid file corruption during normal operation. It's recommended to install Redis to take care of transactional file locking (this is enabled by default) which will offload the database cluster from handling this heavy job.

To install Redis, simply:

$ yum install -y redis

$ systemctl enable redis.service

$ systemctl start redis.service

Append the following lines inside /var/www/html/nextcloud/config/config.php:

  'filelocking.enabled' => true,

  'memcache.locking' => '\OC\Memcache\Redis',

  'redis' => array(

     'host' => '',

     'port' => 6379,

     'timeout' => 0.0,


For more details, check out this documentation, Transactional File Locking.


Nextcloud can be configured to be a scalable and highly available file-hosting service to cater for your private file sharing demands. In this blog, we showed how you can bring redundancy in the Nextcloud, file system and database layers. 


by ashraf at October 28, 2019 03:46 PM

October 25, 2019


A Guide to MySQL Galera Cluster Streaming Replication: Part Two

In the first part of this blog we provided an overview of the new Streaming Replication feature in MySQL Galera Cluster. In this blog we will show you how to enable it and take a look at the results.

Enabling Streaming Replication

It is highly recommended that you enable Streaming Replication at a session-level for the specific transactions that interact with your application/client. 

As stated in the previous blog, Galera logs its write-sets to the wsrep_streaming_log table in MySQL database. This has the potential to create a performance bottleneck, especially when a rollback is needed. This doesn't mean that you can’t use Streaming Replication, it just means you need to design your application client efficiently when using Streaming Replication so you’ll get better performance. Still, it's best to have Streaming Replication for dealing with and cutting down large transactions.

Enabling Streaming Replication requires you to define the replication unit and number of units to use in forming the transaction fragments. Two parameters control these variables: wsrep_trx_fragment_unit and wsrep_trx_fragment_size.

Below is an example of how to set these two parameters:

SET SESSION wsrep_trx_fragment_unit='statements';

SET SESSION wsrep_trx_fragment_size=3;

In this example, the fragment is set to three statements. For every three statements from a transaction, the node will generate, replicate, and certify a fragment.

You can choose between a few replication units when forming fragments:

  • bytes - This defines the fragment size in bytes.
  • rows - This defines the fragment size as the number of rows the fragment updates.
  • statements - This defines the fragment size as the number of statements in a fragment.

Choose the replication unit and fragment size that best suits the specific operation you want to run.

Streaming Replication In Action

As discussed in our other blog on handling large transactions in Mariadb 10.4, we performed and tested how Streaming Replication performed when enabled based on this criteria...

  1. Baseline, set global wsrep_trx_fragment_size=0;
  2. set global wsrep_trx_fragment_unit='rows'; set global wsrep_trx_fragment_size=1;
  3. set global wsrep_trx_fragment_unit='statements'; set global wsrep_trx_fragment_size=1;
  4. set global wsrep_trx_fragment_unit='statements'; set global wsrep_trx_fragment_size=5;

And results are

Transactions: 82.91 per sec., queries: 1658.27 per sec. (100%)

Transactions: 54.72 per sec., queries: 1094.43 per sec. (66%)

Transactions: 54.76 per sec., queries: 1095.18 per sec. (66%)

Transactions: 70.93 per sec., queries: 1418.55 per sec. (86%)

For this example we're using Percona XtraDB Cluster 8.0.15 straight from their testing branch using the Percona-XtraDB-Cluster_8.0.15.5-27dev.4.2_Linux.x86_64.ssl102.tar.gz build. 

We then tried a 3-node Galera cluster with hosts info below:

testnode11 =

testnode12 =

testnode13 =

We pre-populated a table from my sysbench database and tried to delete a very large rows. 

root@testnode11[sbtest]#> select count(*) from sbtest1;


| count(*) |


| 12608218 |


1 row in set (25.55 sec)

At first, running without Streaming Replication,

root@testnode12[sbtest]#> select @@wsrep_trx_fragment_unit, @@wsrep_trx_fragment_size,  @@innodb_lock_wait_timeout;


| @@wsrep_trx_fragment_unit | @@wsrep_trx_fragment_size | @@innodb_lock_wait_timeout |


| bytes                     | 0 |                         50000 |


1 row in set (0.00 sec)

Then run,

root@testnode11[sbtest]#> delete from sbtest1 where id >= 2000000;

However, we ended up getting a rollback...

---TRANSACTION 648910, ACTIVE 573 sec rollback

mysql tables in use 1, locked 1

ROLLING BACK 164858 lock struct(s), heap size 18637008, 12199395 row lock(s), undo log entries 11961589

MySQL thread id 183, OS thread handle 140041167468288, query id 79286 localhost root wsrep: replicating and certifying write set(-1)

delete from sbtest1 where id >= 2000000

Using ClusterControl Dashboards to gather an overview of any indication of flow control, since the transaction runs solely on the master (active-writer) node until commit time, there's no any indication of activity for flow control:

ClusterControl Galera Cluster Overview

In case you’re wondering, the current version of ClusterControl does not yet have direct support for PXC 8.0 with Galera Cluster 4 (as it is still experimental). You can, however, try to import it... but it needs minor tweaks to make your Dashboards work correctly. 

Back to the query process. It failed as it rolled back!

root@testnode11[sbtest]#> delete from sbtest1 where id >= 2000000;

ERROR 1180 (HY000): Got error 5 - 'Transaction size exceed set threshold' during COMMIT

regardless of the wsrep_max_ws_rows or wsrep_max_ws_size,

root@testnode11[sbtest]#> select @@global.wsrep_max_ws_rows, @@global.wsrep_max_ws_size/(1024*1024*1024);


| @@global.wsrep_max_ws_rows | @@global.wsrep_max_ws_size/(1024*1024*1024) |


|                          0 |               2.0000 |


1 row in set (0.00 sec)

It did, eventually, reach the threshold.

During this time the system table mysql.wsrep_streaming_log is empty, which indicates that Streaming Replication is not happening or enabled,

root@testnode12[sbtest]#> select count(*) from mysql.wsrep_streaming_log;


| count(*) |


|        0 |


1 row in set (0.01 sec)

root@testnode13[sbtest]#> select count(*) from mysql.wsrep_streaming_log;


| count(*) |


|        0 |


1 row in set (0.00 sec)

and that is verified on the other 2 nodes (testnode12 and testnode13).

Now, let's try enabling it with Streaming Replication,

root@testnode11[sbtest]#> select @@wsrep_trx_fragment_unit, @@wsrep_trx_fragment_size, @@innodb_lock_wait_timeout;


| @@wsrep_trx_fragment_unit | @@wsrep_trx_fragment_size | @@innodb_lock_wait_timeout |


| bytes                     | 0 |                      50000 |


1 row in set (0.00 sec)

root@testnode11[sbtest]#> set wsrep_trx_fragment_unit='rows'; set wsrep_trx_fragment_size=100; 

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

root@testnode11[sbtest]#> select @@wsrep_trx_fragment_unit, @@wsrep_trx_fragment_size, @@innodb_lock_wait_timeout;


| @@wsrep_trx_fragment_unit | @@wsrep_trx_fragment_size | @@innodb_lock_wait_timeout |


| rows                      | 100 |                      50000 |


1 row in set (0.00 sec)

What to Expect When Galera Cluster Streaming Replication is Enabled? 

When query has been performed in testnode11,

root@testnode11[sbtest]#> delete from sbtest1 where id >= 2000000;

What happens is that it fragments the transaction piece by piece depending on the set value of variable wsrep_trx_fragment_size. Let's check this in the other nodes:

Host testnode12

root@testnode12[sbtest]#> pager sed -n '/TRANSACTIONS/,/FILE I\/O/p'; show engine innodb status\G nopager; show global status like 'wsrep%flow%'; select count(*) from mysql.wsrep_streaming_log;

PAGER set to 'sed -n '/TRANSACTIONS/,/FILE I\/O/p''



Trx id counter 567148

Purge done for trx's n:o < 566636 undo n:o < 0 state: running but idle

History list length 44




---TRANSACTION 421740651985200, not started

0 lock struct(s), heap size 1136, 0 row lock(s)

---TRANSACTION 553661, ACTIVE 190 sec

18393 lock struct(s), heap size 2089168, 1342600 row lock(s), undo log entries 1342600

MySQL thread id 898, OS thread handle 140266050008832, query id 216824 wsrep: applied write set (-1)



1 row in set (0.08 sec)

PAGER set to stdout


| Variable_name                    | Value |


| wsrep_flow_control_paused_ns     | 211197844753 |

| wsrep_flow_control_paused        | 0.133786 |

| wsrep_flow_control_sent          | 633 |

| wsrep_flow_control_recv          | 878 |

| wsrep_flow_control_interval      | [ 173, 173 ] |

| wsrep_flow_control_interval_low  | 173 |

| wsrep_flow_control_interval_high | 173          |

| wsrep_flow_control_status        | OFF |


8 rows in set (0.00 sec)


| count(*) |


|    13429 |


1 row in set (0.04 sec)


Host testnode13

root@testnode13[sbtest]#> pager sed -n '/TRANSACTIONS/,/FILE I\/O/p'; show engine innodb status\G nopager; show global status like 'wsrep%flow%'; select count(*) from mysql.wsrep_streaming_log;

PAGER set to 'sed -n '/TRANSACTIONS/,/FILE I\/O/p''



Trx id counter 568523

Purge done for trx's n:o < 567824 undo n:o < 0 state: running but idle

History list length 23




---TRANSACTION 552701, ACTIVE 216 sec

21587 lock struct(s), heap size 2449616, 1575700 row lock(s), undo log entries 1575700

MySQL thread id 936, OS thread handle 140188019226368, query id 600980 wsrep: applied write set (-1)



1 row in set (0.28 sec)

PAGER set to stdout


| Variable_name                    | Value |


| wsrep_flow_control_paused_ns     | 210755642443 |

| wsrep_flow_control_paused        | 0.0231273 |

| wsrep_flow_control_sent          | 1653 |

| wsrep_flow_control_recv          | 3857 |

| wsrep_flow_control_interval      | [ 173, 173 ] |

| wsrep_flow_control_interval_low  | 173 |

| wsrep_flow_control_interval_high | 173          |

| wsrep_flow_control_status        | OFF |


8 rows in set (0.01 sec)


| count(*) |


|    15758 |


1 row in set (0.03 sec)

Noticeably, the flow control just kicked in!

ClusterControl Galera Cluster Overview

And WSREP queues send/received has been kicking as well:

ClusterControl Galera Overview
Host testnode12 (
ClusterControl Galera Overview
 Host testnode13 (

Now, let's elaborate more of the result from the mysql.wsrep_streaming_log table,

root@testnode11[sbtest]#> pager sed -n '/TRANSACTIONS/,/FILE I\/O/p'|tail -8; show engine innodb status\G nopager;

PAGER set to 'sed -n '/TRANSACTIONS/,/FILE I\/O/p'|tail -8'

MySQL thread id 134822, OS thread handle 140041167468288, query id 0 System lock

---TRANSACTION 649008, ACTIVE 481 sec

mysql tables in use 1, locked 1

53104 lock struct(s), heap size 6004944, 3929602 row lock(s), undo log entries 3876500

MySQL thread id 183, OS thread handle 140041167468288, query id 105367 localhost root updating

delete from sbtest1 where id >= 2000000



1 row in set (0.01 sec)

then taking the result of,

root@testnode12[sbtest]#> select count(*) from mysql.wsrep_streaming_log;


| count(*) |


|    38899 |


1 row in set (0.40 sec)

It tells how much fragment has been replicated using Streaming Replication. Now, let's do some basic math:

root@testnode12[sbtest]#> select 3876500/38899.0;


| 3876500/38899.0 |


|         99.6555 |


1 row in set (0.03 sec)

I'm taking the undo log entries from the SHOW ENGINE INNODB STATUS\G result and then divide the total count of the mysql.wsrep_streaming_log records. As I've set it earlier, I defined wsrep_trx_fragment_size= 100. The result will show you how much the total replicated logs are currently being processed by Galera.

It’s important to take note at what Streaming Replication is trying to achieve... "the node breaks the transaction into fragments, then certifies and replicates them on the slaves while the transaction is still in progress. Once certified, the fragment can no longer be aborted by conflicting transactions."

The fragments are considered transactions, which have been passed to the remaining nodes within the cluster, certifying the fragmented transaction, then applying the write-sets. This means that once your large transaction has been certified or prioritized, all incoming connections that could possibly have a deadlock will need to wait until the transactions finishes.

Now, the verdict of deleting a huge table? 

root@testnode11[sbtest]#> delete from sbtest1 where id >= 2000000;

Query OK, 12034538 rows affected (30 min 36.96 sec)

It finishes successfully without any failure!

How does it look like in the other nodes? In testnode12,

root@testnode12[sbtest]#> pager sed -n '/TRANSACTIONS/,/FILE I\/O/p'|tail -8; show engine innodb status\G nopager; show global status like 'wsrep%flow%'; select count(*) from mysql.wsrep_streaming_log;

PAGER set to 'sed -n '/TRANSACTIONS/,/FILE I\/O/p'|tail -8'

0 lock struct(s), heap size 1136, 0 row lock(s)

---TRANSACTION 421740651985200, not started

0 lock struct(s), heap size 1136, 0 row lock(s)


165631 lock struct(s), heap size 18735312, 12154883 row lock(s), undo log entries 12154883

MySQL thread id 898, OS thread handle 140266050008832, query id 341835 wsrep: preparing to commit write set(215510)



1 row in set (0.46 sec)

PAGER set to stdout


| Variable_name                    | Value |


| wsrep_flow_control_paused_ns     | 290832524304 |

| wsrep_flow_control_paused        | 0 |

| wsrep_flow_control_sent          | 0 |

| wsrep_flow_control_recv          | 0 |

| wsrep_flow_control_interval      | [ 173, 173 ] |

| wsrep_flow_control_interval_low  | 173 |

| wsrep_flow_control_interval_high | 173          |

| wsrep_flow_control_status        | OFF |


8 rows in set (0.53 sec)


| count(*) |


|   120345 |


1 row in set (0.88 sec)

It stops at a total of 120345 fragments, and if we do the math again on the last captured undo log entries (undo logs are the same from the master as well),

root@testnode12[sbtest]#> select 12154883/120345.0;                                                                                                                                                   +-------------------+

| 12154883/120345.0 |


|          101.0003 |


1 row in set (0.00 sec)

So we had a total of 120345 transactions being fragmented to delete 12034538 rows.

Once you're done using or enabling Stream Replication, do not forget to disable it as it will always log huge transactions and adds a lot of performance overhead to your cluster. To disable it, just run

root@testnode11[sbtest]#> set wsrep_trx_fragment_size=0;

Query OK, 0 rows affected (0.04 sec)


With Streaming Replication enabled, it's important that you are able to identify how large your fragment size can be and what unit you have to choose (bytes, rows, statements). 

It is also very important that you need to run it at session-level and of course identify when you only need to use Streaming Replication. 

While performing these tests, deleting a large number of rows to a huge table with Streaming Replication enabled has noticeably caused a high peak of disk utilization and CPU utilization. The RAM was more stable, but this could due to the statement we performed is not highly a memory contention. 

It’s safe to say that Streaming Replication can cause performance bottlenecks when dealing with large records, so using it should be done with proper decision and care. 

Lastly, if you are using Streaming Replication, do not forget to always disable this once done on that current session to avoid unwanted problems.


by Paul Namuag at October 25, 2019 09:45 AM

October 24, 2019


A Guide to MySQL Galera Cluster Streaming Replication: Part One

Streaming Replication is a new feature which was introduced with the 4.0 release of Galera Cluster. Galera uses replication synchronously across the entire cluster, but before this release write-sets greater than 2GB were not supported. Streaming Replication allows you to now replicate large write-sets, which is perfect for bulk inserts or loading data to your database.

In a previous blog we wrote about Handling Large Transactions with Streaming Replication and MariaDB 10.4, but as of writing this blog Codership had not yet released their version of the new Galera Cluster. Percona has, however, released their experimental binary version of Percona XtraDB Cluster 8.0 which highlights the following features...

  • Streaming Replication supporting large transactions

  • The synchronization functions allow action coordination (wsrep_last_seen_gtid, wsrep_last_written_gtid, wsrep_sync_wait_upto_gtid)

  • More granular and improved error logging. wsrep_debug is now a multi-valued variable to assist in controlling the logging, and logging messages have been significantly improved.

  • Some DML and DDL errors on a replicating node can either be ignored or suppressed. Use the wsrep_ignore_apply_errors variable to configure.

  • Multiple system tables help find out more about the state of the cluster state.

  • The wsrep infrastructure of Galera 4 is more robust than that of Galera 3. It features a faster execution of code with better state handling, improved predictability, and error handling.

What's New With Galera Cluster 4.0?

The New Streaming Replication Feature

With Streaming Replication, transactions are replicated gradually in small fragments during transaction processing (i.e. before actual commit, we replicate a number of small size fragments). Replicated fragments are then applied in slave threads, preserving the transaction’s state in all cluster nodes. Fragments hold locks in all nodes and cannot be conflicted later.

Galera SystemTables 

Database Administrators and clients with access to the MySQL database may read these tables, but they cannot modify them as the database itself will make any modifications needed. If your server doesn’t have these tables, it may be that your server is using an older version of Galera Cluster.

#> show tables from mysql like 'wsrep%';


| Tables_in_mysql (wsrep%) |


| wsrep_cluster            |

| wsrep_cluster_members    |

| wsrep_streaming_log      |


3 rows in set (0.12 sec)

New Synchronization Functions 

This version introduces a series of SQL functions for use in wsrep synchronization operations. You can use them to obtain the Global Transaction ID which is based on either the last write or last seen transaction. You can also set the node to wait for a specific GTID to replicate and apply, before initiating the next transaction.

Intelligent Donor Selection

Some understated features that have been present since Galera 3.x include intelligent donor selection and cluster crash recovery. These were originally planned for Galera 4, but made it into earlier releases largely due to customer requirements. When it comes to donor node selection in Galera 3, the State Snapshot Transfer (SST) donor was selected at random. However with Galera 4, you get a much more intelligent choice when it comes to choosing a donor, as it will favour a donor that can provide an Incremental State Transfer (IST), or pick a donor in the same segment. As a Database Administrator, you can force this via setting wsrep_sst_donor.

Why Use MySQL Galera Cluster Streaming Replication?

Long-Running Transactions

Galera's problems and limitations always revolved around how it handled long-running transactions and oftentimes caused the entire cluster to slow down due to large write-sets being replicated. It's flow control often goes high, causing the writes to slow down or even terminating the process in order to revert the cluster back to its normal state. This is a pretty common issue with previous versions of Galera Cluster.

Codership advises to use Streaming Replication for your long-running transactions to mitigate these situations. Once the node replicates and certifies a fragment, it is no longer possible for other transactions to abort it.

Large Transactions

This is very helpful when loading data to your report or analytics. Creating bulk inserts, deletes, updates, or using LOAD DATA statement to load large quantity of data can fall down in this category. Although it depends on how your manage your data for retrieval or storage. You must take into account that Streaming Replication has its limitations such that certification keys are generated from record locks. 

Without Streaming Replication, updating a large number of records would result in a conflict and the whole transaction would have to be rolled back. Slaves that are also replicating large transactions are subject to the flow control as it hits the threshold and starts slowing down the entire cluster to process any writes as they tend to relax receiving incoming transactions from the synchronous replication. Galera will relax the replication until the write-set is manageable as it allows to continue replication again. Check this external blog by Percona to help you understand more about flow control within Galera.

With Streaming Replication, the node begins to replicate the data with each transaction fragment, rather than waiting for the commit. This means that there's no way for any conflicting transactions running within the other nodes to abort since this simply affirms that the cluster has certified the write-set for this particular fragment. It’s free to apply and commit other concurrent transactions without blocking and process large transaction with a minimal impact on the cluster.

Hot Records/Hot Spots

Hot records or rows are those rows in your table that gets constantly get updated. These data could be the most visited and highly gets the traffic of your entire database (e.g. news feeds, a counter such as number of visits or logs). With Streaming Replication, you can force critical updates to the entire cluster. 

As noted by the Galera Team at Codership

“Running a transaction in this way effectively locks the hot record on all nodes, preventing other transactions from modifying the row. It also increases the chances that the transaction will commit successfully and that the client in turn will receive the desired outcome.”

This comes with limitations as it might not be persistent and consistent that you'll have successful commits. Without using Streaming Replication, you'll end up high chances or rollbacks and that could add overhead to the end user when experiencing this issue in the application's perspective.

Things to Consider When Using Streaming Replication

  • Certification keys are generated from record locks, therefore they don’t cover gap locks or next key locks. If the transaction takes a gap lock, it is possible that a transaction, which is executed on another node, will apply a write set which encounters the gap log and will abort the streaming transaction.
  • When enabling Streaming Replication, write-set logs are written to wsrep_streaming_log table found in the mysql system database to preserve persistence in case crash occurs, so this table serves upon recovery. In case of excessive logging and elevated replication overhead, streaming replication will cause degraded transaction throughput rate. This could be a performance bottleneck when high peak load is reached. As such, it’s recommended that you only enable Streaming Replication at a session-level and then only for transactions that would not run correctly without it.
  • Best use case is to use streaming replication for cutting large transactions
  • Set fragment size to ~10K rows
  • Fragment variables are session variables and can be dynamically set
  • Intelligent application can set streaming replication on/off on need basis


Thanks for reading, in part two we will discuss how to enable Galera Cluster Streaming Replication and what the results could look like for your setup.


by Paul Namuag at October 24, 2019 05:05 PM

October 23, 2019


Factors to Consider When Choosing MongoDB for Big Data Applications

Technology advancements have brought about advantages than need to be exploited by business organizations for maximum profit value and reduced operational cost. Data has been the backbone for these technological advancements from which sophisticated procedures are derived towards achieving specific goals. As technology advances, there is more data brought into systems. Besides, as a business grows, there is more data involved and the serving system setup needs to be fast data processing, reliable in storage and offer optimal security for this data. MongoDB is one of the systems that can be trusted in achieving these factors.

Big Data refers to massive data that is fast-changing, can be quickly accessed and highly available for addressing needs efficiently. Business organizations tend to cross-examine available database setups that would provide the best performance as time goes by and consequently realize some value from Big Data. 

For instance, online markets observe client web clicks, purchasing power and then use the derived data in suggesting other goods as a way of advertising or use the data in pricing. Robots learn through machine learning and the process obviously involves a lot of data being collected because the robot would have to keep what it has learned in memory for later usage. To keep this kind of complex data with traditional database software is considered impractical. 

Characteristics of Big Data

In software systems, we consider Big Data in terms of size, speed of access and the data types involved. This can be relatively reduced down into 3 parameters: 

  1. Volume
  2. Velocity
  3. Variety


Volume is the size of Big Data involved and ranges from gigabytes to terabytes or more. On a daily basis, big companies ingest terabytes of data from their daily operations. For instance, a telecommunication company would like to keep a record of calls made since the beginning of their operation, messages sent and how long did each call take. On a daily basis, there are a lot of these activities that take place hence resulting in a lot of data. The data can be they used in statistical analysis, decision making, and tariff planning.


Consider platforms such as Forex trading that need real time updates to all connected client machines and display new stock exchange updates in real time. This dictates that the serving database should be quite fast in processing such data with little latency in mind. Some online games involving players from different world locations collect a lot of data from user clicks, drags and other gestures then relaying them between millions of devices in microseconds. The database system involved needs to be quick enough to do all these in real time.


Data can be categorized in different types ranging from, numbers, strings, date, objects, arrays, binary data, code, geospatial data, and regular expressions just to mention a few. An optimal database system should provide functions in place to enhance the manipulation of this data without incurring additional procedures from the client side. For example, MongoDB provides the geolocation operations for usage while fetching locations near to the coordinates provided in the query. This capability cannot be achieved with traditional databases since they were only designed to address small data volume structures, fewer updates, and some consistent data structures. Besides, one will need additional operations in achieving some specific goal, in the case of traditional databases. 

MongoDB can also be run from multiple servers making it inexpensive and infinite contrary to traditional databases that are only designed to run on a single server.

Factors to Consider When Choosing MongoDB for Big Data

Big Data brings about enterprise advantage when it is highly managed through improved processing power. When selecting a database system, one should consider some factors regarding the kind of data you will be dealing with and whether the system you are selecting provides that capability. In this blog, we are going to discuss the advantages MongoDB offers for Big Data in comparison with Hadoop in some cases.

  • A rich query language for dynamic querying
  • Data embedding
  • High availability
  • Indexing and Scalability
  • Efficient storage engine and Memory handling
  • Data consistency and integrity

Rich Query Language for Dynamic Querying

MongoDB is best suited for Big Data where resulting data need further manipulations for the desired output. Some of the powerful resources are CRUD operations, aggregation framework, text search, and the Map-Reduce feature. Within the aggregation framework, MongoDB has an extra geolocation functionality that can enable one to do many things with geospatial data. For example, by creating a 2Dsphere index, you can fetch locations within a defined radius by just providing the latitude and longitude coordinates. Referring to the telecommunication example above, the company may use the Map-reduce feature or the aggregation framework to group calls from a given location, calculating the average call time on a daily basis for its users or more other operations. Check the example below.

Let’s have a location collection with the data

{ name: "KE",loc: { type: "Point", coordinates: [ -73.97, 40.77 ] }, category: "Parks"}

{ name: "UG",loc: { type: "Point", coordinates: [ -45.97, 40.57 ] }, category: "Parks"}

{ name: "TZ",loc: { type: "Point", coordinates: [ -73.27, 34.43 ] }, category: "Parks"}

{ name: "SA",loc: { type: "Point", coordinates: [ -67.97, 40.77 ] }, category: "Parks"}

We can then find data for locations that are near [-73.00, 40.00] using the aggregation framework and within a distance of 1KM with the query below:

db.places.aggregate( [


      $geoNear: {

         near: { type: "Point", coordinates: [ -73.00, 40.00 ] },

         spherical: true,

         query: { category: "Parks" },

         distanceField: "calcDistance",

   maxDistance: 10000




Map-Reduce operation is also available in Hadoop but it is suitable for simple requests. The iterative process for Big Data using Map-Reduce in Hadoop is quite slow than in MongoDB.The reason behind is, iterative tasks require many map and reduce processes before completion. In the process, multiple files are generated between the map and reduce tasks making it quite unusable in advanced analysis. MongoDb introduced the aggregation pipeline framework to cub this setback and it is the most used in the recent past.

Data Embedding

MongoDB is document-based with the ability to put more fields inside a single field which is termed as embedding. Embedding comes with the advantage of minimal queries to be issued for a single document since the document itself can hold a lot of data. For relational databases where one might have many tables, you have to issue multiple queries to the database for the same purpose.

High Availability

Replication of data across multiple hosts and servers is now possible with MongoDB, unlike relational DBMS where the replication is restricted to a single server. This is advantageous in that data is highly available in different locations and users can be efficiently served by the closest server. Besides, the process of restoration or breakdown is easily achieved considering the journaling feature in MongoDB that creates checkpoints from which the restoration process can be referenced to.

Indexing and Scalability

Primary and secondary indexing in MongoDB comes with plenty of merits. Indexing makes queries to be executed first which is a consideration needed for Big Data as we have discussed under the velocity characteristic for Big Data. Indexing can also be used in creating shards. Shards can be defined as sub-collections that contain data that has been distributed into groups using a shard-key. When a query is issued, the shard-key is used to determine where to look among the available shards. If there were no shards, the process would take quite long for Big Data since all the documents have to be looked into and the process may even timeout before users getting what they wanted. But with sharding, the amount of data to be fetched from is reduced and consequently reducing the latency of waiting for a query to be returned.

Efficient Storage Engine and Memory Handling

The recent MongoDB versions set the WiredTiger as the default storage engine which has an executive capability for handling multiple workloads.  This storage engine has plenty of advantages to serve for Big Data as described in this article. The engine has features such as compression, checkpointing and promotes multiple write operations through document-concurrency. Big Data means many users and the document-level concurrency feature will allow many users to edit in the database simultaneously without incurring any performance setback. MongoDB has been developed using C++ hence making it good for memory handling.

Data Consistency and Integrity

 JSON validator tool is another feature available in MongoDB to ensure data integrity and consistency. It is used to ensure invalid data does not get into the database. For example, if there is a field called age, it will always expect an Integer value. The JSON validator will always check that a string or any other data type is not submitted for storage to the database for this field. This is also to ensure that all documents have values for this field in the same data type hence data consistency. MongoDB also offers Backup and restoration features such that in case of failure one can get back to the desired state.


MongoDB handles real-time data analysis in the most efficient way hence suitable for Big Data. For instance, geospatial indexing enables an analysis of GPS data in real time. 

Besides the basic security configuration, MongoDB has an extra JSON data validation tool for ensuring only valid data get into the database. Since the database is document based and fields have been embedded, very few queries can be issued to the database to fetch a lot of data. This makes it ideal for usage when Big Data is concerned.

by Onyancha Brian Henry at October 23, 2019 06:28 PM

October 22, 2019


An Overview of MongoDB and Load Balancing

Database load balancing distributes concurrent client requests to multiple database servers to reduce the amount of load on any single server. This can improve the performance of your database drastically. Fortunately, MongoDB can handle multiple client's requests to read and write the same data simultaneously by default. It uses some concurrency control mechanisms and locking protocols to ensure data consistency at all times. 

In this way, MongoDB also ensures that all the clients get a consistent view of data at any time. Because of this built-in feature of handling requests from multiple clients, you don't have to worry about adding an external load balancer on top of your MongoDB servers. Although, if you still want to improve the performance of your database using load balancing, here are some ways to achieve that.

MongoDB Vertical Scaling

In simple terms, Vertical scaling means adding more resources to your server to handle to load. Like all the database systems, MongoDB prefers more RAM and IO capacity. This is the simplest way to boost MongoDB performance without spreading the load across multiple servers. Vertical scaling of the MongoDB database typically includes increasing CPU capacity or disk capacity and increasing throughput(I/O operations). By adding more resources, your mongo server becomes more capable of handling multiple client’s requests. Thus, better load balancing for your database.

The downside of using this approach is the technical limitation of adding resources to any single system. Also, all the cloud providers have the limitations on adding new hardware configurations. The other disadvantage of this approach is a single point of failure. In this approach, all your data is being stored in a single system, which can lead to permanent loss of your data.

MongoDB Horizontal Scaling

Horizontal scaling refers to dividing your database into chunks and stores them on multiple servers. The main advantage of this approach is that you can add additional servers on the fly to increase your database performance with zero downtime. MongoDB provides horizontal scaling through sharding. MongoDB sharding gives additional capacity to distribute the write load across multiple servers(shards). Here, each shard can be seen as one independent database and the collection of all the shards can be viewed as one big logical database. Sharding enables your MongoDB to distribute the data across multiple servers to handle concurrent client requests efficiently. Hence, it increases your database’s read and writes throughput.

MongoDB Sharding

A shard can be a single mongod instance or a replica set that holds the subset of the mongo sharded database. You can convert shard in replica set to ensure high availability of data and redundancy.

Illustration of two MongoDB shards holding whole collection and subset of a collection

As you can see in the above image, shard 1 holds a subset of collection 1 and whole collection2, whereas shard 2 contains only other subset of collection1. You can access each shard using the mongos instance. For example, if you connect to shard1 instance, you will be able to see/access only a subset of collection1.


Mongos is the query router which provides access to sharded cluster for client applications. You can have multiple mongos instances for better load balancing. For example, in your production cluster, you can have one mongos instance for each application server. Now here you can use an external load balancer, which will redirect your application server’s request to appropriate mongos instance. While adding such configurations to your production server, make sure that connection from any client always connects to the same mongos instance every time as some mongo resources such as cursors are specific to mongos instance.

Config Servers

Config servers store the configuration settings and metadata about your cluster. From MongoDB version 3.4, you have to deploy config servers as a replica set. If you are enabling sharding in a production environment, then it is mandatory to use three separate config servers, each on different machines.

You can follow this guide to convert your replica set cluster into a sharded cluster. Here is the sample illustration of sharded production cluster:

MongoDB sharded cluster in production

MongoDB Load Balancing Using Replication

Sometimes MongoDB replication can be used to handle more traffic from clients and to reduce the load on the primary server. To do so, you can instruct clients to read from secondaries instead of the primary server. This can reduce the amount of load on the primary server as all the read requests coming from clients will be handled by secondary servers, and the primary server will only take care of write requests.

Following is the command to set the read preference to secondary:


You can also specify some tags to target specific secondaries while handling the read queries.


   "secondary", [

{ "datacenter": "APAC" },

{ "region": "East"},



Here, MongoDB will try to find the secondary node with the datacenter tag value as APAC. If found, then Mongo will serve the read requests from all the secondaries with tag datacenter: “APAC”. If not found, then Mongo will try to find secondaries with tag region: “East”. If still no secondaries found, then {} will work as the default case, and Mongo will serve the requests from any eligible secondaries.

However, this approach for load balancing is not advisable to use for increasing read throughput. Because any read preference mode other than primary can return old data in case of recent write updates on the primary server. Usually, primary server will take some time to handle the write requests and propagates the changes to secondary servers. During this time, if someone requests read operation on the same data, the secondary server will return stale data as it is not in sync with the primary server. You can use this approach if your application is read operations heavy in comparison to write operations.


As MongoDB can handle concurrent requests by itself, there is no need to add a load balancer in your MongoDB cluster. For load balancing the client requests, you can choose either vertical scaling or horizontal scaling as it is not advisable to use secondaries to scale out your read and write operations. Vertical scaling can hit the technical limits, as discussed above. Therefore, it is suitable for small scale applications. For big applications, horizontal scaling through sharding is the best approach for load balancing the read and write operations.

by Akash Kathiriya at October 22, 2019 05:48 PM

MariaDB Foundation

Howto run MariaDB on a Chromebook

Chromebooks are a breed of very portable and easy-to-use laptops with a Linux-based operating system from Google and hardware available from many manufactures. Chromebooks are very popular in the educational sector due to their low price and the effortlessness to use as they require next to no administration and “just works” […]

The post Howto run MariaDB on a Chromebook appeared first on

by Otto Kekäläinen at October 22, 2019 07:47 AM

October 21, 2019


What's New in PostgreSQL 12

On October 3rd 2019 a new version of the world's most advanced open source database was released. PostgreSQL 12 is now available with notable improvements to query performance (particularly over larger data sets and overall space utilization) among other important features. 

In this blog we’ll take a look at these new features and show you how to get and install this new PostgreSQL 12 version. We’ll also explore some considerations to take into account when upgrading.

PostgreSQL 12 Features and Improvements

Let’s start mentioning some of the most important features and improvements of this new PostgreSQL version.


  • There is an optimization to space utilization and read/write performance for B-Tree indexes.
  • Reduction of WAL overhead for the creation of GiST, GIN, and SP-GiST indexes.
  • You can perform K-nearest neighbor queries with the distance operator (<->) using SP-GiST indexes.
  • Rebuild indexes without blocking writes to an index via the REINDEX CONCURRENTLY command, allowing users to avoid downtime scenarios for lengthy index rebuilds.


  • There are improvements over queries on partitioned tables, particularly for tables with thousands of partitions that only need to retrieve data from a limited subset.
  • Performance improvements for adding data to partitioned tables with INSERT and COPY.
  • You will be able to attach a new partition to a table without blocking queries.


  • You can now run queries over JSON documents using JSON path expressions defined in the SQL/JSON standard and they can utilize the existing indexing mechanisms for documents stored in the JSONB format to efficiently retrieve data.
  • WITH queries can now be automatically inlined by PostgreSQL 12 (if it is not recursive, does not have any side-effects, and is only referenced once in a later part of a query), which in turn can help increase the performance of many existing queries.
  • Introduces "generated columns." This type of column computes its value from the contents of other columns in the same table. Storing this computed value on this is also supported. 


  • PostgreSQL 12 extends its support of ICU collations by allowing users to define "nondeterministic collations" that can, for example, allow case-insensitive or accent-insensitive comparisons.


  • Introduces both client and server-side encryption for authentication over GSSAPI interfaces.
  • The PostgreSQL service is able to discover LDAP servers if it is compiled with OpenLDAP.
  • Multi-factor authentication, using the clientcert=verify-full option and an additional authentication method configured in the pg_hba.conf file.

If you want to take advantage of these new features and improvements, you can go to the download page and get the last PostgreSQL version. If you require an HA setup, here is a blog to show you how to install and configure PostgreSQL for HA.

How to Install PostgreSQL 12

For this example, we are going to use CentOS7 as the operating system. So, we need to go to the RedHat based OS download site and install the corresponding version.

$ yum install

It will install the PostgreSQL repository with stable, testing, and source packages.

$ head /etc/yum.repos.d/pgdg-redhat-all.repo

# PGDG Red Hat Enterprise Linux / CentOS stable repositories:


name=PostgreSQL 12 for RHEL/CentOS $releasever - $basearch






Then, install the client and server PostgreSQL12 packages. It will install some python dependencies.

$ yum install postgresql12 postgresql12-server

Now, you can initialize your new PostgreSQL 12 database.

$ /usr/pgsql-12/bin/postgresql-12-setup initdb

Initializing database ... OK

And enable/start the PostgreSQL service.

$ systemctl enable postgresql-12

Created symlink from /etc/systemd/system/ to /usr/lib/systemd/system/postgresql-12.service.

$ systemctl start postgresql-12

And that’s it. You have the new PostgreSQL version up and running.

$ psql

psql (12.0)

Type "help" for help.

postgres=# select version();



 PostgreSQL 12.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

(1 row)

Now you have installed the last PostgreSQL version, you could migrate your data into this new database node.

Upgrading to PostgreSQL 12

If you want to upgrade your current PostgreSQL version to this new one, you have three main options which will perform this task.

  • pg_dump: It’s a logical backup tool that allows you to dump your data and restore it in the new PostgreSQL version. Here you will have a downtime period that will vary according to your data size.. You need to stop the system or avoid new data in the master node, run the pg_dump, move the generated dump to the new database node and restore it. During this time, you can’t write into your master PostgreSQL database to avoid data inconsistency.
  • Pg_upgrade: It’s a PostgreSQL tool to upgrade your PostgreSQL version in-place. It could be dangerous in a production environment and we don’t recommend this method in that case. Using this method you will have downtime too, but probably it will be considerably less than using the previous pg_dump method.
  • Logical Replication: Ever since PostgreSQL 10 you have been able to use this replication method which allows you to perform major version upgrades with zero (or almost zero) downtime. In this way, you can add a standby node in the last PostgreSQL version, and when the replication is up-to-date, you can perform a failover process to promote the new PostgreSQL node. 

Considerations Before Upgrading to PostgreSQL 12

In general, for all upgrade process, and in all technology, there are several points to take into account. Let’s see some of the main ones.

  • Data types abstime, reltime, and tinterval were removed.
  • The recovery.conf settings are into the postgresql.conf file and it is no longer used. If you have this file created the server will not start. The files recovery.signal and standby.signal files are now used to switch into non-primary mode. The trigger_file setting has been renamed to promote_trigger_file and the standby_mode setting has been removed.
  • The multiple conflicting recovery_target specifications are not allowed.
  • The specification of “-f” to send the dump contents to standard output is required in pg_restore.
  • The maximum index entry length is reduced by eight bytes in the B-Tree indexes, to improve the handling of duplicate entries. REINDEX operation on an index pg_upgrade'd from a previous version could fail.
  • DROP IF EXISTS FUNCTION/PROCEDURE/AGGREGATE/ROUTINE generates an error if no argument list is supplied and there are multiple matching objects.

For more detailed information about the new PostgreSQL 12 features and consideration before migrating to it, you can refer to the Official Release Notes web page.

by Sebastian Insausti at October 21, 2019 06:50 PM

October 17, 2019


Deploying MySQL Galera Cluster 4.0 onto Amazon AWS EC2

Galera Cluster is one of the most popular high availability solutions for MySQL. It is a virtually synchronous cluster, which helps to keep the replication lag under control. Thanks to the flow control, Galera cluster can throttle itself and allow more loaded nodes to catch up with the rest of the cluster. Recent release of Galera 4 brought new features and improvements. We covered them in blog post talking about MariaDB 10.4 Galera Cluster and a blog post discussing existing and upcoming features of Galera 4.

How does Galera 4 fares when used in Amazon EC2? As you probably know, Amazon offers Relational Database Services, which are designed to provide users with an easy way to deploy highly available MySQL database. My colleague, Ashraf Sharif, compared failover times for RDS MySQL and RDS Aurora in his blog post. Failover times for Aurora looks really great but there are buts. First of all, you are forced to use RDS. You cannot deploy Aurora on the instances you manage. If the existing features and options available in Aurora are not enough for you, you do not have any other option but to deploy something on your own. Here enters Galera. Galera, unlike Aurora, is not a proprietary black box. Contrary, it is an open source software, which can be used freely on all supported environments. You can install Galera Cluster on AWS Elastic Computing Cloud (EC2) and, through that, build a highly available environment where failover is almost instant: as soon as you can detect node’s failure, you can reconnect to the other Galera node. How does one deploy Galera 4 in EC2? In this blog post we will take a look at it and we will provide you with step-by-step guide showing what is the simplest way of accomplishing that.

Deploying a Galera 4 Cluster on EC2

First step is to create an environment which we will use for our Galera cluster. We will go with Ubuntu 18.04 LTS virtual machines.

Deploying a Galera 4 Cluster on EC2

We will go with t2.medium instance size for the purpose of this blog post. You should scale your instances based on the expected load.

Deploying a Galera 4 Cluster on EC2

We are going to deploy three nodes in the cluster. Why three? We have a blog that explains how Galera maintains high availability.

Deploying a Galera 4 Cluster on EC2

We are going to configure storage for those instances.

Deploying a Galera 4 Cluster on EC2

We will also pick proper security group for the nodes. Again, in our case security group is quite open. You should ensure the access is limited as much as possible - only nodes which have to access databases should be allowed to connect to them.

Deploying a Galera 4 Cluster on EC2

Deploying a Galera 4 Cluster on EC2

Finally, we either pick an existing key par or create a new one. After this step our three instances will be launched.

Deploying a Galera 4 Cluster on EC2

Once they are up, we can connect to them via SSH and start configuring the database.

We decided to go with ‘node1, node2, node3’ naming convention therefore we had to edit /etc/hosts on all nodes and list them alongside their respective local IP’s. We also made the change in /etc/hostname to use the new name for nodes. When this is done, we can start setting up our Galera cluster. At the time of writing only vendor that provides GA version of Galera 4 is MariaDB with its 10.4 therefore we are going to use MariaDB 10.4 for our cluster. We are going to proceed with the installation using the suggestions and guides from the MariaDB website.

Deploying a MariaDB 10.4 Galera Cluster

We will start with preparing repositories:


bash ./mariadb_repo_setup

We downloaded script which is intended to set up the repositories and we ran it to make sure everything is set up properly. This configured repositories to use the latest MariaDB version, which, at the time of writing, is 10.4.

root@node1:~# apt update

Hit:1 bionic InRelease

Hit:2 bionic-updates InRelease

Hit:3 bionic-backports InRelease

Hit:4 bionic InRelease

Ign:5 bionic InRelease

Hit:6 bionic InRelease

Hit:7 bionic Release

Hit:8 bionic-security InRelease

Reading package lists... Done

Building dependency tree

Reading state information... Done

4 packages can be upgraded. Run 'apt list --upgradable' to see them.

As you can see, repositories for MariaDB 10.4 and MaxScale 2.4 have been configured. Now we can proceed and install MariaDB. We will do it step by step, node by node. MariaDB provides guide on how you should install and configure the cluster.

We need to install packages:

apt-get install mariadb-server mariadb-client galera-4 mariadb-backup

This command installs all required packages for MariaDB 10.4 Galera to run. MariaDB creates a set of configuration files. We will add a new one, which would contain all the required settings. By default it will be included at the end of the configuration file so all previous settings for the variables we set will be overwritten. Ideally, afterwards, you would edit existing configuration files to remove settings we put in the galera.cnf to avoid confusion where given setting is configured.

root@node1:~# cat /etc/mysql/conf.d/galera.cnf






# Galera cluster configuration




wsrep_cluster_name="Galera4 cluster"



# Cluster node configuration



When configuration is ready, we can start.

root@node1:~# galera_new_cluster

This should bootstrap the new cluster on the first node. Next we should proceed with similar steps on remaining nodes: install required packages and configure them keeping in mind that the local IP changes so we have to change the galera.cnf file accordingly.

When the configuration files are ready, we have to create a user which will be used for the Snapshot State Transfer (SST):

MariaDB [(none)]> CREATE USER 'sstuser'@'localhost' IDENTIFIED BY 'pa55';

Query OK, 0 rows affected (0.022 sec)

MariaDB [(none)]> GRANT PROCESS, RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost';

Query OK, 0 rows affected (0.022 sec)

We should do that on the first node. Remaining nodes will join the cluster and they will receive full state snapshot so the user will be transferred to them. Now the only thing we have to do is to start the remaining nodes:

root@node2:~# service mysql start

root@node3:~# service mysql start

and verify that cluster indeed has been formed:

MariaDB [(none)]> show global status like 'wsrep_cluster_size';


| Variable_name      | Value |


| wsrep_cluster_size | 3     |


1 row in set (0.001 sec)

All is good, the cluster is up and it consists of three Galera nodes. We managed to deploy MariaDB 10.4 Galera Cluster on Amazon EC2.


by krzysztof at October 17, 2019 05:41 PM

October 16, 2019


Announcing the Beta Launch of Backup Ninja

Backup Ninja Logo

Severalnines is excited to launch our newest product Backup Ninja. Currently in beta, Backup Ninja is a simple, secure, and cost-effective SaaS service you can use to backup the world’s most popular open source databases; locally or in the cloud. It easily connects to your database server through the “bartender” agent, allowing the service to manage the storage of fully-encrypted backups locally or on the cloud storage provider of your choosing.

Backup Ninja Dashboard

With Backup Ninja you can backup your databases locally, in the cloud, or in a combination of multiple locations to ensure there is always a good backup should disaster strike. It lets you go from homegrown, custom scripts that need upkeep to 'scriptless' peace-of-mind in minutes. It helps keep your data safe from data corruption on your production server, or from malicious ransomware attacks.

Backup Ninja Beta

Because we are still in the early phases, using Backup Ninja is free at this time. You will, however, be able to transfer it to a paid account once we’re ready to begin charging. This means you can use Backup Ninja to power your database backups at no charge and easily transition to a paid plan, with no obligation, if you choose.

How to Test Backup Ninja

As this is a new product you will undoubtedly encounter bugs in your travels. We encourage our first wave of users to “poke” the product and let us know how it performed.

Our main goal with this beta launch is to validate that we are able to provide a service that solves a real problem for users who currently maintain their own backup scripts and tools. 

Here are the key things we are hoping you will help us test…

  • Register for the service
  • Verify your account via link from the welcome email
  • Install our agent w/o issues
  • Add one or more DB servers that needs to be backed up
  • Create a backup schedule which stores backups locally (on the server)
  • Create a backup schedule which stores backups locally and on their favorite cloud provider (multiple locations)
  • Be able to be up and running within 10 minutes of registering with our service

Other things to do...

  • Edit, start and resume backup configurations
  • Upgrade agents
  • Delete / uninstall agent(s)
  • Re-install agent(s)
  • Change / reset your password
  • Add servers 
  • Remove servers

Where to Report Your Findings

We have created a simple Google Form for you to log your issues which we will then transfer into our systems. We encourage you to share your test results, report any bugs, or just give us feedback on what we could do to make Backup Ninja even better!

The Next Steps

Severalnines is continuing to add new features, functions and databases to the platform. Coupled with your feedback, we plan to emerge from beta with a product that will allow you to build a quick and simple backup management plan to ensure you are protected should your databases become unavailable.

Join the Beta!


by fwlymburner at October 16, 2019 03:45 PM

Federico Razzoli

Foreign Key bugs in MySQL and MariaDB

Foreign keys are a controversial topic. MySQL and MariaDB implementation has several bugs and limitations, that are discussed here.

by Federico Razzoli at October 16, 2019 11:17 AM

October 15, 2019


An Overview of Various Auxiliary Plan Nodes in PostgreSQL

All modern database system supports a Query Optimizer module to automatically identify the most efficient strategy for executing the SQL queries. The efficient strategy is called “Plan” and it is measured in terms of cost which is directly proportional to “Query Execution/Response Time”.  The plan is represented in the form of a tree output from the Query Optimizer. The plan tree nodes can be majorly divided into the following 3 categories:

  • Scan Nodes: As explained in my previous blog “An Overview of the Various Scan Methods in PostgreSQL”, it indicates the way a base table data needs to be fetched.
  • Join Nodes: As explained in my previous blog “An Overview of the JOIN Methods in PostgreSQL”, it indicates how two tables need to be joined together to get the result of two tables.
  • Materialization Nodes: Also called as Auxiliary nodes. The previous two kinds of nodes were related to how to fetch data from a base table and how to join data retrieved from two tables. The nodes in this category are applied on top of data retrieved in order to further analyze or prepare report, etc e.g. Sorting the data, aggregate of data, etc.

Consider a simple query example such as...

SELECT * FROM TBL1, TBL2 where TBL1.ID > TBL2.ID order by TBL.ID;

Suppose a plan generated corresponding to the query as below:

So here one auxiliary node “Sort” is added on top of the result of join to sort the data in the required order.

Some of the auxiliary nodes generated by the PostgreSQL query optimizer are as below:

  • Sort
  • Aggregate
  • Group By Aggregate
  • Limit
  • Unique
  • LockRows
  • SetOp

Let’s understand each one of these nodes.


As the name suggests, this node is added as part of a plan tree whenever there is a need for sorted data. Sorted data can be required explicitly or implicitly like below two cases:

The user scenario requires sorted data as output. In this case, Sort node can be on top of whole data retrieval including all other processing.

postgres=# CREATE TABLE demotable (num numeric, id int);


postgres=# INSERT INTO demotable SELECT random() * 1000, generate_series(1, 10000);

INSERT 0 10000

postgres=# analyze;


postgres=# explain select * from demotable order by num;

                           QUERY PLAN


 Sort  (cost=819.39..844.39 rows=10000 width=15)

   Sort Key: num

   ->  Seq Scan on demotable  (cost=0.00..155.00 rows=10000 width=15)

(3 rows)

Note: Even though the user required final output in sorted order, Sort node may not be added in the final plan if there is an index on the corresponding table and sorting column. In this case, it may choose index scan which will result in implicitly sorted order of data. For example, let’s create an index on the above example and see the result:

postgres=# CREATE INDEX demoidx ON demotable(num);


postgres=# explain select * from demotable order by num;

                                QUERY PLAN


 Index Scan using demoidx on demotable  (cost=0.29..534.28 rows=10000 width=15)

(1 row)

As explained in my previous blog An Overview of the JOIN Methods in PostgreSQL, Merge Join requires both table data to be sorted before joining. So it may happen that Merge Join found to be cheaper than any other join method even with an additional cost of sorting. So in this case, Sort node will be added between join and scan method of the table so that sorted records can be passed on to the join method.

postgres=# create table demo1(id int, id2 int);


postgres=# insert into demo1 values(generate_series(1,1000), generate_series(1,1000));

INSERT 0 1000

postgres=# create table demo2(id int, id2 int);


postgres=# create index demoidx2 on demo2(id);


postgres=# insert into demo2 values(generate_series(1,100000), generate_series(1,100000));

INSERT 0 100000

postgres=# analyze;


postgres=# explain select * from demo1, demo2 where;

                                  QUERY PLAN


 Merge Join  (cost=65.18..109.82 rows=1000 width=16)

   Merge Cond: ( =

   ->  Index Scan using demoidx2 on demo2  (cost=0.29..3050.29 rows=100000 width=8)

   ->  Sort  (cost=64.83..67.33 rows=1000 width=8)

      Sort Key:

      ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=8)

(6 rows)


Aggregate node gets added as part of a plan tree if there is an aggregate function used to compute single results from multiple input rows. Some of the aggregate functions used are COUNT, SUM, AVG (AVERAGE), MAX (MAXIMUM) and MIN (MINIMUM).

An aggregate node can come on top of a base relation scan or (and) on join of relations. Example:

postgres=# explain select count(*) from demo1;

                       QUERY PLAN


 Aggregate  (cost=17.50..17.51 rows=1 width=8)

   ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=0)

(2 rows)

postgres=# explain select sum( from demo1, demo2 where;

                                       QUERY PLAN


 Aggregate  (cost=112.32..112.33 rows=1 width=8)

   ->  Merge Join  (cost=65.18..109.82 rows=1000 width=4)

      Merge Cond: ( =

      ->  Index Only Scan using demoidx2 on demo2  (cost=0.29..3050.29 rows=100000 width=4)

      ->  Sort  (cost=64.83..67.33 rows=1000 width=4)

            Sort Key:

            ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=4)

HashAggregate / GroupAggregate

These kinds of nodes are extensions of the “Aggregate” node. If aggregate functions are used to combine multiple input rows as per their group, then these kinds of nodes are added to a plan tree. So if the query has any aggregate function used and along with that there is a GROUP BY clause in the query, then either HashAggregate or GroupAggregate node will be added to the plan tree.

Since PostgreSQL uses Cost Based Optimizer to generate an optimal plan tree, it is almost impossible to guess which of these nodes will be used. But let’s understand when and how it gets used.


HashAggregate works by building the hash table of the data in order to group them. So HashAggregate may be used by group level aggregate if the aggregate is happening on unsorted data set.

postgres=# explain select count(*) from demo1 group by id2;

                       QUERY PLAN


 HashAggregate  (cost=20.00..30.00 rows=1000 width=12)

   Group Key: id2

   ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=4)

(3 rows)

Here the demo1 table schema data is as per the example shown in the previous section. Since there are only 1000 rows to group, so the resource required to build a hash table is lesser than the cost of sorting. The query planner decides to choose HashAggregate.


GroupAggregate works on sorted data so it does not require any additional data structure. GroupAggregate may be used by group level aggregate if the aggregation is on sorted data set. In order to group on sorted data either it can explicitly sort (by adding Sort node) or it might work on data fetched by index in which case it is implicitly sorted.

postgres=# explain select count(*) from demo2 group by id2;

                            QUERY PLAN


 GroupAggregate  (cost=9747.82..11497.82 rows=100000 width=12)

   Group Key: id2

   ->  Sort  (cost=9747.82..9997.82 rows=100000 width=4)

      Sort Key: id2

      ->  Seq Scan on demo2  (cost=0.00..1443.00 rows=100000 width=4)

(5 rows) 

Here the demo2 table schema data is as per the example shown in the previous section. Since here there are 100000 rows to group, so the resource required to build hash table might be costlier than the cost of sorting. So the query planner decides to choose GroupAggregate. Observe here the records selected from the “demo2” table are explicitly sorted and for which there is a node added in the plan tree.

See below another example, where already data are retrieved sorted because of index scan:

postgres=# create index idx1 on demo1(id);


postgres=# explain select sum(id2), id from demo1 where id=1 group by id;

                            QUERY PLAN


 GroupAggregate  (cost=0.28..8.31 rows=1 width=12)

   Group Key: id

   ->  Index Scan using idx1 on demo1  (cost=0.28..8.29 rows=1 width=8)

      Index Cond: (id = 1)

(4 rows) 

See below one more example, which even though has Index Scan, still it needs to explicitly sort as the column on which index there and grouping column are not the same. So still it needs to sort as per the grouping column.

postgres=# explain select sum(id), id2 from demo1 where id=1 group by id2;

                               QUERY PLAN


 GroupAggregate  (cost=8.30..8.32 rows=1 width=12)

   Group Key: id2

   ->  Sort  (cost=8.30..8.31 rows=1 width=8)

      Sort Key: id2

      ->  Index Scan using idx1 on demo1  (cost=0.28..8.29 rows=1 width=8)

            Index Cond: (id = 1)

(6 rows)

Note: GroupAggregate/HashAggregate can be used for many other indirect queries even though aggregation with group by not there in the query. It depends on how the planner interprets the query. E.g. Say we need to get distinct value from the table, then it can be seen as a group by the corresponding column and then take one value from each group.

postgres=# explain select distinct(id) from demo1;

                       QUERY PLAN


 HashAggregate  (cost=17.50..27.50 rows=1000 width=4)

   Group Key: id

   ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=4)

(3 rows)

So here HashAggregate gets used even though there is no aggregation and group by involved.


Limit nodes get added to the plan tree if the “limit/offset” clause is used in the SELECT query. This clause is used to limit the number of rows and optionally provide an offset to start reading data. Example below:

postgres=# explain select * from demo1 offset 10;

                       QUERY PLAN


 Limit  (cost=0.15..15.00 rows=990 width=8)

   ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=8)

(2 rows)

postgres=# explain select * from demo1 limit 10;

                       QUERY PLAN


 Limit  (cost=0.00..0.15 rows=10 width=8)

   ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=8)

(2 rows)

postgres=# explain select * from demo1 offset 5 limit 10;

                       QUERY PLAN


 Limit  (cost=0.07..0.22 rows=10 width=8)

   ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=8)

(2 rows)


This node gets selected in order to get a distinct value from the underlying result. Note that depending on the query, selectivity and other resource info, the distinct value can be retrieved using HashAggregate/GroupAggregate also without using Unique node. Example:

postgres=# explain select distinct(id) from demo2 where id<100;

                                 QUERY PLAN


 Unique  (cost=0.29..10.27 rows=99 width=4)

   ->  Index Only Scan using demoidx2 on demo2  (cost=0.29..10.03 rows=99 width=4)

      Index Cond: (id < 100)

(3 rows)


PostgreSQL provides functionality to lock all rows selected. Rows can be selected in a “Shared” mode or “Exclusive” mode depending on the “FOR SHARE” and “FOR UPDATE” clause respectively. A new node “LockRows” gets added to plan tree in achieving this operation.

postgres=# explain select * from demo1 for update;

                        QUERY PLAN


 LockRows  (cost=0.00..25.00 rows=1000 width=14)

   ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=14)

(2 rows)

postgres=# explain select * from demo1 for share;

                        QUERY PLAN


 LockRows  (cost=0.00..25.00 rows=1000 width=14)

   ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=14)

(2 rows)


PostgreSQL provides functionality to combine the results of two or more query. So as the type of Join node gets selected to join two tables, a similarly type of SetOp node gets selected to combine the results of two or more queries. For example, consider a table with employees with their id, name, age and their salary as below:

postgres=# create table emp(id int, name char(20), age int, salary int);


postgres=# insert into emp values(1,'a', 30,100);


postgres=# insert into emp values(2,'b', 31,90);


postgres=# insert into emp values(3,'c', 40,105);


postgres=# insert into emp values(4,'d', 20,80);


Now let’s get employees with age more than 25 years:

postgres=# select * from emp where age > 25;

 id |         name | age | salary


  1 | a                |  30 |    100

  2 | b                |  31 |     90

  3 | c                |  40 |    105

(3 rows) 

Now let’s get employees with salary more than 95M:

postgres=# select * from emp where salary > 95;

 id |         name | age | salary


  1 | a                |  30 |    100

  3 | c                |  40 |    105

(2 rows)

Now in order to get employees with age more than 25 years and salary more than 95M, we can write below intersect query:

postgres=# explain select * from emp where age>25 intersect select * from emp where salary > 95;

                                QUERY PLAN


 HashSetOp Intersect  (cost=0.00..72.90 rows=185 width=40)

   ->  Append  (cost=0.00..64.44 rows=846 width=40)

      ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..30.11 rows=423 width=40)

            ->  Seq Scan on emp  (cost=0.00..25.88 rows=423 width=36)

                  Filter: (age > 25)

      -> Subquery Scan on "*SELECT* 2"  (cost=0.00..30.11 rows=423 width=40)

            ->  Seq Scan on emp emp_1  (cost=0.00..25.88 rows=423 width=36)

                  Filter: (salary > 95)

(8 rows) 

So here, a new kind of node HashSetOp is added to evaluate the intersect of these two individual queries.

Note that there are other two kinds of new node added here:


This node gets added to combine multiple results set into one.

Subquery Scan

This node gets added to evaluate any subquery. In the above plan, the subquery is added to evaluate one additional constant column value which indicates which input set contributed a specific row.

HashedSetop works using the hash of the underlying result but it is possible to generate Sort based SetOp operation by the query optimizer. Sort based Setop node is denoted as “Setop”.

Note: It is possible to achieve the same result as shown in the above result with a single query but here it is shown using intersect just for an easy demonstration.


All nodes of PostgreSQL are useful and get selected based on the nature of the query, data, etc. Many of the clauses are mapped one to one with nodes. For some clauses there are multiple options for nodes, which get decided based on the underlying data cost calculations.


by Kumar Rajeev Rastogi at October 15, 2019 05:36 PM

October 14, 2019


MySQL Cloud Backup and Restore Scenarios Using Microsoft Azure

Backups are a very important part of your database operations, as your business must be secured when catastrophe strikes. When that time comes (and it will), your Recovery Point Objective (RPO) and Recovery Time Objective (RTO) should be predefined, as this is how fast you can recover from the incident which occurred. 

Most organizations vary their approach to backups, trying to have a combination of server image backups (snapshots), logical and physical backups. These backups are then stored in multiple locations, so as to avoid any local or regional disasters.  It also means that the data can be restored in the shortest amount of time, avoiding major downtime which can impact your company's business. 

Hosting your database with a cloud provider, such as Microsoft Azure (which we will discuss in this blog), is not an exception, you still need to prepare and define your disaster recovery policy.

Like other public cloud offerings, Microsoft Azure (Azure) offers an approach for backups that is practical, cost-effective, and designed to provide you with recovery options. Microsoft Azure backup solutions allow you to configure and operate and are easily handled using their Azure Backup or through the Restore Services Vault (if you are operating your database using virtual machines). 

If you want a managed database in the cloud, Azure offers Azure Database for MySQL. This should be used only if you do not want to operate and manage the MySQL database yourself. This service offers a rich solution for backup which allows you to create a backup of your database instance, either from a local region or through a geo-redundant location. This can be useful for data recovery. You may even be able to restore a node from a specific period of time, which is useful in achieving point-in-time recovery. This can be done with just one click.

In this blog, we will cover all of these backup and restore scenarios using a MySQL database on the Microsoft Azure cloud.

Performing Backups on a Virtual Machine on Azure

Unfortunately, Microsoft Azure does not offer a MySQL-specific backup type solution (e.g. MySQL Enterprise Backup, Percona XtraBackup, or MariaDB's Mariabackup). 

Upon creation of your Virtual Machine (using the portal), you can setup a process to backup your VM using the Restore Services vault. This will guard you from any incident, disaster, or catastrophe and the data stored is encrypted by default. Adding encryption is optional and, though recommended by Azure, it comes with a price. You can take a look at their Azure Backup Pricing page for more details.

To create and setup a backup, go to the left panel and click All Resources → Compute → Virtual Machine. Now set the parameters required in the text fields. Once you are on that page, go to the Management tab and scroll down below. You'll be able to see how you can setup or create the backup. See the screenshot below:

Create a Virtual Machine - Azure

Then setup your backup policy based on your backup requirements. Just hit the Create New link in the Backup policy text field to create a new policy. See below:

Define Backup Policy - Azure

You can configure your backup policy with retention by week, monthly, and yearly. 

Once you have your backup configured, you can check that you have a backup enabled on that particular virtual machine you have just created. See the screenshot below:

Backup Settings - Azure

Restore and Recover Your Virtual Machine on Azure

Designing your recovery in Azure depends on what kind of policy and requirements your application requires. It also depends on whether RTO and RPO must be low or invisible to the user in case an incident or during maintenance. You may setup your virtual machine with an availability set or on a different availability zone to achieve a higher recovery rate. 

You may also setup a disaster recovery for your VM to replicate your virtual machines to another Azure region for business continuity and disaster recovery needs. However, this might not be a good idea for your organization as it comes with a high cost. If in place, Azure offers you an option to restore or create a virtual machine from the backup created. 

For example, during the creation of your virtual machine, you can go to Disks tab, then go to Data Disks. You can create or attach an existing disk where you can attach the snapshot you have available. See the screenshot below for which you'll be able to choose from snapshot or storage blob:

Create a New Disk - Azure

 You may also restore on a specific point in time just like in the screenshot below:

Set Restore Point - Azure

Restoring in Azure can be done in different ways, but it uses the same resources you have already created.

For example, if you have created a snapshot or a disk image stored in the Azure Storage blob, if you create a new VM, you can use that resource as long as it's compatible and available to use. Additionally, you may even be able to do some file recovery, aside from restoring a VM just like in the screenshot below:

File Recovery - Azure

During File Recovery, you may be able to choose from a specific recovery point, as well as download a script to browse and recover files. This is very helpful when you need only a specific file but not the whole system or disk volume.

Restoring from backup on an existing VM takes about three minutes. However, restoring from backup to spawn a new VM takes twelve minutes. This, however, could depend on the size of your VM and the network bandwidth available in Azure. The good thing is that, when restoring, it will provide you with details of what has been completed and how much time is remaining. For example, see the screenshot below:

Recovery Job Status - Azure

Backups for Azure Database For MySQL

Azure Database for MySQL is a fully-managed database service by Microsoft Azure. This service offers a very flexible and convenient way to setup your backup and restore capabilities.

Upon creation of your MySQL server instance, you can then setup backup retention and create your backup redundancy options; either locally redundant (local region) or geo-redundant (on a different region). Azure will provide you the estimated cost you would be charged for a month. See a sample screenshot below:

Pricing Calculator - Azure

Keep in mind that geo-redundant backup options are only available on General Purpose and Memory Optimized types of compute nodes. It's not available on a Basic compute node, but you can have your redundancy in the local region (i.e. within the availability zones available).

Once you have a master setup, it's easy to create a replica by going to Azure Database for MySQL servers → Select your MyQL instance → Replication → and click Add Replica. Your replica can be used as the source or restore target when needed. 

Keep in mind that in Azure, when you stop the replication between the master and a replica, this will be forever and irreversible as it makes the replica a standalone server. A replica created using Microsoft Azure is ideally a managed instance and you can stop and start the replication threads just like what you do on a normal master-slave replication. You can do a restart and that's all. If you created the replica manually, by either restoring from the master or a backup, (e.g. via a point-in-time recovery), then you'll be able to stop/start the replication threads or setup a slave lag if needed.

Restoring Your Azure Database For MySQL From A Backup

Restoring is very easy and quick using the Azure portal. You can just hit the restore button with your MySQL instance node and just follow the UI as shown in the screenshot below:

Restoring Your Azure Database For MySQL From A Backup

Then you can select a period of time and create/spawn a new instance based on this backup captured:

Restore - Azure Database For MySQL

Once you have the node available, this node will not be a replica of the master yet. You need to manually set this up with easy steps using their stored procedures available:

CALL mysql.az_replication_change_master('<master_host>', '<master_user>', '<master_password>', 3306, '<master_log_file>', <master_log_pos>, '<master_ssl_ca>');


master_host: hostname of the master server

master_user: username for the master server

master_password: password for the master server

master_log_file: binary log file name from running show master status

master_log_pos: binary log position from running show master status

master_ssl_ca: CA certificate’s context. If not using SSL, pass in empty string.

Then starting the MySQL threads is as follows,

CALL mysql.az_replication_start;

or you can stop the replication threads as follows,

CALL mysql.az_replication_stop;

or you can remove the master as,

CALL mysql.az_replication_remove_master;

or skip SQL thread errors as,

CALL mysql.az_replication_skip_counter;

As mentioned earlier, when a replica is created using Microsoft Azure under the Add Replica feature under a MySQL instance, these specific stored procedures aren't available. However, the mysql.az_replication_restart procedure will be available since you are not allowed to stop nor start the replication threads of a managed replica by Azure. So the example we have above was restored from a master which takes the full copy of the master but acts as a single node and needs a manual setup to be a replica of an existing master.

Additionally, when you have a manual replica that you have setup, you will not be able to see this under Azure Database for MySQL servers → Select your MyQL instance → Replication since you created or setup the replication manually.

Alternative Cloud and Restore Backup Solutions

There are certain scenarios where you want to have full-access when taking a full backup of your MySQL database in the cloud. To do this you can create your own script or use open-source technologies. With these you can control how the data in your MySQL database should be backed up and precisely how it should be stored. 

You can also leverage Azure Command Line Interface (CLI) to create your custom automation. For example, you can create a snapshot using the following command with Azure CLI:

az snapshot create  -g myResourceGroup -source "$osDiskId" --name osDisk-backup

or create your MySQL server replica with the following command:

az mysql server replica create --name mydemoreplicaserver --source-server mydemoserver --resource-group myresourcegroup

Alternatively, you can also leverage an enterprise tool that features ways to take your backup with restore options. Using open-source technologies or 3rd party tools requires knowledge and skills to leverage and create your own implementation. Here's the list you can leverage:

  • ClusterControl - While we may be a little biased, ClusterControl offers the ability to manage physical and logical backups of your MySQL database using battle-tested, open-source technologies (PXB, Mariabackup, and mydumper). It supports MySQL, Percona, MariaDB, Galera databases. You can easily create our backup policy and store your database backups on any cloud (AWS, GCP, or Azure) Please note that the free version of ClusterControl does not include the backup features.
  • LVM Snapshots - You can use LVM to take a snapshot of your logical volume. This is only applicable for your VM since it requires access to block-level storage. Using this tool requires caveat since it can bring your database node unresponsive while the backup is running.
  • Percona XtraBackup (PXB) - An open source technology from Percona. With PXB, you can create a physical backup copy of your MySQL database. You can also do a hot-backup with PXB for InnoDB storage engine but it's recommended to run this on a slave or non-busy MySQL db server. This is only applicable for your VM instance since it requires binary or file access to the database server itself.
  • Mariabackup - Same with PXB, it's an open-source technology forked from PXB but is maintained by MariaDB. Specifically, if your database is using MariaDB, you should use Mariabackup in order to avoid incompatibility issues with tablespaces.
  • mydumper/myloader - These backup tools creates a logical backup copies of your MySQL database. You can use this with your Azure database for MySQL though I haven't tried how successful is this for your backup and restore procedure.
  • mysqldump - it's a logical backup tool which is very useful when you need to backup and dump (or restore) a specific table or database to another instance. This is commonly used by DBA's but you need to pay attention of your disks space as logical backup copies are huge compared to physical backups.
  • MySQL Enterprise Backup - It delivers hot, online, non-blocking backups on multiple platforms including Linux, Windows, Mac & Solaris. It's not a free backup tool but offers a lot of features.
  • rsync - It's a fast and extraordinarily versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync daemon. It offers a large number of options that control every aspect of its behavior and permit very flexible specification of the set of files to be copied. Mostly in Linux systems, rsync is installed as part of the OS package.

by Paul Namuag at October 14, 2019 09:45 AM

October 11, 2019


Securing MongoDB from External Injection Attacks

MongoDB security is not fully-guaranteed by simply configuring authentication certificates or encrypting the data. Some attackers will “go the extra mile” by playing with the received parameters in HTTP requests which are used as part of the database’s query process. 

SQL databases are the most vulnerable to this type of attack, but external injection is also possible in NoSQL DBMs such as MongoDB. In most cases, external injections happen as a result of an unsafe concatenation of strings when creating queries.

What is an External Injection Attack?

Code injection is basically integrating unvalidated data (unmitigated vector) into a vulnerable program which when executed, leads to disastrous access to your database; threatening its safety. 

When unsanitized variables are passed into a MongoDB query, they break the document query orientation structure and are sometimes executed as the javascript code itself. This is often the case when passing props directly from the body-parser module for the Nodejs server. Therefore, an attacker can easily insert a Js object where you’d expect a string or number, thereby getting unwanted results or by manipulating your data. 

Consider the data below in a student's collection.

{username:'John Doc', email:'', age:20},

{username:'Rafael Silver', email:'', age:30},

{username:'Kevin Smith', email:'', age:22},

{username:'Pauline Wagu', email:'', age:23}

Let’s say your program has to fetch all students whose age is equal to 20,  you would write a code like this...

app.get(‘/:age’, function(req, res){

  db.collections(“students”).find({age: req.params.age});


You will have submitted a JSON object in your http request as 

{age: 20}

This will return all students whose age is equal to 20 as the expected result and in this case only {username:'John Doc', email:'', age:20}

Now let’s say an attacker submits an object instead of a number i.e {‘$gt:0’};

The resulting query will be:

db.collections(“students”).find({age: {‘$gt:0’}); which is a valid query that upon execution  will return all students in that collection. The attacker has a chance to act on your data according to their malicious intentions. In most cases, an attacker injects a custom object that contains MongoDB commands that enable them to access your documents without the proper procedure.

Some MongoDB commands execute Javascript code within the database engine, a potential risk for your data. Some of these commands are ‘$where’, ‘$group’ and ‘mapReduce’. For versions before MongoDB 2.4, Js code has access to the db object from within the query.

MongoDB Naitive Protections

MongoDB utilizes the BSON data (Binary JSON) for both its queries and documents, but in some instances it can accept unserialized JSON and Js expressions (such as the ones mentioned above). Most of the data passed to the server is in the format of a string and can be fed directly into a MongoDB query. MongoDB does not parse its data, therefore avoiding potential risks that may result from direct parameters being integrated. 

If an API involves encoding data in a formatted text and that text needs to be parsed, it has the potential of creating disagreement between the server’s caller and the database’s callee on how that string is going to be parsed. If the data is accidentally misinterpreted as metadata the scenario can potentially pose security threats to your data.

Examples of MongoDB External Injections and How to Handle Them

 Let’s consider the data below in a students collection.

{username:'John Doc', password: ‘16djfhg’, email:'', age:20},

{username:'Rafael Silver',password: ‘djh’, email:'', age:30},

{username:'Kevin Smith', password: ‘16dj’, email:'', age:22},

{username:'Pauline Wagu', password: ‘g6yj’, email:'', age:23}

Injection Using the $ne (not equal) Operator

If I want to return the document with username and password supplied from a request the code will be:'/students, function (req, res) {

    var query = {

        username: req.body.username,

        password: req.body.password


    db.collection(students).findOne(query, function (err, student) {




If we receive the request below

POST https://localhost/students HTTP/1.1

Content-Type: application/json


    "username": {"$ne": null},

    "password": {"$ne": null}


The query will definitely return the first student in this case since his username and password are not valued to be null. This is not according to the expected results.

To solve this, you can use:

mongo-sanitize module which stops any key that starts with‘$’ from being passed into MongoDB query engine.

Install the module first  

​npm install mongo-sanitize

var sanitize = require(‘mongo-sanitize’);

var query = {

username: req.body.username,

password: req.body.password


Using mongoose to validate your schema fields such that if it expects a string and receives an object, the query will throw an error. In our case above the null value will be converted into a string “” which literally has no impact.

Injection Using the $where Operator

This is one of the most dangerous operators. It will allow a string to be evaluated inside the server itself. For example, to fetch students whose age is above a value Y, the query will be 

var query = { 

   $where: “this.age > ”+req.body.age


 db.collection(students).findOne(query, function (err, student) {



Using the sanitize module won’t help in this case if we have a ‘0; return true’ because the result will return all the students rather than those whose age is greater than some given value. Other possible strings you can receive are ‘\’; return \ ‘\’ == \’’ or === ‘’;return ‘’ == ‘’. This query will return all students rather than only those that match the clause.

The $where clause should be greatly avoided. Besides the outlined setback it also reduces performance because it is not optimized to use indexes.

There is also a great possibility of passing a function in the $where clause and the variable will not be accessible in the MongoDB scope hence may result in your application crashing. I.e

var query = {

   $where: function() {

       return this.age > setValue //setValue is not defined



You can also use the $eq, $lt, $lte, $gt, $gte operators instead.

Protecting Yourself from MongoDB External Injection

Here are three things you can do to keep yourself protected...

  1. Validate user data.  Looking back at how the $where expression can be used to access your data, it is advisable to always validate what users send to your server.
  2. Use the JSON validator concept to validate your schema together with the mongoose module.
  3. Design your queries such that Js code does not have full access to your database code.


External injection are also possible with MongoDB. It is often associated with unvalidated user data getting into MongoDB queries. It is always important to detect and prevent NoSQL injection by testing any data that may be received by your server. If neglected, this can threaten the safety of user data. The most important procedure is to validate your data at all involved layers.

by Onyancha Brian Henry at October 11, 2019 09:45 AM

October 10, 2019


Using MySQL Galera Cluster Replication to Create a Geo-Distributed Cluster: Part Two

In the previous blog in the series we discussed the pros and cons of using Galera Cluster to create geo-distributed cluster. In this post we will design a Galera-based geo-distributed cluster and we will show how you can deploy all the required pieces using ClusterControl.

Designing a Geo-Distributed Galera Cluster

We will start with explaining the environment we want to build. We will use three remote data centers, connected via Wide Area Network (WAN). Each datacenter will receive writes from local application servers. Reads will also be only local. This is intended to avoid unnecessary traffic crossing the WAN. 

For this setup the connectivity is in place and secured, but we won’t describe exactly how this can be achieved. There are numerous methods to secure the connectivity starting from proprietary hardware and software solutions through OpenVPN and ending up on SSH tunnels. 

We will use ProxySQL as a loadbalancer. ProxySQL will be deployed locally in each datacenter. It will also route traffic only to the local nodes. Remote nodes can always be added manually and we will explain cases where this might be a good solution. Application can be configured to connect to one of the local ProxySQL nodes using round-robin algorithm. We can as well use Keepalived and Virtual IP to route the traffic towards the single ProxySQL node, as long as a single ProxySQL node would be able to handle all of the traffic. 

Another possible solution is to collocate ProxySQL with application nodes and configure the application to connect to the proxy on the localhost. This approach works quite well under the assumption that it is unlikely that ProxySQL will not be available yet the application would work ok on the same node. Typically what we see is either node failure or network failure, which would affect both ProxySQL and application at the same time.

Geo-Distributed MySQL Galera Cluster with ProxySQL

The diagram above shows the version of the environment, where ProxySQL is collocated on the same node as the application. ProxySQL is configured to distribute the workload across all Galera nodes in the local datacenter. One of those nodes would be picked as a node to send the writes to while SELECTs would be distributed across all nodes. Having one dedicated writer node in a datacenter helps to reduce the number of possible certification conflicts, leading to, typically, better performance. To reduce this even further we would have to start sending the traffic over the WAN connection, which is not ideal as the bandwidth utilization would significantly increase. Right now, with segments in place, only two copies of the writeset are being sent across datacenters - one per DC.

The main concern with Galera Cluster geo-distributed deployments is latency. This is something you always have to test prior launching the environment. Am I ok with the commit time? At every commit certification has to happen so writesets have to be sent and certified on all nodes in the cluster, including remote ones. It may be that the high latency will deem the setup unsuitable for your application. In that case you may find multiple Galera clusters connected via asynchronous replication more suitable. This would be a topic for another blog post though.

Deploying a Geo-Distributed Galera Cluster Using ClusterControl

To clarify things, we will show here how a deployment may look like. We won’t use actual multi-DC setup, everything will be deployed in a local lab. We assume that the latency is acceptable and the whole setup is viable. What is great about ClusterControl is that it is infrastructure-agnostic. It doesn’t care if the nodes are close to each other, located in the same datacenter or if the nodes are distributed across multiple cloud providers. As long as there is SSH connectivity from ClusterControl instance to all of the nodes, the deployment process looks exactly the same. That’s why we can show it to you step by step using just local lab.

Installing ClusterControl

First, you have to install ClusterControl. You can download it for free. After registering, you should access the page with guide to download and install ClusterControl. It is as simple as running a shell script. Once you have ClusterControl installed, you will be presented with a form to create an administrative user:

Installing ClusterControl

Once you fill it, you will be presented with a Welcome screen and access to deployment wizards:

ClusterControl Welcome Screen

We’ll go with deploy. This will open a deployment wizard:

ClusterControl Deployment Wizard

We will pick MySQL Galera. We have to pass SSH connectivity details - either root user or sudo user are supported. On the next step we are to define servers in the cluster.

Deploy Database Cluster

We are going to deploy three nodes in one of the data centers. Then we will be able to extend the cluster, configuring new nodes in different segments. For now all we have to do is to click on “Deploy” and watch ClusterControl deploying the Galera cluster.

Cluster List - ClusterControl

Our first three nodes are up and running, we can now proceed to adding additional nodes in other datacenters.

Add a Database Node - ClusterControl

You can do that from the action menu, as shown on the screenshot above.

Add a Database Node - ClusterControl

Here we can add additional nodes, one at a time. What is important, you should change the Galera segment to non-zero (0 is used for the initial three nodes).

After a while we end up with all nine nodes, distributed across three segments.

ClusterControl Geo-Distributed Database Nodes

Now, we have to deploy proxy layer. We will use ProxySQL for that. You can deploy it in ClusterControl via Manage -> Load Balancer:

Add a Load Balancer - ClusterControl

This opens a deployment field:

Deploy Load Balancer - ClusterControl

First, we have to decide where to deploy ProxySQL. We will use existing Galera nodes but you can type anything in the field so it is perfectly possible to deploy ProxySQL on top of the application nodes. In addition, you have to pass access credentials for the administrative and monitoring user.

Deploy Load Balancer - ClusterControl

Then we have to either pick one of existing users in MySQL or create one right now. We also want to ensure that the ProxySQL is configured to use Galera nodes located only in the same datacenter.

When you have one ProxySQL ready in the datacenter, you can use it as a source of the configuration:

Deploy ProxySQL - ClusterControl

This has to be repeated for every application server that you have in all datacenters. Then the application has to be configured to connect to the local ProxySQL instance, ideally over the Unix socket. This comes with the best performance and the lowest latency.

Reducing Latency - ClusterControl

After the last ProxySQL is deployed, our environment is ready. Application nodes connect to local ProxySQL. Each ProxySQL is configured to work with Galera nodes in the same datacenter:

ProxySQL Server Setup - ClusterControl


We hope this two-part series helped you to understand the strengths and weaknesses of geo-distributed Galera Clusters and how ClusterControl makes it very easy to deploy and manage such cluster.

by krzysztof at October 10, 2019 09:45 AM

October 09, 2019

Valeriy Kravchuk

Dynamic Tracing of MariaDB Server With bpftrace - Basic Example

Unlike the previous post, this one is not just a comment to some slides from the "Tracing and Profiling MySQL" talk at Percona Live Europe 2019. I am going to add the details that were missing there (as I was in a hurry and had forgotten to copy/paste proper outputs while testing). I am going to show how to add dynamic probe with "latest and greatest" bpftrace tool.

The goal is the same as before - try to add dynamic probe(s) to trace query execution. More specifically, to capture text of the queries executed by clients of MySQL server. As bpftrace requires new kernel and just does not work on Ubuntu 16.04, for the demonstration I use my Fedora 29 box with kernel 5.2.x and, for a change, get queries from Fedora's own MariaDB 10.3.17 installed from rpm packages there.

I have both bcc and bpftrace installed also from packages:
[openxs@fc29 ~]$ rpm -qa | grep bcc
[openxs@fc29 ~]$ rpm -qa | grep bpf
[openxs@fc29 ~]$
You can check fine manual for the details, but even basic -h option provides enough to start, as long as you already know some terms and probes syntax:
[openxs@fc29 ~]$ bpftrace -h
    bpftrace [options] filename
    bpftrace [options] -e 'program'

    -B MODE        output buffering mode ('line', 'full', or 'none')
    -d             debug info dry run
    -dd            verbose debug info dry run
    -e 'program'   execute this program
    -h             show this help message
    -l [search]    list probes
    -p PID         enable USDT probes on PID
    -c 'CMD'       run CMD and enable USDT probes on resulting process
    -v             verbose messages
    --version      bpftrace version

    BPFTRACE_STRLEN           [default: 64] bytes on BPF stack per str()
    BPFTRACE_NO_CPP_DEMANGLE  [default: 0] disable C++ symbol demangling

bpftrace -l '*sleep*'
    list probes containing "sleep"
bpftrace -e 'kprobe:do_nanosleep { printf("PID %d sleeping...\n", pid); }'
    trace processes calling sleep
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
    count syscalls by process name
In our case we need to define uprobe for proper mysqld binary and trace the dispatch_command() function. Before we start, note that parameters of dispatch_command() in MariaDB 10.3 are not the same as in Percona Server 5.7 I've used in the previous post. Basically, this function starts as follows in sql/
   1570 bool dispatch_command(enum enum_server_command command, THD *thd,
   1571                       char* packet, uint packet_length, bool is_com_mult        i,
   1572                       bool is_next_command)
Note the third argument, packet. If the first argument, command. is SQL_QUERY, then packet contains the query text (as a zero-terminated string) for sure (it's also true for many other commands, but let me skip the details for now). That's why we'll use third argument in our uprobe to capture the SQL text.

Now, let's start the service and check the exact full path name for the mysql binary:
[openxs@fc29 ~]$ sudo service mariadb start
[sudo] password for openxs:
Redirecting to /bin/systemctl start mariadb.service
[openxs@fc29 ~]$ ps aux | grep mysqldmysql     9109  6.2  1.2 1699252 104108 ?      Ssl  09:30   0:00 /usr/libexec/mysqld --basedir=/usr
openxs    9175  0.0  0.0 215744   892 pts/0    S+   09:30   0:00 grep --color=auto mysqld
The first naive attempt to add the probe after cursory reading the documentation and checking few examples may look like this:
 [openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:dispatch_command { printf("%s\n", str(arg2)); }'
Attaching 1 probe...
Could not resolve symbol: /usr/libexec/mysqld:dispatch_command
It seems in my case, unlike perf, bpftrace is not "aware" of C++ names or symbolic information in a separate -debuginfo package. So, I need mangled name:
[openxs@fc29 ~]$ nm -na /usr/libexec/mysqld | grep dispatch_command
nm: /usr/libexec/mysqld: no symbols
[openxs@fc29 ~]$ nm -na /home/openxs/dbs/maria10.3/bin/mysqld | grep dispatch_command
00000000004a1eef t _Z16dispatch_command19enum_server_commandP3THDPcjbb.cold.344
00000000005c5190 T _Z16dispatch_command19enum_server_commandP3THDPcjbb
00000000005c5190 t _Z16dispatch_command19enum_server_commandP3THDPcjbb.localalias.256
Surely there is no symbols in the binary from Fedora package, so I checked the binary (of the same version) that I've built myself (as usual) and assumed that neither parameters nor mangling approach could be different. So, the next attempt to add dynamic probe would look as follows:
[openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb { printf("%s\n", str(arg2)); }'
Attaching 1 probe...
show databases
show tables
select @@version_comment limit 1
select user, host from mysql.user
It worked and you see above the output I've got for the following session:
[openxs@fc29 ~]$ mysql -uroot testReading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 19
Server version: 10.3.17-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [test]> select user, host from mysql.user;
| user | host      |
| root | |
| root | ::1       |
| root | fc29      |
| root | localhost |
4 rows in set (0.000 sec)
You can see some SQL statements generated when mysql command line connects, as well as packet value in some other packet ("t1") than COM_QUERY, probably. My probe had not even tried to check other parameters besides the supposed query text.

Now, the probe is defined on uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb - I've just used long, mangled version of the function name and full path name to the binary, and defined a dynamic probe (uprobe). There is no filter and the action for the probe is defined as { printf("%s\n", str(arg2)); } - that is, I print third argument (they are numbered starting from zero, arg0, arg1, arg2, ...) as a zero-terminated string. Without str() built in function I'd get just a pointer that could be printed as (unsigned) long integer, u64.

Basically, that's all. We have a quick and dirty way to capture all queries. No timing or anything, but it all depends on probe action that can use numerous built in variables and functions.

More "advanced" use of bpftrace, a lame attempt to capture time to execute query, may look like this:
[openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb { @sql = str(arg2); @start[@sql] = nsecs; }
uretprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb /@start[@sql] != 0/ { printf("%s : %u64 ms\n", @sql, (nsecs - @start[@sql])/1000000); } '

Attaching 2 probes...
select sleep(3) : 300064 ms
select sleep(1) : 100064 ms

@sql: select sleep(1)

@start[select sleep(3)]: 10666558704666
@start[select sleep(1)]: 10685614895675

[openxs@fc29 ~]$
In this case I try to store time since probe start into the associative array with query text as an "index" and start time (in nanoseconds) as a value.Then I calculate the difference from current nsecs value upon function return, in a separate uretprobe. I've used global variables, @sql for the query text, and @start[] for the array. It even seems to work well for a single threaded load based on the above. But as soon as I try to use multiple concurrent threads:
[openxs@fc29 ~]$ for i in `seq 1 4`; do mysql -uroot test -e"select sleep($i)" & done
it becomes clear that global variables are really global and my outputs are all wrong.

A bit better version may look like this:
[openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb { @sql[tid] = str(arg2); @start[tid] = nsecs; }                                                                               uretprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb /@start[tid] != 0/ { printf("%s : %u64 %u64 ms\n", @sql[tid], tid, (nsecs - @start[tid])/1000000); } '
Attaching 2 probes...
select sleep(1) : 1120764 100064 ms
 : 1120764 064 ms
select sleep(2) : 949064 200064 ms
 : 949064 064 ms
select sleep(3) : 1120864 300064 ms
 : 1120864 064 ms
select sleep(4) : 1120664 400064 ms
 : 1120664 064 ms
select sleep(1) : 1120764 100064 ms
 : 1120764 064 ms
select sleep(2) : 949064 200064 ms
 : 949064 064 ms
select sleep(3) : 1120664 300064 ms
 : 1120664 064 ms
select sleep(4) : 1120864 400064 ms
 : 1120864 064 ms


@start[11207]: 13609305005933
@start[9490]: 13610305621499
@start[11206]: 13611305753596
@start[11208]: 13612305235313

[openxs@fc29 ~]$
The output is for both sequential and concurrent execution of queries. I've used two associative arrays, @sql[] for queries and @start[] for start times, both indexed by tid - built in variable for thread id, that should not change, at least until we use pool-of-threads... You can also see that the tool by default outputs the content of all global associate arrays at the end, unless we free them explicitly.

* * *
This image of bpftrace internals is taken from the Brendan Gregg's post
bpftrace commands may be much more complicated and it may make sense to to store them in a separate file. The tool is near ideal of "quick and dirty" tests, and one day I'll write a way more complete posts with much better examples. But having a way to capture, filter and summarize queries in kernel space and send only the relevant results to a user space, all that in a safe manner, is really cool!

by Valerii Kravchuk ( at October 09, 2019 04:19 PM


Using MySQL Galera Cluster Replication to Create a Geo-Distributed Cluster: Part One

It is quite common to see databases distributed across multiple geographical locations. One scenario for doing this type of setup is for disaster recovery, where your standby data center is located in a separate location than your main datacenter. It might as well be required so that the databases are located closer to the users. 

The main challenge to achieving this setup is by designing the database in a way that reduces the chance of issues related to the network partitioning. One of the solutions might be to use Galera Cluster instead of regular asynchronous (or semi-synchronous) replication. In this blog we will discuss the pros and cons of this approach. This is the first part in a series of two blogs. In the second part we will design the geo-distributed Galera Cluster and see how ClusterControl can help us deploy such environment.

Why Galera Cluster Instead of  Asynchronous Replication for Geo-Distributed Clusters?

Let’s consider the main differences between the Galera and regular replication. Regular replication provides you with just one node to write to, this means that every write from remote datacenter would have to be sent over the Wide Area Network (WAN) to reach the master. It also means that all proxies located in the remote datacenter will have to be able to monitor the whole topology, spanning across all data centers involved as they have to be able to tell which node is currently the master. 

This leads to the number of problems. First, multiple connections have to be established across the WAN, this adds latency and slows down any checks that proxy may be running. In addition, this adds unnecessary overhead on the proxies and databases. Most of the time you are interested only in routing traffic to the local database nodes. The only exception is the master and only because of this proxies are forced to watch the whole infrastructure rather than just the part located in the local datacenter. Of course, you can try to overcome this by using proxies to route only SELECTs, while using some other method (dedicated hostname for master managed by DNS) to point the application to master, but this adds unnecessary levels of complexity and moving parts, which could seriously impact your ability to handle multiple node and network failures without losing data consistency.

Galera Cluster can support multiple writers. Latency is also a factor, as all nodes in the Galera cluster have to coordinate and communicate to certify writesets, it can even be the reason you may decide not to use Galera when latency is too high. It is also an issue in replication clusters - in replication clusters latency affects only writes from the remote data centers while the connections from the datacenter where master is located would benefit from a low latency commits. 

In MySQL Replication you also have to take the worst case scenario in mind and ensure that the application is ok with delayed writes. Master can always change and you cannot be sure that all the time you will be writing to a local node.

Another difference between replication and Galera Cluster is the handling of the replication lag. Geo-distributed clusters can be seriously affected by lag: latency, limited throughput of the WAN connection, all of this will impact the ability of a replicated cluster to keep up with the replication. Please keep in mind that replication generates one to all traffic.

Geo-Distributed Galera Cluster

All slaves have to receive whole replication traffic - the amount of data you have to send to remote slaves over WAN increases with every remote slave that you add. This may easily result in the WAN link saturation, especially if you do plenty of modifications and WAN link doesn’t have good throughput. As you can see on the diagram above, with three data centers and three nodes in each of them master has to sent 6x the replication traffic over WAN connection.

With Galera cluster things are slightly different. For starters, Galera uses flow control to keep the nodes in sync. If one of the nodes start to lag behind, it has an ability to ask the rest of the cluster to slow down and let it catch up. Sure, this reduces the performance of the whole cluster, but it is still better than when you cannot really use slaves for SELECTs as they tend to lag from time to time - in such cases the results you will get might be outdated and incorrect.

Geo-Distributed Galera Cluster

Another feature of Galera Cluster, which can significantly improve its performance when used over WAN, are segments. By default Galera uses all to all communication and every writeset is sent by the node to all other nodes in the cluster. This behavior can be changed using segments. Segments allow users to split Galera cluster in several parts. Each segment may contain multiple nodes and it elects one of them as a relay node. Such node receives writesets from other segments and redistribute them across Galera nodes local to the segment. As a result, as you can see on the diagram above, it is possible to reduce the replication traffic going over WAN three times - just two “replicas” of the replication stream are being sent over WAN: one per datacenter compared to one per slave in MySQL Replication.

Galera Cluster Network Partitioning Handling

Where Galera Cluster shines is the handling of the network partitioning. Galera Cluster constantly monitors the state of the nodes in the cluster. Every node attempts to connect with its peers and exchange the state of the cluster. If subset of nodes is not reachable, Galera attempts to relay the communication so if there is a way to reach those nodes, they will be reached.

Galera Cluster Network Partitioning Handling

An example can be seen on the diagram above: DC 1 lost the connectivity with DC2 but DC2 and DC3 can connect. In this case one of the nodes in DC3 will be used to relay data from DC1 to DC2 ensuring that the intra-cluster communication can be maintained.

Galera Cluster Network Partitioning Handling

Galera Cluster is able to take actions based on the state of the cluster. It implements quorum - majority of the nodes have to be available in order for the cluster to be able to operate. If node gets disconnected from the cluster and cannot reach any other node, it will cease to operate. 

As can be seen on the diagram above, there’s a partial loss of the network communication in DC1 and affected node is removed from the cluster, ensuring that the application will not access outdated data.

Galera Cluster Network Partitioning Handling

This is also true on a larger scale. The DC1 got all of its communication cut off. As a result, whole datacenter has been removed from the cluster and neither of its nodes will serve the traffic. The rest of the cluster maintained majority (6 out of 9 nodes are available) and it reconfigured itself to keep the connection between DC 2 and DC3. In the diagram above we assumed the write hits the node in DC2 but please keep in mind that Galera is capable of running with multiple writers.

MySQL Replication does not have any kind of cluster awareness, making it problematic to handle network issues. It cannot shut down itself upon losing connection with other nodes. There is no easy way of preventing old master to show up after the network split. 

The only possibilities are limited to the proxy layer or even higher. You have to design a system, which would try to understand the state of the cluster and take necessary actions. One possible way is to use cluster-aware tools like Orchestrator and then run scripts that would check the state of the Orchestrator RAFT cluster and, based on this state, take required actions on the database layer. This is far from ideal because any action taken on a layer higher than the database, adds additional latency: it makes possible so the issue shows up and data consistency is compromised before correct action can be taken. Galera, on the other hand, takes actions on the database level, ensuring the fastest reaction possible.

by krzysztof at October 09, 2019 09:45 AM

October 08, 2019


How to Create a Clone of Your MySQL or PostgreSQL Database Cluster

If you are managing a production database, chances are high that you’ve had to clone your database to a different server other than the production server. The basic method of creating a clone is to restore a database from a recent backup onto another database server. Another method is by replicating from a source database while it is still running, in which case it is important that the original database be unaffected by any cloning procedure.

Why Would You Need to Clone a Database?

A cloned database cluster is useful in a number of scenarios:

  • Troubleshoot your cloned production cluster in the safety of your test environment while performing destructive operations on the database.
  • Patch/upgrade test of a cloned database to validate the upgrade process before applying it to the production cluster.
  • Validate backup & recovery of a production cluster using a cloned cluster.
  • Validate or test new applications on a cloned production cluster before deploying it on the live production cluster.
  • Quickly clone the database for audit or information compliance requirements for example by quarter or year end where the content of the database must not be changed.
  • A reporting database can be created at intervals in order to avoid data changes during the report generations.
  • Migrate a database to new servers, new deployment environment or a new data center.

When running your database infrastructure on the cloud, the cost of owning a host (shared or dedicated virtual machine) is significantly lower compared to the traditional way of renting space in a datacenter or owning a physical server. Furthermore, most of the cloud deployment can be automated easily via provider APIs, client software and scripting. Therefore, cloning a cluster can be a common way to duplicate your deployment environment for example, from dev to staging to production or vice versa.

We haven't seen this feature being offered by anyone in the market thus it is our privilege to showcase how it works with ClusterControl.

Cloning a MySQL Galera Cluster

One of the cool features in ClusterControl is it allows you to quickly clone, an existing MySQL Galera Cluster so you have an exact copy of the dataset on the other cluster. ClusterControl performs the cloning operation online, without any locking or bringing downtime to the existing cluster. It's like a cluster scale out operation except both clusters are independent to each other after the syncing completes. The cloned cluster does not necessarily need to be as the same cluster size as the existing one. We could start with one-node cluster, and scale it out with more database nodes at a later stage.

In this example, we are having a cluster called "Staging" that we would want to clone as another cluster called "Production". The premise is the staging cluster already stored the necessary data that is going to be in production soon. The production cluster consists of another 3 nodes, with production specs.

The following diagram summarizes final architecture of what we want to achieve:

How to Clone Your Database - ClusterControl

The first thing to do is to set up a passwordless SSH from ClusterControl server to the production servers. On ClusterControl server run the following:

$ whoami


$ ssh-copy-id root@prod1.local

$ ssh-copy-id root@prod2.local

$ ssh-copy-id root@prod3.local

Enter the root password of the target server if prompted.

From ClusterControl database cluster list, click on the Cluster Action button and choose Clone Cluster. The following wizard will appear:

Clone Cluster - ClusterControl

Specify the IP addresses or hostnames of the new cluster and make sure you get all the green tick icon next to the specified host. The green icon means ClusterControl is able to connect to the host via passwordless SSH. Click on the "Clone Cluster" button to start the deployment.

The deployment steps are:

  1. Create a new cluster consists of one node.
  2. Sync the new one-node cluster via SST. The donor is one of the source servers.
  3. The remaining new nodes will be joining the cluster after the donor of the cloned cluster is synced with the cluster.

Once done, a new MySQL Galera Cluster will be listed under ClusterControl cluster dashboard once the deployment job completes.

Note that the cluster cloning only clones the database servers and not the whole stack of the cluster. This means, other supporting components related to the cluster like load balancers, virtual IP address, Galera arbitrator or asynchronous slave are not going to be cloned by ClusterControl. Nevertheless, if you would like to clone as an exact copy of your existing database infrastructure, you can achieve that with ClusterControl by deploying those components separately after the database cloning operation completes.

Creating a Database Cluster from a Backup

Another similar feature offered by ClusterControl is "Create Cluster from Backup". This feature is introduced in ClusterControl 1.7.1, specifically for Galera Cluster and PostgreSQL clusters where one can create a new cluster from the existing backup. Contratory to cluster cloning, this operation does not bring additional load to the source cluster with a tradeoff of the cloned cluster will not be at the current state as the source cluster.

In order to create cluster from a backup, you must have a working backup created. For Galera Cluster, all backup methods are supported while for PostgreSQL, only pgbackrest is not supported for new cluster deployment. From ClusterControl, a backup can be created or scheduled easily under ClusterControl -> Backups -> Create Backup. From the list of the created backup, click on Restore backup, choose the backup from the list and choose to "Create Cluster from Backup" from the restoration option:

Restore Backup with ClusterControl

In this example, we are going to deploy a new PostgreSQL Streaming Replication cluster for staging environment, based on the existing backup we have in the production cluster. The following diagram illustrates the final architecture:

Database Backup Restoration with ClusterControl

The first thing to do is to set up a passwordless SSH from ClusterControl server to the production servers. On ClusterControl server run the following:

$ whoami


$ ssh-copy-id root@prod1.local

$ ssh-copy-id root@prod2.local

$ ssh-copy-id root@prod3.local

When you choose Create Cluster From Backup, ClusterControl will open a deployment wizard dialog to assist you on setting up the new cluster:

Create Cluster from Backup - ClusterControl

A new PostgreSQL Streaming Replication instance will be created from the selected backup, which will be used as the base dataset for the new cluster. The selected backup must be accessible from the nodes in the new cluster, or stored in the ClusterControl host. 

Clicking on "Continue" will open the standard database cluster deployment wizard:

Create Database Cluster from Backup - ClusterControl

Note that the root/admin user password for this cluster must the same as the PostgreSQL admin/root password as included in the backup. Follow the configuration wizard accordingly and ClusterControl then perform the deployment on the following order:

  1. Install necessary softwares and dependencies on all PostgreSQL nodes.
  2. Start the first node.
  3. Stream and restore backup on the first node.
  4. Configure and add the rest of the nodes.

Once done, a new PostgreSQL Replication Cluster will be listed under ClusterControl cluster dashboard once the deployment job completes.


ClusterControl allows you to clone or copy a database cluster to multiple environments with just a number of clicks. You can download it for free today. Happy cloning!

by ashraf at October 08, 2019 09:45 AM

October 07, 2019

MariaDB Foundation

MariaDB Server University Program

The demand for DBAs, developers and software engineers knowledgeable in MariaDB Server is high. The supply isn’t.
This is something we plan to fix, with the MariaDB Server University Program, for which we are now inviting universities to participate in, and users of MariaDB Server to sponsor. […]

The post MariaDB Server University Program appeared first on

by Kaj Arnö at October 07, 2019 01:16 PM

Press Release: MariaDB Server University Program Launch

Indonesia to Lead World Wide University Database Education Initiative 
Yogyakarta, Indonesia, 6 Sep 2019: MariaDB Foundation and APTISI (the Association of Private Higher Education Institutions Indonesia) collaborate to launch the MariaDB Server University Programme, providing free education material for universities across Indonesia and worldwide.  […]

The post Press Release: MariaDB Server University Program Launch appeared first on

by Kaj Arnö at October 07, 2019 01:14 PM


Tips for Storing PostgreSQL Backups on Google Cloud (GCP)

All companies nowadays have (or should have) a Disaster Recovery Plan (DRP) to prevent data loss in the case of failure; built according to an acceptable Recovery Point Objective (RPO) for the business.

A backup is a basic start in any DRP, but to guarantee the backup usability a single backup is just not enough. The best practice is to store the backup files in three different places, one stored locally on the database server (for faster recovery), another one in a centralized backup server, and the last one the cloud. For this last step, you should choose a stable and robust cloud provider to make sure your data is stored correctly and is accessible at any time.

In this blog, we will take a look at one of the most famous cloud providers, Google Cloud Platform (GCP) and how to use it to store your PostgreSQL backups in the cloud.

About Google Cloud

Google Cloud offers a wide range of products for your workload. Let’s look at some of them and how they are related to storing PostgreSQL backups in the cloud.

  • Cloud Storage: It allows for world-wide storage and retrieval of any amount of data at any time. You can use Cloud Storage for a range of scenarios including serving website content, storing data for archival and disaster recovery, or distributing large data objects to users via direct download.
  • Cloud SQL: It’s a fully managed database service that makes it easy to set up, maintain, manage, and administer your relational PostgreSQL, MySQL, and SQL Server databases in the cloud.
  • Compute Engine: It delivers virtual machines running in Google Cloud with support to scaling from single instances to global, load-balanced cloud computing. Compute Engine's VMs boot quickly, come with high-performance persistent and local disk options, and deliver consistent performance. 

Storing Backups on Google Cloud

If you’re running your PostgreSQL database on Google Cloud with Cloud SQL you can back it up directly from the Google Cloud Platform, however, it’s not necessary to run it here to store your PostgreSQL backups.

Google Cloud Platform

Google Cloud Storage

Similar to the well-known Amazon S3 product, if you’re not running your PostgreSQL database with Cloud SQL, this is the most commonly used option to store backups or files in Google Cloud. It’s accessible from the Google Cloud Platform, in the Getting Started section or under the Storage left menu. With Cloud Storage, you can even easily transfer your S3 content here using the Transfer feature.

How to Use Google Cloud Storage

First, you need to create a new Bucket to store your data, so go to Google Cloud Platform -> Storage -> Create Bucket

Name Your Bucket - Google Cloud

In the first step, you need to just add a new bucket name.

Choose Where to Store Your Data - Google Cloud

In the next step, you can specify the location type (multi-region by default) and the location place.

Choose Storage Class - Google Cloud

Then, you can change the storage class from standard (default option) to nearline or coldline.

Access - Google Cloud

And then, you can change the control access.

Advanced Setting - Google Cloud

Finally, you have some optional settings like encryption or retention policy.

Now you have your new bucket created, we will see how to use it.

Using the GSutil Tool

GSutil is a Python application that lets you access Cloud Storage from the command line. It allows you to perform different bucket and object management tasks. Let’s see how to install it on CentOS 7 and how to upload a backup using it.

Download Cloud SDK:

$ curl | bash

Restart your shell:

$ exec -l $SHELL

Run gcloud init and configure the tool:

$ gcloud init

This command will ask you to login to your Google Cloud account by accessing a URL and adding an authentication code.

Now you have the tool installed and configured, let’s upload a backup to the bucket.

First, let’s check our buckets created:

[root@PG1bkp ~]# gsutil ls


And to copy your PostgreSQL backup (or another file), run:

[root@PG1bkp ~]# gsutil cp /root/backups/BACKUP-3/base.tar.gz gs://pgbackups1/new_backup/

Copying file:///root/backups/BACKUP-3/base.tar.gz [Content-Type=application/x-tar]...

| [1 files][  4.9 MiB/ 4.9 MiB]

Operation completed over 1 objects/4.9 MiB.

The destination bucket must exist. 

And then, you can list the contents of the new_backup directory, to check the file uploaded:

[root@PG1bkp ~]# gsutil ls -r gs://pgbackups1/new_backup/*



For more information about the GSutil usage, you can check the official documentation.

Google Cloud SQL

If you want to centralize all the environment (database + backups) into Google Cloud, you have available this Cloud SQL product. In this way, you will have your PostgreSQL database running on Google Cloud and you can also manage the backups from the same platform. It’s accessible from the Google Cloud Platform, in the Getting started section or under the Storage left menu.

How to Use Google Cloud SQL

To create a new PostgreSQL instance, go to Google Cloud Platform -> SQL -> Create Instance

Google Cloud SQL - Create Instance

Here you can choose between MySQL and PostgreSQL as the database engine. For this blog, let’s create a PostgreSQL instance.

Google Cloud SQL - Instance Info

Now, you need to add an instance ID, password, location and PostgreSQL version (9.6 or 11).

Google Cloud SQL - Configuration Options

You have also some configuration options, like enable Public IP Address, Machine type and storage, and backups, etc. 

When the Cloud SQL instance is created, you can select it and you will see an overview of this new instance.

PostgreSQL on Google Cloud SQL

And you can go to the Backups section to manage your PostgreSQL backups. 

Google Cloud SQL Backups

To reduce storage costs, backups work incrementally. Each backup stores only the changes to your data since the previous backup.

Google Cloud Compute Engine

Similar to Amazon EC2, this way to store information in the cloud is more expensive and time-consuming than Cloud Storage, but you will have full control over the backup storage environment.  It’s also accessible from the Google Cloud Platform, in the Getting started section or under the Compute left menu.

How to Use a Google Cloud Compute Engine

To create a new virtual machine, go to Google Cloud Platform -> Compute Engine -> Create Instance

Google Cloud - Create Compute Instance

Here you need to add an instance name, region, and zone where to create it. Also, you need to specify the machine configuration according to your hardware and usage requirements, and the disk size and operating system to use for the new virtual machine. 

Google Cloud Compute Engine

When the instance is ready, you can store the backups here, for example, sending it via SSH or FTP using the external IP Address. Let’s look at an example with Rsync and another one with SCP Linux command.

To connect via SSH to the new virtual machine, make sure you have added your SSH key in the virtual machine configuration.

[root@PG1bkp ~]# rsync -avzP -e "ssh -i /home/sinsausti/.ssh/id_rsa" /root/backups/BACKUP-3/base.tar.gz sinsausti@

sending incremental file list


      5,155,420 100%    1.86MB/s 0:00:02 (xfr#1, to-chk=0/1)

sent 4,719,597 bytes  received 35 bytes 629,284.27 bytes/sec

total size is 5,155,420  speedup is 1.09

[root@PG1bkp ~]#

[root@PG1bkp ~]# scp -i /home/sinsausti/.ssh/id_rsa /root/backups/BACKUP-5/base.tar.gz sinsausti@

base.tar.gz                                                                                                                                                             100% 2905KB 968.2KB/s 00:03

[root@PG1bkp ~]#

You can easily embed this into a script to perform an automatic backup process or use this product with an external system like ClusterControl to manage your backups.

Managing Your Backups with ClusterControl

In the same way that you can centralize the management for both database and backup from the same platform by using Cloud SQL, you can use ClusterControl for several management tasks related to your PostgreSQL database.

ClusterControl is a comprehensive management system for open source databases that automates deployment and management functions, as well as health and performance monitoring. ClusterControl supports deployment, management, monitoring and scaling for different database technologies and environments. So, you can, for example, create our Virtual Machine instance on Google Cloud, and deploy/import our database service with ClusterControl.

ClusterControl PostgreSQL

Creating a Backup

For this task, go to ClusterControl -> Select Cluster -> Backup -> Create Backup.

ClusterControl - Create Backup

You can create a new backup or configure a scheduled one. For our example, we will create a single backup instantly.

ClusterControl - Choose Backup Method

You must choose one method, the server from which the backup will be taken, and where you want to store the backup. You can also upload our backup to the cloud (AWS, Google or Azure) by enabling the corresponding button.

ClusterControl - Backup Configuration

Then specify the use of compression, the compression level, encryption and retention period for your backup.

ClusterControl - Cloud Credentials for Backup

If you enabled the upload backup to the cloud option, you will see a section to specify the cloud provider (in this case Google Cloud) and the credentials (ClusterControl -> Integrations -> Cloud Providers). For Google Cloud, it uses Cloud Storage, so you must select a Bucket or even create a new one to store your backups.

ClusterControl Backup Management

On the backup section, you can see the progress of the backup, and information like method, size, location, and more.


Google Cloud may be a good option to store your PostgreSQL backups and it offers different products to make this. It’s not, however, necessary to have your PostgreSQL databases running there as you can use it only as a storage location. 

The GSutil tool is a nice product for managing your Cloud Storage data from the command line, easy-to-use and fast. 

You can also combine Google Cloud and ClusterControl to improve your PostgreSQL high availability environment and monitoring system. If you want to know more about PostgreSQL on Google Cloud you can check our deep dive blog post.

by Sebastian Insausti at October 07, 2019 09:45 AM

October 06, 2019

Valeriy Kravchuk

Dynamic Tracing of MySQL Server With perf probe - Basic Example

I am going to write a series of blog posts based on my talks and experiences at Percona Live Europe 2019. The first one would be a kind of extended comment for a couple of slides from the "Tracing and Profiling MySQL" talk.

We can surely wait until Performance Schema instruments every other line of code or at least every important stage and wait in every storage engine we care about, but there is no real need for that. If you run any version of MySQL under Linux with more or less recent kernel (anything newer than 4.1 is good enough, in general), you can easily use dynamic tracing for any application (at least if there is symbolic information for the binaries), any time. As Brendan Gregg put it here:
"One benefit of dynamic tracing is that it can be enabled on a live system without restarting anything. You can take an already-running kernel or application and then begin dynamic instrumentation, which (safely) patches instructions in memory to add instrumentation. That means there is zero overhead or tax for this feature until you begin using it. One moment your binary is running unmodified and at full speed, and the next, it's running some extra instrumentation instructions that you dynamically added. Those instructions should eventually be removed once you've finished using your session of dynamic tracing."
One of the ways to use dynamic tracing (that is supported for a long time) is a perf profiler and its probe command. In the simplest case that I am going to illustrate here, probe is defined for a function defined in the code and refers to it by name. You can refer to the name of local variable, function parameter, local data structure member in the probe etc, and record the values of them with other probe data.

For a simple example let me consider recent Percona Server 5.7.x running on recent Ubuntu 16.04 with kernel 4.4.x. Let's assume I want to trace all calls to the dispatch_command() function and record every query every connection processes that way.

Skipping the details for now, let's assume I've found out (with gdb in my case, but it can be code review as well) that when this function is called I can see the query user wants to execute in the com_data structure passed via a pointer to the function:
(gdb) p com_data->com_query.query
$4 = 0x7fb0dba8d021 "select 2"
Based on this information and having -dbg package also installed for Percona Server I can add a probe dynamically any time using the following simple command (--add option is assumed by default):
openxs@ao756:~$ sudo perf probe -x /usr/sbin/mysqld 'dispatch_command com_data->com_query.query:string'
Added new event:
  probe_mysqld:dispatch_command (on dispatch_command in /usr/sbin/mysqld with query=com_data->com_query.query:string)

You can now use it in all perf tools, such as:

        perf record -e probe_mysqld:dispatch_command -aR sleep 1
In this probe I refer to the specific binary with -x option and full path name, and the function in that binary by name, and I say that I'd like to record the value of com_data->com_query.query as a zero-terminated string. Now I can use any variation of perf record command (with -F option to define sampling frequency, -g option to capture stack traces etc, see more here) and my probe will be one of the events captured.

For this simple example of tracing I'll use -e option to capture only the events related to the probe I defined. Probe name for this simple case by default consists of the binary name, colon (':') separator and function name. I'll use -R option to collect raw sample records and . I've also added -a option to collect samples on all CPUs. You can see the hint for possible command in the output above. 

So, I can record related events with default frequency as follows:
openxs@ao756:~$ sudo perf record -e 'probe_mysqld:dispatch_command*' -aR
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.676 MB (3 samples) ]
I let it work for some time in the foreground and then pressed Ctrl-C to stop collecting. Now I can check raw sample records with perf script command:
openxs@ao756:~$ sudo perf script >/tmp/queries.txt
openxs@ao756:~$ cat /tmp/queries.txt
          mysqld 31340 [001]  3888.849079: probe_mysqld:dispatch_command: (be9250) query="select 100"
          mysqld 31340 [001]  3891.648739: probe_mysqld:dispatch_command: (be9250) query="select user, host from mysql.user"
          mysqld 31340 [001]  3895.890141: probe_mysqld:dispatch_command: (be9250) query="select 2"
This is the detailed trace, with additional information (exact text of the query executed) added as requested. Output also included PID of the binary, CPU the sample was taken from and a timestamp.

When I am done with tracing, I can delete the probe with --del option referring it by name:
openxs@ao756:~$ sudo perf probe --del dispatch_commandRemoved event: probe_mysqld:dispatch_command
The (small, more on that later) overhead for tracing was added dynamically, only for the exact information I needed and only for the period of tracing. After the dynamic probe is removed we have exactly the same binary as originally started running with zero extra overhead. Now do this with Performance Schema :)

* * *
Slides are available at

More details on the way other tools mentioned during the talk can be used by MySQL DBAs are coming soon in this blog. Stay tuned!

by Valerii Kravchuk ( at October 06, 2019 05:59 PM

October 03, 2019


Tips for Storing MongoDB Backups in the Cloud

When it comes to backups and data archiving, IT departments are under pressure to meet stricter service level agreements, deliver more custom reports, and adhere to expanding compliance requirements while continuing to manage daily archive and backup tasks.  With no doubt, database server stores some of your enterprise’s most valuable information. Guaranteeing reliable database backups to prevent data loss in the event of an accident or hardware failure is a critical checkbox.

But how to make it truly DR when all of your data is in the single data center or even data centers that are in the near geolocation? Moreover, whether it is a 24x7 highly loaded server or a low-transaction-volume environment, you will be in the need of making backups a seamless procedure without disrupting the performance of the server in a production environment.

In this blog, we are going to review MongoDB backup to the cloud. The cloud has changed the data backup industry. Because of its affordable price point, smaller businesses have an offsite solution that backs up all of their data.

We will show you how to perform safe MongoDB backups using mongo services as well as other methods that you can use to extend your database disaster recovery options.

If your server or backup destination is located in an exposed infrastructure like a public cloud, hosting provider or connected through an untrusted WAN network, you need to think about additional actions in your backup policy. There are a few different ways to perform database backups for MongoDB, and depending on the type of backup, recovery time, size, and infrastructure options will vary. Since many of the cloud storage solutions are simply storage with different API front ends, any backup solution can be performed with a bit of scripting. So what are the options we have to make the process smooth and secure?

MongoDB Backup Encryption

Security should be in the center of every action IT teams do. It is always a good idea to enforce encryption to enhance the security of backup data. A simple use case to implement encryption is where you want to push the backup to offsite backup storage located in the public cloud.

When creating an encrypted backup, one thing to keep in mind is that it usually takes more time to recover. The backup has to be decrypted before any recovery activities. With a big dataset, this could introduce some delays to the RTO.

On the other hand, if you are using the private keys for encryption, make sure to store the key in a safe place. If the private key is missing, the backup will be useless and unrecoverable. If the key is stolen, all created backups that use the same key would be compromised as they are no longer secured. You can use popular GnuPG or OpenSSL to generate private or public keys.

To perform MongoDBdump encryption using GnuPG, generate a private key and follow the wizard accordingly:

$ gpg --gen-key

Create a plain MongoDBdump backup as usual:

$ mongodump –db db1 –gzip –archive=/tmp/db1.tar.gz
Encrypt the dump file and remove the older plain backup:
$ gpg --encrypt -r ‘’ db1.tar.gz

$ rm -f db1.tar.gz
GnuPG will automatically append .gpg extension on the encrypted file. To decrypt,

simply run the gpg command with --decrypt flag:

$ gpg --output db1.tar.gz --decrypt db1.tar.gz.gpg
To create an encrypted MongoDBdump using OpenSSL, one has to generate a private key and a public key:
OpenSSL req -x509 -nodes -newkey rsa:2048 -keyout dump.priv.pem -out

This private key (dump.priv.pem) must be kept in a safe place for future decryption. For Mongodump, an encrypted backup can be created by piping the content to openssl, for example

mongodump –db db1 –gzip –archive=/tmp/db1.tar.gz | openssl smime -encrypt -binary -text -aes256

-out database.sql.enc -outform DER
To decrypt, simply use the private key (dump.priv.pem) alongside the -decrypt flag:

openssl smime -decrypt -in database.sql.enc -binary -inform

DEM -inkey dump.priv.pem -out db1.tar.gz

MongoDB Backup Compression

Within the database cloud backup world, compression is one of your best friends. It can not only save storage space, but it can also significantly reduce the time required to download/upload data.

In addition to archiving, we’ve also added support for compression using gzip. This is exposed by the introduction of a new command-line option “--gzip” in both mongodump and mongorestore. Compression works both for backups created using the directory and the archive mode and reduces disk space usage.

Normally, MongoDB dump can have the best compression rates as it is a flat text file. Depending on the compression tool and ratio, a compressed MongoDBdump can be up to 6 times smaller than the original backup size. To compress the backup, you can pipe the MongoDBdump output to a compression tool and redirect it to a destination file

Having a compressed backup could save you up to 50% of the original backup size, depending on the dataset. 

mongodump --db country --gzip --archive=country.archive

Limiting Network Throughput

A great option for cloud backups is to limit network streaming bandwidth (Mb/s) when doing a backup. You can achieve that with pv tool. The pv utility comes with data modifiers option -L RATE, --rate-limit RATE which limit the transfer to a maximum of RATE bytes per second. Below example will restrict it to 2MB/s.

$ pv -q -L 2m

Transferring MongoDB Backups to the Cloud

Now when your backup is compressed and secured (encrypted), it is ready for transfer.

Google Cloud

The gsutil command-line tool is used to manage, monitor and use your storage buckets on Google Cloud Storage. If you already installed the gcloud util, you already have the gsutil installed. Otherwise, follow the instructions for your Linux distribution from here.

To install the gcloud CLI you can follow the below procedure:

curl | bash
Restart your shell:
exec -l $SHELL
Run gcloud init to initialize the gcloud environment:
gcloud init
With the gsutil command line tool installed and authenticated, create a regional storage bucket named MongoDB-backups-storage in your current project.
gsutil mb -c regional -l europe-west1 gs://severalnines-storage/

Creating gs://MongoDB-backups-storage/

Amazon S3

If you are not using RDS to host your databases, it is very probable that you are doing your own backups. Amazon’s AWS platform, S3 (Amazon Simple Storage Service) is a data storage service that can be used to store database backups or other business-critical files. Either it’s Amazon EC2 instance or your on-prem environment you can use the service to secure your data.

While backups can be uploaded through the web interface, the dedicated s3 command line interface can be used to do it from the command line and through backup automation scripts. If backups are to be kept for a very long time, and recovery time isn’t a concern, backups can be transferred to Amazon Glacier service, providing much cheaper long-term storage. Files (amazon objects) are logically stored in a huge flat container named bucket. S3 presents a REST interface to its internals. You can use this API to perform CRUD operations on buckets and objects, as well as to change permissions and configurations on both.

The primary distribution method for the AWS CLI on Linux, Windows, and macOS is pip, a package manager for Python. Instructions can be found here.

aws s3 cp severalnines.sql s3://severalnine-sbucket/MongoDB_backups
By default, S3 provides eleven 9s object durability. It means that if you store (1 billion) objects into it, you can expect to lose 1 object every 10 years on average. The way S3 achieves that an impressive number of 9s is by replicating the object automatically in multiple Availability Zones, which we’ll talk about in another post. Amazon has regional data centers all around the world.

Microsoft Azure Storage

Microsoft’s public cloud platform, Azure, has storage options with its control line interface. Information can be found here. The open-source, cross-platform Azure CLI provides a set of commands for working with the Azure platform. It gives much of the functionality seen in the Azure portal, including rich data access.

The installation of Azure CLI is fairly simple, you can find instructions here. Below you can find how to transfer your backup to Microsoft storage.

az storage blob upload --container-name severalnines --file severalnines.gz.tar --name severalnines_backup

Hybrid Storage for MongoDB Backups

With the growing public and private cloud storage industry, we have a new category called hybrid storage. The typical approach is to keep data on local disk drives for a shorter period while cloud backup storage would be held for a longer time. Many times the requirement for longer backup retention comes from legal obligations for different industries (like telecoms having to store connection metadata).This technology allows the files to be stored locally, with changes automatically synced to remote in the cloud. Such an approach is coming from the need of having recent backups stored locally for fast restore (lower RTO), as well as business continuity objectives.

The important aspect of efficient resource usage is to have separate backup retentions. Data that is stored locally, on redundant disk drives would be kept for a shorter period while cloud backup storage would be held for a longer time. Many times the requirement for longer backup retention comes from legal obligations for different industries (like telecoms having to store connection metadata).

Cloud providers like Google Cloud Services, Microsoft Azure and Amazon S3 each offer virtually unlimited storage, decreasing local space needs. It allows you to retain your backup files longer, for as long as you would like and not have concerns around local disk space.

ClusterControl Backup Management - Hybrid Storage

When scheduling backup with ClusterControl, each of the backup methods are configurable with a set of options on how you want the backup to be executed. The most important for the hybrid cloud storage would be:

  • Network throttling
  • Encryption with the built-in key management
  • Compression
  • The retention period for the local backups
  • The retention period for the cloud backups
ClusterControl Backup and Restore Bartłomiej Oleś Bartłomiej Oleś 9:06 AM Today ClusterControl Encryption

ClusterControl advanced backup features for cloud, parallel compression, network bandwidth limit, encryption, etc. Your company can take advantage of cloud scalability and pay-as-you-go pricing for growing storage needs. You can design a backup strategy to provide both local copies in the datacenter for immediate restoration, and a seamless gateway to cloud storage services from AWS, Google and Azure.

ClusterControl with upload to backup
ClusterControl Encryption
ClusterControl Encryption

Advanced TLS and AES 256-bit encryption and compression features support secure backups that take up significantly less space in the cloud.

by Bart Oles at October 03, 2019 05:29 PM

October 02, 2019

Federico Razzoli

Case sensitivity in MySQL and MariaDB queries

Maybe you’re wondering why in MySQL/MariaDB 'string' seems to be the same as 'STRING'. Or maybe that’s not the case for you, but you would like to make a case insensitive search. This article explains how to write a case ...

by Federico Razzoli at October 02, 2019 11:30 AM


Creating a PostgreSQL Replication Setup on Debian / Ubuntu

PostgreSQL can work separately on multiple machines with the same data structure, making the persistence layer of the application more resilient and prepared for some unexpected event that might compromise the continuity of the service.

The idea behind this is to improve the system response time by distributing the requests in a “Round Robin” network where each node present is a cluster. In this type of setup it is not important as to which one the requests will be delivered to be processed, as the response would always be the same.

In this blog, we will explain how to replicate a PostgreSQL cluster using the tools provided in the program installation. The version used is PostgreSQL 11.5, the current stable,  generally-available version for the operating system Debian Buster. For the examples in this blog it is assumed that you are already familiar with Linux.

PostgreSQL Programs

Inside the directory /usr/bin/ is the program responsible for managing the cluster.

# 1. Lists the files contained in the directory
# 2. Filters the elements that contain 'pg_' in the name
ls /usr/bin/ | grep pg_

Activities conducted through these programs can be performed sequentially, or even in combination with other programs. Running a block of these activities through a single command is possible thanks to a Linux program found in the same directory, called make.

To list the clusters present use the pg_lsclusters program. You can also use make to run it. Its work depends on a file named Makefile, which needs to be in the same directory where the command will run.

# 1. The current directory is checked

# 2. Creates a directory
mkdir ~/Documents/Severalnines/

# 3. Enroute to the chosen directory
cd ~/Documents/Severalnines/

# 4. Create the file Makefile
touch Makefile

# 5. Open the file for editing

The definition of a block is shown below, having as its name ls, and a single program to be run, pg_lsclusters.

# 1. Block name
# 2. Program to be executed

The file Makefile can contain multiple blocks, and each can run as many programs as you need, and even receive parameters. It is imperative that the lines belonging to a block of execution are correct, using tabulations for indenting instead of spaces.

The use of make to run the pg_lsclusters program is accomplished by using the make ls command.

# 1. Executes pg_lsclusters
make ls

The result obtained in a recent PostgreSQL installation brings a single cluster called main, allocated on port 5432 of the operating system. When the pg_createcluster program is used, a new port is allocated to the new cluster created, having the value 5432 as the starting point, until another is found in ascending order.

Write Ahead Logging (WAL)

This replication procedure consists of making a backup of a working cluster which is continuing to receive updates. If this is done on the same machine, however, many of the benefits brought by this technique are lost.

Scaling a system horizontally ensures greater availability of the service, as if any hardware problems occur, it wouldn’t make much difference as there are other machines ready to take on the workload.

WAL is the term used to represent an internal complex algorithm to PostgreSQL that ensures the integrity of the transactions that are made on the system. However, only a single cluster must have the responsibility to access it with write permission.

The architecture now has three distinct types of clusters:

  1. A primary with responsibility for writing to WAL;
  2. A replica ready to take over the primary post;
  3. Miscellaneous other replicas with WAL reading duty.

Write operations are any activities that are intended to modify the data structure, either by entering new elements, or updating and deleting existing records.

PostgreSQL Cluster Configuration

Each cluster has two directories, one containing its configuration files and another with the transaction logs. These are located in /etc/postgresql/11/$(cluster) and /var/lib/postgresql/11/$(cluster), respectively (where $(cluster) is the name of the cluster).

The file postgresql.conf is created immediately after the cluster has been created by running the program pg_createcluster, and the properties can be modified for the customization of a cluster.

Editing this file directly is not recommended because it contains almost all properties. Their values have been commented out, having the symbol # at the beginning of each line, and several other lines commented out containing instructions for changing the property values.

Adding another file containing the desired changes is possible, simply edit a single property named include, replacing the default value #include = ‘’ with include = ‘postgresql.replication.conf’.

Before you start the cluster, you need the presence of the file postgresql.replication.conf in the same directory where you find the original configuration file, called postgresql.conf.

# 1. Block name
# 2. Creates the cluster
pg_createcluster 11 $(cluster) -- --data-checksums
# 3. Copies the file to the directory
cp postgresql.replication.conf /etc/postgresql/11/$(cluster)/
# 4. A value is assigned to the property
sed -i "s|^#include = ''|include = 'postgresql.replication.conf'|g" /etc/postgresql/11/$(cluster)/postgresql.conf

The use of --data-checksums in the creation of the cluster adds a greater level of integrity to the data, costing a bit of performance but being very important in order to avoid corruption of the files when transferred from one cluster to another.

The procedures described above can be reused for other clusters, simply passing a value to $(cluster) as a parameter in the execution of the program make.

# 1. Executes the block 'create' by passing a parameter
sudo make create cluster=primary

Now that a brief automation of the tasks has been established, what remains to be done is the definition of the file postgresql.replication.conf according to the need for each cluster.

Replication on PostgreSQL

Two ways to replicate a cluster are possible, one being complete the other involving the entire cluster (called Streaming Replication) and another could partial or complete (called Logical Replication).

The settings that must be specified for a cluster fall into four main categories:

  • Master Server
  • Standby Servers
  • Sending Servers
  • Subscribers

As we saw earlier, WAL is a file that contains the transactions that are made on the cluster, and the replication is the transmission of these files from one cluster to another.

Inside the settings present in the file postgresql.conf, we can see properties that define the behavior of the cluster in relation to the WAL files, such as the size of those files.

# default values
max_wal_size = 1GB
min_wal_size = 80MB

Another important property called max_wal_senders. Belonging to a cluster with characteristic Sending Servers, is the amount of processes responsible for sending these files to other clusters, having to always a value more than the number of clusters that depend on their receipt.

WAL files can be stored for transmission to a cluster that connects late, or that has had some problems in receiving it, and need previous files in relation to the current time, having the property wal_keep_segments as the specification for how many WAL file segments are to be maintained by a cluster.

A Replication Slot is a functionality that allows the cluster to store WAL files needed to provide another cluster with all the records, having the max_replication_slots option as its property.

# default values
max_wal_senders = 10
wal_keep_segments = 0
max_replication_slots = 10

When the intention is to outsource the storage of these WAL files, another method of processing these files can be used, called Continuous Archiving.

Continuous Archiving

This concept allows you to direct the WAL files to a specific location, using a Linux program, and two variables representing the path of the file, and its name, such as %p, and %f, respectively.

This property is disabled by default, but its use can be easily implemented by withdrawing the responsibility of a cluster from storing such important files, and can be added to the file postgresql.replication.conf.

# 1. Creates a directory
mkdir ~/Documents/Severalnines/Archiving

# 2. Implementation on postgresql.replication.conf
archive_mode = on
archive_command = 'cp %p ~/Documents/Severalnines/Archiving/%f'

# 3. Starts the cluster
sudo systemctl start postgresql@11-primary

After the cluster initialization, some properties might need to be modified, and a cluster restart could be required. However, some properties can only be reloaded, without the need for a full reboot of a cluster.

Information on such subjects can be obtained through the comments present in the file postgresql.conf, appearing as # (note: change requires restart).

If this is the case, a simple way to resolve is with the Linux program systemctl, used previously to start the cluster, having only to override the option to restart.

When a full reboot is not required, the cluster itself can reassign its properties through a query run within itself, however, if multiple clusters are running on the same machine, it will be required to pass a parameter containing the port value that the cluster is allocated on the operating system.

# Reload without restarting
sudo -H -u postgres psql -c ‘SELECT pg_reload_conf();’ -p 5433

In the example above, the property archive_mode requires a reboot, while archive_command does not. After this brief introduction to this subject, let’s look at how a replica cluster can backup these archived WAL files, using Point In Time Recovery (PITR).

PostgreSQL Replication Point-In-Time Recovery

This suggestive name allows a cluster to go back to its state from a certain period in time. This is done through a property called recovery_target_timeline, which expects to receive a value in date format, such as 2019-08-22 12:05 GMT, or the assignment latest, informing the need for a recovery up to the last existing record.

The program pg_basebackup when it runs, makes a copy of a directory containing the data from a cluster to another location. This program tends to receive multiple parameters, being one of them -R, which creates a file named recovery.conf within the copied directory, which in turn is not the same as that contains the other configuration files previously seen, such as postgresql.conf.

The file recovery.conf stores the parameters passed in the execution of the program pg_basebackup, and its existence is essential to the Streaming Replication implementation, because it is within it that the reverse operation to the Continuous Archiving can be performed.

# 1. Block name
# 2. Removes the current data directory
rm -rf /var/lib/postgresql/11/$(replica)
# 3. Connects to primary cluster as user postgres
# 4. Copies the entire data directory
# 5. Creates the file recovery.conf
pg_basebackup -U postgres -d postgresql://localhost:$(primaryPort) -D /var/lib/postgresql/11/$(replica) -P -R
# 6. Inserts the restore_command property and its value
echo "restore_command = 'cp ~/Documents/Severalnines/Archiving/%f %p'" >> /var/lib/postgresql/11/$(replica)/recovery.conf
# 7. The same is done with recovery_target_timeline
echo "recovery_target_timeline = 'latest'" >> /var/lib/postgresql/11/$(replica)/recovery.conf

This replicate block specified above needs to be run by the operating system’s postgres user, in order to avoid potential conflicts with who is the owner of the cluster data, postgres, or the user root.

The replica cluster is still standing, basting it to successfully start the replication, having the replica cluster process called pg_walreceiver interacting with the primary cluster called pg_walsender over a TCP connection.

# 1. Executes the block ‘replicate’ by passing two parameters
sudo -H -u postgres make replicate replica=replica primaryPort=5433
# 2. Starts the cluster replica
sudo systemctl start postgresql@11-replica

Verification of the health of this replication model, called Streaming Replication, is performed by a query that is run on the primary cluster.

# 1. Checks the Streaming Replication created
sudo -H -u postgres psql -x -c ‘select * from pg_stat_replication;’ -p 5433


In this blog, we showed how to setup asynchronous Streaming Replication between two PostgreSQL clusters. Remember though, vulnerabilities exist in the code above, for example, using the postgres user to do such a task is not recommended.

The replication of a cluster provides several benefits when it is used in the correct way and has easy access to the APIs that come to interact with the clusters.