Planet MariaDB

October 17, 2019


Deploying MySQL Galera Cluster 4.0 onto Amazon AWS EC2

Galera Cluster is one of the most popular high availability solutions for MySQL. It is a virtually synchronous cluster, which helps to keep the replication lag under control. Thanks to the flow control, Galera cluster can throttle itself and allow more loaded nodes to catch up with the rest of the cluster. Recent release of Galera 4 brought new features and improvements. We covered them in blog post talking about MariaDB 10.4 Galera Cluster and a blog post discussing existing and upcoming features of Galera 4.

How does Galera 4 fares when used in Amazon EC2? As you probably know, Amazon offers Relational Database Services, which are designed to provide users with an easy way to deploy highly available MySQL database. My colleague, Ashraf Sharif, compared failover times for RDS MySQL and RDS Aurora in his blog post. Failover times for Aurora looks really great but there are buts. First of all, you are forced to use RDS. You cannot deploy Aurora on the instances you manage. If the existing features and options available in Aurora are not enough for you, you do not have any other option but to deploy something on your own. Here enters Galera. Galera, unlike Aurora, is not a proprietary black box. Contrary, it is an open source software, which can be used freely on all supported environments. You can install Galera Cluster on AWS Elastic Computing Cloud (EC2) and, through that, build a highly available environment where failover is almost instant: as soon as you can detect node’s failure, you can reconnect to the other Galera node. How does one deploy Galera 4 in EC2? In this blog post we will take a look at it and we will provide you with step-by-step guide showing what is the simplest way of accomplishing that.

Deploying a Galera 4 Cluster on EC2

First step is to create an environment which we will use for our Galera cluster. We will go with Ubuntu 18.04 LTS virtual machines.

Deploying a Galera 4 Cluster on EC2

We will go with t2.medium instance size for the purpose of this blog post. You should scale your instances based on the expected load.

Deploying a Galera 4 Cluster on EC2

We are going to deploy three nodes in the cluster. Why three? We have a blog that explains how Galera maintains high availability.

Deploying a Galera 4 Cluster on EC2

We are going to configure storage for those instances.

Deploying a Galera 4 Cluster on EC2

We will also pick proper security group for the nodes. Again, in our case security group is quite open. You should ensure the access is limited as much as possible - only nodes which have to access databases should be allowed to connect to them.

Deploying a Galera 4 Cluster on EC2

Deploying a Galera 4 Cluster on EC2

Finally, we either pick an existing key par or create a new one. After this step our three instances will be launched.

Deploying a Galera 4 Cluster on EC2

Once they are up, we can connect to them via SSH and start configuring the database.

We decided to go with ‘node1, node2, node3’ naming convention therefore we had to edit /etc/hosts on all nodes and list them alongside their respective local IP’s. We also made the change in /etc/hostname to use the new name for nodes. When this is done, we can start setting up our Galera cluster. At the time of writing only vendor that provides GA version of Galera 4 is MariaDB with its 10.4 therefore we are going to use MariaDB 10.4 for our cluster. We are going to proceed with the installation using the suggestions and guides from the MariaDB website.

Deploying a MariaDB 10.4 Galera Cluster

We will start with preparing repositories:


bash ./mariadb_repo_setup

We downloaded script which is intended to set up the repositories and we ran it to make sure everything is set up properly. This configured repositories to use the latest MariaDB version, which, at the time of writing, is 10.4.

root@node1:~# apt update

Hit:1 bionic InRelease

Hit:2 bionic-updates InRelease

Hit:3 bionic-backports InRelease

Hit:4 bionic InRelease

Ign:5 bionic InRelease

Hit:6 bionic InRelease

Hit:7 bionic Release

Hit:8 bionic-security InRelease

Reading package lists... Done

Building dependency tree

Reading state information... Done

4 packages can be upgraded. Run 'apt list --upgradable' to see them.

As you can see, repositories for MariaDB 10.4 and MaxScale 2.4 have been configured. Now we can proceed and install MariaDB. We will do it step by step, node by node. MariaDB provides guide on how you should install and configure the cluster.

We need to install packages:

apt-get install mariadb-server mariadb-client galera-4 mariadb-backup

This command installs all required packages for MariaDB 10.4 Galera to run. MariaDB creates a set of configuration files. We will add a new one, which would contain all the required settings. By default it will be included at the end of the configuration file so all previous settings for the variables we set will be overwritten. Ideally, afterwards, you would edit existing configuration files to remove settings we put in the galera.cnf to avoid confusion where given setting is configured.

root@node1:~# cat /etc/mysql/conf.d/galera.cnf






# Galera cluster configuration




wsrep_cluster_name="Galera4 cluster"



# Cluster node configuration



When configuration is ready, we can start.

root@node1:~# galera_new_cluster

This should bootstrap the new cluster on the first node. Next we should proceed with similar steps on remaining nodes: install required packages and configure them keeping in mind that the local IP changes so we have to change the galera.cnf file accordingly.

When the configuration files are ready, we have to create a user which will be used for the Snapshot State Transfer (SST):

MariaDB [(none)]> CREATE USER 'sstuser'@'localhost' IDENTIFIED BY 'pa55';

Query OK, 0 rows affected (0.022 sec)

MariaDB [(none)]> GRANT PROCESS, RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost';

Query OK, 0 rows affected (0.022 sec)

We should do that on the first node. Remaining nodes will join the cluster and they will receive full state snapshot so the user will be transferred to them. Now the only thing we have to do is to start the remaining nodes:

root@node2:~# service mysql start

root@node3:~# service mysql start

and verify that cluster indeed has been formed:

MariaDB [(none)]> show global status like 'wsrep_cluster_size';


| Variable_name      | Value |


| wsrep_cluster_size | 3     |


1 row in set (0.001 sec)

All is good, the cluster is up and it consists of three Galera nodes. We managed to deploy MariaDB 10.4 Galera Cluster on Amazon EC2.


by krzysztof at October 17, 2019 05:41 PM

October 16, 2019


Announcing the Beta Launch of Backup Ninja

Backup Ninja Logo

Severalnines is excited to launch our newest product Backup Ninja. Currently in beta, Backup Ninja is a simple, secure, and cost-effective SaaS service you can use to backup the world’s most popular open source databases; locally or in the cloud. It easily connects to your database server through the “bartender” agent, allowing the service to manage the storage of fully-encrypted backups locally or on the cloud storage provider of your choosing.

Backup Ninja Dashboard

With Backup Ninja you can backup your databases locally, in the cloud, or in a combination of multiple locations to ensure there is always a good backup should disaster strike. It lets you go from homegrown, custom scripts that need upkeep to 'scriptless' peace-of-mind in minutes. It helps keep your data safe from data corruption on your production server, or from malicious ransomware attacks.

Backup Ninja Beta

Because we are still in the early phases, using Backup Ninja is free at this time. You will, however, be able to transfer it to a paid account once we’re ready to begin charging. This means you can use Backup Ninja to power your database backups at no charge and easily transition to a paid plan, with no obligation, if you choose.

How to Test Backup Ninja

As this is a new product you will undoubtedly encounter bugs in your travels. We encourage our first wave of users to “poke” the product and let us know how it performed.

Our main goal with this beta launch is to validate that we are able to provide a service that solves a real problem for users who currently maintain their own backup scripts and tools. 

Here are the key things we are hoping you will help us test…

  • Register for the service
  • Verify your account via link from the welcome email
  • Install our agent w/o issues
  • Add one or more DB servers that needs to be backed up
  • Create a backup schedule which stores backups locally (on the server)
  • Create a backup schedule which stores backups locally and on their favorite cloud provider (multiple locations)
  • Be able to be up and running within 10 minutes of registering with our service

Other things to do...

  • Edit, start and resume backup configurations
  • Upgrade agents
  • Delete / uninstall agent(s)
  • Re-install agent(s)
  • Change / reset your password
  • Add servers 
  • Remove servers

Where to Report Your Findings

We have created a simple Google Form for you to log your issues which we will then transfer into our systems. We encourage you to share your test results, report any bugs, or just give us feedback on what we could do to make Backup Ninja even better!

The Next Steps

Severalnines is continuing to add new features, functions and databases to the platform. Coupled with your feedback, we plan to emerge from beta with a product that will allow you to build a quick and simple backup management plan to ensure you are protected should your databases become unavailable.

Join the Beta!


by fwlymburner at October 16, 2019 03:45 PM

Federico Razzoli

Foreign Key bugs in MySQL and MariaDB

Foreign keys are a controversial topic. MySQL and MariaDB implementation has several bugs and limitations, that are discussed here.

by Federico Razzoli at October 16, 2019 11:17 AM

October 15, 2019


An Overview of Various Auxiliary Plan Nodes in PostgreSQL

All modern database system supports a Query Optimizer module to automatically identify the most efficient strategy for executing the SQL queries. The efficient strategy is called “Plan” and it is measured in terms of cost which is directly proportional to “Query Execution/Response Time”.  The plan is represented in the form of a tree output from the Query Optimizer. The plan tree nodes can be majorly divided into the following 3 categories:

  • Scan Nodes: As explained in my previous blog “An Overview of the Various Scan Methods in PostgreSQL”, it indicates the way a base table data needs to be fetched.
  • Join Nodes: As explained in my previous blog “An Overview of the JOIN Methods in PostgreSQL”, it indicates how two tables need to be joined together to get the result of two tables.
  • Materialization Nodes: Also called as Auxiliary nodes. The previous two kinds of nodes were related to how to fetch data from a base table and how to join data retrieved from two tables. The nodes in this category are applied on top of data retrieved in order to further analyze or prepare report, etc e.g. Sorting the data, aggregate of data, etc.

Consider a simple query example such as...

SELECT * FROM TBL1, TBL2 where TBL1.ID > TBL2.ID order by TBL.ID;

Suppose a plan generated corresponding to the query as below:

So here one auxiliary node “Sort” is added on top of the result of join to sort the data in the required order.

Some of the auxiliary nodes generated by the PostgreSQL query optimizer are as below:

  • Sort
  • Aggregate
  • Group By Aggregate
  • Limit
  • Unique
  • LockRows
  • SetOp

Let’s understand each one of these nodes.


As the name suggests, this node is added as part of a plan tree whenever there is a need for sorted data. Sorted data can be required explicitly or implicitly like below two cases:

The user scenario requires sorted data as output. In this case, Sort node can be on top of whole data retrieval including all other processing.

postgres=# CREATE TABLE demotable (num numeric, id int);


postgres=# INSERT INTO demotable SELECT random() * 1000, generate_series(1, 10000);

INSERT 0 10000

postgres=# analyze;


postgres=# explain select * from demotable order by num;

                           QUERY PLAN


 Sort  (cost=819.39..844.39 rows=10000 width=15)

   Sort Key: num

   ->  Seq Scan on demotable  (cost=0.00..155.00 rows=10000 width=15)

(3 rows)

Note: Even though the user required final output in sorted order, Sort node may not be added in the final plan if there is an index on the corresponding table and sorting column. In this case, it may choose index scan which will result in implicitly sorted order of data. For example, let’s create an index on the above example and see the result:

postgres=# CREATE INDEX demoidx ON demotable(num);


postgres=# explain select * from demotable order by num;

                                QUERY PLAN


 Index Scan using demoidx on demotable  (cost=0.29..534.28 rows=10000 width=15)

(1 row)

As explained in my previous blog An Overview of the JOIN Methods in PostgreSQL, Merge Join requires both table data to be sorted before joining. So it may happen that Merge Join found to be cheaper than any other join method even with an additional cost of sorting. So in this case, Sort node will be added between join and scan method of the table so that sorted records can be passed on to the join method.

postgres=# create table demo1(id int, id2 int);


postgres=# insert into demo1 values(generate_series(1,1000), generate_series(1,1000));

INSERT 0 1000

postgres=# create table demo2(id int, id2 int);


postgres=# create index demoidx2 on demo2(id);


postgres=# insert into demo2 values(generate_series(1,100000), generate_series(1,100000));

INSERT 0 100000

postgres=# analyze;


postgres=# explain select * from demo1, demo2 where;

                                  QUERY PLAN


 Merge Join  (cost=65.18..109.82 rows=1000 width=16)

   Merge Cond: ( =

   ->  Index Scan using demoidx2 on demo2  (cost=0.29..3050.29 rows=100000 width=8)

   ->  Sort  (cost=64.83..67.33 rows=1000 width=8)

      Sort Key:

      ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=8)

(6 rows)


Aggregate node gets added as part of a plan tree if there is an aggregate function used to compute single results from multiple input rows. Some of the aggregate functions used are COUNT, SUM, AVG (AVERAGE), MAX (MAXIMUM) and MIN (MINIMUM).

An aggregate node can come on top of a base relation scan or (and) on join of relations. Example:

postgres=# explain select count(*) from demo1;

                       QUERY PLAN


 Aggregate  (cost=17.50..17.51 rows=1 width=8)

   ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=0)

(2 rows)

postgres=# explain select sum( from demo1, demo2 where;

                                       QUERY PLAN


 Aggregate  (cost=112.32..112.33 rows=1 width=8)

   ->  Merge Join  (cost=65.18..109.82 rows=1000 width=4)

      Merge Cond: ( =

      ->  Index Only Scan using demoidx2 on demo2  (cost=0.29..3050.29 rows=100000 width=4)

      ->  Sort  (cost=64.83..67.33 rows=1000 width=4)

            Sort Key:

            ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=4)

HashAggregate / GroupAggregate

These kinds of nodes are extensions of the “Aggregate” node. If aggregate functions are used to combine multiple input rows as per their group, then these kinds of nodes are added to a plan tree. So if the query has any aggregate function used and along with that there is a GROUP BY clause in the query, then either HashAggregate or GroupAggregate node will be added to the plan tree.

Since PostgreSQL uses Cost Based Optimizer to generate an optimal plan tree, it is almost impossible to guess which of these nodes will be used. But let’s understand when and how it gets used.


HashAggregate works by building the hash table of the data in order to group them. So HashAggregate may be used by group level aggregate if the aggregate is happening on unsorted data set.

postgres=# explain select count(*) from demo1 group by id2;

                       QUERY PLAN


 HashAggregate  (cost=20.00..30.00 rows=1000 width=12)

   Group Key: id2

   ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=4)

(3 rows)

Here the demo1 table schema data is as per the example shown in the previous section. Since there are only 1000 rows to group, so the resource required to build a hash table is lesser than the cost of sorting. The query planner decides to choose HashAggregate.


GroupAggregate works on sorted data so it does not require any additional data structure. GroupAggregate may be used by group level aggregate if the aggregation is on sorted data set. In order to group on sorted data either it can explicitly sort (by adding Sort node) or it might work on data fetched by index in which case it is implicitly sorted.

postgres=# explain select count(*) from demo2 group by id2;

                            QUERY PLAN


 GroupAggregate  (cost=9747.82..11497.82 rows=100000 width=12)

   Group Key: id2

   ->  Sort  (cost=9747.82..9997.82 rows=100000 width=4)

      Sort Key: id2

      ->  Seq Scan on demo2  (cost=0.00..1443.00 rows=100000 width=4)

(5 rows) 

Here the demo2 table schema data is as per the example shown in the previous section. Since here there are 100000 rows to group, so the resource required to build hash table might be costlier than the cost of sorting. So the query planner decides to choose GroupAggregate. Observe here the records selected from the “demo2” table are explicitly sorted and for which there is a node added in the plan tree.

See below another example, where already data are retrieved sorted because of index scan:

postgres=# create index idx1 on demo1(id);


postgres=# explain select sum(id2), id from demo1 where id=1 group by id;

                            QUERY PLAN


 GroupAggregate  (cost=0.28..8.31 rows=1 width=12)

   Group Key: id

   ->  Index Scan using idx1 on demo1  (cost=0.28..8.29 rows=1 width=8)

      Index Cond: (id = 1)

(4 rows) 

See below one more example, which even though has Index Scan, still it needs to explicitly sort as the column on which index there and grouping column are not the same. So still it needs to sort as per the grouping column.

postgres=# explain select sum(id), id2 from demo1 where id=1 group by id2;

                               QUERY PLAN


 GroupAggregate  (cost=8.30..8.32 rows=1 width=12)

   Group Key: id2

   ->  Sort  (cost=8.30..8.31 rows=1 width=8)

      Sort Key: id2

      ->  Index Scan using idx1 on demo1  (cost=0.28..8.29 rows=1 width=8)

            Index Cond: (id = 1)

(6 rows)

Note: GroupAggregate/HashAggregate can be used for many other indirect queries even though aggregation with group by not there in the query. It depends on how the planner interprets the query. E.g. Say we need to get distinct value from the table, then it can be seen as a group by the corresponding column and then take one value from each group.

postgres=# explain select distinct(id) from demo1;

                       QUERY PLAN


 HashAggregate  (cost=17.50..27.50 rows=1000 width=4)

   Group Key: id

   ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=4)

(3 rows)

So here HashAggregate gets used even though there is no aggregation and group by involved.


Limit nodes get added to the plan tree if the “limit/offset” clause is used in the SELECT query. This clause is used to limit the number of rows and optionally provide an offset to start reading data. Example below:

postgres=# explain select * from demo1 offset 10;

                       QUERY PLAN


 Limit  (cost=0.15..15.00 rows=990 width=8)

   ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=8)

(2 rows)

postgres=# explain select * from demo1 limit 10;

                       QUERY PLAN


 Limit  (cost=0.00..0.15 rows=10 width=8)

   ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=8)

(2 rows)

postgres=# explain select * from demo1 offset 5 limit 10;

                       QUERY PLAN


 Limit  (cost=0.07..0.22 rows=10 width=8)

   ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=8)

(2 rows)


This node gets selected in order to get a distinct value from the underlying result. Note that depending on the query, selectivity and other resource info, the distinct value can be retrieved using HashAggregate/GroupAggregate also without using Unique node. Example:

postgres=# explain select distinct(id) from demo2 where id<100;

                                 QUERY PLAN


 Unique  (cost=0.29..10.27 rows=99 width=4)

   ->  Index Only Scan using demoidx2 on demo2  (cost=0.29..10.03 rows=99 width=4)

      Index Cond: (id < 100)

(3 rows)


PostgreSQL provides functionality to lock all rows selected. Rows can be selected in a “Shared” mode or “Exclusive” mode depending on the “FOR SHARE” and “FOR UPDATE” clause respectively. A new node “LockRows” gets added to plan tree in achieving this operation.

postgres=# explain select * from demo1 for update;

                        QUERY PLAN


 LockRows  (cost=0.00..25.00 rows=1000 width=14)

   ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=14)

(2 rows)

postgres=# explain select * from demo1 for share;

                        QUERY PLAN


 LockRows  (cost=0.00..25.00 rows=1000 width=14)

   ->  Seq Scan on demo1  (cost=0.00..15.00 rows=1000 width=14)

(2 rows)


PostgreSQL provides functionality to combine the results of two or more query. So as the type of Join node gets selected to join two tables, a similarly type of SetOp node gets selected to combine the results of two or more queries. For example, consider a table with employees with their id, name, age and their salary as below:

postgres=# create table emp(id int, name char(20), age int, salary int);


postgres=# insert into emp values(1,'a', 30,100);


postgres=# insert into emp values(2,'b', 31,90);


postgres=# insert into emp values(3,'c', 40,105);


postgres=# insert into emp values(4,'d', 20,80);


Now let’s get employees with age more than 25 years:

postgres=# select * from emp where age > 25;

 id |         name | age | salary


  1 | a                |  30 |    100

  2 | b                |  31 |     90

  3 | c                |  40 |    105

(3 rows) 

Now let’s get employees with salary more than 95M:

postgres=# select * from emp where salary > 95;

 id |         name | age | salary


  1 | a                |  30 |    100

  3 | c                |  40 |    105

(2 rows)

Now in order to get employees with age more than 25 years and salary more than 95M, we can write below intersect query:

postgres=# explain select * from emp where age>25 intersect select * from emp where salary > 95;

                                QUERY PLAN


 HashSetOp Intersect  (cost=0.00..72.90 rows=185 width=40)

   ->  Append  (cost=0.00..64.44 rows=846 width=40)

      ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..30.11 rows=423 width=40)

            ->  Seq Scan on emp  (cost=0.00..25.88 rows=423 width=36)

                  Filter: (age > 25)

      -> Subquery Scan on "*SELECT* 2"  (cost=0.00..30.11 rows=423 width=40)

            ->  Seq Scan on emp emp_1  (cost=0.00..25.88 rows=423 width=36)

                  Filter: (salary > 95)

(8 rows) 

So here, a new kind of node HashSetOp is added to evaluate the intersect of these two individual queries.

Note that there are other two kinds of new node added here:


This node gets added to combine multiple results set into one.

Subquery Scan

This node gets added to evaluate any subquery. In the above plan, the subquery is added to evaluate one additional constant column value which indicates which input set contributed a specific row.

HashedSetop works using the hash of the underlying result but it is possible to generate Sort based SetOp operation by the query optimizer. Sort based Setop node is denoted as “Setop”.

Note: It is possible to achieve the same result as shown in the above result with a single query but here it is shown using intersect just for an easy demonstration.


All nodes of PostgreSQL are useful and get selected based on the nature of the query, data, etc. Many of the clauses are mapped one to one with nodes. For some clauses there are multiple options for nodes, which get decided based on the underlying data cost calculations.


by Kumar Rajeev Rastogi at October 15, 2019 05:36 PM

October 14, 2019


MySQL Cloud Backup and Restore Scenarios Using Microsoft Azure

Backups are a very important part of your database operations, as your business must be secured when catastrophe strikes. When that time comes (and it will), your Recovery Point Objective (RPO) and Recovery Time Objective (RTO) should be predefined, as this is how fast you can recover from the incident which occurred. 

Most organizations vary their approach to backups, trying to have a combination of server image backups (snapshots), logical and physical backups. These backups are then stored in multiple locations, so as to avoid any local or regional disasters.  It also means that the data can be restored in the shortest amount of time, avoiding major downtime which can impact your company's business. 

Hosting your database with a cloud provider, such as Microsoft Azure (which we will discuss in this blog), is not an exception, you still need to prepare and define your disaster recovery policy.

Like other public cloud offerings, Microsoft Azure (Azure) offers an approach for backups that is practical, cost-effective, and designed to provide you with recovery options. Microsoft Azure backup solutions allow you to configure and operate and are easily handled using their Azure Backup or through the Restore Services Vault (if you are operating your database using virtual machines). 

If you want a managed database in the cloud, Azure offers Azure Database for MySQL. This should be used only if you do not want to operate and manage the MySQL database yourself. This service offers a rich solution for backup which allows you to create a backup of your database instance, either from a local region or through a geo-redundant location. This can be useful for data recovery. You may even be able to restore a node from a specific period of time, which is useful in achieving point-in-time recovery. This can be done with just one click.

In this blog, we will cover all of these backup and restore scenarios using a MySQL database on the Microsoft Azure cloud.

Performing Backups on a Virtual Machine on Azure

Unfortunately, Microsoft Azure does not offer a MySQL-specific backup type solution (e.g. MySQL Enterprise Backup, Percona XtraBackup, or MariaDB's Mariabackup). 

Upon creation of your Virtual Machine (using the portal), you can setup a process to backup your VM using the Restore Services vault. This will guard you from any incident, disaster, or catastrophe and the data stored is encrypted by default. Adding encryption is optional and, though recommended by Azure, it comes with a price. You can take a look at their Azure Backup Pricing page for more details.

To create and setup a backup, go to the left panel and click All Resources → Compute → Virtual Machine. Now set the parameters required in the text fields. Once you are on that page, go to the Management tab and scroll down below. You'll be able to see how you can setup or create the backup. See the screenshot below:

Create a Virtual Machine - Azure

Then setup your backup policy based on your backup requirements. Just hit the Create New link in the Backup policy text field to create a new policy. See below:

Define Backup Policy - Azure

You can configure your backup policy with retention by week, monthly, and yearly. 

Once you have your backup configured, you can check that you have a backup enabled on that particular virtual machine you have just created. See the screenshot below:

Backup Settings - Azure

Restore and Recover Your Virtual Machine on Azure

Designing your recovery in Azure depends on what kind of policy and requirements your application requires. It also depends on whether RTO and RPO must be low or invisible to the user in case an incident or during maintenance. You may setup your virtual machine with an availability set or on a different availability zone to achieve a higher recovery rate. 

You may also setup a disaster recovery for your VM to replicate your virtual machines to another Azure region for business continuity and disaster recovery needs. However, this might not be a good idea for your organization as it comes with a high cost. If in place, Azure offers you an option to restore or create a virtual machine from the backup created. 

For example, during the creation of your virtual machine, you can go to Disks tab, then go to Data Disks. You can create or attach an existing disk where you can attach the snapshot you have available. See the screenshot below for which you'll be able to choose from snapshot or storage blob:

Create a New Disk - Azure

 You may also restore on a specific point in time just like in the screenshot below:

Set Restore Point - Azure

Restoring in Azure can be done in different ways, but it uses the same resources you have already created.

For example, if you have created a snapshot or a disk image stored in the Azure Storage blob, if you create a new VM, you can use that resource as long as it's compatible and available to use. Additionally, you may even be able to do some file recovery, aside from restoring a VM just like in the screenshot below:

File Recovery - Azure

During File Recovery, you may be able to choose from a specific recovery point, as well as download a script to browse and recover files. This is very helpful when you need only a specific file but not the whole system or disk volume.

Restoring from backup on an existing VM takes about three minutes. However, restoring from backup to spawn a new VM takes twelve minutes. This, however, could depend on the size of your VM and the network bandwidth available in Azure. The good thing is that, when restoring, it will provide you with details of what has been completed and how much time is remaining. For example, see the screenshot below:

Recovery Job Status - Azure

Backups for Azure Database For MySQL

Azure Database for MySQL is a fully-managed database service by Microsoft Azure. This service offers a very flexible and convenient way to setup your backup and restore capabilities.

Upon creation of your MySQL server instance, you can then setup backup retention and create your backup redundancy options; either locally redundant (local region) or geo-redundant (on a different region). Azure will provide you the estimated cost you would be charged for a month. See a sample screenshot below:

Pricing Calculator - Azure

Keep in mind that geo-redundant backup options are only available on General Purpose and Memory Optimized types of compute nodes. It's not available on a Basic compute node, but you can have your redundancy in the local region (i.e. within the availability zones available).

Once you have a master setup, it's easy to create a replica by going to Azure Database for MySQL servers → Select your MyQL instance → Replication → and click Add Replica. Your replica can be used as the source or restore target when needed. 

Keep in mind that in Azure, when you stop the replication between the master and a replica, this will be forever and irreversible as it makes the replica a standalone server. A replica created using Microsoft Azure is ideally a managed instance and you can stop and start the replication threads just like what you do on a normal master-slave replication. You can do a restart and that's all. If you created the replica manually, by either restoring from the master or a backup, (e.g. via a point-in-time recovery), then you'll be able to stop/start the replication threads or setup a slave lag if needed.

Restoring Your Azure Database For MySQL From A Backup

Restoring is very easy and quick using the Azure portal. You can just hit the restore button with your MySQL instance node and just follow the UI as shown in the screenshot below:

Restoring Your Azure Database For MySQL From A Backup

Then you can select a period of time and create/spawn a new instance based on this backup captured:

Restore - Azure Database For MySQL

Once you have the node available, this node will not be a replica of the master yet. You need to manually set this up with easy steps using their stored procedures available:

CALL mysql.az_replication_change_master('<master_host>', '<master_user>', '<master_password>', 3306, '<master_log_file>', <master_log_pos>, '<master_ssl_ca>');


master_host: hostname of the master server

master_user: username for the master server

master_password: password for the master server

master_log_file: binary log file name from running show master status

master_log_pos: binary log position from running show master status

master_ssl_ca: CA certificate’s context. If not using SSL, pass in empty string.

Then starting the MySQL threads is as follows,

CALL mysql.az_replication_start;

or you can stop the replication threads as follows,

CALL mysql.az_replication_stop;

or you can remove the master as,

CALL mysql.az_replication_remove_master;

or skip SQL thread errors as,

CALL mysql.az_replication_skip_counter;

As mentioned earlier, when a replica is created using Microsoft Azure under the Add Replica feature under a MySQL instance, these specific stored procedures aren't available. However, the mysql.az_replication_restart procedure will be available since you are not allowed to stop nor start the replication threads of a managed replica by Azure. So the example we have above was restored from a master which takes the full copy of the master but acts as a single node and needs a manual setup to be a replica of an existing master.

Additionally, when you have a manual replica that you have setup, you will not be able to see this under Azure Database for MySQL servers → Select your MyQL instance → Replication since you created or setup the replication manually.

Alternative Cloud and Restore Backup Solutions

There are certain scenarios where you want to have full-access when taking a full backup of your MySQL database in the cloud. To do this you can create your own script or use open-source technologies. With these you can control how the data in your MySQL database should be backed up and precisely how it should be stored. 

You can also leverage Azure Command Line Interface (CLI) to create your custom automation. For example, you can create a snapshot using the following command with Azure CLI:

az snapshot create  -g myResourceGroup -source "$osDiskId" --name osDisk-backup

or create your MySQL server replica with the following command:

az mysql server replica create --name mydemoreplicaserver --source-server mydemoserver --resource-group myresourcegroup

Alternatively, you can also leverage an enterprise tool that features ways to take your backup with restore options. Using open-source technologies or 3rd party tools requires knowledge and skills to leverage and create your own implementation. Here's the list you can leverage:

  • ClusterControl - While we may be a little biased, ClusterControl offers the ability to manage physical and logical backups of your MySQL database using battle-tested, open-source technologies (PXB, Mariabackup, and mydumper). It supports MySQL, Percona, MariaDB, Galera databases. You can easily create our backup policy and store your database backups on any cloud (AWS, GCP, or Azure) Please note that the free version of ClusterControl does not include the backup features.
  • LVM Snapshots - You can use LVM to take a snapshot of your logical volume. This is only applicable for your VM since it requires access to block-level storage. Using this tool requires caveat since it can bring your database node unresponsive while the backup is running.
  • Percona XtraBackup (PXB) - An open source technology from Percona. With PXB, you can create a physical backup copy of your MySQL database. You can also do a hot-backup with PXB for InnoDB storage engine but it's recommended to run this on a slave or non-busy MySQL db server. This is only applicable for your VM instance since it requires binary or file access to the database server itself.
  • Mariabackup - Same with PXB, it's an open-source technology forked from PXB but is maintained by MariaDB. Specifically, if your database is using MariaDB, you should use Mariabackup in order to avoid incompatibility issues with tablespaces.
  • mydumper/myloader - These backup tools creates a logical backup copies of your MySQL database. You can use this with your Azure database for MySQL though I haven't tried how successful is this for your backup and restore procedure.
  • mysqldump - it's a logical backup tool which is very useful when you need to backup and dump (or restore) a specific table or database to another instance. This is commonly used by DBA's but you need to pay attention of your disks space as logical backup copies are huge compared to physical backups.
  • MySQL Enterprise Backup - It delivers hot, online, non-blocking backups on multiple platforms including Linux, Windows, Mac & Solaris. It's not a free backup tool but offers a lot of features.
  • rsync - It's a fast and extraordinarily versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync daemon. It offers a large number of options that control every aspect of its behavior and permit very flexible specification of the set of files to be copied. Mostly in Linux systems, rsync is installed as part of the OS package.

by Paul Namuag at October 14, 2019 09:45 AM

October 11, 2019


Securing MongoDB from External Injection Attacks

MongoDB security is not fully-guaranteed by simply configuring authentication certificates or encrypting the data. Some attackers will “go the extra mile” by playing with the received parameters in HTTP requests which are used as part of the database’s query process. 

SQL databases are the most vulnerable to this type of attack, but external injection is also possible in NoSQL DBMs such as MongoDB. In most cases, external injections happen as a result of an unsafe concatenation of strings when creating queries.

What is an External Injection Attack?

Code injection is basically integrating unvalidated data (unmitigated vector) into a vulnerable program which when executed, leads to disastrous access to your database; threatening its safety. 

When unsanitized variables are passed into a MongoDB query, they break the document query orientation structure and are sometimes executed as the javascript code itself. This is often the case when passing props directly from the body-parser module for the Nodejs server. Therefore, an attacker can easily insert a Js object where you’d expect a string or number, thereby getting unwanted results or by manipulating your data. 

Consider the data below in a student's collection.

{username:'John Doc', email:'', age:20},

{username:'Rafael Silver', email:'', age:30},

{username:'Kevin Smith', email:'', age:22},

{username:'Pauline Wagu', email:'', age:23}

Let’s say your program has to fetch all students whose age is equal to 20,  you would write a code like this...

app.get(‘/:age’, function(req, res){

  db.collections(“students”).find({age: req.params.age});


You will have submitted a JSON object in your http request as 

{age: 20}

This will return all students whose age is equal to 20 as the expected result and in this case only {username:'John Doc', email:'', age:20}

Now let’s say an attacker submits an object instead of a number i.e {‘$gt:0’};

The resulting query will be:

db.collections(“students”).find({age: {‘$gt:0’}); which is a valid query that upon execution  will return all students in that collection. The attacker has a chance to act on your data according to their malicious intentions. In most cases, an attacker injects a custom object that contains MongoDB commands that enable them to access your documents without the proper procedure.

Some MongoDB commands execute Javascript code within the database engine, a potential risk for your data. Some of these commands are ‘$where’, ‘$group’ and ‘mapReduce’. For versions before MongoDB 2.4, Js code has access to the db object from within the query.

MongoDB Naitive Protections

MongoDB utilizes the BSON data (Binary JSON) for both its queries and documents, but in some instances it can accept unserialized JSON and Js expressions (such as the ones mentioned above). Most of the data passed to the server is in the format of a string and can be fed directly into a MongoDB query. MongoDB does not parse its data, therefore avoiding potential risks that may result from direct parameters being integrated. 

If an API involves encoding data in a formatted text and that text needs to be parsed, it has the potential of creating disagreement between the server’s caller and the database’s callee on how that string is going to be parsed. If the data is accidentally misinterpreted as metadata the scenario can potentially pose security threats to your data.

Examples of MongoDB External Injections and How to Handle Them

 Let’s consider the data below in a students collection.

{username:'John Doc', password: ‘16djfhg’, email:'', age:20},

{username:'Rafael Silver',password: ‘djh’, email:'', age:30},

{username:'Kevin Smith', password: ‘16dj’, email:'', age:22},

{username:'Pauline Wagu', password: ‘g6yj’, email:'', age:23}

Injection Using the $ne (not equal) Operator

If I want to return the document with username and password supplied from a request the code will be:'/students, function (req, res) {

    var query = {

        username: req.body.username,

        password: req.body.password


    db.collection(students).findOne(query, function (err, student) {




If we receive the request below

POST https://localhost/students HTTP/1.1

Content-Type: application/json


    "username": {"$ne": null},

    "password": {"$ne": null}


The query will definitely return the first student in this case since his username and password are not valued to be null. This is not according to the expected results.

To solve this, you can use:

mongo-sanitize module which stops any key that starts with‘$’ from being passed into MongoDB query engine.

Install the module first  

​npm install mongo-sanitize

var sanitize = require(‘mongo-sanitize’);

var query = {

username: req.body.username,

password: req.body.password


Using mongoose to validate your schema fields such that if it expects a string and receives an object, the query will throw an error. In our case above the null value will be converted into a string “” which literally has no impact.

Injection Using the $where Operator

This is one of the most dangerous operators. It will allow a string to be evaluated inside the server itself. For example, to fetch students whose age is above a value Y, the query will be 

var query = { 

   $where: “this.age > ”+req.body.age


 db.collection(students).findOne(query, function (err, student) {



Using the sanitize module won’t help in this case if we have a ‘0; return true’ because the result will return all the students rather than those whose age is greater than some given value. Other possible strings you can receive are ‘\’; return \ ‘\’ == \’’ or === ‘’;return ‘’ == ‘’. This query will return all students rather than only those that match the clause.

The $where clause should be greatly avoided. Besides the outlined setback it also reduces performance because it is not optimized to use indexes.

There is also a great possibility of passing a function in the $where clause and the variable will not be accessible in the MongoDB scope hence may result in your application crashing. I.e

var query = {

   $where: function() {

       return this.age > setValue //setValue is not defined



You can also use the $eq, $lt, $lte, $gt, $gte operators instead.

Protecting Yourself from MongoDB External Injection

Here are three things you can do to keep yourself protected...

  1. Validate user data.  Looking back at how the $where expression can be used to access your data, it is advisable to always validate what users send to your server.
  2. Use the JSON validator concept to validate your schema together with the mongoose module.
  3. Design your queries such that Js code does not have full access to your database code.


External injection are also possible with MongoDB. It is often associated with unvalidated user data getting into MongoDB queries. It is always important to detect and prevent NoSQL injection by testing any data that may be received by your server. If neglected, this can threaten the safety of user data. The most important procedure is to validate your data at all involved layers.

by Onyancha Brian Henry at October 11, 2019 09:45 AM

October 10, 2019


Using MySQL Galera Cluster Replication to Create a Geo-Distributed Cluster: Part Two

In the previous blog in the series we discussed the pros and cons of using Galera Cluster to create geo-distributed cluster. In this post we will design a Galera-based geo-distributed cluster and we will show how you can deploy all the required pieces using ClusterControl.

Designing a Geo-Distributed Galera Cluster

We will start with explaining the environment we want to build. We will use three remote data centers, connected via Wide Area Network (WAN). Each datacenter will receive writes from local application servers. Reads will also be only local. This is intended to avoid unnecessary traffic crossing the WAN. 

For this setup the connectivity is in place and secured, but we won’t describe exactly how this can be achieved. There are numerous methods to secure the connectivity starting from proprietary hardware and software solutions through OpenVPN and ending up on SSH tunnels. 

We will use ProxySQL as a loadbalancer. ProxySQL will be deployed locally in each datacenter. It will also route traffic only to the local nodes. Remote nodes can always be added manually and we will explain cases where this might be a good solution. Application can be configured to connect to one of the local ProxySQL nodes using round-robin algorithm. We can as well use Keepalived and Virtual IP to route the traffic towards the single ProxySQL node, as long as a single ProxySQL node would be able to handle all of the traffic. 

Another possible solution is to collocate ProxySQL with application nodes and configure the application to connect to the proxy on the localhost. This approach works quite well under the assumption that it is unlikely that ProxySQL will not be available yet the application would work ok on the same node. Typically what we see is either node failure or network failure, which would affect both ProxySQL and application at the same time.

Geo-Distributed MySQL Galera Cluster with ProxySQL

The diagram above shows the version of the environment, where ProxySQL is collocated on the same node as the application. ProxySQL is configured to distribute the workload across all Galera nodes in the local datacenter. One of those nodes would be picked as a node to send the writes to while SELECTs would be distributed across all nodes. Having one dedicated writer node in a datacenter helps to reduce the number of possible certification conflicts, leading to, typically, better performance. To reduce this even further we would have to start sending the traffic over the WAN connection, which is not ideal as the bandwidth utilization would significantly increase. Right now, with segments in place, only two copies of the writeset are being sent across datacenters - one per DC.

The main concern with Galera Cluster geo-distributed deployments is latency. This is something you always have to test prior launching the environment. Am I ok with the commit time? At every commit certification has to happen so writesets have to be sent and certified on all nodes in the cluster, including remote ones. It may be that the high latency will deem the setup unsuitable for your application. In that case you may find multiple Galera clusters connected via asynchronous replication more suitable. This would be a topic for another blog post though.

Deploying a Geo-Distributed Galera Cluster Using ClusterControl

To clarify things, we will show here how a deployment may look like. We won’t use actual multi-DC setup, everything will be deployed in a local lab. We assume that the latency is acceptable and the whole setup is viable. What is great about ClusterControl is that it is infrastructure-agnostic. It doesn’t care if the nodes are close to each other, located in the same datacenter or if the nodes are distributed across multiple cloud providers. As long as there is SSH connectivity from ClusterControl instance to all of the nodes, the deployment process looks exactly the same. That’s why we can show it to you step by step using just local lab.

Installing ClusterControl

First, you have to install ClusterControl. You can download it for free. After registering, you should access the page with guide to download and install ClusterControl. It is as simple as running a shell script. Once you have ClusterControl installed, you will be presented with a form to create an administrative user:

Installing ClusterControl

Once you fill it, you will be presented with a Welcome screen and access to deployment wizards:

ClusterControl Welcome Screen

We’ll go with deploy. This will open a deployment wizard:

ClusterControl Deployment Wizard

We will pick MySQL Galera. We have to pass SSH connectivity details - either root user or sudo user are supported. On the next step we are to define servers in the cluster.

Deploy Database Cluster

We are going to deploy three nodes in one of the data centers. Then we will be able to extend the cluster, configuring new nodes in different segments. For now all we have to do is to click on “Deploy” and watch ClusterControl deploying the Galera cluster.

Cluster List - ClusterControl

Our first three nodes are up and running, we can now proceed to adding additional nodes in other datacenters.

Add a Database Node - ClusterControl

You can do that from the action menu, as shown on the screenshot above.

Add a Database Node - ClusterControl

Here we can add additional nodes, one at a time. What is important, you should change the Galera segment to non-zero (0 is used for the initial three nodes).

After a while we end up with all nine nodes, distributed across three segments.

ClusterControl Geo-Distributed Database Nodes

Now, we have to deploy proxy layer. We will use ProxySQL for that. You can deploy it in ClusterControl via Manage -> Load Balancer:

Add a Load Balancer - ClusterControl

This opens a deployment field:

Deploy Load Balancer - ClusterControl

First, we have to decide where to deploy ProxySQL. We will use existing Galera nodes but you can type anything in the field so it is perfectly possible to deploy ProxySQL on top of the application nodes. In addition, you have to pass access credentials for the administrative and monitoring user.

Deploy Load Balancer - ClusterControl

Then we have to either pick one of existing users in MySQL or create one right now. We also want to ensure that the ProxySQL is configured to use Galera nodes located only in the same datacenter.

When you have one ProxySQL ready in the datacenter, you can use it as a source of the configuration:

Deploy ProxySQL - ClusterControl

This has to be repeated for every application server that you have in all datacenters. Then the application has to be configured to connect to the local ProxySQL instance, ideally over the Unix socket. This comes with the best performance and the lowest latency.

Reducing Latency - ClusterControl

After the last ProxySQL is deployed, our environment is ready. Application nodes connect to local ProxySQL. Each ProxySQL is configured to work with Galera nodes in the same datacenter:

ProxySQL Server Setup - ClusterControl


We hope this two-part series helped you to understand the strengths and weaknesses of geo-distributed Galera Clusters and how ClusterControl makes it very easy to deploy and manage such cluster.

by krzysztof at October 10, 2019 09:45 AM

October 09, 2019

Valeriy Kravchuk

Dynamic Tracing of MariaDB Server With bpftrace - Basic Example

Unlike the previous post, this one is not just a comment to some slides from the "Tracing and Profiling MySQL" talk at Percona Live Europe 2019. I am going to add the details that were missing there (as I was in a hurry and had forgotten to copy/paste proper outputs while testing). I am going to show how to add dynamic probe with "latest and greatest" bpftrace tool.

The goal is the same as before - try to add dynamic probe(s) to trace query execution. More specifically, to capture text of the queries executed by clients of MySQL server. As bpftrace requires new kernel and just does not work on Ubuntu 16.04, for the demonstration I use my Fedora 29 box with kernel 5.2.x and, for a change, get queries from Fedora's own MariaDB 10.3.17 installed from rpm packages there.

I have both bcc and bpftrace installed also from packages:
[openxs@fc29 ~]$ rpm -qa | grep bcc
[openxs@fc29 ~]$ rpm -qa | grep bpf
[openxs@fc29 ~]$
You can check fine manual for the details, but even basic -h option provides enough to start, as long as you already know some terms and probes syntax:
[openxs@fc29 ~]$ bpftrace -h
    bpftrace [options] filename
    bpftrace [options] -e 'program'

    -B MODE        output buffering mode ('line', 'full', or 'none')
    -d             debug info dry run
    -dd            verbose debug info dry run
    -e 'program'   execute this program
    -h             show this help message
    -l [search]    list probes
    -p PID         enable USDT probes on PID
    -c 'CMD'       run CMD and enable USDT probes on resulting process
    -v             verbose messages
    --version      bpftrace version

    BPFTRACE_STRLEN           [default: 64] bytes on BPF stack per str()
    BPFTRACE_NO_CPP_DEMANGLE  [default: 0] disable C++ symbol demangling

bpftrace -l '*sleep*'
    list probes containing "sleep"
bpftrace -e 'kprobe:do_nanosleep { printf("PID %d sleeping...\n", pid); }'
    trace processes calling sleep
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
    count syscalls by process name
In our case we need to define uprobe for proper mysqld binary and trace the dispatch_command() function. Before we start, note that parameters of dispatch_command() in MariaDB 10.3 are not the same as in Percona Server 5.7 I've used in the previous post. Basically, this function starts as follows in sql/
   1570 bool dispatch_command(enum enum_server_command command, THD *thd,
   1571                       char* packet, uint packet_length, bool is_com_mult        i,
   1572                       bool is_next_command)
Note the third argument, packet. If the first argument, command. is SQL_QUERY, then packet contains the query text (as a zero-terminated string) for sure (it's also true for many other commands, but let me skip the details for now). That's why we'll use third argument in our uprobe to capture the SQL text.

Now, let's start the service and check the exact full path name for the mysql binary:
[openxs@fc29 ~]$ sudo service mariadb start
[sudo] password for openxs:
Redirecting to /bin/systemctl start mariadb.service
[openxs@fc29 ~]$ ps aux | grep mysqldmysql     9109  6.2  1.2 1699252 104108 ?      Ssl  09:30   0:00 /usr/libexec/mysqld --basedir=/usr
openxs    9175  0.0  0.0 215744   892 pts/0    S+   09:30   0:00 grep --color=auto mysqld
The first naive attempt to add the probe after cursory reading the documentation and checking few examples may look like this:
 [openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:dispatch_command { printf("%s\n", str(arg2)); }'
Attaching 1 probe...
Could not resolve symbol: /usr/libexec/mysqld:dispatch_command
It seems in my case, unlike perf, bpftrace is not "aware" of C++ names or symbolic information in a separate -debuginfo package. So, I need mangled name:
[openxs@fc29 ~]$ nm -na /usr/libexec/mysqld | grep dispatch_command
nm: /usr/libexec/mysqld: no symbols
[openxs@fc29 ~]$ nm -na /home/openxs/dbs/maria10.3/bin/mysqld | grep dispatch_command
00000000004a1eef t _Z16dispatch_command19enum_server_commandP3THDPcjbb.cold.344
00000000005c5190 T _Z16dispatch_command19enum_server_commandP3THDPcjbb
00000000005c5190 t _Z16dispatch_command19enum_server_commandP3THDPcjbb.localalias.256
Surely there is no symbols in the binary from Fedora package, so I checked the binary (of the same version) that I've built myself (as usual) and assumed that neither parameters nor mangling approach could be different. So, the next attempt to add dynamic probe would look as follows:
[openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb { printf("%s\n", str(arg2)); }'
Attaching 1 probe...
show databases
show tables
select @@version_comment limit 1
select user, host from mysql.user
It worked and you see above the output I've got for the following session:
[openxs@fc29 ~]$ mysql -uroot testReading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 19
Server version: 10.3.17-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [test]> select user, host from mysql.user;
| user | host      |
| root | |
| root | ::1       |
| root | fc29      |
| root | localhost |
4 rows in set (0.000 sec)
You can see some SQL statements generated when mysql command line connects, as well as packet value in some other packet ("t1") than COM_QUERY, probably. My probe had not even tried to check other parameters besides the supposed query text.

Now, the probe is defined on uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb - I've just used long, mangled version of the function name and full path name to the binary, and defined a dynamic probe (uprobe). There is no filter and the action for the probe is defined as { printf("%s\n", str(arg2)); } - that is, I print third argument (they are numbered starting from zero, arg0, arg1, arg2, ...) as a zero-terminated string. Without str() built in function I'd get just a pointer that could be printed as (unsigned) long integer, u64.

Basically, that's all. We have a quick and dirty way to capture all queries. No timing or anything, but it all depends on probe action that can use numerous built in variables and functions.

More "advanced" use of bpftrace, a lame attempt to capture time to execute query, may look like this:
[openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb { @sql = str(arg2); @start[@sql] = nsecs; }
uretprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb /@start[@sql] != 0/ { printf("%s : %u64 ms\n", @sql, (nsecs - @start[@sql])/1000000); } '

Attaching 2 probes...
select sleep(3) : 300064 ms
select sleep(1) : 100064 ms

@sql: select sleep(1)

@start[select sleep(3)]: 10666558704666
@start[select sleep(1)]: 10685614895675

[openxs@fc29 ~]$
In this case I try to store time since probe start into the associative array with query text as an "index" and start time (in nanoseconds) as a value.Then I calculate the difference from current nsecs value upon function return, in a separate uretprobe. I've used global variables, @sql for the query text, and @start[] for the array. It even seems to work well for a single threaded load based on the above. But as soon as I try to use multiple concurrent threads:
[openxs@fc29 ~]$ for i in `seq 1 4`; do mysql -uroot test -e"select sleep($i)" & done
it becomes clear that global variables are really global and my outputs are all wrong.

A bit better version may look like this:
[openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb { @sql[tid] = str(arg2); @start[tid] = nsecs; }                                                                               uretprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb /@start[tid] != 0/ { printf("%s : %u64 %u64 ms\n", @sql[tid], tid, (nsecs - @start[tid])/1000000); } '
Attaching 2 probes...
select sleep(1) : 1120764 100064 ms
 : 1120764 064 ms
select sleep(2) : 949064 200064 ms
 : 949064 064 ms
select sleep(3) : 1120864 300064 ms
 : 1120864 064 ms
select sleep(4) : 1120664 400064 ms
 : 1120664 064 ms
select sleep(1) : 1120764 100064 ms
 : 1120764 064 ms
select sleep(2) : 949064 200064 ms
 : 949064 064 ms
select sleep(3) : 1120664 300064 ms
 : 1120664 064 ms
select sleep(4) : 1120864 400064 ms
 : 1120864 064 ms


@start[11207]: 13609305005933
@start[9490]: 13610305621499
@start[11206]: 13611305753596
@start[11208]: 13612305235313

[openxs@fc29 ~]$
The output is for both sequential and concurrent execution of queries. I've used two associative arrays, @sql[] for queries and @start[] for start times, both indexed by tid - built in variable for thread id, that should not change, at least until we use pool-of-threads... You can also see that the tool by default outputs the content of all global associate arrays at the end, unless we free them explicitly.

* * *
This image of bpftrace internals is taken from the Brendan Gregg's post
bpftrace commands may be much more complicated and it may make sense to to store them in a separate file. The tool is near ideal of "quick and dirty" tests, and one day I'll write a way more complete posts with much better examples. But having a way to capture, filter and summarize queries in kernel space and send only the relevant results to a user space, all that in a safe manner, is really cool!

by Valerii Kravchuk ( at October 09, 2019 04:19 PM


Using MySQL Galera Cluster Replication to Create a Geo-Distributed Cluster: Part One

It is quite common to see databases distributed across multiple geographical locations. One scenario for doing this type of setup is for disaster recovery, where your standby data center is located in a separate location than your main datacenter. It might as well be required so that the databases are located closer to the users. 

The main challenge to achieving this setup is by designing the database in a way that reduces the chance of issues related to the network partitioning. One of the solutions might be to use Galera Cluster instead of regular asynchronous (or semi-synchronous) replication. In this blog we will discuss the pros and cons of this approach. This is the first part in a series of two blogs. In the second part we will design the geo-distributed Galera Cluster and see how ClusterControl can help us deploy such environment.

Why Galera Cluster Instead of  Asynchronous Replication for Geo-Distributed Clusters?

Let’s consider the main differences between the Galera and regular replication. Regular replication provides you with just one node to write to, this means that every write from remote datacenter would have to be sent over the Wide Area Network (WAN) to reach the master. It also means that all proxies located in the remote datacenter will have to be able to monitor the whole topology, spanning across all data centers involved as they have to be able to tell which node is currently the master. 

This leads to the number of problems. First, multiple connections have to be established across the WAN, this adds latency and slows down any checks that proxy may be running. In addition, this adds unnecessary overhead on the proxies and databases. Most of the time you are interested only in routing traffic to the local database nodes. The only exception is the master and only because of this proxies are forced to watch the whole infrastructure rather than just the part located in the local datacenter. Of course, you can try to overcome this by using proxies to route only SELECTs, while using some other method (dedicated hostname for master managed by DNS) to point the application to master, but this adds unnecessary levels of complexity and moving parts, which could seriously impact your ability to handle multiple node and network failures without losing data consistency.

Galera Cluster can support multiple writers. Latency is also a factor, as all nodes in the Galera cluster have to coordinate and communicate to certify writesets, it can even be the reason you may decide not to use Galera when latency is too high. It is also an issue in replication clusters - in replication clusters latency affects only writes from the remote data centers while the connections from the datacenter where master is located would benefit from a low latency commits. 

In MySQL Replication you also have to take the worst case scenario in mind and ensure that the application is ok with delayed writes. Master can always change and you cannot be sure that all the time you will be writing to a local node.

Another difference between replication and Galera Cluster is the handling of the replication lag. Geo-distributed clusters can be seriously affected by lag: latency, limited throughput of the WAN connection, all of this will impact the ability of a replicated cluster to keep up with the replication. Please keep in mind that replication generates one to all traffic.

Geo-Distributed Galera Cluster

All slaves have to receive whole replication traffic - the amount of data you have to send to remote slaves over WAN increases with every remote slave that you add. This may easily result in the WAN link saturation, especially if you do plenty of modifications and WAN link doesn’t have good throughput. As you can see on the diagram above, with three data centers and three nodes in each of them master has to sent 6x the replication traffic over WAN connection.

With Galera cluster things are slightly different. For starters, Galera uses flow control to keep the nodes in sync. If one of the nodes start to lag behind, it has an ability to ask the rest of the cluster to slow down and let it catch up. Sure, this reduces the performance of the whole cluster, but it is still better than when you cannot really use slaves for SELECTs as they tend to lag from time to time - in such cases the results you will get might be outdated and incorrect.

Geo-Distributed Galera Cluster

Another feature of Galera Cluster, which can significantly improve its performance when used over WAN, are segments. By default Galera uses all to all communication and every writeset is sent by the node to all other nodes in the cluster. This behavior can be changed using segments. Segments allow users to split Galera cluster in several parts. Each segment may contain multiple nodes and it elects one of them as a relay node. Such node receives writesets from other segments and redistribute them across Galera nodes local to the segment. As a result, as you can see on the diagram above, it is possible to reduce the replication traffic going over WAN three times - just two “replicas” of the replication stream are being sent over WAN: one per datacenter compared to one per slave in MySQL Replication.

Galera Cluster Network Partitioning Handling

Where Galera Cluster shines is the handling of the network partitioning. Galera Cluster constantly monitors the state of the nodes in the cluster. Every node attempts to connect with its peers and exchange the state of the cluster. If subset of nodes is not reachable, Galera attempts to relay the communication so if there is a way to reach those nodes, they will be reached.

Galera Cluster Network Partitioning Handling

An example can be seen on the diagram above: DC 1 lost the connectivity with DC2 but DC2 and DC3 can connect. In this case one of the nodes in DC3 will be used to relay data from DC1 to DC2 ensuring that the intra-cluster communication can be maintained.

Galera Cluster Network Partitioning Handling

Galera Cluster is able to take actions based on the state of the cluster. It implements quorum - majority of the nodes have to be available in order for the cluster to be able to operate. If node gets disconnected from the cluster and cannot reach any other node, it will cease to operate. 

As can be seen on the diagram above, there’s a partial loss of the network communication in DC1 and affected node is removed from the cluster, ensuring that the application will not access outdated data.

Galera Cluster Network Partitioning Handling

This is also true on a larger scale. The DC1 got all of its communication cut off. As a result, whole datacenter has been removed from the cluster and neither of its nodes will serve the traffic. The rest of the cluster maintained majority (6 out of 9 nodes are available) and it reconfigured itself to keep the connection between DC 2 and DC3. In the diagram above we assumed the write hits the node in DC2 but please keep in mind that Galera is capable of running with multiple writers.

MySQL Replication does not have any kind of cluster awareness, making it problematic to handle network issues. It cannot shut down itself upon losing connection with other nodes. There is no easy way of preventing old master to show up after the network split. 

The only possibilities are limited to the proxy layer or even higher. You have to design a system, which would try to understand the state of the cluster and take necessary actions. One possible way is to use cluster-aware tools like Orchestrator and then run scripts that would check the state of the Orchestrator RAFT cluster and, based on this state, take required actions on the database layer. This is far from ideal because any action taken on a layer higher than the database, adds additional latency: it makes possible so the issue shows up and data consistency is compromised before correct action can be taken. Galera, on the other hand, takes actions on the database level, ensuring the fastest reaction possible.

by krzysztof at October 09, 2019 09:45 AM

October 08, 2019


How to Create a Clone of Your MySQL or PostgreSQL Database Cluster

If you are managing a production database, chances are high that you’ve had to clone your database to a different server other than the production server. The basic method of creating a clone is to restore a database from a recent backup onto another database server. Another method is by replicating from a source database while it is still running, in which case it is important that the original database be unaffected by any cloning procedure.

Why Would You Need to Clone a Database?

A cloned database cluster is useful in a number of scenarios:

  • Troubleshoot your cloned production cluster in the safety of your test environment while performing destructive operations on the database.
  • Patch/upgrade test of a cloned database to validate the upgrade process before applying it to the production cluster.
  • Validate backup & recovery of a production cluster using a cloned cluster.
  • Validate or test new applications on a cloned production cluster before deploying it on the live production cluster.
  • Quickly clone the database for audit or information compliance requirements for example by quarter or year end where the content of the database must not be changed.
  • A reporting database can be created at intervals in order to avoid data changes during the report generations.
  • Migrate a database to new servers, new deployment environment or a new data center.

When running your database infrastructure on the cloud, the cost of owning a host (shared or dedicated virtual machine) is significantly lower compared to the traditional way of renting space in a datacenter or owning a physical server. Furthermore, most of the cloud deployment can be automated easily via provider APIs, client software and scripting. Therefore, cloning a cluster can be a common way to duplicate your deployment environment for example, from dev to staging to production or vice versa.

We haven't seen this feature being offered by anyone in the market thus it is our privilege to showcase how it works with ClusterControl.

Cloning a MySQL Galera Cluster

One of the cool features in ClusterControl is it allows you to quickly clone, an existing MySQL Galera Cluster so you have an exact copy of the dataset on the other cluster. ClusterControl performs the cloning operation online, without any locking or bringing downtime to the existing cluster. It's like a cluster scale out operation except both clusters are independent to each other after the syncing completes. The cloned cluster does not necessarily need to be as the same cluster size as the existing one. We could start with one-node cluster, and scale it out with more database nodes at a later stage.

In this example, we are having a cluster called "Staging" that we would want to clone as another cluster called "Production". The premise is the staging cluster already stored the necessary data that is going to be in production soon. The production cluster consists of another 3 nodes, with production specs.

The following diagram summarizes final architecture of what we want to achieve:

How to Clone Your Database - ClusterControl

The first thing to do is to set up a passwordless SSH from ClusterControl server to the production servers. On ClusterControl server run the following:

$ whoami


$ ssh-copy-id root@prod1.local

$ ssh-copy-id root@prod2.local

$ ssh-copy-id root@prod3.local

Enter the root password of the target server if prompted.

From ClusterControl database cluster list, click on the Cluster Action button and choose Clone Cluster. The following wizard will appear:

Clone Cluster - ClusterControl

Specify the IP addresses or hostnames of the new cluster and make sure you get all the green tick icon next to the specified host. The green icon means ClusterControl is able to connect to the host via passwordless SSH. Click on the "Clone Cluster" button to start the deployment.

The deployment steps are:

  1. Create a new cluster consists of one node.
  2. Sync the new one-node cluster via SST. The donor is one of the source servers.
  3. The remaining new nodes will be joining the cluster after the donor of the cloned cluster is synced with the cluster.

Once done, a new MySQL Galera Cluster will be listed under ClusterControl cluster dashboard once the deployment job completes.

Note that the cluster cloning only clones the database servers and not the whole stack of the cluster. This means, other supporting components related to the cluster like load balancers, virtual IP address, Galera arbitrator or asynchronous slave are not going to be cloned by ClusterControl. Nevertheless, if you would like to clone as an exact copy of your existing database infrastructure, you can achieve that with ClusterControl by deploying those components separately after the database cloning operation completes.

Creating a Database Cluster from a Backup

Another similar feature offered by ClusterControl is "Create Cluster from Backup". This feature is introduced in ClusterControl 1.7.1, specifically for Galera Cluster and PostgreSQL clusters where one can create a new cluster from the existing backup. Contratory to cluster cloning, this operation does not bring additional load to the source cluster with a tradeoff of the cloned cluster will not be at the current state as the source cluster.

In order to create cluster from a backup, you must have a working backup created. For Galera Cluster, all backup methods are supported while for PostgreSQL, only pgbackrest is not supported for new cluster deployment. From ClusterControl, a backup can be created or scheduled easily under ClusterControl -> Backups -> Create Backup. From the list of the created backup, click on Restore backup, choose the backup from the list and choose to "Create Cluster from Backup" from the restoration option:

Restore Backup with ClusterControl

In this example, we are going to deploy a new PostgreSQL Streaming Replication cluster for staging environment, based on the existing backup we have in the production cluster. The following diagram illustrates the final architecture:

Database Backup Restoration with ClusterControl

The first thing to do is to set up a passwordless SSH from ClusterControl server to the production servers. On ClusterControl server run the following:

$ whoami


$ ssh-copy-id root@prod1.local

$ ssh-copy-id root@prod2.local

$ ssh-copy-id root@prod3.local

When you choose Create Cluster From Backup, ClusterControl will open a deployment wizard dialog to assist you on setting up the new cluster:

Create Cluster from Backup - ClusterControl

A new PostgreSQL Streaming Replication instance will be created from the selected backup, which will be used as the base dataset for the new cluster. The selected backup must be accessible from the nodes in the new cluster, or stored in the ClusterControl host. 

Clicking on "Continue" will open the standard database cluster deployment wizard:

Create Database Cluster from Backup - ClusterControl

Note that the root/admin user password for this cluster must the same as the PostgreSQL admin/root password as included in the backup. Follow the configuration wizard accordingly and ClusterControl then perform the deployment on the following order:

  1. Install necessary softwares and dependencies on all PostgreSQL nodes.
  2. Start the first node.
  3. Stream and restore backup on the first node.
  4. Configure and add the rest of the nodes.

Once done, a new PostgreSQL Replication Cluster will be listed under ClusterControl cluster dashboard once the deployment job completes.


ClusterControl allows you to clone or copy a database cluster to multiple environments with just a number of clicks. You can download it for free today. Happy cloning!

by ashraf at October 08, 2019 09:45 AM

October 07, 2019

MariaDB Foundation

MariaDB Server University Program

The demand for DBAs, developers and software engineers knowledgeable in MariaDB Server is high. The supply isn’t. This is something we plan to fix, with the MariaDB Server University Program, for which we are now inviting universities to participate in, and users of MariaDB Server to sponsor. Mind the Gap Closing the gap between supply […]

The post MariaDB Server University Program appeared first on

by Kaj Arnö at October 07, 2019 01:16 PM

Press Release: MariaDB Server University Program Launch

Indonesia to Lead World Wide University Database Education Initiative  Yogyakarta, Indonesia, 6 Sep 2019: MariaDB Foundation and APTISI (the Association of Private Higher Education Institutions Indonesia) collaborate to launch the MariaDB Server University Programme, providing free education material for universities across Indonesia and worldwide.  The need for database education in Indonesia With millions of university […]

The post Press Release: MariaDB Server University Program Launch appeared first on

by Kaj Arnö at October 07, 2019 01:14 PM


Tips for Storing PostgreSQL Backups on Google Cloud (GCP)

All companies nowadays have (or should have) a Disaster Recovery Plan (DRP) to prevent data loss in the case of failure; built according to an acceptable Recovery Point Objective (RPO) for the business.

A backup is a basic start in any DRP, but to guarantee the backup usability a single backup is just not enough. The best practice is to store the backup files in three different places, one stored locally on the database server (for faster recovery), another one in a centralized backup server, and the last one the cloud. For this last step, you should choose a stable and robust cloud provider to make sure your data is stored correctly and is accessible at any time.

In this blog, we will take a look at one of the most famous cloud providers, Google Cloud Platform (GCP) and how to use it to store your PostgreSQL backups in the cloud.

About Google Cloud

Google Cloud offers a wide range of products for your workload. Let’s look at some of them and how they are related to storing PostgreSQL backups in the cloud.

  • Cloud Storage: It allows for world-wide storage and retrieval of any amount of data at any time. You can use Cloud Storage for a range of scenarios including serving website content, storing data for archival and disaster recovery, or distributing large data objects to users via direct download.
  • Cloud SQL: It’s a fully managed database service that makes it easy to set up, maintain, manage, and administer your relational PostgreSQL, MySQL, and SQL Server databases in the cloud.
  • Compute Engine: It delivers virtual machines running in Google Cloud with support to scaling from single instances to global, load-balanced cloud computing. Compute Engine's VMs boot quickly, come with high-performance persistent and local disk options, and deliver consistent performance. 

Storing Backups on Google Cloud

If you’re running your PostgreSQL database on Google Cloud with Cloud SQL you can back it up directly from the Google Cloud Platform, however, it’s not necessary to run it here to store your PostgreSQL backups.

Google Cloud Platform

Google Cloud Storage

Similar to the well-known Amazon S3 product, if you’re not running your PostgreSQL database with Cloud SQL, this is the most commonly used option to store backups or files in Google Cloud. It’s accessible from the Google Cloud Platform, in the Getting Started section or under the Storage left menu. With Cloud Storage, you can even easily transfer your S3 content here using the Transfer feature.

How to Use Google Cloud Storage

First, you need to create a new Bucket to store your data, so go to Google Cloud Platform -> Storage -> Create Bucket

Name Your Bucket - Google Cloud

In the first step, you need to just add a new bucket name.

Choose Where to Store Your Data - Google Cloud

In the next step, you can specify the location type (multi-region by default) and the location place.

Choose Storage Class - Google Cloud

Then, you can change the storage class from standard (default option) to nearline or coldline.

Access - Google Cloud

And then, you can change the control access.

Advanced Setting - Google Cloud

Finally, you have some optional settings like encryption or retention policy.

Now you have your new bucket created, we will see how to use it.

Using the GSutil Tool

GSutil is a Python application that lets you access Cloud Storage from the command line. It allows you to perform different bucket and object management tasks. Let’s see how to install it on CentOS 7 and how to upload a backup using it.

Download Cloud SDK:

$ curl | bash

Restart your shell:

$ exec -l $SHELL

Run gcloud init and configure the tool:

$ gcloud init

This command will ask you to login to your Google Cloud account by accessing a URL and adding an authentication code.

Now you have the tool installed and configured, let’s upload a backup to the bucket.

First, let’s check our buckets created:

[root@PG1bkp ~]# gsutil ls


And to copy your PostgreSQL backup (or another file), run:

[root@PG1bkp ~]# gsutil cp /root/backups/BACKUP-3/base.tar.gz gs://pgbackups1/new_backup/

Copying file:///root/backups/BACKUP-3/base.tar.gz [Content-Type=application/x-tar]...

| [1 files][  4.9 MiB/ 4.9 MiB]

Operation completed over 1 objects/4.9 MiB.

The destination bucket must exist. 

And then, you can list the contents of the new_backup directory, to check the file uploaded:

[root@PG1bkp ~]# gsutil ls -r gs://pgbackups1/new_backup/*



For more information about the GSutil usage, you can check the official documentation.

Google Cloud SQL

If you want to centralize all the environment (database + backups) into Google Cloud, you have available this Cloud SQL product. In this way, you will have your PostgreSQL database running on Google Cloud and you can also manage the backups from the same platform. It’s accessible from the Google Cloud Platform, in the Getting started section or under the Storage left menu.

How to Use Google Cloud SQL

To create a new PostgreSQL instance, go to Google Cloud Platform -> SQL -> Create Instance

Google Cloud SQL - Create Instance

Here you can choose between MySQL and PostgreSQL as the database engine. For this blog, let’s create a PostgreSQL instance.

Google Cloud SQL - Instance Info

Now, you need to add an instance ID, password, location and PostgreSQL version (9.6 or 11).

Google Cloud SQL - Configuration Options

You have also some configuration options, like enable Public IP Address, Machine type and storage, and backups, etc. 

When the Cloud SQL instance is created, you can select it and you will see an overview of this new instance.

PostgreSQL on Google Cloud SQL

And you can go to the Backups section to manage your PostgreSQL backups. 

Google Cloud SQL Backups

To reduce storage costs, backups work incrementally. Each backup stores only the changes to your data since the previous backup.

Google Cloud Compute Engine

Similar to Amazon EC2, this way to store information in the cloud is more expensive and time-consuming than Cloud Storage, but you will have full control over the backup storage environment.  It’s also accessible from the Google Cloud Platform, in the Getting started section or under the Compute left menu.

How to Use a Google Cloud Compute Engine

To create a new virtual machine, go to Google Cloud Platform -> Compute Engine -> Create Instance

Google Cloud - Create Compute Instance

Here you need to add an instance name, region, and zone where to create it. Also, you need to specify the machine configuration according to your hardware and usage requirements, and the disk size and operating system to use for the new virtual machine. 

Google Cloud Compute Engine

When the instance is ready, you can store the backups here, for example, sending it via SSH or FTP using the external IP Address. Let’s look at an example with Rsync and another one with SCP Linux command.

To connect via SSH to the new virtual machine, make sure you have added your SSH key in the virtual machine configuration.

[root@PG1bkp ~]# rsync -avzP -e "ssh -i /home/sinsausti/.ssh/id_rsa" /root/backups/BACKUP-3/base.tar.gz sinsausti@

sending incremental file list


      5,155,420 100%    1.86MB/s 0:00:02 (xfr#1, to-chk=0/1)

sent 4,719,597 bytes  received 35 bytes 629,284.27 bytes/sec

total size is 5,155,420  speedup is 1.09

[root@PG1bkp ~]#

[root@PG1bkp ~]# scp -i /home/sinsausti/.ssh/id_rsa /root/backups/BACKUP-5/base.tar.gz sinsausti@

base.tar.gz                                                                                                                                                             100% 2905KB 968.2KB/s 00:03

[root@PG1bkp ~]#

You can easily embed this into a script to perform an automatic backup process or use this product with an external system like ClusterControl to manage your backups.

Managing Your Backups with ClusterControl

In the same way that you can centralize the management for both database and backup from the same platform by using Cloud SQL, you can use ClusterControl for several management tasks related to your PostgreSQL database.

ClusterControl is a comprehensive management system for open source databases that automates deployment and management functions, as well as health and performance monitoring. ClusterControl supports deployment, management, monitoring and scaling for different database technologies and environments. So, you can, for example, create our Virtual Machine instance on Google Cloud, and deploy/import our database service with ClusterControl.

ClusterControl PostgreSQL

Creating a Backup

For this task, go to ClusterControl -> Select Cluster -> Backup -> Create Backup.

ClusterControl - Create Backup

You can create a new backup or configure a scheduled one. For our example, we will create a single backup instantly.

ClusterControl - Choose Backup Method

You must choose one method, the server from which the backup will be taken, and where you want to store the backup. You can also upload our backup to the cloud (AWS, Google or Azure) by enabling the corresponding button.

ClusterControl - Backup Configuration

Then specify the use of compression, the compression level, encryption and retention period for your backup.

ClusterControl - Cloud Credentials for Backup

If you enabled the upload backup to the cloud option, you will see a section to specify the cloud provider (in this case Google Cloud) and the credentials (ClusterControl -> Integrations -> Cloud Providers). For Google Cloud, it uses Cloud Storage, so you must select a Bucket or even create a new one to store your backups.

ClusterControl Backup Management

On the backup section, you can see the progress of the backup, and information like method, size, location, and more.


Google Cloud may be a good option to store your PostgreSQL backups and it offers different products to make this. It’s not, however, necessary to have your PostgreSQL databases running there as you can use it only as a storage location. 

The GSutil tool is a nice product for managing your Cloud Storage data from the command line, easy-to-use and fast. 

You can also combine Google Cloud and ClusterControl to improve your PostgreSQL high availability environment and monitoring system. If you want to know more about PostgreSQL on Google Cloud you can check our deep dive blog post.

by Sebastian Insausti at October 07, 2019 09:45 AM

October 06, 2019

Valeriy Kravchuk

Dynamic Tracing of MySQL Server With perf probe - Basic Example

I am going to write a series of blog posts based on my talks and experiences at Percona Live Europe 2019. The first one would be a kind of extended comment for a couple of slides from the "Tracing and Profiling MySQL" talk.

We can surely wait until Performance Schema instruments every other line of code or at least every important stage and wait in every storage engine we care about, but there is no real need for that. If you run any version of MySQL under Linux with more or less recent kernel (anything newer than 4.1 is good enough, in general), you can easily use dynamic tracing for any application (at least if there is symbolic information for the binaries), any time. As Brendan Gregg put it here:
"One benefit of dynamic tracing is that it can be enabled on a live system without restarting anything. You can take an already-running kernel or application and then begin dynamic instrumentation, which (safely) patches instructions in memory to add instrumentation. That means there is zero overhead or tax for this feature until you begin using it. One moment your binary is running unmodified and at full speed, and the next, it's running some extra instrumentation instructions that you dynamically added. Those instructions should eventually be removed once you've finished using your session of dynamic tracing."
One of the ways to use dynamic tracing (that is supported for a long time) is a perf profiler and its probe command. In the simplest case that I am going to illustrate here, probe is defined for a function defined in the code and refers to it by name. You can refer to the name of local variable, function parameter, local data structure member in the probe etc, and record the values of them with other probe data.

For a simple example let me consider recent Percona Server 5.7.x running on recent Ubuntu 16.04 with kernel 4.4.x. Let's assume I want to trace all calls to the dispatch_command() function and record every query every connection processes that way.

Skipping the details for now, let's assume I've found out (with gdb in my case, but it can be code review as well) that when this function is called I can see the query user wants to execute in the com_data structure passed via a pointer to the function:
(gdb) p com_data->com_query.query
$4 = 0x7fb0dba8d021 "select 2"
Based on this information and having -dbg package also installed for Percona Server I can add a probe dynamically any time using the following simple command (--add option is assumed by default):
openxs@ao756:~$ sudo perf probe -x /usr/sbin/mysqld 'dispatch_command com_data->com_query.query:string'
Added new event:
  probe_mysqld:dispatch_command (on dispatch_command in /usr/sbin/mysqld with query=com_data->com_query.query:string)

You can now use it in all perf tools, such as:

        perf record -e probe_mysqld:dispatch_command -aR sleep 1
In this probe I refer to the specific binary with -x option and full path name, and the function in that binary by name, and I say that I'd like to record the value of com_data->com_query.query as a zero-terminated string. Now I can use any variation of perf record command (with -F option to define sampling frequency, -g option to capture stack traces etc, see more here) and my probe will be one of the events captured.

For this simple example of tracing I'll use -e option to capture only the events related to the probe I defined. Probe name for this simple case by default consists of the binary name, colon (':') separator and function name. I'll use -R option to collect raw sample records and . I've also added -a option to collect samples on all CPUs. You can see the hint for possible command in the output above. 

So, I can record related events with default frequency as follows:
openxs@ao756:~$ sudo perf record -e 'probe_mysqld:dispatch_command*' -aR
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.676 MB (3 samples) ]
I let it work for some time in the foreground and then pressed Ctrl-C to stop collecting. Now I can check raw sample records with perf script command:
openxs@ao756:~$ sudo perf script >/tmp/queries.txt
openxs@ao756:~$ cat /tmp/queries.txt
          mysqld 31340 [001]  3888.849079: probe_mysqld:dispatch_command: (be9250) query="select 100"
          mysqld 31340 [001]  3891.648739: probe_mysqld:dispatch_command: (be9250) query="select user, host from mysql.user"
          mysqld 31340 [001]  3895.890141: probe_mysqld:dispatch_command: (be9250) query="select 2"
This is the detailed trace, with additional information (exact text of the query executed) added as requested. Output also included PID of the binary, CPU the sample was taken from and a timestamp.

When I am done with tracing, I can delete the probe with --del option referring it by name:
openxs@ao756:~$ sudo perf probe --del dispatch_commandRemoved event: probe_mysqld:dispatch_command
The (small, more on that later) overhead for tracing was added dynamically, only for the exact information I needed and only for the period of tracing. After the dynamic probe is removed we have exactly the same binary as originally started running with zero extra overhead. Now do this with Performance Schema :)

* * *
Slides are available at

More details on the way other tools mentioned during the talk can be used by MySQL DBAs are coming soon in this blog. Stay tuned!

by Valerii Kravchuk ( at October 06, 2019 05:59 PM

October 03, 2019


Tips for Storing MongoDB Backups in the Cloud

When it comes to backups and data archiving, IT departments are under pressure to meet stricter service level agreements, deliver more custom reports, and adhere to expanding compliance requirements while continuing to manage daily archive and backup tasks.  With no doubt, database server stores some of your enterprise’s most valuable information. Guaranteeing reliable database backups to prevent data loss in the event of an accident or hardware failure is a critical checkbox.

But how to make it truly DR when all of your data is in the single data center or even data centers that are in the near geolocation? Moreover, whether it is a 24x7 highly loaded server or a low-transaction-volume environment, you will be in the need of making backups a seamless procedure without disrupting the performance of the server in a production environment.

In this blog, we are going to review MongoDB backup to the cloud. The cloud has changed the data backup industry. Because of its affordable price point, smaller businesses have an offsite solution that backs up all of their data.

We will show you how to perform safe MongoDB backups using mongo services as well as other methods that you can use to extend your database disaster recovery options.

If your server or backup destination is located in an exposed infrastructure like a public cloud, hosting provider or connected through an untrusted WAN network, you need to think about additional actions in your backup policy. There are a few different ways to perform database backups for MongoDB, and depending on the type of backup, recovery time, size, and infrastructure options will vary. Since many of the cloud storage solutions are simply storage with different API front ends, any backup solution can be performed with a bit of scripting. So what are the options we have to make the process smooth and secure?

MongoDB Backup Encryption

Security should be in the center of every action IT teams do. It is always a good idea to enforce encryption to enhance the security of backup data. A simple use case to implement encryption is where you want to push the backup to offsite backup storage located in the public cloud.

When creating an encrypted backup, one thing to keep in mind is that it usually takes more time to recover. The backup has to be decrypted before any recovery activities. With a big dataset, this could introduce some delays to the RTO.

On the other hand, if you are using the private keys for encryption, make sure to store the key in a safe place. If the private key is missing, the backup will be useless and unrecoverable. If the key is stolen, all created backups that use the same key would be compromised as they are no longer secured. You can use popular GnuPG or OpenSSL to generate private or public keys.

To perform MongoDBdump encryption using GnuPG, generate a private key and follow the wizard accordingly:

$ gpg --gen-key

Create a plain MongoDBdump backup as usual:

$ mongodump –db db1 –gzip –archive=/tmp/db1.tar.gz
Encrypt the dump file and remove the older plain backup:
$ gpg --encrypt -r ‘’ db1.tar.gz

$ rm -f db1.tar.gz
GnuPG will automatically append .gpg extension on the encrypted file. To decrypt,

simply run the gpg command with --decrypt flag:

$ gpg --output db1.tar.gz --decrypt db1.tar.gz.gpg
To create an encrypted MongoDBdump using OpenSSL, one has to generate a private key and a public key:
OpenSSL req -x509 -nodes -newkey rsa:2048 -keyout dump.priv.pem -out

This private key (dump.priv.pem) must be kept in a safe place for future decryption. For Mongodump, an encrypted backup can be created by piping the content to openssl, for example

mongodump –db db1 –gzip –archive=/tmp/db1.tar.gz | openssl smime -encrypt -binary -text -aes256

-out database.sql.enc -outform DER
To decrypt, simply use the private key (dump.priv.pem) alongside the -decrypt flag:

openssl smime -decrypt -in database.sql.enc -binary -inform

DEM -inkey dump.priv.pem -out db1.tar.gz

MongoDB Backup Compression

Within the database cloud backup world, compression is one of your best friends. It can not only save storage space, but it can also significantly reduce the time required to download/upload data.

In addition to archiving, we’ve also added support for compression using gzip. This is exposed by the introduction of a new command-line option “--gzip” in both mongodump and mongorestore. Compression works both for backups created using the directory and the archive mode and reduces disk space usage.

Normally, MongoDB dump can have the best compression rates as it is a flat text file. Depending on the compression tool and ratio, a compressed MongoDBdump can be up to 6 times smaller than the original backup size. To compress the backup, you can pipe the MongoDBdump output to a compression tool and redirect it to a destination file

Having a compressed backup could save you up to 50% of the original backup size, depending on the dataset. 

mongodump --db country --gzip --archive=country.archive

Limiting Network Throughput

A great option for cloud backups is to limit network streaming bandwidth (Mb/s) when doing a backup. You can achieve that with pv tool. The pv utility comes with data modifiers option -L RATE, --rate-limit RATE which limit the transfer to a maximum of RATE bytes per second. Below example will restrict it to 2MB/s.

$ pv -q -L 2m

Transferring MongoDB Backups to the Cloud

Now when your backup is compressed and secured (encrypted), it is ready for transfer.

Google Cloud

The gsutil command-line tool is used to manage, monitor and use your storage buckets on Google Cloud Storage. If you already installed the gcloud util, you already have the gsutil installed. Otherwise, follow the instructions for your Linux distribution from here.

To install the gcloud CLI you can follow the below procedure:

curl | bash
Restart your shell:
exec -l $SHELL
Run gcloud init to initialize the gcloud environment:
gcloud init
With the gsutil command line tool installed and authenticated, create a regional storage bucket named MongoDB-backups-storage in your current project.
gsutil mb -c regional -l europe-west1 gs://severalnines-storage/

Creating gs://MongoDB-backups-storage/

Amazon S3

If you are not using RDS to host your databases, it is very probable that you are doing your own backups. Amazon’s AWS platform, S3 (Amazon Simple Storage Service) is a data storage service that can be used to store database backups or other business-critical files. Either it’s Amazon EC2 instance or your on-prem environment you can use the service to secure your data.

While backups can be uploaded through the web interface, the dedicated s3 command line interface can be used to do it from the command line and through backup automation scripts. If backups are to be kept for a very long time, and recovery time isn’t a concern, backups can be transferred to Amazon Glacier service, providing much cheaper long-term storage. Files (amazon objects) are logically stored in a huge flat container named bucket. S3 presents a REST interface to its internals. You can use this API to perform CRUD operations on buckets and objects, as well as to change permissions and configurations on both.

The primary distribution method for the AWS CLI on Linux, Windows, and macOS is pip, a package manager for Python. Instructions can be found here.

aws s3 cp severalnines.sql s3://severalnine-sbucket/MongoDB_backups
By default, S3 provides eleven 9s object durability. It means that if you store (1 billion) objects into it, you can expect to lose 1 object every 10 years on average. The way S3 achieves that an impressive number of 9s is by replicating the object automatically in multiple Availability Zones, which we’ll talk about in another post. Amazon has regional data centers all around the world.

Microsoft Azure Storage

Microsoft’s public cloud platform, Azure, has storage options with its control line interface. Information can be found here. The open-source, cross-platform Azure CLI provides a set of commands for working with the Azure platform. It gives much of the functionality seen in the Azure portal, including rich data access.

The installation of Azure CLI is fairly simple, you can find instructions here. Below you can find how to transfer your backup to Microsoft storage.

az storage blob upload --container-name severalnines --file severalnines.gz.tar --name severalnines_backup

Hybrid Storage for MongoDB Backups

With the growing public and private cloud storage industry, we have a new category called hybrid storage. The typical approach is to keep data on local disk drives for a shorter period while cloud backup storage would be held for a longer time. Many times the requirement for longer backup retention comes from legal obligations for different industries (like telecoms having to store connection metadata).This technology allows the files to be stored locally, with changes automatically synced to remote in the cloud. Such an approach is coming from the need of having recent backups stored locally for fast restore (lower RTO), as well as business continuity objectives.

The important aspect of efficient resource usage is to have separate backup retentions. Data that is stored locally, on redundant disk drives would be kept for a shorter period while cloud backup storage would be held for a longer time. Many times the requirement for longer backup retention comes from legal obligations for different industries (like telecoms having to store connection metadata).

Cloud providers like Google Cloud Services, Microsoft Azure and Amazon S3 each offer virtually unlimited storage, decreasing local space needs. It allows you to retain your backup files longer, for as long as you would like and not have concerns around local disk space.

ClusterControl Backup Management - Hybrid Storage

When scheduling backup with ClusterControl, each of the backup methods are configurable with a set of options on how you want the backup to be executed. The most important for the hybrid cloud storage would be:

  • Network throttling
  • Encryption with the built-in key management
  • Compression
  • The retention period for the local backups
  • The retention period for the cloud backups
ClusterControl Backup and Restore Bartłomiej Oleś Bartłomiej Oleś 9:06 AM Today ClusterControl Encryption

ClusterControl advanced backup features for cloud, parallel compression, network bandwidth limit, encryption, etc. Your company can take advantage of cloud scalability and pay-as-you-go pricing for growing storage needs. You can design a backup strategy to provide both local copies in the datacenter for immediate restoration, and a seamless gateway to cloud storage services from AWS, Google and Azure.

ClusterControl with upload to backup
ClusterControl Encryption
ClusterControl Encryption

Advanced TLS and AES 256-bit encryption and compression features support secure backups that take up significantly less space in the cloud.

by Bart Oles at October 03, 2019 05:29 PM

October 02, 2019

Federico Razzoli

Case sensitivity in MySQL and MariaDB queries

Maybe you’re wondering why in MySQL/MariaDB 'string' seems to be the same as 'STRING'. Or maybe that’s not the case for you, but you would like to make a case insensitive search. This article explains how to write a case ...

by Federico Razzoli at October 02, 2019 11:30 AM


Creating a PostgreSQL Replication Setup on Debian / Ubuntu

PostgreSQL can work separately on multiple machines with the same data structure, making the persistence layer of the application more resilient and prepared for some unexpected event that might compromise the continuity of the service.

The idea behind this is to improve the system response time by distributing the requests in a “Round Robin” network where each node present is a cluster. In this type of setup it is not important as to which one the requests will be delivered to be processed, as the response would always be the same.

In this blog, we will explain how to replicate a PostgreSQL cluster using the tools provided in the program installation. The version used is PostgreSQL 11.5, the current stable,  generally-available version for the operating system Debian Buster. For the examples in this blog it is assumed that you are already familiar with Linux.

PostgreSQL Programs

Inside the directory /usr/bin/ is the program responsible for managing the cluster.

# 1. Lists the files contained in the directory
# 2. Filters the elements that contain 'pg_' in the name
ls /usr/bin/ | grep pg_

Activities conducted through these programs can be performed sequentially, or even in combination with other programs. Running a block of these activities through a single command is possible thanks to a Linux program found in the same directory, called make.

To list the clusters present use the pg_lsclusters program. You can also use make to run it. Its work depends on a file named Makefile, which needs to be in the same directory where the command will run.

# 1. The current directory is checked

# 2. Creates a directory
mkdir ~/Documents/Severalnines/

# 3. Enroute to the chosen directory
cd ~/Documents/Severalnines/

# 4. Create the file Makefile
touch Makefile

# 5. Open the file for editing

The definition of a block is shown below, having as its name ls, and a single program to be run, pg_lsclusters.

# 1. Block name
# 2. Program to be executed

The file Makefile can contain multiple blocks, and each can run as many programs as you need, and even receive parameters. It is imperative that the lines belonging to a block of execution are correct, using tabulations for indenting instead of spaces.

The use of make to run the pg_lsclusters program is accomplished by using the make ls command.

# 1. Executes pg_lsclusters
make ls

The result obtained in a recent PostgreSQL installation brings a single cluster called main, allocated on port 5432 of the operating system. When the pg_createcluster program is used, a new port is allocated to the new cluster created, having the value 5432 as the starting point, until another is found in ascending order.

Write Ahead Logging (WAL)

This replication procedure consists of making a backup of a working cluster which is continuing to receive updates. If this is done on the same machine, however, many of the benefits brought by this technique are lost.

Scaling a system horizontally ensures greater availability of the service, as if any hardware problems occur, it wouldn’t make much difference as there are other machines ready to take on the workload.

WAL is the term used to represent an internal complex algorithm to PostgreSQL that ensures the integrity of the transactions that are made on the system. However, only a single cluster must have the responsibility to access it with write permission.

The architecture now has three distinct types of clusters:

  1. A primary with responsibility for writing to WAL;
  2. A replica ready to take over the primary post;
  3. Miscellaneous other replicas with WAL reading duty.

Write operations are any activities that are intended to modify the data structure, either by entering new elements, or updating and deleting existing records.

PostgreSQL Cluster Configuration

Each cluster has two directories, one containing its configuration files and another with the transaction logs. These are located in /etc/postgresql/11/$(cluster) and /var/lib/postgresql/11/$(cluster), respectively (where $(cluster) is the name of the cluster).

The file postgresql.conf is created immediately after the cluster has been created by running the program pg_createcluster, and the properties can be modified for the customization of a cluster.

Editing this file directly is not recommended because it contains almost all properties. Their values have been commented out, having the symbol # at the beginning of each line, and several other lines commented out containing instructions for changing the property values.

Adding another file containing the desired changes is possible, simply edit a single property named include, replacing the default value #include = ‘’ with include = ‘postgresql.replication.conf’.

Before you start the cluster, you need the presence of the file postgresql.replication.conf in the same directory where you find the original configuration file, called postgresql.conf.

# 1. Block name
# 2. Creates the cluster
pg_createcluster 11 $(cluster) -- --data-checksums
# 3. Copies the file to the directory
cp postgresql.replication.conf /etc/postgresql/11/$(cluster)/
# 4. A value is assigned to the property
sed -i "s|^#include = ''|include = 'postgresql.replication.conf'|g" /etc/postgresql/11/$(cluster)/postgresql.conf

The use of --data-checksums in the creation of the cluster adds a greater level of integrity to the data, costing a bit of performance but being very important in order to avoid corruption of the files when transferred from one cluster to another.

The procedures described above can be reused for other clusters, simply passing a value to $(cluster) as a parameter in the execution of the program make.

# 1. Executes the block 'create' by passing a parameter
sudo make create cluster=primary

Now that a brief automation of the tasks has been established, what remains to be done is the definition of the file postgresql.replication.conf according to the need for each cluster.

Replication on PostgreSQL

Two ways to replicate a cluster are possible, one being complete the other involving the entire cluster (called Streaming Replication) and another could partial or complete (called Logical Replication).

The settings that must be specified for a cluster fall into four main categories:

  • Master Server
  • Standby Servers
  • Sending Servers
  • Subscribers

As we saw earlier, WAL is a file that contains the transactions that are made on the cluster, and the replication is the transmission of these files from one cluster to another.

Inside the settings present in the file postgresql.conf, we can see properties that define the behavior of the cluster in relation to the WAL files, such as the size of those files.

# default values
max_wal_size = 1GB
min_wal_size = 80MB

Another important property called max_wal_senders. Belonging to a cluster with characteristic Sending Servers, is the amount of processes responsible for sending these files to other clusters, having to always a value more than the number of clusters that depend on their receipt.

WAL files can be stored for transmission to a cluster that connects late, or that has had some problems in receiving it, and need previous files in relation to the current time, having the property wal_keep_segments as the specification for how many WAL file segments are to be maintained by a cluster.

A Replication Slot is a functionality that allows the cluster to store WAL files needed to provide another cluster with all the records, having the max_replication_slots option as its property.

# default values
max_wal_senders = 10
wal_keep_segments = 0
max_replication_slots = 10

When the intention is to outsource the storage of these WAL files, another method of processing these files can be used, called Continuous Archiving.

Continuous Archiving

This concept allows you to direct the WAL files to a specific location, using a Linux program, and two variables representing the path of the file, and its name, such as %p, and %f, respectively.

This property is disabled by default, but its use can be easily implemented by withdrawing the responsibility of a cluster from storing such important files, and can be added to the file postgresql.replication.conf.

# 1. Creates a directory
mkdir ~/Documents/Severalnines/Archiving

# 2. Implementation on postgresql.replication.conf
archive_mode = on
archive_command = 'cp %p ~/Documents/Severalnines/Archiving/%f'

# 3. Starts the cluster
sudo systemctl start postgresql@11-primary

After the cluster initialization, some properties might need to be modified, and a cluster restart could be required. However, some properties can only be reloaded, without the need for a full reboot of a cluster.

Information on such subjects can be obtained through the comments present in the file postgresql.conf, appearing as # (note: change requires restart).

If this is the case, a simple way to resolve is with the Linux program systemctl, used previously to start the cluster, having only to override the option to restart.

When a full reboot is not required, the cluster itself can reassign its properties through a query run within itself, however, if multiple clusters are running on the same machine, it will be required to pass a parameter containing the port value that the cluster is allocated on the operating system.

# Reload without restarting
sudo -H -u postgres psql -c ‘SELECT pg_reload_conf();’ -p 5433

In the example above, the property archive_mode requires a reboot, while archive_command does not. After this brief introduction to this subject, let’s look at how a replica cluster can backup these archived WAL files, using Point In Time Recovery (PITR).

PostgreSQL Replication Point-In-Time Recovery

This suggestive name allows a cluster to go back to its state from a certain period in time. This is done through a property called recovery_target_timeline, which expects to receive a value in date format, such as 2019-08-22 12:05 GMT, or the assignment latest, informing the need for a recovery up to the last existing record.

The program pg_basebackup when it runs, makes a copy of a directory containing the data from a cluster to another location. This program tends to receive multiple parameters, being one of them -R, which creates a file named recovery.conf within the copied directory, which in turn is not the same as that contains the other configuration files previously seen, such as postgresql.conf.

The file recovery.conf stores the parameters passed in the execution of the program pg_basebackup, and its existence is essential to the Streaming Replication implementation, because it is within it that the reverse operation to the Continuous Archiving can be performed.

# 1. Block name
# 2. Removes the current data directory
rm -rf /var/lib/postgresql/11/$(replica)
# 3. Connects to primary cluster as user postgres
# 4. Copies the entire data directory
# 5. Creates the file recovery.conf
pg_basebackup -U postgres -d postgresql://localhost:$(primaryPort) -D /var/lib/postgresql/11/$(replica) -P -R
# 6. Inserts the restore_command property and its value
echo "restore_command = 'cp ~/Documents/Severalnines/Archiving/%f %p'" >> /var/lib/postgresql/11/$(replica)/recovery.conf
# 7. The same is done with recovery_target_timeline
echo "recovery_target_timeline = 'latest'" >> /var/lib/postgresql/11/$(replica)/recovery.conf

This replicate block specified above needs to be run by the operating system’s postgres user, in order to avoid potential conflicts with who is the owner of the cluster data, postgres, or the user root.

The replica cluster is still standing, basting it to successfully start the replication, having the replica cluster process called pg_walreceiver interacting with the primary cluster called pg_walsender over a TCP connection.

# 1. Executes the block ‘replicate’ by passing two parameters
sudo -H -u postgres make replicate replica=replica primaryPort=5433
# 2. Starts the cluster replica
sudo systemctl start postgresql@11-replica

Verification of the health of this replication model, called Streaming Replication, is performed by a query that is run on the primary cluster.

# 1. Checks the Streaming Replication created
sudo -H -u postgres psql -x -c ‘select * from pg_stat_replication;’ -p 5433


In this blog, we showed how to setup asynchronous Streaming Replication between two PostgreSQL clusters. Remember though, vulnerabilities exist in the code above, for example, using the postgres user to do such a task is not recommended.

The replication of a cluster provides several benefits when it is used in the correct way and has easy access to the APIs that come to interact with the clusters.


by Severalnines at October 02, 2019 09:45 AM

Colin Charles

Percona Live Europe Amsterdam Day 1 notes

Percona Live Europe Amsterdam Day 1 was a bunch of fun, especially since I didn’t have to give a talk or anything since my tutorial was over on Day 0.

At lunch, I realised that there are a lot more fringe events happening around Percona Live… and if you’ve seen how people do “tech weeks”, maybe this is what the event ends up being – a show, plus plenty of focused satellite events. FOSDEM in the open source world totally gets this, and best of all, also lists fringe events (see example from 2019).

So, Thursday evening gets a few fringe events, a relatively short train ride away:

Anyway, what was Day 1 like? Keynotes started the morning, and I did make a Twitter thread. It is clear that there is a lot of talk amongst companies that make open source software, and companies in the ecosystem that somehow also derive great value from it. Some look at this as the great cloud vendors vs open source software vendors debate, but this isn’t necessarily always the case – we’ve seen this as Percona’s model too. And we’ve seen cloud companies contribute back (again, just like Percona). Guess this is a topic for a different post, because there are always two sides to this situation…

It is also clear that people want permissive open source licenses over anything source available. If you’re a CxO looking for software, it would be almost irresponsible to be using critical software that is just source available with a proprietary license. After all, what happens when the company decides to ask for more money? (Companies change ownership too, you know).

It is probably clear the best strategies are the “multi” (or hybrid) strategies. Multiple databases, multiple clouds, and going all in on open source to avoid vendor lock-in. Of course, don’t forget that open source software also can have “vendor lock-in” – always look at the health metrics of a project, vs. a product. We’re lucky in the MySQL ecosystem that we have not just the excellent work of Oracle, but also MariaDB Corporation / MariaDB Foundation and also Percona.

MySQL 8.0 adoption is taking off, with about 26% of the users on it. Those on MySQL 5.6 still seem to be on it, and there has been a decrease in 5.7 use to grow that 8.0 pie. It isn’t clear how these stats are generated (since there is no “phone home” functionality in MySQL; also the MariaDB Server variant doesn’t get as much usage as one would like), but maybe it is via download numbers?

Anyone paying any attention to MySQL 8 will know that they have switched to a “continuous delivery model”, also known as, you get new features in every point release. So the latest 8.0.18 gets EXPLAIN ANALYZE, and while we can’t try it yet (not released, and the documentation isn’t updated), I expect it will be fairly soon. I am eager to try this, because MariaDB Server has had ANALYZE since 10.1 (GA – Oct 2015). And it wasn’t long ago that MySQL received CHECK constraints support (8.0.16). Also the CLONE plugin in 8.0.17 warrants some checking/usage!

Besides all the hallway chats and meetings I did manage to get into a few sessions… Rakuten Intelligence talked about their usage of ProxySQL, and one thing was interesting with regard to their future plans slide – they do consider group replication but they wonder what would replace their custom HA software? But most importantly they wonder if it is stable and which companies have successfully deployed it, because they don’t want to be the first. Question from the floor about Galera Cluster came up, and they said they had one app that required XA support – looks like something to consider once Galera 4 is fully baked!

The PXC–8 talk was also chock full of information, delivered excellently, and something to try soon (it wasn’t quite available yesterday, but today I see a release announcement: Experimental Binary of Percona XtraDB Cluster 8.0).

I enjoyed the OpenCorporates use case at the end too. From the fact that for them, being on-premise would be cheaper than the cloud, how they use ProxySQL, Galera Cluster branch Percona XtraDB Cluster (PXC), and ZFS. ZFS is not the most common filesystem for MySQL deployments, so it was interesting to see what could be achieved.

Then there was the party and boy, did they outdo themselves. We had a menu, multi-course meal with wine pairings, and a lot of good conversation. A night wouldn’t be complete without some Salmiakkikossu, and Monty sent some over for us to enjoy.

Food at the Hilton has been great too (something I would never really want to say, considering I’m not a fan of the hotel chain) – even the coffee breaks are well catered for. I think maybe this has been the best Percona Live in terms of catering, and I’ve been to a lot of them (maybe all…). I have to give much kudos to Bronwyn and Lorraine at Percona for the impeccable organisation. The WiFi works a charm as well. On towards Day 2!

by Colin Charles at October 02, 2019 06:51 AM

October 01, 2019


What’s New in MySQL Galera Cluster 4.0

MySQL Galera Cluster 4.0 is the new kid on the database block with very interesting new features. Currently it is available only as a part of MariaDB 10.4 but in the future it will work as well with MySQL 5.6, 5.7 and 8.0. In this blog post we would like to go over some of the new features that came along with Galera Cluster 4.0.

Galera Cluster Streaming Replication

The most important new feature in this release is streaming replication. So far the certification process for the Galera Cluster worked in a way that whole transactions had to be certified after they completed. 

This process was not ideal in several scenarios...

  1. Hotspots in tables, rows which are very frequently updated on multiple nodes - hundreds of fast transactions running on multiple nodes, modifying the same set of rows result in frequent deadlocks and rollback of transactions
  2. Long running transactions - if a transaction takes significant time to complete, this seriously increases chances that some other transaction, in the meantime, on another node, may modify some of the rows that were also updated by the long transaction. This resulted in a deadlock during certification and one of the transactions having to be rolled back.
  3. Large transactions - if a transaction modifies a significant number of rows, it is likely that another transaction, at the same time, on a different node, will modify one of the rows already modified by the large transaction. This results in a deadlock during certification and one of the transactions has to be rolled back. In addition to this, large transactions will take additional time to be processed, sent to all nodes in the cluster and certified. This is not an ideal situation as it adds delay to commits and slows down the whole cluster.

Luckily, streaming replication can solve these problems. The main difference is that the certification happens in chunks where there is no need to wait for the whole transaction to complete. As a result, even if a transaction is large or long, majority (or all, depending on the settings we will discuss in a moment) of rows are locked on all of the nodes, preventing other queries from modifying them.

MySQL Galera Cluster Streaming Replication Options

There are two configuration options for streaming replication: 


This tells how big a fragment should be (by default it is set to 0, which means that the streaming replication is disabled)


This tells what the fragment really is. By default it is bytes, but it can also be a ‘statements’ or ‘rows’. 

Those variables can (and should) be set on a session level, making it possible for user to decide which particular query should be replicated using streaming replication. Setting unit to ‘statements’ and size to 1 allow, for example, to use streaming replication just for a single query which, for example, updates a hotspot.

You can configure Galera 4.0 to certify every row that you have modified and grab the locks on all of the nodes while doing so. This makes streaming replication great at solving problems with frequent deadlocks which, prior to Galera 4.0, were possible to solve only by redirecting all writes to a single node.

WSREP Tables

Galera 4.0 introduces several tables, which will help to monitor the state of the cluster:

  • wsrep_cluster
  • wsrep_cluster_members
  • wsrep_streaming_log

All of them are located in the ‘mysql’ schema. wsrep_cluster will provide insight into the state of the cluster. wsrep_cluster_members will give you information about the nodes that are part of the cluster. wsrep_streaming_log helps to track the state of the streaming replication.

Galera Cluster Upcoming Features

Codership, the company behind the Galera, isn’t done yet. We were able to get a preview of the roadmap  from CEO, Seppo Jaakola which was given at Percona Live earlier this year. Apparently, we are going to see features like XA transaction support and gcache encryption. This is really good news. 

Support for XA transactions will be possible thanks to the streaming replication. In short, XA transactions are the distributed transactions which can run across multiple nodes. They utilize two-phase commit, which requires to first acquire all required locks to run the transaction on all of the nodes and then, once it is done, commit the changes. In previous versions Galera did not have means to lock resources on remote nodes, with streaming replication this has changed.

Gcache is a file which stores writesets. Its contents are sent to joiner nodes which asks for a data transfer. If all data is stored in the gcache, joiner will receive just the missing transactions in the process called Incremental State Transfer (IST). If gcache does not contain all required data, State Snapshot Transfer (SST) will be required and the whole dataset will have to be transferred to the joining node. 

Gcache contains information about recent changes, therefore it’s great to see its contents encrypted for better security. With better security standards being introduced through more and more regulations, it is crucial that the software will become better at achieving compliance.


We are definitely looking forward to see how Galera Cluster 4.0 will work out on databases than MariaDB. Being able to deploy MySQL 5.7 or 8.0 with Galera Cluster will be really great. After all, Galera is one of the most widely tested synchronous replication solutions that are available on the market.

by krzysztof at October 01, 2019 09:45 AM

September 30, 2019

Colin Charles

ProxySQL Technology Day Ghent 2019

Just delivered a tutorial on MariaDB Server 10.4. Decided to take a closer look at the schedule for Percona Live Europe Amsterdam 2019 and one thing is clear: feels like there should also be a ProxySQL tutorial, largely because at mine, I noticed like 20% of the folk saying they use it.

Seems like there are 2 talks about it though, one about real world usage on Oct 1, and one about firewall usage with AWS, given by Marco Tusa on Oct 2.

Which led me to the ProxySQL Technology Day 2019 in Ghent, Belgium. October 3 2019. 2 hour train ride away from Amsterdam Schipol (the airport stop). It is easy to grab a ticket at Schipol Plaza, first class is about €20 more per way than second class, and a good spot to stay could be the Ibis Budget Dampoort (or the Marriott Ghent). Credit card payments accepted naturally, and I’m sure you can also do this online. Didn’t take me longer than five minutes to get all this settled.

So, the ProxySQL Technology Day is free, seems extremely focused and frankly is refreshing because you just learn about one thing! I feel like the MySQL world misses out on this tremendously as we lost the users conference… Interesting to see if this happens more in our ecosystem!

by Colin Charles at September 30, 2019 02:09 PM


How to Troubleshoot MySQL Database Issues

As soon as you start running a database server and your usage grows, you are exposed to many types of technical problems, performance degradation, and database malfunctions.  Each of these could lead to much bigger problems, such as catastrophic failure or data loss. It’s like a chain reaction, where one thing can lead to another, causing more and more issues. Proactive countermeasures must be performed in order for you to have a stable environment as long as possible.

In this blog post, we are going to look at a bunch of cool features offered by ClusterControl that can greatly help us troubleshoot and fix our MySQL database issues when they happen.

Database Alarms and Notifications

For all undesired events, ClusterControl will log everything under Alarms, accessible on the Activity (Top Menu) of ClusterControl page. This is commonly the first step to start troubleshooting when something goes wrong. From this page, we can get an idea on what is actually going on with our database cluster:

ClusterControl Database Alarms

The above screenshot shows an example of a server unreachable event, with severity CRITICAL, detected by two components, Network and Node. If you have configured the email notifications setting, you should get a copy of these alarms in your mailbox. 

When clicking on the “Full Alarm Details,” you can get the important details of the alarm like hostname, timestamp, cluster name and so on. It also provides the next recommended step to take. You can also send out this alarm as an email to other recipients configured under the Email Notification Settings. 

You may also opt to silence an alarm by clicking the “Ignore Alarm” button and it will not appear in the list again. Ignoring an alarm might be useful if you have a low severity alarm and know how to handle or work around it. For example if ClusterControl detects a duplicate index in your database, where in some cases would be needed by your legacy applications.

By looking at this page, we can obtain an immediate understanding of what is going on with our database cluster and what the next step is to do to solve the problem. As in this case, one of the database nodes went down and became unreachable via SSH from the ClusterControl host. Even a beginner SysAdmin would now know what to do next if this alarm appears.

Centralized Database Log Files

This is where we can drill down what was wrong with our database server. Under ClusterControl -> Logs -> System Logs, you can see all log files related to the database cluster. As for MySQL-based database cluster, ClusterControl pulls the ProxySQL log, MySQL error log and backup logs:

ClusterControl System Logs

Click on "Refresh Log" to retrieve the latest log from all hosts that are accessible at that particular time. If a node is unreachable, ClusterControl will still view the outdated log in since this information is stored inside the CMON database. By default ClusterControl keeps retrieving the system logs every 10 minutes, configurable under Settings -> Log Interval. 

ClusterControl will trigger the job to pull the latest log from each server, as shown in the following "Collect Logs" job:

ClusterControl Database Job Details

A centralized view of log file allows us to have faster understanding on what went wrong. For a database cluster which commonly involves multiple nodes and tiers, this feature will greatly improve the log reading where a SysAdmin can compare these logs side-by-side and pinpoint critical events, reducing the total troubleshooting time. 

Web SSH Console

ClusterControl provides a web-based SSH console so you can access the DB server directly via the ClusterControl UI (as the SSH user is configured to connect to the database hosts). From here, we can gather much more information which allows us to fix the problem even faster. Everyone knows when a database issue hits the production system, every second of downtime counts.

To access the SSH console via web, simply pick the nodes under Nodes -> Node Actions -> SSH Console, or simply click on the gear icon for a shortcut:

ClusterControl Web SSH Console Access

Due to security concern that might be imposed with this feature, especially for multi-user or multi-tenant environment, one can disable it by going to /var/www/html/clustercontrol/bootstrap.php on ClusterControl server and set the following constant to false:

define('SSH_ENABLED', false);

Refresh the ClusterControl UI page to load the new changes.

Database Performance Issues

Apart from monitoring and trending features, ClusterControl proactively sends you various alarms and advisors related to database performance, for example:

  • Excessive usage - Resource that passes certain thresholds like CPU, memory, swap usage and disk space.
  • Cluster degradation - Cluster and network partitioning.
  • System time drift - Time difference among all nodes in the cluster (including ClusterControl node).
  • Various other MySQL related advisors:
    • Replication - replication lag, binlog expiration, location and growth
    • Galera - SST method, scan GRA logfile, cluster address checker
    • Schema check - Non-transactional table existance on Galera Cluster.
    • Connections - Threads connected ratio
    • InnoDB - Dirty pages ratio, InnoDB log file growth
    • Slow queries - By default ClusterControl will raise an alarm if it finds a query running for more than 30 seconds. This is of course configurable under Settings -> Runtime Configuration -> Long Query.
    • Deadlocks - InnoDB transactions deadlock and Galera deadlock.
    • Indexes - Duplicate keys, table without primary keys.

Check out the Advisors page under Performance -> Advisors to get the details of things that can be improved as suggested by ClusterControl. For every advisor, it provides justifications and advice as shown in the following example for "Checking Disk Space Usage" advisor:

ClusterControl Disk Space Usage Check

When a performance issue occurs you will get "Warning" (yellow) or "Critical" (red) status on these advisors. Further tuning is commonly required to overcome the problem. Advisors raise alarms, which means, users will get a copy of these alarms inside the mailbox if Email Notifications are configured accordingly. For every alarm raised by ClusterControl or its advisors, users will also get an email if the alarm has been cleared. These are pre-configured within ClusterControl and require no initial configuration. Further customization is always possible under Manage -> Developer Studio. You can check out this blog post on how to write your own advisor.

ClusterControl also provides a dedicated page in regards to database performance under ClusterControl -> Performance. It provides all sorts of database insights following the best-practices like centralized view of DB Status, Variables, InnoDB status, Schema Analyzer, Transaction Logs. These are pretty self-explanatory and straightforward to understand.

For query performance, you can inspect Top Queries and Query Outliers, where ClusterControl highlights queries which performed significantly differ from their average query. We have covered this topic in detail in this blog post, MySQL Query Performance Tuning.

Database Error Reports

ClusterControl comes with an error report generator tool, to collect debugging information about your database cluster to help understand the current situation and status. To generate an error report, simply go to ClusterControl -> Logs -> Error Reports -> Create Error Report:

ClusterControl Database Error Reports

The generated error report can be downloaded from this page once ready. This generated report will be in TAR ball format (tar.gz) and you may attach it to a support request. Since the support ticket has the limit of 10MB of file size, if the tarball size is bigger than that, you could upload it into a cloud drive and only share with us the download link with proper permission. You may remove it later once we already got the file. You can also generate the error report via command line as explained in the Error Report documentation page.

In the event of an outage, we highly recommend that you generate multiple error reports during and right after the outage. Those reports will be very useful to try to understand what went wrong, the consequences of the outage, and to verify that the cluster is in-fact back to operational status after a disastrous event.


ClusterControl proactive monitoring, together with a set of troubleshooting features, provide an efficient platform for  users to troubleshoot any kind of MySQL database issues. Long gone is the legacy way of troubleshooting where one has to open multiple SSH sessions to access multiple hosts and execute multiple commands repeatedly in order to pinpoint the root cause.

If the above mentioned features are not helping you in solving the problem or troubleshooting the database issue, you always contact the Severalnines Support Team to back you up. Our 24/7/365 dedicated technical experts are available to attend your request at anytime. Our average first reply time is usually less than 30 minutes.

by ashraf at September 30, 2019 09:48 AM

September 27, 2019


Failover & Failback for PostgreSQL on Microsoft Azure

It’s pretty common to use the cloud to store your data or as a failover option in the case of master failure. There are several cloud providers which allow you to store, manage, retrieve, and manipulate data via a cloud platform; accessible over the internet. Each cloud provider has its own product offerings and unique features, each with different cost models. 

Microsoft Azure is one of these could providers. In this blog, we’ll take a look at what features Microsoft Azure offers for primary storage, as a disaster recovery site, and specifically look at how it handles a mixed PostgreSQL database environment.

Deploying a PostgreSQL Database Instance on Microsoft Azure

Before performing this task, you need to decide how you will use this instance and which Azure product is best for you. There are two basic ways to deploy a PostgreSQL instance on Microsoft Azure.

Microsoft Azure Marketplace
  1. Azure Database for PostgreSQL: Is a managed service that you can use to run, manage, and scale highly-available PostgreSQL databases in the cloud. It’s available in two deployment options: Single Server and Hyperscale.
  2. Virtual Machine: Provides an on-demand, high-scale, secure, virtualized infrastructure. It has support for Ubuntu Server, RedHat Enterprise Linux, SUSE Linux Enterprise Server, CentOS, Debian, and Windows Server and it allows you to develop, test, run applications, and extend your datacenter in just a few seconds.

For this blog we will take a look at both how we can create an Azure Database for PostgreSQL and use a Virtual Machine Azure from the Microsoft Azure Portal.

Deploying Azure Database for PostgreSQL

If you go to your Azure Portal -> Create a Resource -> Databases -> Azure Database for PostgreSQL, you’ll be able to choose between Single Server or Hyperscale. For this blog, we’ll use a Single Server, as the Hyperscale option is on preview and it doesn’t offer an SLA yet.

Azure Database for PostgreSQL

Here you need to add some information about your new PostgreSQL instance; such as subscription, server name, user credentials, and location. You can also choose which PostgreSQL version to use (9.5, 9.6, 10 or 11 versions are currently available) and the virtual hardware to run it (Compute + Storage).

Azure Database for PostgreSQL

When you specify the hardware, you’ll see the estimated price in real-time. This is really useful to avoid a big surprise next month. After this step, you just have to confirm the resource configuration and wait a couple minutes until Azure finishes the creation job.

When you have the new resource created, you can go to All Resources to see the resource options available.

Azure Database for PostgreSQL

In the created resource options, you can go to Replication to enable it and replicate from the master server to up to five replicas. You should also check the Connection Security section to enable or disable external access. To know the access information, you must visit the overview resource section.

$ psql -h -U severalnines@pg1blog postgres

Password for user severalnines@pg1blog:

psql (11.5, server 11.4)

SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)

Type "help" for help.


Failover on Azure Database for PostgreSQL

Unfortunately, automated failover between master and replica servers is not available. If you delete the master instance, however, Azure will perform a failover process to promote the replica in an automatic way.

Failover on Azure Database for PostgreSQL

There is an option to perform this failover task manually, which requires you to stop the replica and configure the new endpoint into your application to point to the new master. The replica will be promoted and delinked from the master. There is not a way to relink this replica to your master again.

Deploying PostgreSQL on Azure Virtual Machine

If you go to your Azure Portal -> Create a Resource -> Compute -> Virtual Machine, you’ll open the Create a virtual machine section where you can specify different configurations for your new Azure Virtual Machine.

Azure Create a Virtual Machine

In the basic tab, you must specify the Azure subscription, Region, Availability options, Operating System, Server Size, access credentials (username/password or SSH Key), and inbound firewall rules.

Azure Create a Virtual Machine - Disk Options

In the disk tab, you must specify the storage (type and size) for your new virtual machine. The disk type can be Standard HDD, Standard SSD, or Premium SSD. The last one is recommended for high IOPS workloads.

Azure Create a Virtual Machine - Networking

In the networking tab, you can specify the virtual network, public IP address, and the allowed inbound ports. You can also add this new virtual machine behind an exiting Azure load balancing solution.

Azure Create a Virtual Machine Management Settings

In the next tab, we have some management options, like monitoring and backups. 

Azure Create a Virtual Machine - Advanced Settings

And finally, in the advanced tab, we can add extensions, cloud-init, or host groups.

After reviewing the previous option and confirming it, you’ll have your new virtual machine created and accessible from the Azure Portal. In the Resource -> Overview section, you can see the virtual machine access information (Public/Private IP Address).

Azure Resource Overview

Now, you can access it via SSH and install the PostgreSQL database using ClusterControl.

$ ssh

Last login: Mon Sep 23 21:33:27 2019

[sinsausti@vm1test ~]$

You can check this link to see the steps to perform the PostgreSQL deployment with ClusterControl.

PostgreSQL Failover on Azure Virtual Machine

Disaster recovery is a Virtual Machine feature under the Operations section that allows you to replicate your environment in another Azure region. To enable it, you need to choose the target region. In the advanced tab, you can modify the specific target details; such as virtual network, storage settings, and replication settings.

Microsoft Azure Disaster Recovery

When the disaster recovery is enabled, you’ll be able to check the replication status, test the failover process, or manually failover to it.

Microsoft Azure Disaster Recovery

Enabling this allows you to have a failover option in the case of failure. This, however, will be a failover for entire environment and not just the database service.

An Improved PostgreSQL Failover Process for Microsoft Azure

As you have SSH access, you can improve this failover process by importing the virtual machine (or even deploying the PostgreSQL database) with ClusterControl.

If you’re managing the database nodes with ClusterControl (and if the “Auto Recovery” option is ON) in the case of master failure, ClusterControl will promote the most advanced slave (if it is not blacklisted) to master as well as notify you of the problem. It also automatically fails over the rest of the slaves to replicate from the new master.

With ClusterControl, you even also deploy a mixed environment with some nodes on the cloud and other nodes on-prem. You can also add load balancers to your topology to improve our high availability environment. You can find more information about this topic here.


Azure has a lot of features and products to offer an enterprise-level solution. During these tests, however, the main issue I found was that the time to creation and failover was too lengthy for most application needs.

If you need a fast failover and recovery, you should improve the availability of the environment by using a load balancer, or an external system like ClusterControl, to decrease downtime. For more detailed information about running PostgreSQL on Microsoft Azure you can take a look at our deep dive blog.

by Sebastian Insausti at September 27, 2019 03:28 PM

September 26, 2019


Comparing DBaaS Failover Solutions to Manual Recovery Setups

We have recently written several blogs covering how different cloud providers handle database failover. We compared failover performance in Amazon Aurora, Amazon RDS and ClusterControl, tested the failover behavior in Amazon RDS, and also on Google Cloud Platform. While those services provide great options when it comes to failover, they may not be right for every application.

In this blog post we will spend a bit of time analysing the pros and cons of using the DBaaS solutions compared with designing an environment manually or by using a database management platform, like ClusterControl.

Implementing High Availability Databases with Managed Solutions

The primary reason to use existing solutions is ease of use. You can deploy a highly available solution with automated failover in just a couple of clicks. There’s no need for combining different tools together, managing the databases by hand, deploying tools, writing scripts, designing the monitoring, or any other database management operations. Everything is already in place. This can seriously reduce the learning curve and requires less experience to set up a highly-available environment for the databases; allowing basically everyone to deploy such setups.

In most of the cases with these solutions, the failover process is executed within a reasonable time. It may be blazing fast as with Amazon Aurora or somewhat slower as with Google Cloud Platform SQL nodes. For the majority of the cases, these types of results are acceptable. 

The bottom line. If you can accept 30 - 60 seconds of downtime, you should be ok using any of the DBaaS platforms.

The Downside of Using a Managed Solution for HA

While DBaaS solutions are simple to use, they also come with some serious drawbacks. For starters, there is always a vendor lock-in component to consider. Once you deploy a cluster in Amazon Web Services it is quite tricky to migrate out of that provider. There are no easy methods to download the full dataset through a physical backup. With most providers, only manually executed logical backups are available. Sure, there are always options to achieve this, but it is typically a complex, time-consuming process, which still may require some downtime after all.

Using a provider like Amazon RDS also comes with limitations. Some actions cannot be easily performed which would be very simple to accomplish on environments deployed in a fully user-controlled manner (e.g. AWS EC2). Some of these limitations have already been covered in other blogs, but to summarize is that no DBaaS service gives you the same level of flexibility as regular MySQL GTID-based replication. You can promote any slave, you can re-slave every node off any other...virtually every action is possible. With tools like RDS you face design-induced limitations you cannot bypass.

The problem is also with an ability to understand performance details. When you design your own highly available setup, you become knowledgeable about potential performance issues that may show up. On the other hand, RDS and similar environments are pretty much “black boxes.” Yes, we have learned that Amazon RDS uses DRBD to create a shadow copy of the master, we know that Aurora uses shared, replicated storage to implement very fast failovers. That’s just a general knowledge. We cannot tell what are the performance implications of those solutions other than what we might casually notice. What are common issues associated with them? How stable are those solutions? Only the developers behind the solution know for sure.

What is the Alternative to DBaaS Solutions?

You may wonder, is there an alternative to DBaaS? After all, it is so convenient to run the managed service where you can access most of the typical actions via UI. You can create and restore backups, failover is handled automatically for you. The environment is easy-to-use which can be compelling for companies who do not have dedicated and experienced staff for dealing with databases.

ClusterControl provides a great alternative to cloud-based DBaaS services. It provides you with a graphical user interface, which can be used to deploy, manage, and monitor open source databases. 

In couple of clicks you can easily deploy a highly-available database cluster, with automated failover (faster than most of the DBaaS offerings), backup management, advanced monitoring, and other features like integration with external tools (e.g. Slack or PagerDuty) or upgrade management. All this while completely avoiding vendor lock-in. 

ClusterControl doesn’t care where your databases are located as long as it can connect to them using SSH. You can have setups in cloud, on-prem, or in a mixed environment of multiple cloud providers. As long as connectivity is there, ClusterControl will be able to manage the environment. Utilizing the solutions you want (and not the ones that you are not familiar nor aware of) allows you to take full control over the environment at any point in time. 

Whatever setup you deployed with ClusterControl, you can easily manage it in a more traditional, manual or scripted way. ClusterControl even provides you with command line interface, which will let you incorporate tasks executed by ClusterControl into your shell scripts. You have all the control you want - nothing is a black box, every piece of the environment would be built using open source solutions combined together and deployed by ClusterControl.

Let’s take a look at how easily you can deploy a MySQL Replication cluster using ClusterControl. Let’s assume you have the environment prepared with ClusterControl installed on one instance and all other nodes accessible via SSH from ClusterControl host.

ClusterControl Deployment Wizard

We will start with picking the “Deploy” wizard.

ClusterControl Deployment Wizard

At the first step we have to define how ClusterControl should connect to the nodes on which databases are to be deployed. Both root access or sudo (with or without the password) are supported.

ClusterControl Deployment Wizard

Then, we want to pick a vendor, version and pass the password for the administrative user in our MySQL database.

ClusterControl Deployment Wizard

Finally, we want to define the topology for our new cluster. As you can see, this is already quite complex setup, unlike something you can deploy using AWS RDS or GCP SQL node.

ClusterControl Jobs

All we have to do now is to wait for the process to complete. ClusterControl will do its best to understand the environment it is deploying to and install required set of packages, including the database itself.

ClusterControl Cluster List

Once the cluster is up-and-running, you can proceed with deploying the proxy layer (which will provide your application with a single point of entry into the database layer). This is more or less what happens behind the scenes with DBaaS, where you also have endpoints to connect to the database cluster. It is quite common to use a single endpoint for writes and multiple endpoints for reaching particular replicas.

Database Cluster Topology

Here we will use ProxySQL, which will do the dirty work for us - it will understand the topology, sends writes only to the master and load balance read-only queries across all replicas that we have.

To deploy ProxySQL we will go to Manage -> Load Balancers.

Add Database Load Balancer ClusterControl

We have to fill all required fields: hosts to deploy on, credentials for the administrative and monitoring user, we may import existing user from MySQL into ProxySQL or create a new one. All the details about ProxySQL can be easily found in multiple blogs in our blog section.

We want at least two ProxySQL nodes to be deployed to ensure high-availability. Then, once they are deployed, we will deploy Keepalived on top of ProxySQL. This will ensure that Virtual IP will be configured and pointing to one of the ProxySQL instances, as long as there will be at least one healthy node.

Add ProxySQL ClusterControl

Here is the only potential problem if you go with cloud environments where routing works in a way that you cannot easily bring up a network interface. In such case you will have to modify the configuration of Keepalived, introduce ‘notify_master’ script and use a script, which will make the necessary IP changes - in case of EC2 it would have to detach Elastic IP from one host and attach it to the other host. 

There are plenty of instructions on how to do that using widely-tested open source software in setups deployed by ClusterControl. You can easily find additional information, tips, and how-to’s which are relevant to your particular environment.

Database Cluster Topology with Load Balancer


We hope you found this blog post insightful. If you would like to test ClusterControl, it comes with a 30 day enterprise trial where you have available all the features. You can download it for free and test if it fits in your environment.

by krzysztof at September 26, 2019 06:41 PM

September 25, 2019


Failover & Failback on Amazon RDS

Previously we posted a blog discussing Achieving MySQL Failover & Failback on Google Cloud Platform (GCP) and in this blog we'll look at how it’s rival, Amazon Relational Database Service (RDS), handles failover. We will also look at how you can perform a failback of your former master node, bringing it back to its original order as a master.

When comparing the tech-giant public clouds which supports managed relational database services, Amazon is the only one that offers an alternative option (along with MySQL/MariaDB, PostgreSQL, Oracle, and SQL Server) to deliver its own kind of database management called Amazon Aurora. For those not familiar with Aurora, it is a fully managed relational database engine that's compatible with MySQL and PostgreSQL. Aurora is part of the managed database service Amazon RDS, a web service that makes it easy to set up, operate, and scale a relational database in the cloud. 

Why You Would Need to Failover or Failback?

Designing a large system that is fault-tolerant, highly-available, with no Single-Point-Of-Failure (SPOF) requires proper testing to determine how it would react when things go wrong. 

If you are concerned about how your system would perform when responding to your system's Fault Detection, Isolation, and Recovery (FDIR), then failover and failback should be of high importance. 

Database Failover in Amazon RDS

Failover occurs automatically (as manual failover is called switchover). As discussed in a previous blog the need to failover occurs once your current database master experiences a network failure or abnormal termination of the host system. Failover switches it to a stable state of redundancy or to a standby computer server, system, hardware component, or network. 

In Amazon RDS you don't need to do this, nor are you required to monitor it yourself, as RDS is a managed database service (meaning Amazon handles the job for you). This service manages things such as hardware issues, backup and recovery, software updates, storage upgrades, and even software patching. We'll talk about that later in this blog.

Database Failback in Amazon RDS

In the previous blog we also covered why you would need to failback. In a typical replicated environment the master must be powerful enough to carry a huge load, especially when the workload requirement is high. Your master setup requires adequate hardware specs to ensure it can process writes, generate replication events, process critical reads, etc, in a stable way. When failover is required during disaster recovery (or for maintenance) it’s not uncommon that when promoting a new master you might use inferior hardware. This situation might be okay temporarily, but in the long run, the designated master must be brought back to lead the replication after it is deemed healthy (or maintenance is completed).

Contrary to failover, failback operations usually happen in a controlled environment by using switchover. It is rarely done when in panic-mode. This approach provides your engineers enough time to plan carefully and rehearse the exercise to ensure a smooth transition. Its main objective is to simply bring back the good, old master to the latest state and restore the replication setup to its original topology. Since we are dealing with Amazon RDS, there's really no need for you to be overly concerned about these type of issues since it's a managed service with most jobs being handled by Amazon.

How Does Amazon RDS Handle Database Failover?

When deploying your Amazon RDS nodes, you can setup your database cluster with Multi-Availability Zone (AZ) or to a Single-Availability Zone. Let's check each of them on how does failover being processed.

What is a Multi-AZ Setup?

When catastrophe or disaster occurs, such as unplanned outages or natural disasters where your database instances are affected, Amazon RDS automatically switches to a standby replica in another Availability Zone. This AZ is typically in another branch of the data center, often far from the current availability zone where instances are located. These AZ's are highly-available, state-of-the-art facilities protecting your database instances. Failover times depend on the completion of the setup which is often based on the size and activity of the database as well as other conditions present at the time the primary DB instance became unavailable. 

Failover times are typically 60-120 seconds. They can be longer though, as large transactions or a lengthy recovery process can increase failover time. When the failover is complete, it can also take additional time for the RDS Console (UI) to reflect the new Availability Zone.

What is a Single-AZ Setup?

Single-AZ setups should only be used for your database instances if your RTO (Recovery Time Objective) and RPO (Recovery Point Objective) are high enough to allow for it. There are risks involved with using a Single-AZ, such as large downtimes which could disrupt business operations. 

Common RDS Failure Scenarios

The amount of downtime is dependent on the type of failure. Let's go over what these are and how recovery of the instance is handled.

Recoverable Instance Failure

An Amazon RDS instance failure occurs when the underlying EC2 instance suffers a failure. Upon occurrence, AWS will trigger an event notification and send out an alert to you using Amazon RDS Event Notifications. This system uses AWS Simple Notification Service (SNS) as the alert processor. 

RDS will automatically try to launch a new instance in the same Availability Zone, attach the EBS volume, and attempt recovery. In this scenario, RTO is typically under 30 minutes. RPO is zero because the EBS volume was able to be recovered. The EBS volume is in a single Availability Zone and this type of recovery occurs in the same Availability Zone as the original instance.  

Non-Recoverable Instance Failures or EBS Volume Failures

For failed RDS instance recovery (or if the underlying EBS volume suffers a data loss failure) point-in-time recovery (PITR) is required.  PITR is not automatically handled by Amazon, so you need to either create a script to automate it (using AWS Lambda) or do it manually.

The RTO timing requires starting up a new Amazon RDS instance, which will have a new DNS name once created, and then applying all changes since the last backup. 

The RPO is typically 5 minutes, but you can find it by calling RDS:describe-db-instances:LatestRestorableTime. The time can vary from 10 minutes to hours depending on the number of logs which need to be applied. It can only be determined by testing as it depends on the size of the database, the number of changes made since the last backup, and the workload levels on the database. Since the backups and transaction logs are stored in Amazon S3, this recovery can occur in any supported Availability Zone in the Region.

Once the new instance is created, you will need to update your client's endpoint name. You also have the option to rename it to the old DB instance's endpoint name (but that requires you to delete the old failed instance) but that makes determining the root cause of the issue impossible.

Availability Zone Disruptions

Availability Zone disruptions can be temporary and are rare, however, if AZ failure is more permanent the instance will be set to a failed state. The recovery would work as described previously and a new instance could be created in a different AZ, using point-in-time recovery. This step has to be done manually or by scripting. The strategy for this type of recovery scenario should be part of your larger disaster recovery (DR) plans.

If the Availability Zone failure is temporary, the database will be down but remains in the available state. You are responsible for application-level monitoring (using either Amazon’s or third-party tools) to detect this type of scenario. If this occurs you could wait for the Availability Zone to recover, or you could choose to recover the instance to another Availability Zone with a point-in-time recovery.

The RTO would be the time it takes to start up a new RDS instance and then apply all the changes since the last backup. The RPO might be longer, up to the time the Availability Zone failure occurred.

Testing Failover and Failback on Amazon RDS

We created and setup an Amazon RDS Aurora using db.r4.large with a Multi-AZ deployment (which will create an Aurora replica/reader in a different AZ) which is only accessible via EC2. You will need to make sure to choose this option upon creation if you intend to have Amazon RDS as the failover mechanism.

Amazon RDS DB Instance Size

During the provisioning of our RDS instance, it took about ~11 minutes before the instances became available and accessible. Below is a screenshot of the nodes available in RDS after the creation:

Amazon RDS Provisioned Databases

These two nodes will have their own designated endpoint names, which we'll use to connect from the client's perspective. Verify it first and check the underlying hostname for each of these nodes. To check, you can run this bash command below and just replace the hostnames/endpoint names accordingly:

root@ip-172-31-8-130:~# host=('' ''); for h in "${host[@]}"; do mysql -h $h -e "select @@hostname; select @@global.innodb_read_only"; done;


| @@hostname    |


| ip-10-20-0-94 |



| @@global.innodb_read_only |


|                         0 |



| @@hostname     |


| ip-10-20-1-139 |



| @@global.innodb_read_only |


|                         1 |


The result clarifies as follows,

s9s-db-aurora-instance-1 = = ip-10-20-0-94 (read-write)

s9s-db-aurora-instance-1-us-east-2b = = ip-10-20-1-139 (read-only)

Simulating Amazon RDS Failover

Now, let's simulate a crash to simulate a failover for the Amazon RDS Aurora writer instance, which is s9s-db-aurora-instance-1 with endpoint

To do this, connect to your writer instance using the mysql client command prompt and then issue the syntax below:

ALTER SYSTEM SIMULATE percentage_of_failure PERCENT DISK FAILURE [ IN DISK index | NODE index ]


Issuing this command has its Amazon RDS recovery detection and acts pretty quick. Although the query is for testing purposes, it might differ when this occurrence happens in a factual event. You might be interested to know more about testing an instance crash in their documentation. See how we end up below:


Query OK, 0 rows affected (0.01 sec)

Running the SQL command above means that it has to simulate disk failure for at least 3 minutes. I monitored the point in time to begin the simulation and it took about 18 seconds before the failover begins. 

See below on how RDS handles the simulation failure and the failover,

Tue Sep 24 10:06:29 UTC 2019


| @@hostname    |


| ip-10-20-0-94 |



| @@global.innodb_read_only |


|                         0 |



| @@hostname     |


| ip-10-20-1-139 |



| @@global.innodb_read_only |


|                         1 |





Tue Sep 24 10:06:44 UTC 2019


| @@hostname    |


| ip-10-20-0-94 |



| @@global.innodb_read_only |


|                         0 |


ERROR 2003 (HY000): Can't connect to MySQL server on '' (111)




Tue Sep 24 10:06:51 UTC 2019


| @@hostname    |


| ip-10-20-0-94 |



| @@global.innodb_read_only |


|                         0 |



| @@hostname     |


| ip-10-20-1-139 |



| @@global.innodb_read_only |


|                         0 |





Tue Sep 24 10:07:13 UTC 2019


| @@hostname     |


| ip-10-20-1-139 |



| @@global.innodb_read_only |


|                         0 |



| @@hostname    |


| ip-10-20-0-94 |



| @@global.innodb_read_only |


|                         1 |


The results of this simulation are pretty interesting. Let's take this one at a time.

  • At around 10:06:29, I started to run the simulation query as stated above. 
  • At around 10:06:44,  it shows that endpoint with assigned hostname of ip-10-20-1-139 where in fact it's the read-only instance, went inaccessible nevertheless that the simulation command was ran under the read-write instance.
  • At around 10:06:51, it shows that endpoint with assigned hostname of ip-10-20-1-139 is up but it has mark as read-write state. Take note that variable innodb_read_only, for Aurora MySQL managed instances, this is its identifier to determine if host is read-write or read-only node and Aurora also runs only on InnoDB storage engine for MySQL comptable instances.
  • At around 10:07:13, the order has changed. This means that the failover was done and the instances have been assigned to its designated endpoints. 

Checkout the result below which is shown in the RDS console:

Aurora Status Amazon RDS

If you compare to the earlier one, the s9s-db-aurora-instance-1 was a reader, but then promoted as a writer after the failover. The process including the test took out some 44 seconds to complete the task, but the failover shows completed at almost 30 seconds. That's impressive and fast for a failover, especially considering this is a managed service database; meaning you don't need to worry about any hardware or maintenance issues.

Performing a Failback in Amazon RDS

Failback in Amazon RDS is pretty simple. Before going through it, let's add a new reader replica. We need an option to test and identify what node AWS RDS would choose from when it tries failing-back to the desired master (or failback to the previous master) and to see if it selects the right node based on priority. The current list of instances as of now and its endpoints are shown below.

Amazon RDS DB Identifier
Amazon RDS Endpoint Name

The new replica is located on us-east-2c AZ with db hostname of ip-10-20-2-239

We'll attempt to do a failback using the instance s9s-db-aurora-instance-1 as the desired failback target. In this setup we have two reader instances. In order to assure that the correct node is picked up during failover, you will need to establish whether priority or availability is on top (tier-0 > tier-1 > tier-2 and so on until tier-15). This can be done by modifying the instance or during the creation of the replica. 

Amazon RDS Tiered Failover Setting

You can verify this in your RDS console.

Amazon RDS Console

In this setup s9s-db-aurora-instance-1 has priority = 0 (and is a read-replica), s9s-db-aurora-instance-1-us-east-2b has priority = 1 (and is the current writer), and s9s-db-aurora-instance-1-us-east-2c has priority = 2 (and is also a read-replica). Let's see what happens when we try to failback.

You can monitor the state by using this command.

$ host=('' '' ''); while true; do echo -e "\n==========================================="; date; echo -e "===========================================\n"; for h in "${host[@]}"; do mysql -h $h -e "select @@hostname; select @@global.innodb_read_only";  done; sleep 1; done;

After the failover has been triggered, it will failback to our desired target, which is the node s9s-db-aurora-instance-1


Tue Sep 24 13:30:59 UTC 2019



| @@hostname     |


| ip-10-20-1-139 |



| @@global.innodb_read_only |


|                         0 |



| @@hostname    |


| ip-10-20-0-94 |



| @@global.innodb_read_only |


|                         1 |



| @@hostname     |


| ip-10-20-2-239 |



| @@global.innodb_read_only |


|                         1 |






Tue Sep 24 13:31:35 UTC 2019



| @@hostname     |


| ip-10-20-1-139 |



| @@global.innodb_read_only |


|                         1 |


ERROR 2003 (HY000): Can't connect to MySQL server on '' (111)

ERROR 2003 (HY000): Can't connect to MySQL server on '' (111)




Tue Sep 24 13:31:38 UTC 2019



| @@hostname    |


| ip-10-20-0-94 |



| @@global.innodb_read_only |


|                         0 |



| @@hostname     |


| ip-10-20-2-239 |



| @@global.innodb_read_only |


|                         1 |



| @@hostname     |


| ip-10-20-2-239 |



| @@global.innodb_read_only |


|                         1 |


The failback attempt started at 13:30:59 and it completed around 13:31:38 (nearest 30 second mark). It ends up ~32 seconds on this test, which is still fast. 

I have verified the failover/failback multiple times and it has been consistently exchanging its read-write state between instances s9s-db-aurora-instance-1 and s9s-db-aurora-instance-1-us-east-2b. This leaves s9s-db-aurora-instance-1-us-east-2c left unpicked unless both nodes are experiencing issues (which is very rare as they are all situated in different AZ's). 

During the failover/failback attempts, RDS goes at a rapid transition pace during the failover at around 15 - 25 seconds (which is very fast). Keep in mind, we don't have huge data files stored on this instance, but it's still quite impressive considering there is nothing further to manage. 


Running a Single-AZ introduces danger when performing a failover. Amazon RDS allows you to modify and convert your Single-AZ to a Multi-AZ capable setup, though this will add some costs for you. Single-AZ may be fine if you are ok with a higher RTO and RPO time, but is definitely not recommended for high-traffic, mission-critical, business applications. 

With Multi-AZ, you can automate failover and failback on Amazon RDS, spending your time focusing on query tuning or optimization. This eases many problems faced by DevOps or DBAs. 

While Amazon RDS may cause a dilemma in some organizations (as it's not platform agnostic), it's still worthy of consideration; especially if your application requires a long term DR plan and you do not want to have to spend time worrying about hardware and capacity planning.


by Paul Namuag at September 25, 2019 05:49 PM

MariaDB Foundation

Progress on Pull Request Processing

In his blog post “On Contributions, Pride and Cockiness ” in May, MariaDB Foundation CEO Kaj Arnö spoke of a renewed focus on MariaDB Server pull requests. Processing community pull requests in good time is a key part of our mission, but we’d been falling behind, and receiving justifiable criticism. At the time of that […]

The post Progress on Pull Request Processing appeared first on

by Ian Gilfillan at September 25, 2019 03:17 PM

September 24, 2019


Using Backups to Fix Common Failure Scenarios for MongoDB

Production outages are almost guaranteed to occur at some point. Accepting this fact and analyzing the timeline and failure scenario of your database outage can help better prepare, diagnose, and recover from the next one. To mitigate the impact of downtime, organizations need an appropriate disaster recovery (DR) plan. DR planning is a critical task for many SysOps/DevOps, but even though it’s foreseen; often it does not exist.

In this blog post, we will analyze different backup and failure scenarios in MongoDB database systems. We will also walk you through recovery and failover procedures for each respective scenario. These use cases will vary from restoring a single node, restoring a node in an existing replicaSet, and seeding a new node in a replicaSet. Hopefully this will give you a good understanding of the risks you might face and what to consider when designing your infrastructure.

Before we start discussing possible failure scenarios, let’s take a look at how MongoDB stores data and what types of backup are available.

How MongoDB Stores Data

MongoDB is a document-oriented database. Instead of storing your data in tables made out of individual rows (as a relational database does), it stores data in collections made out of individual documents. In MongoDB, a document is a big JSON blob with no particular format or schema. Additionally, data can be spread across different cluster nodes with sharing or replicated to slave servers with replicaSet.

MongoDB allows for very fast writes and updates by default. The tradeoff is that often you are not explicitly notified of failures. By default, most drivers do asynchronous, unsafe writes. This means that the driver does not return an error directly, similar to INSERT DELAYED with MySQL. If you want to know if something succeeded, you have to manually check for errors using getLastError

For optimal performance, it’s preferable to use SSD rather than HDD for storage. It is necessary to take care of whether your storage is local or remote and take measures accordingly. It’s better to use RAID for protection of hardware defects and recovery schemes, but don’t rely completely on it as it doesn’t offer protection against adverse failures. The right hardware is the building block for your application to optimize performance and avoid a major debacle.

Disk-level data corruption or missing data files can prevent mongod instances from starting, and journal files may be insufficient to recover automatically. 

If you are running with journaling enabled, there is almost never any need to run repair since the server can use the journal files to restore the data files to a clean state automatically. However, you may still need to run repair in cases where you need to recover from disk-level data corruption.

If journaling is not enabled, your only option may be to run repair command. mongod --repair should be used only if you have no other options as the operation removes (and does not save) any corrupt data during the repair process. This type of operation should always be preceded with backup. 

MongoDB Disaster Recovery Scenario

In a failure recovery plan, your Recovery Point Objective (RPO) is a key recovery parameter that dictates how much data you can afford to lose. RPO is listed in time, from mili-seconds to days and is directly dependent on your backup system. It considers the age of your backup data that you must recover in order to resume normal operations.

To estimate RPO you need to ask yourself a few questions. When is my data is backed up? What is the SLA associated with the retrieval of the data? Is restoring a backup of the data acceptable or does the data need to be online and ready to be queried at any given time? 

Answers to these questions will help drive what type of backup solution you need.

MongoDB Backup Solutions

Backup techniques have varying impacts on the performance of the running database. Some backup solutions degrade database performance enough that you may need to schedule backups to avoid peak usage or maintenance windows. You may decide to deploy new secondary servers just to support backups.

The three most common solutions to backup your MongoDB server/cluster are...

  • Mongodump/Mongorestore - logical backup.
  • Mongo Management System (Cloud) - Production databases can be backed up using MongoDB Ops Manager or if using the MongoDB Atlas service you can use a fully managed backup solution.
  • Database Snapshots (disk-level backup)


When performing a mongodump, all collections within the designated databases will be dumped as BSON output. If no database is specified, MongoDB will dump all databases except for the admin, test and local databases as they are reserved for internal use.

By default, mongodump will create a directory called dump, with a directory for each database containing a BSON file per collection in that database. Alternatively, you can tell mongodump to store the backup within one single archive file. The archive parameter will concatenate the output from all databases and collections into one single stream of binary data. Additionally, the gzip parameter can naturally compress this archive, using gzip. In ClusterControl we stream all our backups, so we enable both the archive and gzip parameters.

Similar to mysqldump with MySQL, if you create a backup in MongoDB it will freeze the collections while dumping the contents to the backup file. As MongoDB does not support transactions (changed in 4.2) you can’t make a 100% fully consistent backup unless you create the backup with the oplog parameter. Enabling this on the backup includes the transactions from the oplog that were executing while making the backup. 

For better automation and You can run MongoDB from the command line or use external tools like ClusterControl. ClusterControl is recommended option for backup management and backup automation, as it allows for creating advanced backup strategies for various open-source database systems.

ClusterControl allows you to upload your backup to the cloud. It supports full backup and restores the encryption of mongodump. If you want to see how it works there is a demo on our website.

ClusterControl Demo

Restoring MongoDB From a Backup

There are basically two ways you can use a BSON format dump:

  1. Run mongod directly from the backup directory
  2. Run mongorestore and restore the backup

Run mongod Directly From a Backup

A prerequisite for running mongod directly from the backup is that the backup target is a standard dump, and is not gzipped.

The MongoDB daemon will then check the integrity of the data directory, add the admin database, journals, collection and index catalogs and some other files necessary to run MongoDB. If you ran WiredTiger as the storage engine before, it will now run the existing collections as MMAP. For simple data dumps or integrity checks, this works fine.

Running mongorestore

A better way to restore would obviously be by restoring the node using a mongorestore.

mongorestore  dump/

This will restore the backup into the default server settings (localhost, port 27017) and overwrite any databases in the backup that reside on this server. Now there are tons of parameters to manipulate the restore process, and we will cover some of the important ones.

In ClusterControl this is done in restore backup option. You can choose the machine when the backup will be restored and process with take care of the rest. This includes encrypted backup where normally you would also need to decrypt your backup.

ClusterControl Restore Backup

Object Validation

As the backup contains BSON data, you would expect the contents of the backup to be correct. However, it could have been the case that the document that got dumped was malformed, to begin with. Mongodump does not check the integrity of the data it dumps. 

To address that use -- objcheck which forces mongorestore to validate all requests from clients upon receipt to ensure that clients never insert invalid documents into the database. It can have a small impact on performance.

Oplog Replay

Oplog to your backup will enable you to perform a consistent backup and do a point-in-time-recovery. Enable the oplogReplay parameter to apply the oplog during the restore process. To control how far to replay the oplog, you can define a timestamp in the oplogLimit parameter. Only transactions up until the timestamp will then be applied.

Restoring a Full ReplicaSet From a Backup

Restoring a replicaSet is not much different than restoring a single node. Either you have to set up the replicaSet first and restore directly into the replicaSet. Or you restore a single node first and then use this restored node to build a replicaSet.

Restore node first, then create replicaSet

Now the second and third nodes will sync their data from the first node. After the sync has finished our replicaSet has been restored.

Create a ReplicaSet first, then restore

Different to the previous process, you can create the replicaSet first. First configure all three hosts with the replicaSet enabled, start-up all three daemons and initiate the replicaSet on the first node:

Now that we have created the replicaSet, we can directly restore our backup into it:

In our opinion restoring a replicaSet this way is much more elegant. It is closer to the way you would normally set up a new replicaSet from scratch, and then fill it with (production) data.

Seeding a New Node in a ReplicaSet

When scaling out a cluster by adding a new node in MongoDB, the initial sync of the dataset must happen. With MySQL replication and Galera, we are so accustomed to using a backup to seed the initial sync. With MongoDB this is possible, but only by making a binary copy of the data directory. If you don’t have the means to make a file system snapshot, you will have to face downtime on one of the existing nodes. The process, with downtime, is described below.

Seeding With a Backup

So what would happen if you restore the new node from a mongodump backup instead, and then have it join a replicaSet? Restoring from a backup should, in theory, give the same dataset. As this new node has been restored from a backup, it will lack the replicaSetId and MongoDB will notice. As MongoDB doesn’t see this node as part of the replicaSet, the rs.add() command then will always trigger the MongoDB initial sync. The initial sync will always trigger deletion of any existing data on the MongoDB node.

The replicaSetId is generated when initiating a replicaSet, and unfortunately can’t be set manually. That’s a shame as recovering from a backup (including replaying the oplog) would theoretically give us a 100% identical data set. It would be nice if the initial sync was optional in MongoDB to satisfy this use case.


by Bart Oles at September 24, 2019 08:27 PM

September 23, 2019


What to Know When Start Working with MongoDB in Production - Ten Tips

Learning MongoDB requires a lot of precise thinking. Little consideration is often not made in essential undertakings that could otherwise jeopardise the performance of the database in production mode. 

MongoDB is a NoSQL DBMS which literally follows a different pattern from SQL databases, especially along the lines of security and structure. Although some of the integrated features promote its performance and make it one of the best in the recent times, some of the features consequently pose potential threats that can ruin its performance if not taken into account. 

In a recent “worst case” experience, I was trying to query a collection with documents that had large arrays and it took ages for me to get the results back. I decided to write this blog as I knew if someone experiences these same problems that this blog will be of great help. 

Key Considerations for MongoDB in Production

  1. Security and authentication.
  2. Indexing your documents
  3. Using a schema in your collections
  4. Capped collection
  5. Document size
  6. Array size for embedded documents
  7. Aggregation pipeline stages
  8. Order of keys in hash object
  9. ‘undefined’ and ‘null’ in MongoDB
  10. Write operation

MongoDB Security and Authentication

Data vary in many ways and you will obviously need to keep some information confidential. By default MongoDB installs do not set authentication requirement as a must but that doesn’t give you a go ahead using it especially when confidential data such as financial and medical records are involved. On a development workstation, it is not a big deal but because of multi-user involvement in the production mode, it is good practice to set the authentication certificates. The most common and easy to use method is the default MongoDB Username and Password credentials.

Data is written to files which can be accessed through a third party tool more so if they are not encrypted.The data can be altered without your knowledge if some anonymous person gets access to the system files. Hosting the database on a dedicated server and assign a single user who will have full access to the data files will save you the trick.

Protecting data from external injection attacks is also an essential undertaking. Some operators such as $group, $whereby and the mapReduce operations are javascript(js) developed hence prone to js manipulation. To avoid any instance of data integrity as a result, you can disable arbitrary JS setting by configuring the parameter javascriptEnabled:false in the config file if you have not used any of the mentioned operators. Further, you can reduce the risk of data access through network breaches by employing some of the procedures highlighted in the  MongoDB Security Checklist.

Indexing Your Documents

Indexing is generally assigning a unique identification value to each document in a MongoDB collection. Indexing brings about performance upgrade in both read and write operations. By default  it is enabled and one should always maintain that setting. Without indexing, the database has to check through multiple docs from the start to the end and unfortunately the operation will be time costly for documents that are towards the end, rendering poor latency for the query. At some point, on the application end, users may experience a lag and may think the application is actually not working. Indexing is helpful in sort and lookup query operations not leaving out the find operation itself.  Sorting is a common operation for many returned documents. It is often carried out as the final stage after documents have been filtered so that a small amount of data need to be sorted. An index in this case will help sort the data in nature of entry and restrict the returned data to a limit of 32MB. If there is no indexing, chances of the 32 memory limit on the combined size of returned documents will be exceeded and whenever the database hits this limit, it will throw an error besides returning an empty record set.

$lookup operation is as well supported with indexing in place. An index on the key value used as the foreign key is essential for the preceding stages processing.

Using a Schema in Your Collections

MongoDB does not need one to define fields(columns) just as it may require you to do for SQL dbms. However much you will not need to define the fields, to avoid data inconsistency and some setbacks that may arise, defining a schema is always a good practice. Schema design allows you to determine which type of data goes to a certain field, which field must be supplied with a value and generally enhance data validation before entry or update thereby promoting data integrity and consistency. A schema design will also direct you whether to reference or embed data. As a beginner you may think the only model will be “One -to-N” that will facilitate one to have subdocument  array entries but that is not the case.

You need to understand the cardinality relationship between documents before making your model. Some of the rules that  will help you have an optimal schema are:

  1. To reduce the number of queries that you will need to execute before  accessing some data and if few fields or array elements are involved, then you can embed subdocuments. Take an example of the model below:
    1. {
       Name: ‘John Doh’,
         {street: ‘Moi Avenue’, city:’Nairobi’, countryCode: ‘KE’},
         {street: ‘Kenyatta Avenue’, city:’Nairobi’, countryCode: ‘KE’},
  2. For frequently updated documents, use denormalization . If any field is going to be frequently updated, then there will be the task of finding all the instances that need to be updated. This will result in slow query processing, hence overwhelming even the merits associated with denormalization.
  3. Complex queries such as aggregate pipelining take more time to execute when many sub-documents are involved and there is need to fetch a document separately.
  4. Array elements with large set of object data should not be embedded obviously due to the fact that they may grow and consequently  exceeding the document size.

Modelling of a schema is often determined by the application access pattern. You can find more procedures that can help in the design of your model in the blog 6 Rules of Thumb for MongoDB Schema Design

Use a Capped Collection for Recent Documents Priority

MongoDB provides a lot of resources such as the capped collection. Unfortunately some end up not being utilized. A capped collection has a fixed size and it’s known to support high-throughput operations that insert and retrieve documents based on the insertion order. When its space is filled up, old documents are deleted to give room for new ones. 

Example of capped collection  use case:

  • Caching frequently accessed data since the collection itself is read-heavy rather than write-heavy. You need to ensure the collection is always in performance.
  • Log information for high volume systems. Capped collection often don’t use an index and this is advantageous in that, the speed of recording is quite fast just like writing into a file.

Pay Attention to MongoDB Document Size

Every MongoDB document is limited to a size of 16 megabytes. However, it is optimal for the document  to reach or approach this limit as it will pose some atrocious performance problems. MongoDB itself works best when the size of the documents is of a few kilobytes. If the document is large enough in size,   a complex projection request will take a long time and the query may time out. 

Pay Attention to the Array Size of Embedded Documents

One can push subdocuments to a field in a document thereby creating an array value on this field. As mentioned before, you need to keep the size of the subdocuments low. It is equally  important to ensure the number of array elements is below a four figure. Otherwise, the document will grow beyond its size and it will need to be relocated in disk. A further problem associated with such an operation is that, that every document will need to be re-indexed. Besides, each subdocument  will equally need to be re-indexed. This means that there will be a lot of index writings which result in slow operations. For large subdocument size, it rather important to keep the records in a new collection than embedding.

Aggregation Pipeline Stages 

Besides the normal MongoDB query operations, there is an aggregation framework used manipulate and return data in accordance to some specifications such as ordering and grouping. MongoDB does not have a query optimizer hence need one to order queries appropriately. With  the aggregation framework, ensure the pipeline stages are well ordered. Start by reducing the amount of data you dealing with using the $match operator and possibly $sort in the end if need to sort. You can use third party tools such as Studio 3T to optimize your aggregation query before integrating it in your code. The tool enables you see data input and output in any of the stages thus  knowing what you are dealing with.

Using $limit and $sort should always give the same results every time the query is executed. In case you use $limit the returned data will not be deterministic and may render some issues which are difficult to track.

Check the Order of Keys in Hash Objects

Consider having two large documents with sample data 


   FirstName: ‘John’,

   LastName: ‘Doh’


If you do a find operation with the query {FirstName: ‘John’, LastName: ‘Doh’}, the operation does not match with the query {LastName: ‘Doh’ FirstName: ‘John’}. You therefore need to maintain the order of name and value pairs in your documents. 

Avoid ‘undefined’ and ‘null’ in MongoDB

MongoDB uses BSON format for its documents. With JSON validation, ‘undefined’ is not supported and you should avoid using it. $null comes as a solution but you should avoid it too.

Consider Write Operations

You might set MongoDB for high-speed writes but this poses a setback in that, a response is returned even before the data is written. Journalling should be enabled  to avoid this scenario. In addition, in the case of a database break down, the data will still be available and it will create a checkpoint which can be used in the recovery process. The configuration for the duration of journal writes can be set using the parameter commitIntervalMs.


Database system should ensure data integrity and consistency besides being resilient to failure and malice. However, to arrive at this factors, one needs to understand the database itself and the data it holds. MongoDB will work well when the mentioned factors above are taken into account. The paramount of them being using a schema. A schema enables you to validate your data before entry or update and how you will model this data. Data modelling is often driven by the application accessibility pattern. All these summed will offer a better database performance.

by Onyancha Brian Henry at September 23, 2019 09:45 AM

September 20, 2019


MongoDB Database Automation Basics Using Chef

Managed environments grow over time, often as a result of the increased data involvement or maybe due to the need of increasing performance through shared workload. Because of this there is a need to add members. For instance, with MongoDB one can decide to do replication and sharding which consequently will require one to add more members to the cluster. Configuring and deploying these environments with time becomes hectic, time consuming, prone to human errors and so many associated setbacks which finally pose operational expenses. Take an example of a replica set with 50 members in MongoDB and you want to shard a certain collection in each of the members, doing this manually for each is time consuming, we thus need a centralized system from which you can easily configure all the members. With a centralized systems you write some code which in tern configure the connected members. The code is therefore human readable, versionable and testable to remove possible errors before deployment.

What is Chef Software?

Chef is an automation software written in the Ruby language that is used to streamline the task of configuring and maintaining cloud machines or on prem servers. It does so by ensuring that all connected members get the required resources, the resources are well configured and corrects any resources that are not in the desired state. So, Chef basically ensures files and software resources expected to be in a certain machine are present, configured correctly and working properly as intended.

The Components of Chef

Chef Server

This is the central controlling system that houses the configuration data. The data is written in a ‘recipe’  and if many of these recipes are involved they form a cookbook. The central system also contains metadata describing each of the nodes as outlined in chef-client.

All changes made in recipes pass here for validation before deployment. The server also ensures that the workstation and the connected nodes are paired using authorization keys before allowing communication between them and applying the changes.

Chef Client Node

The Chef client node registers and validates nodes and builds the node objects. It essentially holds the current state of a given node and its metadata.


This is the physical, virtual or cloud machine to be configured and each should have the client node installed.


The workstation provides an interface for communication between the server and the client nodes. It provides a platform for writing, testing and deploying  the cookbooks. This is where the roles are also defined

Test Kitchen

This is where the code is validated.

Chef Knife

Interacts the nodes.


Contains recipes written in Ruby language and are used to define the tasks that are to be carried out. The recipes specify resources and order of implementation on the defined tasks. 

  • attributes are used to override the default settings.
  • files used to transfer files from specific path to chef-client.
  • metadata resource defines the node information as described in the client-node.

How Chef Works

Chef has two ways of operation that is, client/server or in a standalone mode known as ‘chef-solo’.

The Chef-server receives various attributes regarding a certain node from the Chef-client. These attributes are then indexed using Elasticsearch by the server which then provides an Application Program Interface (API) from where the client-nodes can query these data. The returned results are then used by the client-nodes to configure the relevant machines and transform them to the desired state.

The server hubs all operations where changes are to be stored

MongoDB Chef

Chef managed servers are evaluated from time to time against some desired state ensuring that any changes in configurations are automatically corrected and applied universally. With this approach, there is a consistent configuration at scale.

Getting Started with Chef

You can download the chef-workstation from this site and install it. Make a folder named cookbooks and inside this folder run the command:

$ chef generate cookbook first_cookbook

This will generate a directory named first_cookbook with some sub-folders and files.

Navigate to cookbooks/first_cookbook/recipes/ and update the default.rb recipe with the contents

file "test.txt" do

  content 'This is my first recipe file'


We then execute this file using the command

$ chef-client --local-mode --override-runlist first_cookbook.

OR, inside the recipe folder, you can run the  file with the command

$ chef-apply default.rb

If you navigate to your recipe folder, you will definitely see the test.txt file with the content This is my first recipe file. It’s that easy. In the next section we will be creating recipes to do some specific tasks regarding MongoDB

Installing and Configuring MongoDB with Chef

You can install MongoDB by creating an install recipe MongoDBInstall.rb and populate it with the contents

package "mongodb" do

 action :install

 version '4.0.3'


In this case our package name is mongodb and we are going to install version 4.0.3

What we have is a basic recipe but in many cases we will need an advanced cookbook to do our configuration in MongoDB. To easen the task, there are developed cookbooks such as SC-MongoDB that generally make the configuration precise.

SC-MongoDB Cookbook

The cookbook provides  mongodb_instance that enhances one to configure MongoDB params, replica set and a sharded cluster.

To install the cookbook just run the command

$ knife supermarket download sc-mongodb

You can then use the defined attributes in this site to reconfigure some of the default MongoDB attributes.

by Onyancha Brian Henry at September 20, 2019 09:45 AM

September 19, 2019


PostgreSQL Top Learning & Training Resources

Oftentimes, people want to know about “That One Place” to get all their learning and training resources for PostgreSQL. When I get such a question from a colleague, my typical response it to tell them to look it up online. But I know as soon as they hit the “.com” highway, they will be confronted with a barrage of resources about PostgreSQL from blogs, articles, whitepapers, videos, webinars, cookbooks for dummies, cheat sheets, and more.

In this blog, I am going to take you on a journey of some of the important avenues to quickly obtain most of the knowledge you would need to know about PostgreSQL

Here we go...

Read the PostgreSQL Manual

The first stop are the online manuals of PostgreSQL. The official documentation (or docs as they are referred to in short) of any product is the best place to find the largest wealth of information. For most people nowadays, manuals are typically the last place to look for help.  It should, however, always be the first stop on the list for various reasons as listed below:

  • Official docs explain the internals of various components of a product and how they relate to each other
  • They link to various other sections of manuals discussing a concept when a new concept is introduced
  • There is sample code to be executed and its expected output with explanation
  • There is a logical flow from one idea to another
  • There is a “Tip” and “Quick Setup” section wherever required that gives bonus information for newbies
  • Most of the other online resources lead you to official documentation in one way or the other
  • The manuals are divided into appropriate sections as per the need such as developer oriented, administrator related, programming focused, utilities, command reference, internals and appendices etc.

One excellent feature of using manuals that I liked the most is the “Supported Versions” subtitle on top of the page which provides links to other versions of PostgreSQL where a concept is available. It makes it convenient to navigate between various versions of PostgreSQL for the same concept, especially when you want to compare default settings across versions, parameter names, and error conditions etc. 

I once wanted to play around with “Logical Replication” when it was first introduced in PostgreSQL 10. I found a dedicated chapter in the manuals on Logical Replication that explains the architecture, components involved, configuration settings, and a quick setup. All I did was follow the steps of “Quick Setup” and had a working Logical Replication setup on my test virtual machine in no time.

These docs are like the owner’s manual for a home appliance. Any error code from the appliance can only be understood by referring to the  owner’s manual to take necessary action to troubleshoot and remedy the issue. The notion sounds like a cliche but it holds true about manuals.

The other benefit of getting used to online manuals is by attaining first hand information about the added and/or enhanced features  in a newly released version of PostgreSQL (called Release Notes). Online manuals may give you a comprehensive account of enhancements, added features, and deprecated features, but Release Notes give you the “introductory gist” of what the new feature is, what enhancements have been made, and what features are no longer supported. A quick glance of Release Notes across major release versions also gives you an understanding of what developments have been made in a specific PostgreSQL version since the earlier release.

In addition to online manuals, there is a repository of all stuff PostgreSQL in the form of WIKI pages. This has supplementary information covering tutorials, guides, how-tos, and tips 'n' tricks related to PostgreSQL. It also serves as a collaboration area for PostgreSQL contributors. You can also get access to automation scripts developed by various users on installation, administration, and management of PostgreSQL, which could be utilized in your environment under GPL notice.

Using the PostgreSQL Distribution Lists

The next top learning and training resources are the community distribution lists. This is where you can interact with other PostgreSQL enthusiasts from across the globe. There are over 45 community distribution lists divided into 7 broad categories (listed below).

  • User lists
  • Developer lists
  • Regional lists
  • Associations
  • User groups
  • Project lists
  • Internal lists

There is a dedicated distribution list for every type of PostgreSQL professional for you depending on the regional language, experience level, and background of PostgreSQL interest. But as PostgreSQL gains more and more momentum this may quickly build up to be over a 100 distributions lists across even more categories.

To stay up-to-speed on PostgreSQL you have to subscribe and follow some of the community distribution lists, because you will see a lot of action around PostgreSQL. There is an audience of various levels of expertise starting from newbies requesting a little hand-holding to industry and community heavy-weights offering suggestions to solve complex issues being faced in production environments.

The best way to participate in these community distribution lists is to start with a PostgreSQL database instance running in your own local virtual machine (VM). This will help you to know the terminologies and nuances of PostgreSQL.  You are also in a position to offer help to the community when someone confronts a PostgreSQL situation you may have already faced and successfully resolved. 

PostgreSQL Partners & Software Tools

There are many tools that can be configured to work with a PostgreSQL database. It is not possible for a new user to truly get a grasp of the whole market out there, but it does get easier if you narrow down to a specific concept and evaluate the most popular tools related to the concept of your choice. 

My personal interest around databases is Backup & Recovery, Replication, High Availability, and Monitoring. I have spent enough time learning and implementing some of the open source tools around these areas, and when a fellow community member gets into a bind, and I know what could be the cause, I offer to help with a quick explanation and plan of action by citing references from the respective documentation. 

Official PostgreSQL Webinars

There are also online webinars conducted by various registered organizations (note: you will need a PostgreSQL account to view these), with their members forming part of a core team of contributors or committers of PostgreSQL code. Some of the other core team members manage their own personal blogs publishing technical content from time to time such as know-hows, white papers, case studies, tutorials or simple tips and tricks of working with PostgreSQL internals. The other forms of engaging with the PostgreSQL community members online include IRC, Slack, GitHub and several other online networking portals.

A List of PostgreSQL Events

Now that you have started learning and exploring the possibilities of PostgreSQL, it’s time to meet some real people in person. One way of achieving that would be to attend events and technical symposia organized by various local PostgreSQL user groups within your region. These events run anywhere from a few hours a day to one full week of activities revolving around PostgreSQL development, PostgreSQL hacks, bootcamps, and workshops etc.

There are plenty of conferences held all year round across the globe such as listed below:

The sponsored conferences listed above are held at various geographical locations and they are named after the region being conducted at, such as PGDay UK, PGConf Asia, PGConf EU and so on (note that some of them are only held in the region’s local language). 

If you can only attend one, the most important conference is the PGCon. This is an annual conference for users and developers of PostgreSQL held during the last week of May every year at the University of Ottawa in Ottawa City, Canada. This is where the top developers and committers of PostgreSQL meet each year to discuss enhancements, new features, and the development activities of PostgreSQL (in addition to presenting and conducting training bootcamps). It is during this event the community recognized developers and committers that have contributed immensely to PostgreSQL. Some are also formally inducted into the panel of contributors. 

The bootcamps and trainings conducted during PGCon are handled by industry experts who have developed the core features of PostgreSQL, which means you get to know the internals of PostgreSQL from the people who designed it. While a good reason to attend the community events is so you can expand your technical  network, the other good reason is to collect the PostgreSQL shirts which can be worn to work with pride in order to get others interested in PostgreSQL. The events calendar can be accessed from here, and each of the events will point you to its unique website managed and maintained by the respective event organisers. 

PostgreSQL Local and Regional User Groups

The User Groups such as PUG (PostgreSQL UG), SIG (Special Interest Group) and RUG (Regional UG). They give you an opportunity to bump into the PostgreSQL enthusiast next door. These are casual meetups organized by its members who meet on a regular basis. The frequency of these quick meets can be as often as once in a fortnight (which means two weeks for those who don’t read English literature) to once every quarter.

The main purpose of these user groups is to keep its members informed of the latest news around PostgreSQL and on upcoming global events. The members can be seen presenting technical content to a smaller group of individuals to cut their teeth for presenting at the global events. The topics of these meetups can get intriguing, especially when you have a bunch of IT engineers from varied technological backgrounds all discussing issues, limitations, and advantages of various database products and the ways to reduce costs, etc. These events also give you an opportunity to present a topic of your choice, which further widens your horizons within PostgreSQL. Most of the local group events are managed via the popular meetup platform as can be seen from the Local User Groups page

In addition to all the above, there are the official international websites of PostgreSQL, hosted and maintained in the local language of the region. The international websites tend to add more content on training and learning; catering to the needs of local audiences in a regional language. An excellent benefit of having such local and regional language sites is, you get to meet like minded individuals that can collaborate together to build systems and solutions using PostgreSQL.

The PostgreSQL Planet

Did you know that PostgreSQL has its own planet, where everything exists only related to PostgreSQL. It is like the master portal consolidating all the information from community distribution lists, PostgreSQL developers network, PostgreSQL bloggers, news, latest releases, and GitHub repositories. In you could come across small projects of interest which can give you a quick hands-on experience of a specific feature of PostgreSQL. There are some basic projects in this site which can get you started in developing your skills of PostgreSQL.

My own personal favourite is the consolidated record of a real world computing issue within PostgreSQL applications, discussed within the distribution list with plenty of inputs and replies from various PostgreSQL enthusiasts. These real world issues gain traction by way of someone trying to create a use case out of it, in order to discuss the possible solutions and come up with a quick fix. The quick fixes are published on the GitHub repositories with further enhancements by other community members. What starts as a problem for a PostgreSQL user ends up being a minor feature enhancement.

The PostgreSQL Planet is also a one-stop-shop for various maintenance scripts that are developed and tested by notable community bigwigs. One can build a repository of tool-sets out of these code snippets to manage and monitor PostgreSQL implementations. Most of the code comes with a default disclaimer that the developer is not liable and/or responsible for any damage, service failure, or performance degradation caused to the systems (but most of the code snippet is safe to run on production workloads for monitoring and learning purpose).

PostgreSQL Extensions

As you start following all the resources around PostgreSQL, after getting a firm grasp of its internals, you might want to develop something on your own and share it with the rest of the community members. A step forward would be to put various similar enhancements and functionalities together in the form of a PostgreSQL extension. PostgreSQL extensions are an extended feature set that can be included in a PostgreSQL database system as a ‘plug and play’ option. PostgreSQL extensions undergo an exhaustive process of review before being published on the official PostgreSQL extensions website. More on various PostgreSQL extensions and their uses would be discussed in another post in great detail.


I hope this blog gave you an idea of where to seek more information about PostgreSQL and how to enhance your PostgreSQL skills on a self taught, self learned basis from using the various types of resources. Make sure to reach out to our team of experts for your PostgreSQL management needs.

by Nawaz Ahmed at September 19, 2019 09:45 AM

September 18, 2019


Achieving MySQL Failover & Failback on Google Cloud Platform (GCP)

There are numerous cloud providers these days. They can be small or large, local or with data centers spread across the whole world. Many of these cloud providers offer some kind of a managed relational database solution. The databases supported tend to be MySQL or PostgreSQL or some other flavor of relational database. 

When designing any kind of database infrastructure it is important to understand your business needs and decide what kind of availability you would need to achieve. 

In this blog post, we will look into high availability options for MySQL-based solutions from one of the largest cloud providers - Google Cloud Platform.

Deploying a Highly Available Environment Using GCP SQL Instance

For this blog we want is a very simple environment - one database, with maybe one or two replicas. We want to be able to failover easily and restore operations as soon as possible if the master fails. We will use MySQL 5.7 as the version of choice and start with the instance deployment wizard:

We then have to create the root password, set the instance name, and determine where it should be located:

Next, we will look into the configuration options:

We can make changes in terms of the instance size (we will go with db-n1-standard-4), storage,  and maintenance schedule. What is most important for us in this setup are the high availability options:

Here we can choose to have a failover replica created. This replica will be promoted to a master should the original master fail.

After we deploy the setup, let’s add a replication slave:

Once the process of adding the replica is done, we are ready for some tests. We are going to run test workload using Sysbench on our master, failover replica, and read replica to see how this will work out. We will run three instances of Sysbench, using the endpoints for all three types of nodes.

Then we will trigger the manual failover via the UI:

Testing MySQL Failover on Google Cloud Platform?

I have got to this point without any detailed knowledge of how the SQL nodes in GCP work. I did have some expectations, however, based on previous MySQL experience and what I’ve seen in the other cloud providers. For starters, the failover to the failover node should be very quick. What we would like is to keep the replication slaves available, without the need for a rebuild. We would also like to see how fast we can execute the failover a second time (as it is not uncommon that the issue propagates from one database to another).

What we determined during our tests...

  1. While failing over, the master became available again in 75 - 80 seconds.
  2. Failover replica was not available for 5-6 minutes.
  3. Read replica was available during the failover process, but it became unavailable for 55 - 60 seconds after the failover replica became available

What we’re not sure about...

What is happening when the failover replica is not available? Based on the time, it looks like the failover replica is being rebuilt. This makes sense, but then the recovery time would be strongly related to the size of the instance (especially I/O performance) and the size of the data file.

What is happening with read replica after the failover replica would have been rebuilt? Originally, the read replica was connected to the master. When the master failed, we would expect the read replica to provide an outdated view of the dataset. Once the new master shows up, it should reconnect via replication to the instance (which used to be failover replica and which has been promoted to master). There is no need for a minute of downtime when CHANGE MASTER is being executed.

More importantly, during the failover process there is no way to execute another failover (which sort of makes sense):

It is also not possible to promote read replica (which not necessarily makes sense - we would expect to be able to promote read replicas at any time).

It is important to note, relying on the read replicas to provide high availability (without creating a failover replica) is not a viable solution. You can promote a read replica to become a master, however a new cluster would be created; detached from the rest of the nodes.

There is no way to slave your other replicas off the new cluster. The only way to do this would be to create new replicas, but this is a time-consuming process. It is also virtually non-usable, making the failover replica to be the only real option for high availability for SQL nodes in Google Cloud Platform.


While it is possible to create a highly-available environment for SQL nodes in GCP, the master will not be available for roughly a minute and a half. The whole process (including rebuilding the failover replica and some actions on the read replicas) took several minutes. During that time we weren’t able to trigger an additional failover, nor we we able to promote a read replica. 

Do we have any GCP users out there? How are you achieving high availability?


by krzysztof at September 18, 2019 09:45 AM

September 17, 2019


The Most Common PostgreSQL Failure Scenarios

There is not a perfect system, hardware, or topology to avoid all the possible issues that could happen in a production environment. Overcoming these challenges requires an effective DRP (Disaster Recovery Plan), configured according to your application, infrastructure, and business requirements. The key to success in these types of situations is always how fast we can fix or recover from the issue.

In this blog we’ll take a look at the most common PostgreSQL failure scenarios and show you how you can solve or cope with the issues. We’ll also look at how ClusterControl can help us get back online

The Common PostgreSQL Topology

To understand common failure scenarios, you must first start with a common PostgreSQL topology. This can be any application connected to a PostgreSQL Primary Node which has a replica connected to it.

The Common PostgreSQL Topology - Severalnines

You can always improve or expand this topology by adding more nodes or load balancers, but this is the basic topology we’ll start working with.

Primary PostgreSQL Node Failure

Primary PostgreSQL Node Failure - Severalnines

This is one of the most critical failures as we should fix it ASAP if we want to keep our systems online. For this type of failure it’s important to have some kind of automatic failover mechanism in place. After the failure, then you can look into the reason for the issues. After the failover process we ensure that the failed primary node doesn’t still think that it’s the primary node. This is to avoid data inconsistency when writing to it.

The most common causes of this kind of issue are an operating system failure, hardware failure, or a disk failure. In any case, we should check the database and the operating system logs to find the reason.

The fastest solution for this issue is by performing a failover task to reduce downtime To promote a replica we can use the pg_ctl promote command on the slave database node, and then, we must send the traffic from the application to the new primary node. For this last task, we can implement a load balancer between our application and the database nodes, to avoid any change from the application side in case of failure. We can also configure the load balancer to detect the node failure and instead of sending traffic to him, send the traffic to the new primary node.

After the failover process and make sure the system is working again, we can look into the issue, and we recommend to keep always at least one slave node working, so in case of a new primary failure, we can perform the failover task again.

PostgreSQL Replica Node Failure

PostgreSQL Replica Node Failure - Severalnines

This is not normally a critical issue (as long as you have more than one replica and are not using it to send the read production traffic). If you are experiencing issues on the primary node, and don’t have your replica up-to-date, you’ll have a real critical issue. If you’re using our replica for reporting or big data purposes, you will probably want to fix it quickly anyway.

The most common causes of this kind of issue are the same that we saw for the primary node, an operating system failure, hardware failure, or disk failure .You should check the database and the operating system logs to find the reason.

It’s not recommended to keep the system working without any replica as, in case of failure, you don’t have a fast way to get back online. If you have only one slave, you should solve the issue ASAP; the fastest way being by creating a new replica from scratch. For this you’ll need to take a consistent backup and restore it to the slave node, then configure the replication between this slave node and the primary node.

If you wish to know the failure reason, you should use another server to create the new replica, and then look into the old one to discover it. When you finish this task, you can also reconfigure the old replica and keep both working as a future failover option.

If you’re using the replica for reporting or for big data purposes, you must change the IP address to connect to the new one. As in the previous case, one way to avoid this change is by using a load balancer that will know the status of each server, allowing you to add/remove replicas as you wish.

PostgreSQL Replication Failure

PostgreSQL Replication Failure - Severalnines

In general, this kind of issue is generated due to a network or configuration issue. It’s related to a WAL (Write-Ahead Logging) loss in the primary node and the way PostgreSQL manages the replication.

If you have important traffic, you’re doing checkpoints too frequently, or you’re storing WALS for only a few minutes; if you have a network issue you’ll have little time to solve it. Your WALs would be deleted before you can send and apply it to the replica.

If the WAL that the replica needs to continue working was deleted you need to rebuild it, so to avoid this task, we should check our database configuration to increase the wal_keep_segments (amounts of WALS to keep in the pg_xlog directory) or the max_wal_senders (maximum number of simultaneously running WAL sender processes) parameters.

Another recommended option is to configure archive_mode on and send the WAL files to another path with the parameter archive_command. This way, if PostgreSQL reaches the limit and deletes the WAL file, we’ll have it in another path anyway.

PostgreSQL Data Corruption / Data Inconsistency / Accidental Deletion

PostgreSQL Data Corruption / Data Inconsistency / Accidental Deletion - Severalnines

This is a nightmare for any DBA and probably the most complex issue to be fixed, depending on how widespread the issue is.

When your data is affected by some of these issues, the most common way to fix it (and probably the only one) is by restoring a backup. That is why backups are the basic form of any disaster recovery plan and it is recommended that you have at least three backups stored in different physical places. Best practice dictates backup files should have one stored locally on the database server (for a faster recovery), another one in a centralized backup server, and the last one on the cloud

We can also create a mix of full/incremental/differential PITR compatible backups to reduce our Recovery Point Objective.

Managing PostgreSQL Failure with ClusterControl

Now that we have looked at these common PostgreSQL failures scenarios let’s look at what would happen if we were managing your PostgreSQL databases from a centralized database management system. One that is great in terms of reaching a fast and easy way to fix the issue, ASAP, in the case of failure.

Managing PostgreSQL Failure with ClusterControl

ClusterControl provides automation for most of the PostgreSQL tasks described above; all in a centralized and user-friendly way. With this system you will be able to easily configure things that, manually, would take time and effort. We will now review some of its main features related to PostgreSQL failure scenarios.

Deploy / Import a PostgreSQL Cluster

Once we enter the ClusterControl interface, the first thing to do is to deploy a new cluster or import an existing one. To perform a deployment, simply select the option Deploy Database Cluster and follow the instructions that appear.

Scaling Your PostgreSQL Cluster

If you go to Cluster Actions and select Add Replication Slave, you can either create a new replica from scratch or add an existing PostgreSQL database as a replica. In this way, you can have your new replica running in a few minutes and we can add as many replicas as we want; spreading read traffic between them using a load balancer (which we can also implement with ClusterControl).

PostgreSQL Automatic Failover

ClusterControl manages failover on your replication setup. It detects master failures and promotes a slave with the most current data as the new master. It also automatically fails-over the rest of the slaves to replicate from the new master. As for client connections, it leverages two tools for the task: HAProxy and Keepalived.

HAProxy is a load balancer that distributes traffic from one origin to one or more destinations and can define specific rules and/or protocols for the task. If any of the destinations stop responding, it is marked as offline, and the traffic is sent to one of the available destinations. This prevents traffic from being sent to an inaccessible destination and the loss of this information by directing it to a valid destination.

Keepalived allows you to configure a virtual IP within an active/passive group of servers. This virtual IP is assigned to an active “Main” server. If this server fails, the IP is automatically migrated to the “Secondary” server that was found to be passive, allowing it to continue working with the same IP in a transparent way for our systems.

Adding a PostgreSQL Load Balancer

If you go to Cluster Actions and select Add Load Balancer (or from the cluster view - go to Manage -> Load Balancer) you can add load balancers to our database topology.

The configuration needed to create your new load balancer is quite simple. You only need to add IP/Hostname, port, policy, and the nodes we are going to use.  You can add two load balancers with Keepalived between them, which allows us to have an automatic failover of our load balancer in case of failure. Keepalived uses a virtual IP address, and migrates it from one load balancer to another in case of failure, so our setup can continue to function normally.

PostgreSQL Backups

We have already discussed the importance of having backups. ClusterControl provides the functionality either to generate an immediate backup or schedule one.

You can choose between three different backup methods, pgdump, pg_basebackup, or pgBackRest. You can also specify where to store the backups (on the database server, on the ClusterControl server, or in the cloud), the compression level, encryption required, and the retention period.

PostgreSQL Monitoring & Alerting

Before being able to take action you need to know what is happening, so you’ll need to monitor your database cluster. ClusterControl allows you to monitor our servers in real-time. There are graphs with basic data such as CPU, Network, Disk, RAM, IOPS, as well as database-specific metrics collected from the PostgreSQL instances. Database queries can also be viewed from the Query Monitor.

In the same way that you enable monitoring from ClusterControl, you can also setup alerts which inform you of events in your cluster. These alerts are configurable, and can be personalized as needed.


Everyone will eventually need to cope with PostgreSQL issues and failures. And since you can’t avoid the issue, you need to be able to fix it ASAP and keep the system running. We also saw also how using ClusterControl can help with these issues; all from a single and user-friendly platform.

These are what we thought were some of the most common failure scenarios for PostgreSQL. We would love to hear about your own experiences and how you fixed it.


by Sebastian Insausti at September 17, 2019 09:45 AM

September 16, 2019


Database Switchover and Failover for Drupal Websites Using MySQL or PostgreSQL

Drupal is a Content Management System (CMS) designed to create everything from tiny to large corporate websites. Over 1,000,000 websites run on Drupal and it is used to make many of the websites and applications you use every day (including this one). Drupal has a great set of standard features such as easy content authoring, reliable performance, and excellent security. What sets Drupal apart is its flexibility as modularity is one of its core principles. 

Drupal is also a great choice for creating integrated digital frameworks. You can extend it with the thousands of add-ons available. These modules expand Drupal's functionality. Themes let you customize your content's presentation and distributions (Drupal bundles) are bundles which you can use as starter-kits. You can use all these functionalities to mix and match to enhance Drupal's core abilities or to integrate Drupal with external services. It is content management software that is powerful and scalable.

Drupal uses databases to store its web content. When your Drupal-based website or application is experiencing a large amount of traffic it can have an impact on your database server. When you are in this situation you'll require load balancing, high availability, and a redundant architecture to keep your database online. 

When I started researching this blog, I realized there are many answers to this issue online, but the solutions recommended were very dated. This could be a result of the increase in market share by WordPress resulting in a smaller open source community. What I did find were some examples on implementing high availability by using Master/Master (High Availability) or Master/Master/Slave (High Availability/High Performance)

Drupal offers support for a wide array of databases, but it was initially designed using MySQL variants. Though using MySQL is fully supported, there are better approaches you can implement. Implementing these other approaches, however, if not done properly, can cause your website to experience large amounts of downtime, cause your application to suffer performance issues, and may result in write issues to your slaves. Performing maintenance would also be difficult as you need failover to apply the server upgrades or patches (hardware or software) without downtime. This is especially true if you have a large amount of data, causing a potential major impact to your business. 

These are situations you don't want to happen which is why in this blog we’ll discuss how you can implement database failover for your MySQL or PostgreSQL databases.

Why Does Your Drupal Website Need Database Failover?

From Wikipedia “failover is switching to a redundant or standby computer server, system, hardware component or network upon the failure or abnormal termination of the previously active application, server, system, hardware component, or network. Failover and switchover are essentially the same operation, except that failover is automatic and usually operates without warning, while switchover requires human intervention.” 

In database operations, switchover is also a term used for manual failover, meaning that it requires a person to operate the failover. Failover comes in handy for any admin as it isolates unwanted problems such as accidental deletes/dropping of tables, long hours of downtime causing business impact, database corruption, or system-level corruption. 

Database Failover consists of more than a single database node, either physically or virtually. Ideally, since failover requires you to do switching over to a different node, you might as well switch to a different database server, if a host is running multiple database instances on a single host. That still can be either switchover or failover, but typically it's more of redundancy and high-availability in case a catastrophe occurs on that current host.

MySQL Failover for Drupal

Performing a failover for your Drupal-based application requires that the data handled by the database does not differentiate, nor separate. There are several solutions available, and we have already discussed some of them in previous Severalnines blogs. You may likely want to read our Introduction to Failover for MySQL Replication - the 101 Blog.

The Master-Slave Switchover

The most common approaches for MySQL Failover is using the master-slave switch over or the manual failover. There are two approaches you can do here:

  • You can implement your database with a typical asynchronous master-slave replication.
  • or can implement with asynchronous master-slave replication using GTID-based replication.

Switching to another master could be quicker and easier. This can be done with the following MySQL syntax:

mysql> SET GLOBAL read_only = 1; /* enable read-only */

mysql> CHANGE MASTER TO MASTER_HOST = '<hostname-or-ip>', MASTER_USER = '<user>', MASTER_PASSWORD = '<password>', MASTER_LOG_FILE = '<master-log-file>', MASTER_LOG_POS=<master_log_position>; /* master information to connect */

mysql> START SLAVE; /* start replication */

mysql> SHOW SLAVE STATUS\G /* check replication status */

or with GTID, you can simply do,


mysql> CHANGE MASTER TO MASTER_HOST = '<hostname-or-ip>', MASTER_USER = '<user>', MASTER_PASSWORD = '<password>', MASTER_AUTO_POSITION = 1; /* master information to connect */



Using the non-GTID approach requires you to determine first the master's log file and master's log pos. You can determine this by looking at the master's status in the master node before switching over. 


You may also consider hardening your server adding sync_binlog = 1 and innodb_flush_log_at_trx_commit = 1 as, in the event your master crashes, you'll have a higher chance that transactions from master will be insync with your slave(s). In such a case that promoted master has a higher chance of being a consistent datasource node.

This, however, may not be the best approach for your Drupal database as it could impose long downtimes if not performed correctly, such as being taken down abruptly. If your master database node experiences a bug resulting in a database to crash, you’ll need your application to point to another database waiting on standby as your new master or by having your slave promoted to be the master. You will need to specify exactly which node should take over and then determine the lag and consistency of that node. Achieving this is not as easy as just doing SET GLOBAL read_only=1; CHANGE MASTER TO… (etc), there are certain situations which require deeper analysis, looking at the viable transactions required to be present in that standby server or promoted master, to get it done. 

Drupal Failover Using MHA

One of the most common tools for automatic and manual failover is MHA. It has been around for a long while now and is still used by many organizations. You can checkout these previous blogs we have on the subject, Top Common Issues with MHA and How to Fix Them or MySQL High Availability Tools - Comparing MHA, MRM and ClusterControl.

Drupal Failover Using Orchestrator

Orchestrator has been widely adopted now and is being used by large organizations such as Github and It not only allows you to manage a failover, but also topology management, host discovery, refactoring, and recovery. There's a nice external blog here which I found it very useful to learn about its failover mechanism with Orchestrator. It's a two part blog series; part one and part two.

Drupal Failover Using MaxScale

MaxScale is not just a load balancer designed for MariaDB server, it also extends high availability, scalability, and security for MariaDB while, at the same time, simplifying application development by decoupling it from underlying database infrastructure. If you are using MariaDB, then MaxScale could be a relevant technology for you. Check out our previous blogs on how you can use the MaxScale failover mechanism.

Drupal Failover Using ClusterControl

Severalnines' ClusterControl offers a wide array of database management and monitoring solutions. Part of the solutions we offer is automatic failover, manual failover, and cluster/node recovery. This is very helpful as if it acts as your virtual database administrator, notifying you in real-time in case your cluster is in “panic mode,” all while the recovery is being managed by the system. You can check out this blog How to Automate Database Failover with ClusterControl to learn more about ClusterControl failover.

Other MySQL Solutions

Some of the old approaches are still applicable when you want to failover. There's MMM, MRM, or you can checkout Group Replication or Galera (note: Galera does not use asynchronous, rather synchronous replication). Failover in a Galera Cluster does not work the same way as it does with asynchronous replication. With Galera you can just write to any node or, if you implement a master-slave approach, you can direct your application to another node that will be the active-writer for the cluster.

Drupal PostgreSQL Failover

Since Drupal supports PostgreSQL, we will also checkout the tools to implement a failover or switchover process for PostgreSQL. PostgreSQL uses built-in Streaming Replication, however you can also set it to use a Logical Replication (added as a core element of PostgreSQL in version 10). 

Drupal Failover Using pg_ctlcluster

If your environment is Ubuntu, using pg_ctlcluster is a simple and easy way to achieve failover. For example, you can just run the following command:

$ pg_ctlcluster 9.6 pg_7653 promote

or with RHEL/Centos, you can use the pg_ctl command just like,

$ sudo -iu postgres /usr/lib/postgresql/9.6/bin/pg_ctl promote -D  /data/pgsql/slave/data

server promoting

You can also trigger failover of a log-shipping standby server by creating a trigger file with the filename and path specified by the trigger_file in the recovery.conf. 

You have to be careful with standby promotion or slave promotion here as you might have to ensure that only one master is accepting the read-write request. This means that, while doing the switchover, you might have to ensure the previous master has been relaxed or stopped.

Taking care of switchover or manual failover from primary to standby server can be fast, but it requires some time to re-prepare the failover cluster. Regularly switching from primary to standby is a useful practice as it allows for regular downtime on each system for maintenance. This also serves as a test of the failover mechanism, to ensure that it will really work when you need it. Written administration procedures are always advised. 

Drupal PostgreSQL Automatic Failover

Instead of a manual approach, you might require automatic failover. This is especially needed when a server goes down due to hardware failure or virtual machine corruption. You may also require an application to automatically perform the failover to lessen the downtime of your Drupal application. We'll now go over some of these tools which can be utilized for automatic failover.

Drupal Failover Using Patroni

Patroni is a template for you to create your own customized, high-availability solution using Python and - for maximum accessibility - a distributed configuration store like ZooKeeper, etcd, Consul or Kubernetes. Database engineers, DBAs, DevOps engineers, and SREs who are looking to quickly deploy HA PostgreSQL in the datacenter-or anywhere else-will hopefully find it useful

Drupal Failover Using Pgpool

Pgpool-II is a proxy software that sits between the PostgreSQL servers and a PostgreSQL database client. Aside from having an automatic failover, it has multiple features that includes connection pooling, load balancing, replication, and limiting the exceeding connections. You can read more about this tool is our three part blog; part one, part two, part three.

Drupal Failover Using pglookout

pglookout is a PostgreSQL replication monitoring and failover daemon. pglookout monitors the database nodes, their replication status, and acts according to that status. For example, calling a predefined failover command to promote a new master in the case the previous one goes missing.

pglookout supports two different node types, ones that are installed on the db nodes themselves and observer nodes that can be installed anywhere. The purpose of having pglookout on the PostgreSQL DB nodes is to monitor the replication status of the cluster and act accordingly, the observers have a more limited remit: they just observe the cluster status to give another viewpoint to the cluster state.

Drupal Failover Using repmgr

repmgr is an open-source tool suite for managing replication and failover in a cluster of PostgreSQL servers. It enhances PostgreSQL's built-in hot-standby capabilities with tools to set up standby servers, monitor replication, and perform administrative tasks such as failover or manual switchover operations.

repmgr has provided advanced support for PostgreSQL's built-in replication mechanisms since they were introduced in 9.0. The current repmgr series, repmgr 4, supports the latest developments in replication functionality introduced from PostgreSQL 9.3 such as cascading replication, timeline switching and base backups via the replication protocol.

Drupal Failover Using ClusterControl

ClusterControl supports automatic failover for PostgreSQL. If you have an incident, your slave can be promoted to master status automatically. With ClusterControl you can also deploy standalone, replicated, or clustered PostgreSQL database. You can also easily add or remove a node with a single action.

Other PostgreSQL Drupal Failover Solutions

There are certainly automatic failover solutions that I might have missed on this blog. If I did, please add your comments below so we can know your thoughts and experiences with your implementation and setup for failover especially for Drupal websites or applications.

Additional Solutions For Drupal Failover

While the tools I have mentioned earlier definitely handles the solution for your problems with failover, adding some tools that makes the failover pretty easier, safer, and has a total isolation between your database layer can be satisfactory. 

Drupal Failover Using ProxySQL

With ProxySQL, you can just point your Drupal websites or applications to the ProxySQL server host and it will designate which node will receive writes and which nodes will receive the reads. The magic happens transparently within the TCP layer and no changes are needed for your application/website configuration. In addition to that, ProxySQL acts also as your load balancer for your write and read requests for your database traffic. This is only applicable if you are using MySQL database variants.

Drupal Failover Using HAProxy with Keepalived

Using HAProxy and Keepalived adds more high availability and redundancy within your Drupal's database. If you want to failover, it can be done without your application knowing what's happening within your database layer. Just point your application to the vrrp IP that you setup in your Keepalived and everything will be handled with total isolation from your application. Having an automatic failover will be handled transparently and unknowingly by your application so no changes has to occur once, for example, a disaster has occurred and a recovery or failover was applied. The good thing about this setup is that it is applicable for both MySQL and PostgreSQL databases. I suggest you check out our blog PostgreSQL Load Balancing Using HAProxy & Keepalived to learn more about how to do this.

All of the options above are supported by ClusterControl. You can deploy or import the database and then deploy ProxySQL, MaxScale, or HAProxy & Keepalived. Everything will be managed, monitored, and will be set up automatically without any further configuration needed by your end. It all happens in the background and automatically creates a ready-for-production.


Having an always-on Drupal website or application, especially if you are expecting a large amount of traffic, can be complicated to create. If you have the right tools, the right setup, and the right technology stack, however, it is possible to achieve high availability and redundancy.

And if you don’t? Well then ClusterControl will set it up and maintain it for you. Alternatively, you can create a setup using the technologies mentioned in this blog, most of which are open source, free tools that would cater to your needs.

by Paul Namuag at September 16, 2019 06:53 PM

September 13, 2019


Percona Toolkit 3.1.0 Is Now Available

Percona Toolkit

Percona announces the release of Percona Toolkit 3.1.0 on September 13, 2019.

Percona Toolkit is a collection of advanced open-source command-line tools, developed and used by the Percona technical staff, that are engineered to perform a variety of MySQL®, MongoDB®, PostgreSQL® and system tasks that are too difficult or complex to perform manually. With over 1,000,000 downloads, Percona Toolkit supports Percona Server for MySQL, MySQL, MariaDB, PostgreSQL, Percona Server for MongoDB, and MongoDB.

Percona Toolkit, like all Percona software, is free and open source. You can download packages from the website or install from official repositories.

This release includes the following changes:

New features and improvements:

  • PT-1696: the new pt-pg-summary tool supports PostgreSQL data collection in a way similar to other PT summary tools. The following is a fragment of the report that the tool produces:
    • ##### --- Database Port and Data_Directory --- ####
      |         Name         |                      Setting                       |
      | data_directory       | /var/lib/postgresql/9.5/main                       |
      ##### --- List of Tablespaces ---- ######
      |         Name         |         Owner        |               Location                             |
      | pg_default           | postgres             |                                                    |
      | pg_global            | postgres             |                                                    |
      ##### --- Cluster Information --- ####
       Usename        : postgres                                                           
       Time           : 2019-09-13 08:30:42.272582 -0400 EDT                                     
       Client Address : ::1                                             
       Client Hostname:                         
       Version        : PostgreSQL 9.5.18 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1                                      
       Started        : 2019-09-13 08:29:43.175138 -0400 EDT                                  
       Is Slave       : false                                              
      ##### --- Databases --- ####
      |       Dat Name       |    Size    |
      | template1            |    6841 kB |
  • PT-1663: pt-stalk has two new options limiting the amount of disk space it can consume: --retention-size option makes pt-stalk to store less than the specified amount of megabytes, while --retention-count option limits the number of runs for which data are kept. Following simple example illustrates how these two parameters can be passed to the tool (here pt-stalk just collects the information and exits):
    pt-stalk --no-stalk --retention-count=3 --retention-size=100M -- --defaults-file=./my.default.cnf
  • PT-1741: Migration to a new MongoDB driver was done.
  • PT-1761: pt-online-schema-change will not run under MySQL 8.0.14 .. 8.0.17 if the table has foreign keys
    Important note: There is an error in MySQL from versions 8.0.14 up to the current 8.0.17 that makes MySQL die under certain conditions when trying to rename a table. Since the last step for pt-online-schema-change is to rename the tables to swap the old and new ones, we have added a check that prevents running pt-online-schema-change if the conditions for this error are met.

Bug fixes:

  • PT-1114: pt-table-checksum failed when the table was empty
  • PT-1344: pt-online-schema-change failed to detect hostnames with a specified port number
  • PT-1575: pt-mysql-summary did not print the PXC section for PXC 5.6 and 5.7
  • PT-1630: pt-table-checksum had a regression which prevented it from working with Galera cluster
  • PT-1633: pt-config-diff incorrectly parsed variables with numbers having K, M, G or T suffix (Thanks to Dieter Adriaenssens)
  • PT-1709: pt-upgrade generated “Use of uninitialized value in concatenation (.) or string” error in case of invalid MySQL packets
  • PT-1720: pt-pmp exited with an error in case of any unknown option in a common PT configuration file
  • PT-1728: pt-table-checksum failed to scan small tables that get wiped out often
  • PT-1734: pt-stalk did non-strict matching for ‘log_error’, resulting in wider filtering
  • PT-1746: pt-diskstats didn’t work for newer Linux kernels starting from 4.18

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

by Dmitriy Kostiuk at September 13, 2019 08:39 PM


Automatic Scaling with Amazon Aurora Serverless

Amazon Aurora Serverless provides an on-demand, auto-scalable, highly-available, relational database which only charges you when it’s in use. It provides a relatively simple, cost-effective option for infrequent, intermittent, or unpredictable workloads. What makes this possible is that it automatically starts up, scales compute capacity to match your application's usage, and then shuts down when it's no longer needed.

The following diagram shows Aurora Serverless high-level architecture.

Aurora Serverless high-level architecture

With Aurora Serverless, you get one endpoint (as opposed to two endpoints for the standard Aurora provisioned DB). This is basically a DNS record consisting of a fleet of proxies which sits on top of the database instance. From a MySQL server point it means that the connections are always coming from the proxy fleet.

Aurora Serverless Auto-Scaling

Aurora Serverless is currently only available for MySQL 5.6. You basically have to set the minimum and maximum capacity unit for the DB cluster. Each capacity unit is equivalent to a specific compute and memory configuration. Aurora Serverless reduces the resources for the DB cluster when its workload is below these thresholds. Aurora Serverless can reduce capacity down to the minimum or increase capacity to maximum capacity unit.

The cluster will automatically scale up if either of the following conditions are met:

  • CPU utilization is above 70% OR
  • More than 90% of connections are being used

The cluster will automatically scale down if both of the following conditions are met:

  • CPU utilization drops below 30% AND
  • Less than 40% of connections are being used.

Some of the notable things to know about Aurora automatic scaling flow:

  • It only scales up when it detects performance issues that can be resolved by scaling up.
  • After scaling up, the cooldown period for scaling down is 15 minutes. 
  • After scaling down, the cooldown period for the next scaling down again is 310 seconds.
  • It scales to zero capacity when there are no connections for a 5-minute period.

By default, Aurora Serverless performs the automatic scaling seamlessly, without cutting off any active database connections to the server. It is capable of determining a scaling point (a point in time at which the database can safely initiate the scaling operation). Under the following conditions, however, Aurora Serverless might not be able to find a scaling point:

  • Long-running queries or transactions are in progress.
  • Temporary tables or table locks are in use.

If either of the above cases happens, Aurora Serverless continues to try to find a scaling point so that it can initiate the scaling operation (unless "Force Scaling" is enabled). It does this for as long as it determines that the DB cluster should be scaled.

Observing Aurora Auto Scaling Behaviour

Note that in Aurora Serverless, only a small number of parameters can be modified and max_connections is not one of them. For all other configuration parameters, Aurora MySQL Serverless clusters use the default values. For max_connections, it is dynamically controlled by Aurora Serverless using the following formula: 

max_connections = GREATEST(




Where, log is log2 (log base-2) and "DBInstanceClassMemory" is the number of bytes of memory allocated to the DB instance class associated with the current DB instance, less the memory used by the Amazon RDS processes that manage the instance. It's pretty hard to predetermine the value that Aurora will use, thus it's good to put some tests to understand how this value is scaled accordingly.

Here is our Aurora Serverless deployment summary for this test:

Aurora Serverless deployment summary

For this example I’ve selected a minimum of 1 Aurora capacity unit, which is equal to 2GB of RAM up until the maximum 256 capacity unit with 488GB of RAM.

Tests were performed using sysbench, by simply sending out multiple threads until it reaches the limit of MySQL database connections. Our first attempt to send out 128 simultaneous database connections at once got into a straight failure:

$ sysbench \

/usr/share/sysbench/oltp_read_write.lua \

--report-interval=2 \

--threads=128 \

--delete_inserts=5 \

--time=360 \

--max-requests=0 \

--db-driver=mysql \

--db-ps-mode=disable \

--mysql-host=${_HOST} \

--mysql-user=sbtest \

--mysql-db=sbtest \

--mysql-password=password \

--tables=20 \

--table-size=100000 \


The above command immediately returned the 'Too many connections' error:

FATAL: unable to connect to MySQL server on host '', port 3306, aborting...

FATAL: error 1040: Too many connections

When looking at the max_connection settings, we got the following:

mysql> select @@hostname, @@max_connections;


| @@hostname     | @@max_connections |


| ip-10-2-56-105 |                90 |


It turns out, the starting value of max_connections for our Aurora instance with one DB capacity (2GB RAM) is 90. This is actually way lower than our anticipated value if calculated using the provided formula to estimate the max_connections value:

mysql> select GREATEST({log2(2147483648/805306368)*45},{log2(2147483648/8187281408)*1000});


| GREATEST({log2(2147483648/805306368)*45},{log2(2147483648/8187281408)*1000}) |


|                                                                     262.2951 |


This simply means the DBInstanceClassMemory is not equal to the actual memory for Aurora instance. It must be way lower. According to this discussion thread, the variable's value is adjusted to account for memory already in use for OS services and RDS management daemon.

Nevertheless, changing the default max_connections value to something higher won't help us either since this value is dynamically controlled by Aurora Serverless cluster. Thus, we had to reduce the sysbench starting threads value to 84 because Aurora internal threads already reserved around 4 to 5 connections via 'rdsadmin'@'localhost'. Plus, we also need an extra connection for our management and monitoring purposes.

So we executed the following command instead (with --threads=84):

$ sysbench \

/usr/share/sysbench/oltp_read_write.lua \

--report-interval=2 \

--threads=84 \

--delete_inserts=5 \

--time=600 \

--max-requests=0 \

--db-driver=mysql \

--db-ps-mode=disable \

--mysql-host=${_HOST} \

--mysql-user=sbtest \

--mysql-db=sbtest \

--mysql-password=password \

--tables=20 \

--table-size=100000 \


After the above test was completed in 10 minutes (--time=600), we ran the same command again and at this time, some of the notable variables and status had changed as shown below:

mysql> select @@hostname as hostname, @@max_connections as max_connections, (SELECT VARIABLE_VALUE from global_status where VARIABLE_NAME = 'THREADS_CONNECTED') as threads_connected, (SELECT VARIABLE_VALUE from global_status where VARIABLE_NAME = 'UPTIME') as uptime;


| hostname     | max_connections | threads_connected | uptime |


| ip-10-2-34-7 |             180 | 179 | 157    |


Notice that the max_connections has now doubled up to 180, with a different hostname and small uptime like the server was just getting started. From the application point-of-view, it looks like another "bigger database instance" has taken over the endpoint and configured with a different max_connections variable. Looking at the Aurora event, the following has happened:

Wed, 04 Sep 2019 08:50:56 GMT The DB cluster has scaled from 1 capacity unit to 2 capacity units.

Then, we fired up the same sysbench command, creating another 84 connections to the database endpoint. After the generated stress test completed, the server automatically scales up to 4 DB capacity, as shown below:

mysql> select @@hostname as hostname, @@max_connections as max_connections, (SELECT VARIABLE_VALUE from global_status where VARIABLE_NAME = 'THREADS_CONNECTED') as threads_connected, (SELECT VARIABLE_VALUE from global_status where VARIABLE_NAME = 'UPTIME') as uptime;


| hostname      | max_connections | threads_connected | uptime |


| ip-10-2-12-75 |             270 | 6 | 300   |


You can tell by looking at the different hostname, max_connection and uptime value if compared to the previous one. Another bigger instances has "taken over" the role from the previous instance, where DB capacity was equal to 2. The actual scaling point is when the server load was dropping and almost hitting the floor. In our test, if we kept the connection full and the database load consistently high, automatic scaling wouldn't take place. 

By looking at both screenshots below, we can tell the scaling only happens when our Sysbench has completed its stress test for 600 seconds because that is the safest point to perform automatic scaling.

Serverless DB CapacityServerless DB Capacity
CPU UtilizationCPU UtilizationCPU Utilization

When looking at Aurora events, the following events happened:

Wed, 04 Sep 2019 16:25:00 GMT Scaling DB cluster from 4 capacity units to 2 capacity units for this reason: Autoscaling.

Wed, 04 Sep 2019 16:25:05 GMT The DB cluster has scaled from 4 capacity units to 2 capacity units.

Finally, we generated much more connections until almost 270 and wait until it finished, to get into the 8 DB capacity:

mysql> select @@hostname as hostname, @@max_connections as max_connections, (SELECT VARIABLE_VALUE from global_status where VARIABLE_NAME = 'THREADS_CONNECTED') as threads_connected, (SELECT VARIABLE_VALUE from global_status where VARIABLE_NAME = 'UPTIME') as uptime;


| hostname      | max_connections | threads_connected | uptime |


| ip-10-2-72-12 |            1000 | 144 | 230    |


In the 8 capacity unit instance, MySQL max_connections value is now 1000. We repeated similar steps by maxing out the database connections and up until the limit of 256 capacity unit. The following table summarizes overall DB capacity unit versus max_connections value in our testing up to the maximum DB capacity:

Amazon Aurora DB Capacity

Forced Scaling

As mentioned above, Aurora Serverless will only perform automatic scaling when it's safe to do so. However, the user has the option to force the DB capacity scaling to happen immediately by ticking on the Force scaling checkbox under 'Additional scaling configuration' option:

Amazon Aurora Capacity Settings

When forced scaling is enabled, the scaling happens as soon as the timeout is reached which is 300 seconds. This behaviour may cause database interruption from your application where active connections to the database may get dropped. We observed the following error when force automatic scaling happened after it reaches timeout:

FATAL: mysql_drv_query() returned error 1105 (The last transaction was aborted due to an unknown error. Please retry.) for query 'SELECT c FROM sbtest19 WHERE id=52824'

FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:419: SQL error, errno = 1105, state = 'HY000': The last transaction was aborted due to an unknown error. Please retry.

The above simply means, instead of finding the right time to scale up, Aurora Serverless forces the instance replacement to take place immediately after reaches its timeout, which cause transactions being aborted and roll backed. Retrying the aborted query for the second time will likely solve the problem. This configuration could be used if your application is resilient to connection drops.


Amazon Aurora Serverless auto scaling is a vertical scaling solution, where a more powerful instance takes over an inferior instance, utilizing the underlying Aurora shared storage technology efficiently. By default, the auto scaling operation is performed seamlessly, whereby Aurora finds a safe scaling point to perform the instance switching. One has the option to force for automatic scaling with risks of active database connections get dropped.

by ashraf at September 13, 2019 09:45 AM

September 12, 2019


Integrations & Services Available from MongoDB for the Cloud

MongoDB is a document data store that has been around for over a decade. In the last few years, MongoDB has evolved into a mature product that features enterprise-grade options like scalability, security, and resilience. However, with the demanding cloud movement that wasn’t good enough.

Cloud resources, such as virtual machines, containers, serverless compute resources, and databases are currently in high demand. These days many software solutions can be spun-up in a fraction of the time it used to take to deploy onto one’s own hardware. It started a trend and changed the markets expectations at the same time.

But the quality of an online service is not limited to deployment alone. Often users need additional services, integrations, or extra features that help them to do their work. Cloud offerings can still be very limited and may cause more issues than what you can gain from the automation and remote infrastructure.

So what is MongoDB Inc.’s approach this common problem?

The answer was MongoDB Atlas, which brings internal extensions as a part of a larger cloud/automation platform. With the addition of third-party components, MongoDB has flourished. In today's blog, we are going to see what they have developer and how it can help you to address your data processing needs.

The items we will explore today are...

  • MongoDB Charts
  • MongoDB Stich
  • MongoDB Kubernetes Integrations with Ops Manager
  • MongoDB Cloud migration
  • Fulltext Search
  • MongoDB Data Lake (beta)

MongoDB Charts

MongoDB Charts is one of the services accessible through the MongoDB Atlas platform. It simply provides an easy way to visualize your data living inside MongoDB. You don’t need to move your data to a different repository or write your own code as MongoDB Charts was designed to work with data documents and make it easy to visualize your data.

MongoDB Charts

MongoDB Charts makes communicating your data a straightforward process by providing built-in tools to easily share and collaborate on visualizations. Data visualization is a key component to providing a clear understanding of your data, highlighting correlations between variables and making it easy to discern patterns and trends within your dataset. 

Here are some key features which you can use in the Charts.


Aggregation framework is an operational process that manipulates documents in different stages, processes them in accordance with the provided criteria, and then returns the computed results. Values from multiple documents are grouped together, on which more operations can be performed to return matching results.

MongoDB Charts Aggregation

MongoDB Charts provides built-in aggregation functionality. Aggregation allows you to process your collection data by a variety of metrics and perform calculations such as mean and standard deviation.

Charts provide seamless integration with MongoDB Atlas. You can link MongoDB Charts to Atlas projects and quickly get started visualizing your Atlas cluster data.

Document Data Handling

MongoDB Charts natively understands the benefits of the Document Data Model. It manages document-based data, including fixed objects and arrays. Using a nested data structure provides the flexibility to structure your data as it fits for your application while still maintaining visualization capabilities.

MongoDB Charts provides built-in aggregation functionality which allows you to process your collection data using a variety of metrics. It’s intuitive enough for non-developers to use, allowing for self-service data analysis which makes it a great tool for data analytics teams.

MongoDB Stitch

Have you heard about serverless architecture? 

With Serverless, you compose your application into individual, autonomous functions. Each function is hosted by the serverless provider and can be scaled automatically as function call frequency increases or decreases. This turns out to be a very cost-effective way of paying for computing resources. You only pay for the times that your functions get called, rather than paying to have your application always on and waiting for requests on so many different instances.

MongoDB Stitch

MongoDB Stitch is a different kind of MongoDB service taking only what’s most useful in the cloud infrastructure environments. It is a serverless platform that enables developers to build applications without having to set up server infrastructure. Stitch is made on top of MongoDB Atlas, automatically integrating the connection to your database. You can connect to Stitch through the Stitch Client SDKs, which are open for many of the platforms that you develop.

MongoDB Kubernetes Integrations with Ops Manager

Ops Manager is a management platform for MongoDB Clusters that you run on your own infrastructure. The capabilities of Ops Manager include monitoring, alerting, disaster recovery, scaling, deploying, and upgrading of Replica Sets and sharded clusters, and other MongoDB products. In 2018 MongoDB introduced beta integration with Kubernetes. 

The MongoDB Enterprise Operator is compatible with Kubernetes v1.11 and above. It has been tested against Openshift 3.11. This Operator requires Ops Manager or Cloud Manager. In this document, when we refer to "Ops Manager", you may substitute "Cloud Manager". The functionality is the same.

The installation is fairly simple and requires

  • Installing the MongoDB Enterprise Operator. This could be done via helm or YAML file.
  • Gather Ops Manager properties. 
  • Create and apply a Kubernetes ConfigMap file
  • Create the Kubernetes secret object which will store the Ops Manager API Key

In this basic example we are going to use YAML file:

kubectl apply -f crds.yaml
kubectl apply -f

The next step is to obtain the following information that we are going to use in ConfigMap File. All that can be found in the ops manager.

  • Base URL. Base Url is the URL of your Ops Manager or Cloud Manager.
  • Project Id. The id of an Ops Manager Project which the Kubernetes Operator will deploy into.
  • User. An existing Ops Manager username
  • Public API Key. Used by the Kubernetes Operator to connect to the Ops Manager REST API endpoint

Now that we have acquired the necessary Ops Manager configuration information we need to create a Kubernetes ConfigMap file for the Kubernetes. For exercise purposes we can call this file project.yaml.

apiVersion: v1

kind: ConfigMap



  namespace: mongodb


  projectId:<<Project ID>>

  baseUrl: <<OpsManager URL>>

The next step is to create ConfigMap to Kubernetes and secret file

kubectl apply -f my-project.yaml

kubectl -n mongodb create secret generic <<Name of credentials>> --from-literal="user=<<User>>" --from-literal="publicApiKey=<<public-api-key>>"

Once we have we can deploy our first cluster


kind: MongoDbReplicaSet


  name: <<Replica set name>>

  namespace: mongodb


  members: 3

  version: 4.2.0

  persistent: false

  project: <<Name value specified in of ConfigMap file>>

  credentials: <<Name of credentials secret>>

For more detailed instructions please visit the MongoDB documentation. 

MongoDB Cloud migration

The Atlas Live Migration Service can migrate your data from your existing environment whether it's on AWS, Azure, GCP, or on-prem to MongoDB Atlas, the global cloud database for MongoDB.

The migration is done via a dedicated replication service. Atlas Live Migration process streams data through a MongoDB-controlled application server. 

Live migration works by keeping a cluster in MongoDB Atlas in sync with your source database. During this process, your application can continue to read and write from your source database. Since the process watches upcoming changes, all will be replicated, and the migration can be done online. You decide when to change the application connection setting and do cutover. To do the process less prone Atlas provides Validate option which checks whitelist IP access, SSL configuration, CA, etc.

Full-Text Search

Full-text search is another service cloud service provided by MongoDB and is available only in MongoDB Atlas. Non-Atlas MongoDB deployments can use text indexing. Atlas Full-Text Search is built on Open Source Apache Lucene. Lucene is a powerful text search library. Lucene has a custom query syntax for querying its indexes. It’s a foundation of popular systems such as Elasticsearch and Apache Solr. It allows creating an index for full-text search, it's searching, saving and reading. It’s fully integrated into Atlas MongoDB so there are no additional systems or infrastructure to provision or manage.

MongoDB Data Lake (beta)

The last MongoDB cloud feature we would like to mention in MongoDB Data Lake. It’s fairly new service addressing the popular concept of data lakes. A data lake is a vast pool of raw data, the purpose for which is not yet defined. Instead of placing data in a purpose-built data store, you move it into a data lake in its original format. This eliminates the upfront costs of data ingestion, like transformation. Once data is placed into the. 

Using Atlas Data Lake to ingest your S3 data into Atlas clusters allows you to query data stored in your AWS S3 buckets using the Mongo Shell, MongoDB Compass, and any MongoDB driver.

There are some limitations though. The following features do not work yet like monitoring Data Lakes with Atlas monitoring tools, single S3 AWS account support, IP whitelist and AWS account and AWS security groups limitations or no possibility to add indexes.


by Bart Oles at September 12, 2019 03:06 PM

September 11, 2019

MariaDB Foundation

MariaDB 10.4.8, 10.3.18 and 10.2.27 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.4.8, MariaDB 10.3.18 and MariaDB 10.2.27, the latest stable releases in their respective series. See the release notes and changelogs for details. Download MariaDB 10.4.8 Release Notes Changelog What is MariaDB 10.3? Download MariaDB 10.3.18 Release Notes Changelog What is MariaDB 10.3? Download MariaDB […]

The post MariaDB 10.4.8, 10.3.18 and 10.2.27 now available appeared first on

by Ian Gilfillan at September 11, 2019 08:07 PM


The Basics of Deploying a MongoDB Replica Set and Shards Using Puppet

Database system perform best when they are integrated with some well defined approaches that facilitate both the read and write throughput  operations. MongoDB went the extra mile by embracing replication and sharding with the aim of enabling horizontal and vertical scaling as opposed to relational DBMs whose same concept only enhance vertical scaling.

 Sharding ensures distribution of load among the members of the database cluster so that the read operations are carried out with little latency. Without sharding, the capacity of a single database server with a large set of data and high throughput operations can be technically challenged and may result in failure of that server if the necessary measures are not taken into account. For example, if the rate of queries is very high, the CPU capacity of the server will be overwhelmed.

Replication on the other hand is a concept whereby different database servers are housing the same data. It ensures high availability of data besides enhancing data integrity. Take an example of a high performing social media application, if the main serving database system fails like in case of a power blackout, we should have another system to be serving the same data. A good replica set should have more than 3 members, an arbiter and optimal electionTimeoutMillis. In replication, we will have a master/primary node where all the write operations are made and then applied to an Oplog. From the Oplog, all the made changes are then applied to the other members, which in this case are referred to as secondary nodes or slaves. In case the primary nodes does not communicate after some time: electionTimeoutMillis, the other nodes are signaled to go for an election. The electionTimeoutMillis should be set not too high nor too low for reason that the systems will be down for a long time hence lose a lot of data or frequent elections that may result even with  temporary network latency hence data inconsistency respectively. An arbiter is used to add a vote to a winning member to become a master in case there is a draw but does not carry any data like the other members.

Why Use Puppet to Deploy a MongoDB Replica Set

More often, sharding is used hand in hand with replication. The process of configuring and maintaining a replica set is not easy due to:

  1. High chances of human error
  2. Incapability to carry out repetitive tasks automatically
  3. Time consuming especially when a large number of members is involved
  4. Possibility of work dissatisfaction
  5. Overwhelming complexity that may emerge.

In order to overcome the outlined setbacks, we settle to an automated system like Puppet that have plenty of resources to help us work with ease.

In our previous blog, we learnt the process of installing and configuring MongoDB with Puppet. However, it is important to understand the basic resources of Puppet since we will be using them in configuring our replica set and shards. In case you missed it out, this is the manifest file for the process of installing and running your MongoDB on the machine you created

​  package {'mongodb':

    ensure => 'installed',


  service {'mongodb':

    ensure => 'running',

    enable => true


So we can put the content above in a file called runMongoDB.pp and run it with the command 

$ sudo apply runMongoDB.pp

Sing the 'mongodb' module and functions, we can set up our replica set with the corresponding parameters for each  mongodb resource.

MongoDB Connection

We need to establish a mongodb connection between a node and the mongodb server. The main aim of this is to prevent configuration changes from being applied if the mongodb server cannot be reached but can potentially be used for other purposes like database monitoring. We use the mongodb_conn_validator


ensure => present,

     server: ‘’,

     timeout: 40,



name:  in this case the name mongodb_validator defines identity of the resource. It could also be considered as a connection string

server: this could be a string or an array of strings containing DNS names/ IP addresses of the server where mongodb should be running.

timeout: this is the maximum number of seconds the validator should wait before deciding that the puppetdb is not running.

tcp_port:  this is a provider for the resource that validates the mongodb connection by attempting the https connection to the mongodb server. The puppet SSL certificate setup from the local puppet environment is used in the authentication.

Creating the Database


ensure => present,

     tries => 10


This function takes 3 params that is:

name:  in this case the name databaseName defines the name of the database we are creating, which would have also been declared as name => ‘databaseName’.

tries: this defines the maximum amount of two second tries to wait MongoDB startup

Creating MongoDB User

The module mongodb_user enables one to create and manage users for a given database in the puppet module.

mongodb_user {userprod:

  username => ‘prodUser’,

  ensure => present,

  password_hash => mongodb_password(‘prodUser’, ‘passProdser’),

  database => prodUser,

  roles => [‘readWrite’, ‘dbAdmin’],

  tries  => 10



username: defines the name of the user.

password_hash: this is the password hash of the user. The function mongodb_password() available on MongoDB 3.0 and later is used for creating the hash.

roles: this defines the roles that the user is allowed to execute on the target database.

password: this is the plain user password text.

database: defines the user’s target database.

Creating a Replica Set

We use the module mongodb_replset to create a replica set.


   arbiter: 'host0:27017',

   ensure  => present,

   members => ['host0:27017','host1:27017', 'host2:27017', 'host3:27017'] 

   initialize_host: host1:27017


name: defines the name of the replica set.

members: an array of members the replica set will  hold.

initialize_host: host to be used in initialization of the replica set

arbiter: defines the replica set member that will be used as an arbiter.

Creating a MongoDB Shard


   ensure  => present,

   members => ['shard1/host1:27017', 'shard1/host2:27017', 'shard1/host3:27017'] 

   keys: 'price'


name: defines the name of the shard.

members: this the array  of members the shard will  hold.

keys: define the key to be used in the sharding or an array of keys that can be used to create a compound shard key.

by Onyancha Brian Henry at September 11, 2019 02:50 PM

Serge Frezefond

Using Terraform and Kubernetes to provision MariaDB on Azure

In previous post I used Terraform to provision a managed version of MariaDB (AWS RDS for MariaDB). There exist various managed version of MariaDB on the major cloud providers : AWS, Azure, Alibaba Cloud. All of these versions offer a simplification to rapidly deploy and operate MariaDB. You benefit from easy setup including High availability ...continue reading "Using Terraform and Kubernetes to provision MariaDB on Azure"

by Serge at September 11, 2019 01:23 PM

September 10, 2019


Comparing Galera Cluster Cloud Offerings: Part Three Microsoft Azure

Microsoft Azure is known to many as an alternative public cloud platform to Amazon AWS. It's not easy to directly compare these two giant companies. Microsoft's cloud business -- dubbed commercial cloud -- includes everything from Azure to Office 365 enterprise subscriptions to Dynamics 365 to LinkedIn services. After LinkedIn was acquired by Microsoft it began moving its infrastructure to Azure. While moving LinkedIn to Azure could take some time, it demonstrates Microsoft Azure’s capabilities and ability to handle millions of transactions. Microsoft's strong enterprise heritage, software stack, and data center tools offer both familiarity and a hybrid approach to cloud deployments.

Microsoft Azure is built as an Infrastructure as a Service (IaaS) as well as a Platform as a Service (PaaS). The Azure Virtual machine offers per-second billing and it's currently a multi-tenant compute. It has, however, recently previewed its new offering which allows virtual machines to run on single-tenant physical servers. The offering is called Azure Dedicated Hosts

Azure also offers specialized large instances (such as for SAP HANA). There are multitenant blocks, file storage, and many  other additional IaaS and PaaS capabilities. These include object storage (Azure Blob Storage), a CDN, a Docker-based container service (Azure Container Service), a batch computing service (Azure Batch), and event-driven “serverless computing” (Azure Functions). The Azure Marketplace offers third-party software and services. Colocation needs are met via partner exchanges (Azure ExpressRoute) offered from partners like Equinix and CoreSite.

With all of these offerings Microsoft Azure has stepped up its game to play a vital role in the public cloud market. The PaaS infrastructure offered to its consumers has garnered a lot of trust and many are moving their own infrastructure or private cloud to Microsoft Azure's public cloud infrastructure. This is especially advantageous for consumers who need integration with other Windows Services, such as Visual Studio.

So what’s different between Azure and the other clouds we have looked at in this series? Microsoft has focused heavily on AI, analytics, and the Internet of Things. AzureStack is another “cloud-meets-data center” effort that has been a real differentiator in the market.

Microsoft Azure Migration Pros & Cons

There are several things you should consider when moving your legacy applications or infrastructure to Microsoft Azure.


  • Enterprises that are strategically committed to Microsoft technology generally choose Azure as their primary IaaS+PaaS provider. The integrated end-to-end experience for enterprises building .NET applications using Visual Studio (and related services) is unsurpassed. Microsoft is also leveraging its tremendous sales reach and ability to co-sell Azure with other Microsoft products and services in order to drive adoption.
  • Azure provides a well-integrated approach to edge computing and Internet of Things (IoT), with offerings that reach from its hyperscale data center out through edge solutions such as AzureStack and Data Box Edge.
  • Microsoft Azure’s capabilities have become increasingly innovative and open. 50% of the workloads are Linux-based alongside numerous open-source application stacks. Microsoft has a unique vision for the future that involves bringing in technology partners through native, first-party offerings such as those from VMware, NetApp, Red Hat, Cray and Databricks.


  • Microsoft Azure’s reliability issues continue to be a challenge for customers, largely as a result of Azure’s growing pains. Since September 2018, Azure has had multiple service-impacting incidents, including significant outages involving Azure Active Directory. These outages leave customers with no ability to mitigate the downtime.
  • Gartner clients often experience challenges with executing on-time implementations within budget. This comes from Microsoft often providing unreasonably high expectations for customers. Much of this stems from the Microsoft’s field sales teams being “encouraged” to appropriately position and sell Azure within its customer base.
  • Enterprises frequently lament the quality of Microsoft technical support (along with the increasing cost of support) and field solution architects. This negatively impacts customer satisfaction, and slows Azure adoption and therefore customer spending.

Microsoft may not be your first choice as it has been seen as a “not-so-open-source-friendly” tech giant, but in fairness it has embraced a lot of activity and support within the Open Source world. Microsoft Azure offers fully-managed services to most of the top open source RDBMS database like PostgreSQL, MySQL, and MariaDB.  

Galera Cluster (Percona, Codership, or MariaDB) variants, unfortunately, aren't supported by Azure. The only way you can deploy your Galera Cluster to Azure is by means of a Virtual Machine. You may also want to check their blog on using MariaDB Enterprise Cluster (which is based on Galera) on Azure.

Azure's Virtual Machine

Virtual Machine is the equivalent offering for compute instances in GCP and AWS. An Azure Virtual Machine is an on-demand, high-performance computing server in the cloud and can be deployed in Azure using various methods. These might include the user interface within the Azure portal, using pre-configured images in the Azure marketplace, scripting through Azure PowerShell, deploying from a template that is defined by using a JSON file, or by deploying directly through Visual Studio.

Azure uses a deployment model called the Azure Resource Manager (ARM), which defines all resources that form part of your overall application solution, allowing you to deploy, update, or delete your solution in a single operation.

Resources may include the storage account, network configurations, and IP addresses. You may have heard the term “ARM templates”, which essentially means the JSON template which defines the different aspects of your solution which you are trying to deploy.

Azure Virtual Machines come in different types and sizes, with names beginning with A-series to N-series. Each VM type is built with specific workloads or performance needs in mind, including general purpose, compute optimized, storage optimized or memory optimized. You can also deploy less common types like GPU or high performance compute VMs.

Similar to other public cloud offerings, you can do the following in your virtual machine instances...

  • Encrypt your disk on virtual machine. Although this does not come easily when compared to GCP and AWS. Encrypting your virtual machine requires a more manual approach. It requires you to complete the Azure Disk Encryption prerequisites. Since Galera does not support Windows, we're only talking here about Linux-based images. Basically, it requires you to have dm-crypt and vfat modules present in the system. Once you get that piece right, then you can encrypt the VM using the Azure CLI. You can check out how to Enable Azure Disk Encryption for Linux IaaS VMs to see how to do it. Encrypting your disk is very important, especially if your company or organization requires that your Galera Cluster data must follow the standards mandated by laws and regulations such as PCI DSS or GDPR.
  • Creating a snapshot. You can create a snapshot either using the Azure CLI or through the portal. Check their manual on how to do it.
  • Use auto scaling or Virtual Machine Scale Sets if you require horizontal scaling. Check out the overview of autoscaling in Azure or the overview of virtual machine scale sets.
  • Multi Zone Deployment. Deploy your virtual machine instances into different availability zones to avoid single-point of failure.

You can also create (or get information from) your virtual machines in different ways. You can use the Azure portal, Azure PowerShell, REST APIs, Client SDKs, or with the Azure CLI. Virtual machines in the Azure virtual network can also easily be connected to your organization’s network and treated as an extended datacenter.

Microsoft Azure Pricing

Just like other public cloud providers, Microsoft Azure also offers a free tier with some free services. It also offers pay-as-you-go options and reserved instances to choose from. Pay-as-you-go starts at $0.008/hour - $0.126/hour.

Microsoft Azure Pricing

For reserved instances, the longer you commit and contract with Azure, the more you save on the cost. Microsoft Azure claims to help subscribers save up to 72% of their billing costs compared to its pay-as-you-go model when subscribers sign up for a one to three year term for a Windows or Linux Virtual Machine. Microsoft also offers added flexibility in the sense that if your business needs change, you can cancel your Azure RI subscription at any time and return the remaining unused RI to Microsoft as an early termination fee.

Let's checkout it's pricing in comparison between GCP, AWS EC2, and an Azure Virtual Machine. This is based on us-east1 region and we will compare the price ranges for the compute instances required to run your Galera Cluster.


Compute Engine






Prices starts at $0.006 -  $0.019 hourly

t2.nano – t3a.2xlarge

Price starts at $0.0058 - $0.3328 hourly


Price starts at $0.0052 - $0.832 hourly


n1-standard-1 – n1-standard-96

Prices starts at $0.034  - $3.193 hourly

m4.large – m4.16xlarge

m5.large – m5d.metal

Prices starts at $0.1 - $5.424  hourly

Av2 Standard, D2-64 v3 latest generation, D2s-64s v3 latest generation, D1-5 v2, DS1-S5 v2, DC-series

Price starts at $0.043 - $3.072 hourly

High Memory/ Memory Optimized

n1-highmem-2 – n1-highmem-96


n1-ultramem-40 – n1-ultramem-160

Prices starts at $0.083  - $17.651 hourly

r4.large – r4.16xlarge

x1.16xlarge – x1.32xlarge

x1e.xlarge – x1e.32xlarge

Prices starts at $0.133  - $26.688 hourly

D2a – D64a v3, D2as – D64as v3, E2-64 v3 latest generation, E2a – E64a v3, E2as – E64as v3, E2s-64s v3 latest generation, D11-15 v2, DS11-S15 v2, M-series, Mv2-series, Instances, Extreme Memory Optimized

Price starts at $0.043 - $44.62 hourly

High CPU/Storage Optimized

n1-highcpu-2 – n1-highcpu-32

Prices starts at $0.05 - $2.383 hourly

h1.2xlarge – h1.16xlarge

i3.large – i3.metal

I3en.large - i3en.metal

d2.xlarge – d2.8xlarge

Prices starts at $0.156 - $10.848  hourly

Fsv2-series, F-series, Fs-Series

Price starts at $0.0497 - $3.045 hourly


Data Encryption on Microsoft Azure

Microsoft Azure does not offer encryption support directly for Galera Cluster (or vice-versa). There are, however, ways you can encrypt data either at-rest or in-transit.

Encryption in-transit is a mechanism for protecting data when it's transmitted across networks. With Azure Storage, you can secure data by using:

Microsoft uses encryption to protect customer data when it’s in-transit between customers realm and Microsoft cloud services. More specifically, Transport Layer Security (TLS) is the protocol that Microsoft’s data centers will use to negotiate with client systems that are connected to Microsoft cloud services.  

Perfect Forward Secrecy (PFS) is also employed so that each connection between customers’ client systems and Microsoft’s cloud services use unique keys. Connections to Microsoft cloud services also take advantage of RSA based 2,048-bit encryption key lengths.

Encryption At-Rest

For many organizations, data encryption at-rest is a mandatory step towards achieving data privacy, compliance, and data sovereignty. Three Azure features provide encryption of data at-rest:

  • Storage Service Encryption is always enabled and automatically encrypts storage service data when writing it to Azure Storage. If your application logic requires your MySQL Galera Cluster database to store valuable data, then storing to Azure Storage can be an option.
  • Client-side encryption also provides the feature of encryption at-rest.
  • Azure Disk Encryption enables you to encrypt the OS disks and data disks that an IaaS virtual machine uses. Azure Disk Encryption also supports enabling encryption on Linux VMs that are configured with disk striping (RAID) by using mdadm, and by enabling encryption on Linux VMs by using LVM for data disks

Galera Cluster Multi-AZ/Multi-Region/Multi-Cloud Deployments with GCP

Similar to AWS and GCP, Microsoft Azure does not offer direct support for deploying a Galera Cluster onto a Multi-AZ/-Region/-Cloud. You can, however, deploy your nodes manually as well as creating scripts using PowerShell or Azure CLI to do this for you. Alternatively, when you provision your Virtual Machine instance you can place your nodes in different availability zones. Microsoft Azure also offers another type of redundancy, aside from having its availability zone, which is called Virtual Machine Scale Sets. You can check the differences between virtual machine and scale sets.

Galera Cluster High Availability, Scalability, and Redundancy on Azure

One of the primary reasons for using a Galera node cluster is for high-availability, redundancy, and for its ability to scale. If you are serving traffic globally, it's best that you cater your traffic by region. You should ensure your architectural design includes geo-distribution of your database nodes. In order to achieve this, multi-AZ, multi-region, or multi-cloud/multi-datacenter deployments are recommended. This prevents the cluster from going down as well as a malfunction due to lack of quorum. 

As mentioned earlier, Microsoft Azure has an auto scaling solution which can be leveraged using scale sets. This allows you to autoscale a node when a certain threshold has been met (based on what you are monitoring). This depends on which health status items you are monitoring before it then vertically scales. You can check out their tutorial on this topic here.

For multi-region or multi-cloud deployments, Galera has its own parameter called gmcast.segment for which can be set upon server start. This parameter is designed to optimize the communication between the Galera nodes and minimize the amount of traffic sent between network segments. This includes writeset relaying and IST and SST donor selection. This type of setup allows you to deploy multiple nodes in different regions. Aside from that, you can also deploy your Galera nodes on a different cloud vendors routing from GCP, AWS, Microsoft Azure, or within an on-premise setup. 

We recommend you to check out our blog Multiple Data Center Setups Using Galera Cluster for MySQL or MariaDB and Zero Downtime Network Migration With MySQL Galera Cluster Using Relay Node to gather more information on how to implement these types of deployments.

Galera Cluster Database Performance on Microsoft Azure

The underlying host machines used by virtual machines in Azure are, in fact, very powerful. The newest VM's in Azure have already been equipped with network optimization modules. You can check this in your kernel info by running (e.g. in Ubuntu).

uname -r|grep azure

Note: Make certain that your command has the "azure" string on it. 

For Centos/RHEL, installing any Linux Integration Services (LIS) since version 4.2 contains network optimization. To learn more about this, visit the page on optimizing network throughput.

If your application is very sensitive to network latency, you might be interested in looking at the proximity placement group. It's currently in preview (and not yet recommended for production use) but this helps optimize your network throughput. 

For the type of virtual machine you would consume, then this would depend on the requirement of your application traffic and resource demands. For queries that are high on memory consumption, you can start with Dv3. However, for memory-optimized, then start with the Ev3 series. For High CPU requirements, such as high-transactional database or gaming applications, then start with Fsv2 series.

Choosing the right storage and required IOPS for your database volume is a must. Generally, a SSD-based persistent disk is your ideal choice. Begin with Standard SSD which is cost-effective and offers consistent performance. This decision, however, might depend on if you need more IOPS in the long run. If this is the case, then you should go for Premium SSD storage.

We also recommend you to check and read our blog How to Improve Performance of Galera Cluster for MySQL or MariaDB to learn more about optimizing your Galera Cluster.

Database Backup for Galera Nodes on Azure

There's no existing naitve backup support for your MySQL Galera data in Azure, but you can take a snapshot. Microsoft Azure offers Azure VM Backup which takes a snapshot which can be scheduled and encrypted. 

Alternatively, if you want to backup the data files from your Galera Cluster, you can also use external services like ClusterControl, use Percona Xtrabackup for your binary backup, or use mysqldump or mydumper for your logical backups. These tools provide backup copies for your mission-critical data and you can read this if you want to learn more.

Galera Cluster Monitoring on Azure

Microsoft Azure has its monitoring service named Azure Monitor. Azure Monitor maximizes the availability and performance of your applications by delivering a comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premise environments. It helps you understand how your applications are performing and proactively identifies issues affecting them (and the resources they depend on). You can setup or create health alerts, get notified on advisories and alerts detected in the services you deployed.

If you want monitoring specific to your database, then you will need to utilize external monitoring tools which have  advanced, highly-granular database metrics. There are several choices you can choose from such as PMM by Percona, DataDog, Idera, VividCortex, or our very own ClusterControl (Monitoring is FREE with ClusterControl Community.)

Galera Cluster Database Security on Azure

As discussed in our previous blogs for AWS and GCP, you can take the same approach for securing your database in the public cloud. Once you create a virtual machine, you can specify what ports only can be opened, or create and setup your Network Security Group in Azure. You can setup the ports need to be open (particularly ports 3306, 4444, 4567, 4568), or create a Virtual Network in Azure and specify the private subnets if they remain as a private node. To add this, if you setup your VM's in Azure without a public IP, it can still an outbound connection merely because it uses SNAT and PAT. If you're familiar with AWS and GCP, you'll like this explanation to make it easier to comprehend.

Another feature available is Role-Based Access Control in Microsoft Azure. This gives you control on which people that access to the specific resources they need.

In addition to this, you can secure your data-in-transit by using a TLS/SSL connection or by encrypting your data when it's at-rest. If you're using ClusterControl, deploying a secure data in-transit is simple and easy. You can check out our blog SSL Key Management and Encryption of MySQL Data in Transit if you want to try out. For data at-rest, you can follow the discussion I have stated earlier in the Encryption section of this blog.

Galera Cluster Troubleshooting 

Microsoft Azure offers a wide array of log types to aid troubleshooting and auditing. The logs Activity logs, Azure diagnostics logs, Azure AD reporting, Virtual machines and cloud services, Network Security Group (NSG) flow logs, and Application insight are very useful when troubleshooting. It might not always be necessary to go into all of these when you need troubleshooting, however, it would add more insights and clues when checking the logs.

If you're using ClusterControl, going to Logs -> System Logs, and you'll be able to browse the captured error logs taken from the MySQL Galera node itself. Apart from this, ClusterControl provides real-time monitoring that would amplify your alarm and notification system in case an emergency or if your MySQL Galera node(s) is kaput.


As we finish this three part blog series, we have showed you the offerings and the advantages of each of the tech-giants serving the public cloud industry. There are advantages and disadvantages when selecting one over the other, but what matters most is your reason for moving to a public cloud, its benefits for your organization, and how it serves the requirements of your application. 

The choice of provider for your Galera Cluster may involve financial considerations like “what's most cost-efficient” and better suits your budgetary needs. It could also be due to privacy laws and regulation compliance, or even because of the technology stack you are wanting to use.  What's important is how your application and database will perform once it's in the cloud handling large amounts of traffic. It has to be highly-available, must be resilient, has the right levels of scalability and redundancy, and takes backups to ensure data recovery.

by Paul Namuag at September 10, 2019 03:19 PM

MariaDB Foundation

2019 上海MariaDB开发者会议

(The original English version of this post is available here). MariaDB基金会很高兴地宣布,2019 年度MariaDB开发者会议将于上海举行。继第一届2017年深圳开发者会议之后,这是该会议在中国的第二次举办。此次会议将于2019年11月19日周二开始,2019年11月21日周四结束。 此次会议将由微软上海分公司友情承办。请有意向与会者,点击此链接报名参加event page at 我们建议您选择会议地址附近的酒店入住。所有会议均免费参加。 此次会议的模式将与以往相同,欢迎大家围绕所有主题进行讨论与合作。我们欢迎每一位有兴趣为MarianDB开源项目做贡献的人的参与。参与形式不限,可以为编程或其他任何方式。参与人员也不局限于核心开发人员。这是一次开放的聚会,我们欢迎喜欢开源的学习和工作方式,并愿意为MariaDB做出贡献的新伙伴和老朋友。 此次会议将围绕MariaDB开源项目展开深度演讲与交流。届时很多核心MariaDB开发者也将出席会议。 此次会议的目的在于能够提供一个面对面交流的平台,使新老开发者们能共同破解难题或者规划未来。 若您需要申请办理签证所需邀请函,请浏览此处see the visa applications page for China from Shenzhen. 若您对此次会议或者如何致力于MariaDB有任何疑问,请随时于此处发起您的讨论MariaDBdevelopers mailing list以及MariaDB discuss mailing list. 报名参加会议请点击此处 MariaDB Developers (Un)Conference here.

The post 2019 上海MariaDB开发者会议 appeared first on

by Ian Gilfillan at September 10, 2019 09:56 AM

2019 Developers Unconference, Shanghai

(A Chinese version of this post is available here). The MariaDB Foundation is pleased to announce the 2019 MariaDB Developers Unconference in Shanghai. This will be our second Unconference in China, after the 2017 Developers Unconference in Shenzhen, and will take place from Tuesday 19 November to Thursday 21 November 2019. Microsoft Shanghai are kindly […]

The post 2019 Developers Unconference, Shanghai appeared first on

by Ian Gilfillan at September 10, 2019 09:55 AM

September 09, 2019


Cloud Vendor Deep-Dive: PostgreSQL on Microsoft Azure

If you have followed Microsoft lately it will come as no surprise that the provider of a competing database product, namely SQL Server, also jumped on the PostgreSQL bandwagon. From releasing 60,000 patents to OIN to being Platinum sponsor at PGCon, Microsoft as one of the PostgreSQL corporate backing organizations. Took every opportunity for showing that not only can you run PostgreSQL on Microsoft, but also the reverse is true: Microsoft, through its cloud offering, can run PostgreSQL for you. The statement became even more clear with the acquisition of Citus Data and the release of their flagship product in the Azure Cloud under the name of Hyperscale. It is safe to say that PostgreSQL adoption is growing and now there are even more good reasons to choose it.

My journey through the Azure cloud started right at the landing page where I meet the contenders: Single Server and a preview (in other words no SLA provided) release of Hyperscale (Citus). This blog will focus on the former. While on this journey, I had the opportunity to practice what open source is all about — giving back to the community — in this case, by providing feedback to the documentation that, to Microsoft’s credit, they make this very easy by piping the feedback straight into Github:

Github: My Azure Documentation Feedback Issues

PostgreSQL Compatibility with Azure


According to product documentation Single Server targets PostgreSQL versions in the n-2 major range:

Azure Database for PostgreSQL: Single server PostgreSQL versions

As a solution built for performance Single Server is recommended for data sets 100 GB and larger. The servers provided predictable performance — the database instances come with a predefined number of vCores and IOPS (based on the size of provisioned storage).


There is a fair number of Supported Extensions with some of them being installed out of the box:

postgres@pg10:5432 postgres> select name, default_version, installed_version from pg_available_extensions where name !~ '^postgis' order by name;

            name             | default_version | installed_version


address_standardizer         | 2.4.3 |

address_standardizer_data_us | 2.4.3           |

btree_gin                    | 1.2 |

btree_gist                   | 1.5 |

chkpass                      | 1.0 |

citext                       | 1.4 |

cube                         | 1.2 |

dblink                       | 1.2 |

dict_int                     | 1.0 |

earthdistance                | 1.1 |

fuzzystrmatch                | 1.1 |

hstore                       | 1.4 |

hypopg                       | 1.1.1 |

intarray                     | 1.2 |

isn                          | 1.1 |

ltree                        | 1.1 |

orafce                       | 3.7 |

pg_buffercache               | 1.3 | 1.3

pg_partman                   | 2.6.3 |

pg_prewarm                   | 1.1 |

pg_qs                        | 1.1 |

pg_stat_statements           | 1.6 | 1.6

pg_trgm                      | 1.3 |

pg_wait_sampling             | 1.1 |

pgcrypto                     | 1.3 |

pgrouting                    | 2.5.2 |

pgrowlocks                   | 1.2 |

pgstattuple                  | 1.5 |

plpgsql                      | 1.0 | 1.0

plv8                         | 2.1.0 |

postgres_fdw                 | 1.0 |

tablefunc                    | 1.0 |

timescaledb                  | 1.1.1 |

unaccent                     | 1.1 |

uuid-ossp                    | 1.1 |

(35 rows)

PostgreSQL Monitoring on Azure

Server monitoring relies on a set of metrics that can be neatly grouped to create a custom dashboard:

Azure Database for PostgreSQL: Single server --- Metrics

Those familiar with Graphviz or Blockdiag are likely to appreciate the option of exporting the entire dashboard to a JSON file:

Azure Database for PostgreSQL: Single server --- Metrics

Furthermore metrics can — and they should — be linked to alerts:

Azure Database for PostgreSQL: Single Server --- Available Alerts

Query statistics can be tracked by means of Query Store and visualized with Query Performance Insight. For that, a couple of Azure specific parameters will need to be enabled:

postgres@pg10:5432 postgres> select * from pg_settings where name ~ 'pgms_wait_sampling.query_capture_mode|pg_qs.query_capture_mode';

-[ RECORD 1 ]---+------------------------------------------------------------------------------------------------------------------

name            | pg_qs.query_capture_mode

setting         | top

unit            |

category        | Customized Options

short_desc      | Selects which statements are tracked by pg_qs. Need to reload the config to make change take effect.

extra_desc      |

context         | superuser

vartype         | enum

source          | configuration file

min_val         |

max_val         |

enumvals        | {none,top,all}

boot_val        | none

reset_val       | top

sourcefile      |

sourceline      |

pending_restart | f

-[ RECORD 2 ]---+------------------------------------------------------------------------------------------------------------------

name            | pgms_wait_sampling.query_capture_mode

setting         | all

unit            |

category        | Customized Options

short_desc      | Selects types of wait events are tracked by this extension. Need to reload the config to make change take effect.

extra_desc      |

context         | superuser

vartype         | enum

source          | configuration file

min_val         |

max_val         |

enumvals        | {none,all}

boot_val        | none

reset_val       | all

sourcefile      |

sourceline      |

pending_restart | f

In order to visualize the slow queries and waits we proceed to the Query Performance widget:

Long Running Queries​​​

Azure Database for PostgreSQL: Single server --- Long running queries graph

Wait Statistics

Azure Database for PostgreSQL: Single server --- wait statistics

PostgreSQL Logging on Azure

The standard PostgreSQL logs can be downloaded, or exported to Log Analytics for more advanced parsing:

Azure Database for PostgreSQL: Single server --- Log Analytics

PostgreSQL Performance and Scaling with Azure

While the number of vCores can be easily increased or decreased, this action will trigger a server restart:

Azure Database for PostgreSQL: Single server PostgreSQL versions

In order to achieve zero downtime applications must be able to gracefully handle transient errors.

For tuning queries, Azure provides the DBA with Performance Recommendations, in addition to the preloaded pg_statements and pg_buffercache extensions:

Azure Database for PostgreSQL: Single server --- Performance Recommendations screen

High Availability and Replication on Azure

Database server high availability is achieved by means of a node based hardware replication. This ensures that in the case of hardware failure, a new node can be brought up within tens of seconds.

Azure provides a redundant gateway as a network connection endpoint for all database servers within a region.

PostgreSQL Security on Azure

By default firewall rules deny access to the PostgreSQL instance. Since an Azure database server is the equivalent of a database cluster the access rules will apply to all databases hosted on the server.

In addition to IP addresses, firewall rules can reference virtual network, a feature available only for General Purpose and Memory Optimized tiers.

Azure Database for PostgreSQL: Single server --- Firewall --- Adding a VNet

One thing I found peculiar in the firewall web interface — I could not navigate away from the page while changes were being saved:

Azure Database for PostgreSQL: Single server --- change security rules in progress pop-up screen when attempting to navigate away

Data at rest is encrypted using a Server-Managed Key and cloud users cannot disable the encryption. Data in transit is also encrypted — SSL required can only be changed after the database server is created. Just as the data at rest, backups are encrypted and encryption cannot be disabled.

Advanced Threat Protection provides alerts and recommendations on a number of database access requests that are considered a security risk. The feature is currently in preview. To demonstrate, I simulated a password brute force attack:

~ $ while : ; do psql -U $(pwgen -s 20 1)@pg10 ; sleep 0.1 ; done

psql: FATAL:  password authentication failed for user "AApT6z4xUzpynJwiNAYf"

psql: FATAL:  password authentication failed for user "gaNeW8VSIflkdnNZSpNV"

psql: FATAL:  password authentication failed for user "SWZnY7wGTxdLTLcbqnUW"

psql: FATAL:  password authentication failed for user "BVH2SC12m9js9vZHcuBd"

psql: FATAL:  password authentication failed for user "um9kqUxPIxeQrzWQXr2v"

psql: FATAL:  password authentication failed for user "8BGXyg3KHF3Eq3yHpik1"

psql: FATAL:  password authentication failed for user "5LsVrtBjcewd77Q4kaj1"


Check the PostgreSQL logs:

2019-08-19 07:13:50 UTC-5d5a4c2e.138-FATAL:  password authentication failed

for user "AApT6z4xUzpynJwiNAYf"

2019-08-19 07:13:50 UTC-5d5a4c2e.138-DETAIL:  Role "AApT6z4xUzpynJwiNAYf" does not exist.

   Connection matched pg_hba.conf line 3: "host all all password"

2019-08-19 07:13:51 UTC-5d5a4c2f.13c-LOG:  connection received: host= port=27248 pid=316

2019-08-19 07:13:51 UTC-5d5a4c2f.13c-FATAL:  password authentication failed for user "gaNeW8VSIflkdnNZSpNV"

2019-08-19 07:13:51 UTC-5d5a4c2f.13c-DETAIL:  Role "gaNeW8VSIflkdnNZSpNV" does not exist.

   Connection matched pg_hba.conf line 3: "host all all password"

2019-08-19 07:13:52 UTC-5d5a4c30.140-LOG:  connection received: host= port=58256 pid=320

2019-08-19 07:13:52 UTC-5d5a4c30.140-FATAL:  password authentication failed for user "SWZnY7wGTxdLTLcbqnUW"

2019-08-19 07:13:52 UTC-5d5a4c30.140-DETAIL:  Role "SWZnY7wGTxdLTLcbqnUW" does not exist.

   Connection matched pg_hba.conf line 3: "host all all password"

2019-08-19 07:13:53 UTC-5d5a4c31.148-LOG:  connection received: host= port=32984 pid=328

2019-08-19 07:13:53 UTC-5d5a4c31.148-FATAL:  password authentication failed for user "BVH2SC12m9js9vZHcuBd"

2019-08-19 07:13:53 UTC-5d5a4c31.148-DETAIL:  Role "BVH2SC12m9js9vZHcuBd" does not exist.

   Connection matched pg_hba.conf line 3: "host all all password"

2019-08-19 07:13:53 UTC-5d5a4c31.14c-LOG:  connection received: host= port=43384 pid=332

2019-08-19 07:13:54 UTC-5d5a4c31.14c-FATAL:  password authentication failed for user "um9kqUxPIxeQrzWQXr2v"

2019-08-19 07:13:54 UTC-5d5a4c31.14c-DETAIL:  Role "um9kqUxPIxeQrzWQXr2v" does not exist.

   Connection matched pg_hba.conf line 3: "host all all password"

2019-08-19 07:13:54 UTC-5d5a4c32.150-LOG:  connection received: host= port=27672 pid=336

2019-08-19 07:13:54 UTC-5d5a4c32.150-FATAL:  password authentication failed for user "8BGXyg3KHF3Eq3yHpik1"

2019-08-19 07:13:54 UTC-5d5a4c32.150-DETAIL:  Role "8BGXyg3KHF3Eq3yHpik1" does not exist.

   Connection matched pg_hba.conf line 3: "host all all password"

2019-08-19 07:13:55 UTC-5d5a4c33.154-LOG:  connection received: host= port=12712 pid=340

2019-08-19 07:13:55 UTC-5d5a4c33.154-FATAL:  password authentication failed for user "5LsVrtBjcewd77Q4kaj1"

2019-08-19 07:13:55 UTC-5d5a4c33.154-DETAIL:  Role "5LsVrtBjcewd77Q4kaj1" does not exist.

The email alert arrived about 30 minutes later:

Azure Database for PostgreSQL: Single server --- Advanced Threat Protection email alert

In order to allow fine grained access to database server, Azure provides RBAC, which is a cloud native access control feature, just one more tool in the arsenal of the PostgreSQL Cloud DBA. This is as close as we can get to the ubiquitous pg_hba access rules.

PostgreSQL Backup and Recovery on Azure

Regardless of pricing tiers, backups are retained between 7 and 35 days. The pricing tier also influences the ability to restore data.

Point-in-time recovery is available via the Azure Portal or the CLI and according to documentation as granular as up to five minutes. The portal functionality is rather limited — the date picker widget blindly shows the last 7 days as possible dates to select, although I created the server today. Also, there is no verification performed on the recovery target time — I expected that entering a value outside the recovery interval would trigger an error preventing the wizard to continue:

Azure Database for PostgreSQL: Single server --- point-in-time restore screen

Once the restore process is started, an error, supposedly caused by the out of range value, will popup about a minute later:

Azure Database for PostgreSQL: Single server --- Activity Log error message on restore failure

…but, unfortunately, the error message was not very helpful:

Azure Database for PostgreSQL: Single server --- Activity Log error details on restore failure

Lastly, backup storage is free for retention periods of up to 7 days. That could prove extremely handy for development environments.

Hints and Tips


Get accustomed with the Single Server Limits.


Always use the connection string in order for the connection to be routed to the correct database server.


For disaster recovery scenarios, locate read replicas in one of the paired regions.


Just as is the case with AWS and GCloud, there is no superuser access.


Parameters requiring a server restart or superuser access cannot be configured.


During auto-scaling, applications should retry until the new node is brought up.

Memory amount and IOPS cannot be specified — memory is allocated in units of GB per vCore, up to a maximum of 320GB (32vCores x 10GB), and IOPS are dependent on the size of the provisioned storage to a maximum of 6000 IOPS. At this time Azure offers a large storage preview option with a maximum of 20,000 IOPS.

Servers created in the Basic tier cannot be upgraded to General Purpose or Memory Optimized.


Ensure that the auto-grow feature is enabled — if the amount of data exceed the provisioned storage space, the database will enter in read-only mode.

Storage can only be scaled up. Just as with all the other cloud providers storage allocation cannot be decreased and I couldn’t come across any explanation. Given the state of the art equipment, the big cloud players can afford there should be no reason for not providing features similar to LVM online data relocation. Storage is really cheap nowadays, there is really no reason to think about scaling down until the next major version upgrade.


In some cases, updates to firewall rules may take up to five minutes to propagate.

A server is located in the same subnet as the application servers will not be reachable until the appropriate firewall rules are in place.

Virtual network rules do not allow cross-region access and as a result, dblink and postgres_fdw cannot be used to connect to databases outside the Azure cloud.

The VNet/Subnet approach cannot be applied to Web Apps as their connections originate from public IP addresses.

Large virtual networks will be unavailable while the service endpoints are enabled.


For applications that require server certificate validation, the file is available for download from Digicert. Microsoft made it easy and you shouldn’t have to worry about renewal until 2025:

~ $ openssl x509 -in BaltimoreCyberTrustRoot.crt.pem -noout -dates

notBefore=May 12 18:46:00 2000 GMT

notAfter=May 12 23:59:00 2025 GMT

Intrusion Detection System

The preview release of Advanced Threat Protection is not available for the Basic tier instances.

Backup and Restore

For applications that cannot afford a region downtime, consider configuring the server with geo-redundant backup storage. This option can only be enabled at the time of creating the database server.

The requirement for reconfiguring the cloud firewall rules after a PITR operation is particularly important.

Deleting a database server removes all backups.

Following the restore, there are certain post-restore tasks that will have to be performed.

Unlogged tables are recommended for bulk inserts in order to boost performance, however, they are not replicated.


Metrics are recorded every minute and stored for 30 days.


Query Store is a global option, meaning that it applies to all databases. Read-only transactions and queries longer than 6,000 bytes are problematic. By default, the captured queries are retained for 7 days.


Query Performance Insight recommendations are currently limited to create and drop index.

Disable pg_stat_staements when not needed.

Replace uuid_generate_v4 with gen_random_uuid(). This is inline with the recommendation in the official PostgreSQL documentation, see Building uuid-ossp.

High Availability and Replication

There is a limit of five read replicas. Write-intensive applications should avoid using read replicas as the replication mechanism is asynchronous which introduces some delays that applications must be able to tolerate. Read replicas can be located in a different region.

REPLICA support can only be enabled after the server was created. The feature requires a server restart:

Azure Database for PostgreSQL: Single server --- enabling replication
Azure Database for PostgreSQL: Single server --- read replica missing firewall rules after creation

Read replicas do not inherit the firewall rules from master node:

Azure Database for PostgreSQL: Single server --- read replica missing firewall rules after creation

Failover to read replica is not automatic. The failover mechanism is node based.

There is a long list of Considerations that needs to be reviewed before configuring read replicas.

Creating replicas takes a long time, even when I tested with relatively small data set:

Azure Database for PostgreSQL: Single server --- Replicas creation taking a long time


Review the key parameters, as Azure Database for PostgreSQL ships with upstream vacuum default values:

postgres@pg10:5432 postgres> select name,setting from pg_settings where name ~ '^autovacuum.*';

               name                 | setting


autovacuum                          | on

autovacuum_analyze_scale_factor     | 0.05

autovacuum_analyze_threshold        | 50

autovacuum_freeze_max_age           | 200000000

autovacuum_max_workers              | 3

autovacuum_multixact_freeze_max_age | 400000000

autovacuum_naptime                  | 15

autovacuum_vacuum_cost_delay        | 20

autovacuum_vacuum_cost_limit        | -1

autovacuum_vacuum_scale_factor      | 0.05

autovacuum_vacuum_threshold         | 50

autovacuum_work_mem                 | -1

(12 rows)


Automatic major upgrades are not supported. As mentioned earlier, this is a cost savings opportunity, by scaling down the auto-grown storage.

PostgreSQL Azure Enhancements


TimescaleDB is available as an extension (not part of the PostgreSQL modules), however, it is just a few clicks away. The only drawback being the older version 1.1.1, while the upstream version is currently at  1.4.1 (2019-08-01).

postgres@pg10:5432 postgres> CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;



_____ _                               _ ____________

|_   _(_)                             | | | _ \ ___ \

| |  _ _ __ ___   ___ ___ ___ __ _| | ___| | | | |_/ /

| | | |  _ ` _ \ / _ \/ __|/ __/ _` | |/ _ \ | | | ___ \

| | | | | | | | |  __/\__ \ (_| (_| | |  __/ |/ /| |_/ /

|_| |_|_| |_| |_|\___||___/\___\__,_|_|\___|___/ \____/

               Running version 1.1.1

For more information on TimescaleDB, please visit the following links:

1. Getting started:

2. API reference documentation:

3. How TimescaleDB is designed:


postgres@pg10:5432 postgres> \dx timescaledb

                                    List of installed extensions

   Name     | Version | Schema |                            Description


timescaledb | 1.1.1   | public | Enables scalable inserts and complex queries for time-series data

(1 row)


In addition to PostgreSQL logging options, Azure Database for PostgreSQL can be configured to record additional diagnostics events.


Azure Portal includes a handy feature for allowing connections from the IP addresses logged in to the portal:

Azure Database for PostgreSQL: Single server --- Firewall --- Add Client IP Address

I noted the feature as it makes it easy for developers and system administrators to allow themselves in, and it stands out as a feature not offered by neither AWS, nor GCloud.


Azure Database for PostgreSQL Single Server offers enterprise level services, however, many of these services are still in preview mode: Query Store, Performance Insight, Performance Recommendation, Advanced Threat Protection, Large Storage, Cross-region Read Replicas.

While operating system knowledge is no longer required for administering PostgreSQL in the Azure cloud, the DBA is expected to acquire skills which are not limited to the database itself — Azure networking (VNet), connection security (firewall), log viewer and analytics along with KQL, Azure CLI for handy scripting, and the list goes on.

Lastly, for those planning to migrate their PostgreSQL workloads to Azure, there are a number of resources available along with a select list of Azure Partners including Credativ, one of the PostgreSQL major sponsors and contributors.


by Viorel Tabara at September 09, 2019 09:45 AM

September 06, 2019


An Overview of the JOIN Methods in PostgreSQL

In my previous blog, we discussed various ways to select, or scan, data from a single table. But in practical, fetching data from a single table is not enough. It requires selecting data from multiple tables and then correlating among them. Correlation of this data among tables is called joining tables and it can be done in various ways. As the joining of tables requires input data (e.g. from the table scan), it can never be a leaf node in the plan generated.

E.g. consider a simple query example as SELECT * FROM TBL1, TBL2 where TBL1.ID > TBL2.ID; and suppose the plan generated is as below:

Nested Loop Join

So here the first both tables are scanned and then they are joined together as per the correlation condition as TBL.ID > TBL2.ID

In addition to the join method, the join order is also very important. Consider the below example:


Consider that TBL1, TBL2 AND TBL3 have 10, 100 and 1000 records respectively. 

The condition TBL1.ID=TBL2.ID returns only 5 records, whereas TBL2.ID=TBL3.ID returns 100 records, then it’s better to join TBL1 and TBL2 first so that lesser number of records get joined with TBL3. The plan will be as shown below:

Nested Loop Join with Table Order

PostgreSQL supports the below kind of joins:

  • Nested Loop Join
  • Hash Join
  • Merge Join

Each of these Join methods are equally useful depending on the query and other parameters e.g. query, table data, join clause, selectivity, memory etc. These join methods are implemented by most of the relational databases.

Let’s create some pre-setup table and populate with some data, which will be used frequently to better explain these scan methods.

postgres=# create table blogtable1(id1 int, id2 int);


postgres=# create table blogtable2(id1 int, id2 int);


postgres=# insert into blogtable1 values(generate_series(1,10000),3);

INSERT 0 10000

postgres=# insert into blogtable2 values(generate_series(1,1000),3);

INSERT 0 1000

postgres=# analyze;


In all our subsequent examples, we consider default configuration parameter unless otherwise specified specifically.

Nested Loop Join

Nested Loop Join (NLJ) is the simplest join algorithm wherein each record of outer relation is matched with each record of inner relation. The Join between relation A and B with condition A.ID < B.ID can be represented as below:

For each tuple r in A
       	For each tuple s in B
            	If (r.ID < s.ID)
                 	Emit output tuple (r,s)

Nested Loop Join (NLJ) is the most common joining method and it can  be used almost on any dataset with any type of join clause. Since this algorithm scan all tuples of inner and outer relation, it is considered to be the most costly join operation.

As per the above table and data, the following query will result in a Nested Loop Join as shown below:

postgres=# explain select * from blogtable1 bt1, blogtable2 bt2 where bt1.id1 < bt2.id1;

                               QUERY PLAN


 Nested Loop  (cost=0.00..150162.50 rows=3333333 width=16)

   Join Filter: (bt1.id1 < bt2.id1)

   ->  Seq Scan on blogtable1 bt1  (cost=0.00..145.00 rows=10000 width=8)

   ->  Materialize  (cost=0.00..20.00 rows=1000 width=8)

         ->  Seq Scan on blogtable2 bt2  (cost=0.00..15.00 rows=1000 width=8)

(5 rows)

Since the join clause is “<”, the only possible join method here is Nested Loop Join.

Notice here one new kind of node as Materialize; this node acts as intermediate result cache i.e. instead of fetching all tuples of a relation multiple times, the first time fetched result is stored in memory and on the next request to get tuple will be served from the memory instead of fetching from the relation pages again. In-case if all tuples cannot be fit in memory then spill-over tuples go to a temporary file. It is mostly useful in-case of Nested Loop Join and to some extent in-case of Merge Join as they rely on rescan of inner relation. Materialize Node is not only limited to caching result of relation but it can cache results of any node below in the plan tree.

TIP: In case join clause is “=” and nested loop join is chosen between a relation, then it is really important to investigate if more efficient join method such as hash or merge join can be chosen by tuning configuration (e.g. work_mem but not limited to ) or by adding an index, etc.

Some of the queries may not have join clause, in that case also the only choice to join is Nested Loop Join. E.g. consider the below queries as per the pre-setup data:

postgres=# explain select * from blogtable1, blogtable2;

                             QUERY PLAN


 Nested Loop  (cost=0.00..125162.50 rows=10000000 width=16)

   ->  Seq Scan on blogtable1  (cost=0.00..145.00 rows=10000 width=8)

   ->  Materialize  (cost=0.00..20.00 rows=1000 width=8)

      ->  Seq Scan on blogtable2  (cost=0.00..15.00 rows=1000 width=8)

(4 rows)

The join in the above example is just a Cartesian product of both tables.

Hash Join

This algorithm works in two phases:

  • Build Phase: A Hash table is built using the inner relation records. The hash key is calculated based on the join clause key.
  • Probe Phase: An outer relation record is hashed based on the join clause key to find matching entry in the hash table.

The join between relation A and B with condition A.ID = B.ID can be represented as below:

  • Build Phase
    • For each tuple r in inner relation B
    • Insert r into hash table HashTab with key r.ID
  • Probe Phase
  • For each tuple s in outer relation A
  • For each tuple r in bucker HashTab[s.ID]
  • If (s.ID = r.ID)
    • Emit output tuple (r,s)

As per above pre-setup table and data, the following query will result in a Hash Join as shown below:

postgres=# explain select * from blogtable1 bt1, blogtable2 bt2 where bt1.id1 = bt2.id1;

                               QUERY PLAN


 Hash Join  (cost=27.50..220.00 rows=1000 width=16)

   Hash Cond: (bt1.id1 = bt2.id1)

   ->  Seq Scan on blogtable1 bt1  (cost=0.00..145.00 rows=10000 width=8)

   ->  Hash  (cost=15.00..15.00 rows=1000 width=8)

         ->  Seq Scan on blogtable2 bt2  (cost=0.00..15.00 rows=1000 width=8)

(5 rows) 

Here the hash table is created on the table blogtable2 because it is the smaller table so the minimal memory required for hash table and whole hash table can fit in memory.

Merge Join

Merge Join is an algorithm wherein each record of outer relation is matched with each record of inner relation until there is a possibility of join clause matching. This join algorithm is only used if both relations are sorted and join clause operator is “=”. The join between relation A and B with condition A.ID = B.ID can be represented as below:

    For each tuple r in A

        For each tuple s in B

             If (r.ID = s.ID)

                  Emit output tuple (r,s)


             If (r.ID > s.ID)




The example query which resulted in a Hash Join, as shown above, can result in a Merge Join if the index gets created on both tables. This is because the table data can be retrieved in sorted order because of the index, which is one of the major criteria for the Merge Join method:

postgres=# create index idx1 on blogtable1(id1);


postgres=# create index idx2 on blogtable2(id1);


postgres=# explain select * from blogtable1 bt1, blogtable2 bt2 where bt1.id1 = bt2.id1;

                                   QUERY PLAN


 Merge Join  (cost=0.56..90.36 rows=1000 width=16)

   Merge Cond: (bt1.id1 = bt2.id1)

   ->  Index Scan using idx1 on blogtable1 bt1  (cost=0.29..318.29 rows=10000 width=8)

   ->  Index Scan using idx2 on blogtable2 bt2  (cost=0.28..43.27 rows=1000 width=8)

(4 rows)

So, as we see, both tables are using index scan instead of sequential scan because of which both tables will emit sorted records.


PostgreSQL supports various planner related configurations, which can be used to hint the query optimizer to not select some particular kind of join methods. If the join method chosen by the optimizer is not optimal, then these configuration parameters can be switch-off to force the query optimizer to choose a different kind of join methods. All of these configuration parameters are “on” by default. Below are the planner configuration parameters specific to join methods.

  • enable_nestloop: It corresponds to Nested Loop Join.
  • enable_hashjoin: It corresponds to Hash Join.
  • enable_mergejoin: It corresponds to Merge Join.

There are many plan related configuration parameters used for various purposes. In this blog, keeping it restricted to only join methods.

These parameters can be modified from a particular session. So in-case we want to experiment with the plan from a particular session, then these configuration parameters can be manipulated and other sessions will still continue to work as it is.

Now, consider the above examples of merge join and hash join. Without an index, query optimizer selected a Hash Join for the below query as shown below but after using configuration, it switches to merge join even without index:

postgres=# explain select * from blogtable1, blogtable2 where blogtable1.id1 = blogtable2.id1;

                             QUERY PLAN


 Hash Join  (cost=27.50..220.00 rows=1000 width=16)

   Hash Cond: (blogtable1.id1 = blogtable2.id1)

   ->  Seq Scan on blogtable1  (cost=0.00..145.00 rows=10000 width=8)

   ->  Hash  (cost=15.00..15.00 rows=1000 width=8)

      ->  Seq Scan on blogtable2  (cost=0.00..15.00 rows=1000 width=8)

(5 rows)

postgres=# set enable_hashjoin to off;


postgres=# explain select * from blogtable1, blogtable2 where blogtable1.id1 = blogtable2.id1;

                             QUERY PLAN


 Merge Join  (cost=874.21..894.21 rows=1000 width=16)

   Merge Cond: (blogtable1.id1 = blogtable2.id1)

   ->  Sort  (cost=809.39..834.39 rows=10000 width=8)

      Sort Key: blogtable1.id1

      ->  Seq Scan on blogtable1  (cost=0.00..145.00 rows=10000 width=8)

   ->  Sort  (cost=64.83..67.33 rows=1000 width=8)

      Sort Key: blogtable2.id1

      ->  Seq Scan on blogtable2  (cost=0.00..15.00 rows=1000 width=8)

(8 rows)

Initially Hash Join is chosen because data from tables are not sorted. In order to choose the Merge Join Plan, it needs to first sort all records retrieved from both tables and then apply the merge join. So, the cost of sorting will be additional and hence the overall cost will increase. So possibly, in this case, the total (including increased) cost is more than the total cost of Hash Join, so Hash Join is chosen.

Once configuration parameter enable_hashjoin is changed to “off”, this means the query optimizer directly assign a cost for hash join as disable cost (=1.0e10 i.e. 10000000000.00).  The cost of any possible join will be lesser than this. So, the same query result in Merge Join after enable_hashjoin changed to “off” as even including the sorting cost, the total cost of merge join is lesser than disable cost.

Now consider the below example:

postgres=# explain select * from blogtable1, blogtable2 where blogtable1.id1 < blogtable2.id1;

                             QUERY PLAN


 Nested Loop  (cost=0.00..150162.50 rows=3333333 width=16)

   Join Filter: (blogtable1.id1 < blogtable2.id1)

   ->  Seq Scan on blogtable1  (cost=0.00..145.00 rows=10000 width=8)

   ->  Materialize  (cost=0.00..20.00 rows=1000 width=8)

      ->  Seq Scan on blogtable2  (cost=0.00..15.00 rows=1000 width=8)

(5 rows)

postgres=# set enable_nestloop to off;


postgres=# explain select * from blogtable1, blogtable2 where blogtable1.id1 < blogtable2.id1;

                             QUERY PLAN


 Nested Loop  (cost=10000000000.00..10000150162.50 rows=3333333 width=16)

   Join Filter: (blogtable1.id1 < blogtable2.id1)

   ->  Seq Scan on blogtable1  (cost=0.00..145.00 rows=10000 width=8)

   ->  Materialize  (cost=0.00..20.00 rows=1000 width=8)

      ->  Seq Scan on blogtable2  (cost=0.00..15.00 rows=1000 width=8)

(5 rows)

As we can see above, even though the nested loop join related configuration parameter is changed to “off” still it chooses Nested Loop Join as there is no alternate possibility of any other kind of Join Method to get selected. In simpler terms, since Nested Loop Join is the only possible join, then whatever is the cost it will be always the winner (Same as I used to be the winner in 100m race if I ran alone…:-)). Also, notice the difference in cost in the first and second plan. The first plan shows the actual cost of Nested Loop Join but the second one shows the disable cost of the same.


All kinds of PostgreSQL join methods are useful and get selected based on the nature of the query, data, join clause, etc. In-case the query is not performing as expected, i.e. join methods are not selected as expected then, the user can play around with different plan configuration parameters available and see if something is missing.

by Kumar Rajeev Rastogi at September 06, 2019 09:45 AM

September 05, 2019


An Overview of MongoDB Atlas: Part Two

In the first part of the blog “An Overview of MongoDB Atlas,” we looked at getting started with MongoDB Atlas, the initial setup and migration of an existing MongoDB Cluster to MongoDB Atlas. In this part we are going to continue to explore several management elements required for every MongoDB production system, such as security and business continuity. 

Database Security in MongoDB Atlas

Security always comes first. While it is important for all databases, for MongoDB it has a special meaning. In mid 2017 the internet was full of news regarding ransomware attacks which specifically targeted vulnerabilities in MongoDB systems. Hackers were hijacking MongoDB instances and asking for a ransom in exchange for the return of the stored data. There were warnings. Prior to these ransomware attacks bloggers and experts wrote about how many production instances were found to be vulnerable. It stirred up vibrant discussion around MongoDB security for a long time after.

We are now in 2019 and MongoDB is getting even more popular. The new major version (4.0) was recently released, and we have seen increased stability in MongoDB Atlas. But what has been done to increase security for the NoSQL databases in the cloud. 

The ransomware and constant press must have had an impact on MongoDB as we can clearly see that security is now at the center of the MongoDB ecosystem. MongoDB Atlas in no exception as it now comes with built-in security controls for production data processing needs and many enterprise security features out of the box. The default approach (which caused the vulnerability) from the older version is gone and the database is now secured by default (network, crud authorisations etc). It also comes with features you would expect to have in a modern production environment (auditing, temporary user access, etc). 

But it doesn’t stop there. Since Atlas is an online solution you can now use integrations with third parties like LDAP authentication or modern MongoDB internet services like MongoDB charts. MongoDB Atlas is built atop of Amazon WebServices (AWS), Microsoft Azure, and Google Cloud Platform (GCP) which also offer high-security measures of their own. This great combination ensures MongoDB Atlas security standards are what we would expect. Let’s take a quick look at some of these key features.

MongoDB Atlas & Network Security

MongoDB Atlas builds clusters on top of your existing cloud infrastructure. When one chooses AWS, the customer data is stored in MongoDB Atlas systems. These systems are single-tenant, dedicated, AWS EC2 virtual servers which are created solely for an Atlas Customer. Amazon AWS data centers are compliant with several physical security and information security standards, but since we need an open network, it can raise concerns.

MongoDB Atlas dedicated clusters are deployed in a Virtual Private Cloud (VPC) with dedicated firewalls. Access must be granted by an IP whitelist or through VPC Peering. By default all access is disabled.

MongoDB requires the following network ports for Atlas...

  • 27016 for shards
  • 27015 for the BI connector
  • 27017 for server
  • If LDAP is enabled, MongoDB requires LDAP network 636 on the customer side open to (entire Internet) traffic.

The network ports cannot be changed and TLS cannot be disabled. Access can also be isolated by IP whitelist. 

MongoDB Atlas Add Whitelist Entry

Additionally you can choose to access MongoDB Atlas via Bastion hosts. Bastion hosts are configured to require SSH keys (not passwords). They also require multi-factor authentication, and users must additionally be approved by senior management for backend access. 

MongoDB Atlas Role-Based Access Management

You can configure advanced, role-based access rules to control which users (and teams) can access, manipulate, and/or delete data in your databases. By default there are no users created so you will be prompted to create one.

MongoDB Atlas allows administrators to define permissions for a user or application as well as what data can be accessed when querying MongoDB. MongoDB Atlas provides the ability to provision users with roles specific to a project or database, making it possible to realize a separation of duties between different entities accessing and managing the data. The process is simple and fully interactive.

To create a new user go to the Security tab on the left side and choose between MongoDB users and MongoDB roles. 

MongoDB Atlas Add a New User

MongoDB Roles

MongoDB Atlas Add Custom Role

End-to-End Database Encryption in MongoDB Atlas

All the MongoDB Atlas data in transit is encrypted using Transport Layer Security (TLS). You have the flexibility to configure the minimum TLS protocol version. Encryption for data-at-rest is automated using encrypted storage volumes.

You can also integrate your existing security practices and processes with MongoDB Atlas to provide additional control over how you secure your environment. 

For the MongoDB Atlas Cluster itself, authentication is automatically enabled by default via SCRAM to ensure a secure system out of the box.

With Encryption Key Management you can bring your own encryption keys to your dedicated clusters for an additional layer of encryption on the database files, including backup snapshots.

MongoDB Atlas Encryption Key

Auditing in MongoDB Atlas

Granular database auditing answers detailed questions about system activity for deployments with multiple users by tracking all the commands against the database. Auditing in MongoDB is only available in MongoDB Enterprise. You can write audit events to the console, to the syslog, to a JSON file, or to a BSON file. You configure the audit option using the –auditDestination qualifier. For example, to send audit events as JSON events to syslog use...

mongod --dbpath data/db --auditDestination syslog

MongoDB maintains a centralized log management system for collection, storage, and analysis of log data for production environments. This information can be used for health monitoring, troubleshooting, and for security purposes. Alerts are configured in the system in order to notify SREs of any operational concerns.

MongoDB Atlas Activity Feed

MongoDB Atlas LDAP Integration

User authentication and authorization against MongoDB Atlas clusters can be managed via a customer’s Lightweight Directory Access Protocol (LDAP) server over TLS. A single LDAP configuration applies to all database clusters within an Atlas project. LDAP servers are used to simplify access control and make permissions management more granular. 

For customers running their LDAP server in an AWS Virtual Private Cloud (VPC), a peering connection is recommended between that environment and the VPC containing their Atlas databases.

MongoDB Atlas LDAP Integration

MongoDB Business Continuity and Disaster Recovery

MongoDB Atlas creates and configures dedicated clusters on infrastructure provided by AWS, Azure and/or Google GCP. Data availability is subject to the infrastructure provider service Business Continuity Plans (BCP) and Disaster Recovery (DR) processes. MongoDB Atlas infrastructure service providers hold a number of certifications and audit reports for these controls. 

Database Backups in MongoDB Atlas

MongoDB Atlas backs up data, typically only seconds behind an operational system. MongoDB Atlas ensures continuous backup of replica sets, consistent, cluster-wide snapshots of sharded clusters, and point-in-time recovery. This fully-managed backup service uses Amazon S3 in the region nearest to the customer's database deployment.

Backup data is protected using server-side encryption. Amazon S3 encrypts backed up data at the object level as it writes it to disks in its data centers and decrypts it for you when you restore it. All keys are fully managed by AWS.

Atlas clusters deployed in Amazon Web Services and Microsoft Azure can take advantage of cloud provider snapshots which use the native snapshot capabilities of the underlying cloud provider. Backups are stored in the same cloud region as the corresponding cluster. For multi-region clusters, snapshots are stored in the cluster’s preferred region. 

Atlas offers the following methods to back up your data...

Continuous Database Backups

Continuous backups are available in M10+ Clusters and versions lower than server version 4.2. This is an old method of performing MongoDB backups. Atlas uses incremental snapshots to continuously back up your data. Continuous backup snapshots are typically just a few seconds behind the operational system. Atlas ensures point-in-time backup of replica sets and consistent, cluster-wide snapshots of sharded clusters on it’s own, which it uses S3 for.

Full-Copy Snapshots

Atlas uses the native snapshot capabilities of your cloud provider to support full-copy snapshots and localized snapshot storage.

MongoDB Atlas Data Lake

Using Atlas Data Lake to ingest your S3 data into Atlas clusters allows you to quickly query data stored in your AWS S3 buckets using the Mongo Shell, MongoDB Compass, and any MongoDB driver.

When you create a Data Lake, you will grant Atlas read only access to S3 buckets in your AWS account and create a data configuration file that maps data from your S3 buckets to your MongoDB databases and collections. Atlas supports using any M10+ cluster, including Global Clusters, to connect to Data Lakes in the same. 

MongoDB Atlas Data Lake

At the time of writing this blog following formats are supported.

  • Avro
  • Parquet
  • JSON
  • JSON/Gzipped
  • BSON
  • CSV (requires header row)
  • TSV (requires header row)


That’s all for now, I hope you enjoyed my two part overview of MongoDB Atlas. Remember that ClusterControl also provides end-to-end management of MongoDB Clusters as well and is a great, lower-cost alternative to MongoDB Atlas which can also be deployed in the cloud.

by Bart Oles at September 05, 2019 09:45 AM

September 04, 2019


Database Load Balancing Using HAProxy on Amazon AWS

When traffic to your database increases day-after-day it can start to become hard to manage. When this situation happens it’s useful to distribute the traffic across multiple servers, thus improving performance. Depending on the application, however, this may not be possible (if you have a single configurable endpoint).  To achieve a split, you will need to use a load balancer to perform the task. 

A load balancer can redirect applications to available/healthy database nodes and then failover when required. To deploy it, you don’t need a physical server as you can deploy it in the cloud; making it easier and faster. In this blog, we’ll take a look at the popular database load balancer HAProxy and how to deploy it to Amazon AWS both manually and with ClusterControl’s help.

What is HAProxy?

HAProxy is an open source proxy that can be used to implement high availability, load balancing, and proxying for TCP and HTTP based applications.

As a load balancer, HAProxy distributes traffic from one origin to one or more destinations and can define specific rules and/or protocols for this task. If any of the destinations stops responding, it is marked as offline, and the traffic is sent to the rest of the available destinations.

An Overview of Amazon EC2

Amazon Elastic Compute Cloud (or EC2) is a web service that provides resizable compute capacity in the cloud. It gives you complete control of your computing resources and allows you to set up and configure everything within your instances from the operating system up to your applications. It also allows you to quickly scale capacity, both up and down, as your computing requirements change.

Amazon EC2 supports different operating systems like Amazon Linux, Ubuntu, Windows Server, Red Hat Enterprise Linux, SUSE Linux Enterprise Server, Fedora, Debian, CentOS, Gentoo Linux, Oracle Linux, and FreeBSD.

Now, let’s see how to create an EC2 instance to deploy HAProxy there.

Creating an Amazon EC2 Instance

For this example, we’ll assume that you have an Amazon AWS account.

Go to the Amazon EC2 section, and press on Launch Instance. In the first step, you must choose the EC2 instance operating system.

Create Amazon EC2 Instance

In the next step, you must choose the resources for the new instance.

Choose an Amazon EC2 Instance Type

Then, you can specify a more detailed configuration like network, subnet, and more.

Configure Amazon EC2 Instance

We can now add more storage capacity on this new instance, as this will be only a load balancer (it's probably not necessary).

Amazon EC2 Add Storage

When we finish the creation task, we can go to the Instances section to see our new EC2 instance.

Launch Amazon EC2 Instance

Now that our EC2 instance is ready (Instance State running), we can deploy our load balancer here. For this task, we’ll see two different ways, manually and using ClusterControl.

How Manually Install and Configure HAProxy

To install HAProxy on Linux you can use the following commands in our EC2 instance:

On Ubuntu/Debian OS:

$ apt-get install haproxy -y

On CentOS/RedHat OS:

$ yum install haproxy -y

And then we need to edit the following configuration file to manage our HAProxy configuration:

$ /etc/haproxy/haproxy.cfg

Configuring our HAProxy is not complicated, but we need to know what we are doing. We have several parameters to configure, depending on how we want HAProxy to work. For more information, we can follow the documentation about the HAProxy configuration.

Let's look at a basic configuration example. Suppose that you have the following database topology:

Basic Load Balancer Configuration

We want to create an HAProxy listener to balance the read traffic between the three nodes.

listen haproxy_read

   bind *:5434

   balance     roundrobin

   server  node1 check

   server  node2 check

   server  node3 check

As we mentioned before, there are several parameters to configure here, and this configuration depends on what we want to do. For example:

listen  haproxy_read

       bind *:5434

       mode tcp

       timeout client  10800s

       timeout server  10800s

       tcp-check expect string is\ running

       balance leastconn

       option tcp-check

       default-server port 9201 inter 2s downinter 5s rise 3 fall 2 slowstart 60s maxconn 64 maxqueue 128 weight 100

       server  node1 check

       server  node2 check

       server  node3 check

Now, let’s see how ClusterControl can make this task in an easy way.

How to Install and Configure HAProxy with ClusterControl

For this task, we’ll assume that you have ClusterControl installed (on-prem or in the cloud) and it’s currently managing your databases.

Go to ClusterControl -> Select Cluster -> Cluster Actions -> Add Load Balancer.

ClusterControl Cluster List

Here we must add the information that ClusterControl will use to install and configure our HAProxy load balancer.

Configure HAProxy in ClusterControl

The information that we need to introduce is:

Action: Deploy or Import.

Server Address: IP Address for our HAProxy server.

Listen Port (Read/Write): Port for read/write mode.

Listen Port (Read Only): Port for read only mode.

Policy: It can be:

  • leastconn: The server with the lowest number of connections receives the connection.
  • roundrobin: Each server is used in turns, according to their weights.
  • source: The source IP address is hashed and divided by the total weight of the running servers to designate which server will receive the request.

Install for read/write splitting: For master-slave replication.

Build from Source: We can choose Install from a package manager or build from source.

And we need to select which servers you want to add to the HAProxy configuration and some additional information like:

Role: It can be Active or Backup.

Include: Yes or No.

Connection address information.

Also, we can configure Advanced Settings like Admin User, Backend Name, Timeouts, and more.

When you finish the configuration and confirm the deploy, we can follow the progress in the Activity section on ClusterControl UI.

Setup HAProxy Server ClusterControl

And when this finishes, we can go to ClusterControl -> Nodes -> HAProxy node, and check the current status.

HAProxy Node in ClusterControl

We can also monitor our HAProxy servers from ClusterControl checking the Dashboard section.

HAProxy Monitoring with ClusterControl

We can improve our HA design adding a new HAProxy node and configuring Keepalived service between them. All this can be performed by ClusterControl. 

What is Amazon Elastic Load Balancing?

HAProxy is not the only possibility to deploy a Load Balancer on AWS as they have their own product for this task. Amazon Elastic Load Balancing (or ELB) distributes incoming application or network traffic across multiple targets, such as Amazon EC2 instances, containers, and IP addresses, in multiple Availability Zones. 

You can add and remove compute resources from your load balancer as your needs change, without disrupting the overall flow of requests to your applications.

You can configure health checks, which are used to monitor the health of the compute resources so that the load balancer can send requests only to the healthy ones. You can also offload the work of encryption and decryption to your load balancer so that your compute resources can focus on their main work.

To configure it, go to the Amazon EC2 section, and click on the Load Balancers option in the left menu. There, we’ll see three different options.

Amazon EC2 Elastic Load Balancing ELB
  • Application Load Balancer: If you need a flexible feature set for your web applications with HTTP and HTTPS traffic. Operating at the request level, Application Load Balancers provide advanced routing and visibility features targeted at application architectures, including microservices and containers.
  • Network Load Balancer: If you need ultra-high performance, TLS offloading at scale, centralized certificate deployment, support for UDP, and static IP addresses for your application. Operating at the connection level, Network Load Balancers are capable of handling millions of requests per second securely while maintaining ultra-low latencies.
  • Classic Load Balancer: If you have an existing application running in the EC2-Classic network.


As we could see, a Load Balancer can help us manage our database traffic by balancing it between multiple servers. It’s also useful to improve our high availability environment by performing failover tasks. We can deploy it manually on AWS or by using ClusterControl in a fast and easy way. With ClusterControl (download for FREE!) we can also take advantage of different features like monitoring, management and scaling for different database technologies, and we can deploy this system on-prem or in the cloud.

by Sebastian Insausti at September 04, 2019 09:45 AM

September 03, 2019


Database Failover for WordPress Websites

Every profitable enterprise requires high availability. Websites & Blogs are no different as even smaller companies and individuals require their sites to stay live to keep their reputation. 

WordPress is, by far, the most popular CMS in the world powering millions of websites from small to large. But how can you ensure that your website stays live. More specifically, how can I ensure the unavailability of my database will not impact my website? 

In this blog post we will show how to achieve failover for your WordPress website using ClusterControl.

The setup we will use for this blog will use Percona Server 5.7. We will have another host which contains the Apache and Wordpress application. We will not touch the application high-availability portion, but this also  something you want to make sure to have. We will use ClusterControl to manage databases to ensure the availability and we will use a third host to install and setup ClusterControl itself.

Assuming that the ClusterControl is up and running, we will need to import our existing database into it.

Importing a Database Cluster with ClusterControl

ClusterControl Import Cluster

Go to the Import Existing Server/Database option in the deployment wizard.

Importing an Existing Cluster with ClusterControl

We have to configure the SSH connectivity as this is a requirement for ClusterControl to be able to manage the nodes.

Configuring an Imported Cluster with ClusterControl

We now have to define some details about the vendor, version, root user access, the node itself, and if we want ClusterControl to manage autorecovery for us or not. That’s all, once the job succeeds, you will be presented with a cluster on the list.

Database Cluster List

To set up the highly-available environment, we need to execute a couple of actions. Our environment will consists of...

  • Master - Slave pair
  • Two ProxySQL instances for read/write split and topology detection
  • Two Keepalived instances for Virtual IP management

The idea is simple - we will deploy the slave to our master so we will have a second instance to failover to should the master fail. ClusterControl will be responsible for failure detection and it will promote the slave should the master become unavailable. ProxySQL will keep the track of the replication topology and it will redirect the traffic to the correct node - writes will be sent to the master, no matter which node it’s in, reads can either be sent to master-only or distributed across master and slaves. Finally, Keepalived will be collocated with ProxySQL and it will provide VIP for the application to connect to. That VIP will always be assigned to one of ProxySQL instances and Keepalived will move it to the second one, should the “main” ProxySQL node fail.

Having said all of that, let’s configure this using ClusterControl. All of it can be done in just a couple of clicks. We’ll start with adding the slave.

Adding a Database Slave with ClusterControl

Adding a Database Slave with ClusterControl

We start with picking “Add Replication Slave” job. Then we are asked to fill a form:

Adding a Replication Slave

We have to pick the master (in our case we don’t really have many options), we have to pass the IP or hostname for the new slave. If we had backups previously created, we could use one of them to provision the slave. In our case this is not available and ClusterControl will provision the slave directly from the master. That’s all, the job starts and ClusterControl performs required actions. You can monitor the progress in the Activity tab.

ClusterControl Activity Tab

Finally, once the job completes successfully, the slave should be visible on the cluster list.

Cluster List

Now we will proceed with configuring the ProxySQL instances. In our case the environment is minimal so, to keep things simpler, we will locate ProxySQL on one of the database nodes. This is not, however, the best option in a real production environment. Ideally, ProxySQL would either be located on a separate node or collocated with the other application hosts.

Configure ProxySQL ClusterControl

The place to start the job is Manage -> Loadbalancers.

ProxySQL Load Balancer Configuration ClusterControl

Here you have to pick where the ProxySQL should be installed, pass administrative credentials, and add a database user. In our case, we will use our existing user as our WordPress application already uses it for connecting to the database. We then have to pick which nodes to use in ProxySQL (we want both master and slave here) and let ClusterControl know if we use explicit transactions or not. This is not really relevant in our case, as we will reconfigure ProxySQL once it will be deployed. When you have that option enabled, read/write split will not be enabled. Otherwise ClusterControl will configure ProxySQL for read/write split. In our minimal setup we should seriously think if we want the read/write split to happen. Let’s analyse that.

The Advantages & Disadvantages of Read/Write Spit in ProxySQL

The main advantage of using the read/write split is that all the SELECT traffic will be distributed between the master and the slave. This means that the load on the nodes will be lower and response time should also be lower. This sounds good but keep in mind that should one node fail, the other node will have to be able to accommodate all of the traffic. There is little point in having automated failover in place if the loss of one node means that the second node will be overloaded and, de facto, unavailable too. 

It might make sense to distribute the load if you have multiple slaves - losing one node out of five is less impactful than losing one out of two. No matter what you decide on, you can easily change the behavior by going to ProxySQL node and clicking on the Rules tab.

ProxySQL Rules - ClusterControl

Make sure to look at rule 200 (the one which catches all SELECT statements). On the screenshot below you can see that the destination hostgroup is 20, which means all nodes in the cluster - read/write split and scale-out is enabled. We can easily disable this by editing this rule and changing the Destination Hostgroup to 10 (the one which contain master).

ProxySQL Configuration - ClusterControl

If you would like to enable the read/write split, you can easily do so by editing this query rule again and setting the destination hostgroup back to 20.

Now, let’s deploy second ProxySQL.

Deploy ProxySQL ClusterControl

To avoid passing all the configuration options again we can use the “Import Configuration” option and pick our existing ProxySQL as the source.

When this job will complete we still have to perform the last step in setting our environment. We have to deploy Keepalived on top of the ProxySQL instances.

Deploying Keepalived on Top of ProxySQL Instances

Deploy Keepalived with ProxySQL - ClusterControl

Here we picked ProxySQL as the load balancer type, passed both ProxySQL instances for Keepalived to be installed on and we typed our VIP and network interface.

Topology View - ClusterControl

As you can see, we now have the whole setup up and ready. We have a VIP of which is assigned to one of the ProxySQL instances. ProxySQL instances will redirect our traffic to the correct backend MySQL nodes and ClusterControl will keep an eye on the environment performing failover if needed. The last action we have to take is to reconfigure Wordpress to use the Virtual IP to connect to the database.

To do that, we have to edit wp-config.php and change the DB_HOST variable to our Virtual IP:

/** MySQL hostname */

define( 'DB_HOST', '' );


From now on Wordpress will connect to the database using VIP and ProxySQL. In case the master node fails, ClusterControl will perform the failover.

ClusterControl Failover with ProxySQL

As you can see, new master has been elected and ProxySQL also points towards new master in the hostgroup 10.

We hope this blog post gives you some idea about how to design a highly-available database environment for a Wordpress website and how ClusterControl can be used to deploy all of its elements.

by krzysztof at September 03, 2019 09:45 AM

September 02, 2019


Comparing Galera Cluster Cloud Offerings: Part Two Google Cloud Platform (GCP)

In our last blog we discussed the offerings available within Amazon Web Services (AWS) when running a MySQL Galera Cluster. In this blog, we'll continue the discussion by looking further at what the offerings are for running the same clustering technology, but this time on the Google Cloud Platform (GCP)

GCP, as an alternative to AWS, has been continuously attracting applications suited for DevOps by offering support for a wide array of full-stack technologies, containerized applications, and large production database systems. Google Cloud is a full-blown, battle-tested environment which powers its own hardware infrastructure at Google for products like YouTube and Gmail.

GCP has gained traction largely because of its ever-growing list of capabilities. It offers support for platforms like Visual Studio, Android Studio, Eclipse, Powershell and many others. GCP has one of the largest and most advanced computer networks and it provides access to numerous tools that help you focus on building your application. 

Another thing that attracts customers to migrate, import, or use Google Cloud is their strong support and solutions for containerization. Kubernetes (GKE: Google Kubernetes Engine) is built on their platform. 

GCP has also recently launched a new solution called Anthos. This product is designed to let organizations manage workloads using the same interface on the Google Cloud Platform (GCP) or on-premises using GKE On-Prem, and even on rival clouds such as Amazon Web Services (AWS) or Azure. 

In addition to these technologies, GCP offers sophisticated and powerful, compute-optimized machine types like the C2 family in GCE which is built on the latest generation Intel Scalable Processors (Cascade Lake).

GCP is continuing to support open source as well, which benefits users by providing well-supported and a straightforward framework that makes it easy to deliver a final product in a timely manner. Despite this support of open source technology, GCP does not provide native support for the deployment or configuration of a MySQL Galera Cluster. In this blog we will show you the only option available to you if you wish to use this technology, deployment via a compute instance which you have to manage yourself.

The Google Compute Engine (GCE)

GCE has a sophisticated and powerful set of compute nodes available for your consumption. Unlike AWS, GCE has the most powerful compute node available on the market (n1-ultramem-160 having 160 vCPU and 3.75 TB of memory). GCE also just recently introduced a new type of compute instance family called C2 machine-type. Built on the latest generation of Intel Scalable Processors (Cascade Lake), C2 machine types offer up to 3.8 GHz sustained all-core turbo and provide full transparency into the architecture of the underlying server platforms; letting you fine-tune the performance. C2 machine types offer much more computing power, run on a newer platform, and are generally more robust for compute-intensive workloads than the N1 high-CPU machine types. C2 family offerings are limited (as of the time of writing) and it’s not available in all regions and zones. C2 also does not support regional persistent disks though it would be a great add-on for stateful database services that requires redundancy and high availability. The resources of a C2 instance is too much for a Galera node, so we'll focus on the compute nodes instead, which are ideal.

GCE also uses KVM as its virtualization technology software, whereas Amazon is using Xen. Let's take a look at the compute nodes available in GCE which are suitable for running Galera alongside its equivalence in AWS EC2. Prices differs based on region, but for this chart, we use us-east region using on-demand pricing type for AWS.


Machine/Instance Type

Google Compute Engine






Prices starts at $0.006 -  $0.019 hourly

t2.nano – t3.2xlarge'


Price starts at $0.0058 - $0.3328 hourly


n1-standard-1 – n1-standard-96


Prices starts at $0.034  - $3.193 hourly

m4.large – m4.16xlarge

m5.large – m5d.metal


Prices starts at $0.1 - $5.424  hourly

High Memory/ Memory Optimized

n1-highmem-2 – n1-highmem-96


n1-ultramem-40 – n1-ultramem-160


Prices starts at $0.083  - $17.651 hourly

r4.large – r4.16xlarge

x1.16xlarge – x1.32xlarge

x1e.xlarge – x1e.32xlarge


Prices starts at $0.133  - $26.688 hourly

High CPU/Storage Optimized

n1-highcpu-2 – n1-highcpu-32


Prices starts at $0.05 - $2.383 hourly

h1.2xlarge – h1.16xlarge

i3.large – i3.metal

I3en.large - i3en.metal

d2.xlarge – d2.8xlarge


Prices starts at $0.156 - $10.848  hourly

GCE has a fewer number of available predefined types of compute nodes to choose from, unlike AWS. When it comes to the type of node, however, it has more granularity. This makes it easier to setup and choose what kind of instance you want to use. For example, you can add a disk and set its physical block size (4 is default) to 16 or you can set its mode either read/write or read-only. This allows you to offer the right type of machine or compute instance ready to manage your Galera node. You may also instantiate your compute nodes using Cloud SDK, or by using Cloud APIs, to automate or integrate it to your Continuous Integration, Delivery, or Deployment (CI/CD). 

Pricing (Compute Instance, Disk, vCPU, Memory, and Network)

The price as well depends on the region where its located, the type of OS or licensing (RHEL vs Suse Linux Enterprise), and also the type of disk storage you're using. 

GCP also offers discounts which allows you to economize your resource consumption. For Compute Engine, it provides different discounts to avail. 

Sustained use discounts apply to the following resources:

Take note that sustained use discounts do not apply to VMs created using App Engine Flexible Environment and Cloud Dataflow.

You can also use Committed Use Discounts when you purchase a VMS which is bound to a contract. This type of choice is ideal for predictable workloads and resource needs. When you purchase a committed use contract you purchase a certain amount of vCPUs, memory, GPUs, and local SSDs at a discounted price in return for committing to paying for those resources for 1 year or 3 years. The discount is up to 57% for most resources like machine types or GPUs. The discount is up to 70% for memory-optimized machine types. Once purchased, you are billed monthly for the resources you purchased for the duration of the term you selected (whether you use the services or not). 

A preemptible VM is an instance that you can create and run at a much lower price than normal instances. Compute Engine may, however, terminate (preempt) these instances if it requires access to those resources for other tasks. Preemptible instances use excess Compute Engine capacity, so their availability varies with usage.

If your applications are fault-tolerant and can withstand possible instance preemptions, then preemptible instances can reduce your Compute Engine costs significantly. For example, batch processing jobs can run on preemptible instances. If some of those instances terminate during processing, the job slows but does not completely stop. Preemptible instances complete your batch processing tasks without placing additional workload on your existing instances, and without requiring you to pay full price for additional normal instances.

For Compute Engine, disk size, machine type memory, and network usage are calculated in gigabytes (GB), where 1 GB is 230 bytes. This unit of measurement is also known as a gibibyte (GiB). This means that GCP offers you to only pay based on the resource consumption you have allocated. 

Now, if you have a high-grade, production database application, it's recommendable (and ideal) to attach or add a separate persistent disk. You would then use that disk as your database volume, as it offers you reliable and consistent disk performance in GCE. The higher the size you setup, the higher the IOPS it offers you.  Checkout their list of persistent disk pricing to determine the price you would get. In addition to this, GCE has regional persistent disk which is suitable in case you require more solid and sustainable high-availability within your database cluster. Regional persistent disk adds more redundancy in the case that your instance terminates or crashes or becomes corrupted. It provides synchronous replication of data between two zones in one region which happens transparently in the VM instance. In the unlikely event of  zone failure, your workload can fail-over to another VM instance in the same, or a secondary, zone. You can then force-attach your regional persistent disk to that instance. Force-attach time is estimated in less than one minute.

If you store backups as part of your disaster recovery solution, and requires a volume that is cluster-wide, GCP offers Cloud Filestore, NetApp Cloud Volumes, and some other alternative file-sharing solutions. These are fully-managed services that offers standard and premium services. You can checkout NetApp's pricing page here and Filestore pricing here.

Galera Encryption on GCP

GCP does not include specific support for the type of encryption available for Galera. GCP, however, encrypts customer data stored at rest by default, with no additional action required from you. GCP also offers another option to encrypt your data using Customer-managed encryption keys (CMEK) with Cloud KMS as well as with Customer-supplied encryption keys (CSEK). GCP also uses SSL/TLS encryption for all communications intercepted as data moves between your site and the cloud provider or between two services. This protection is achieved by encrypting the data before transmission; authenticating the endpoints; and decrypting and verifying the data on arrival.

Because Galera uses MySQL under the hood (Percona, MariaDB, or Codership build), you can take advantage of the File Key Management Encryption Plugin by MariaDB or by using the MySQL Keyring plugins. Here's an external blog by Percona which is a good resource on how you can implement this.

Galera Cluster Multi-AZ/Multi-Region/Multi-Cloud Deployments with GCP

Similarly to AWS, GCP does not offer direct support to deploy a Galera cluster on a Multi-AZ/-Region/-Cloud.

Galera Cluster High Availability, Scalability, and Redundancy on GCP

One of the primary reasons to use a Galera node cluster is the high-availability, redundancy, and it's ability to scale. If you are serving traffic globally, it's best that you cater your traffic based by regions with your architectural design including a geo-distribution of your database nodes. In order to achieve this, multi-AZ and multi-region or multi-cloud/multi-datacenter deployment is recommendable and achievable. This prevents the cluster from going down or a cluster malfunction due to lack of quorum. 

To help you more with your scalability design, GCP also has an autoscaler you can set up with an autoscaling group. This will work as long as you created your cluster as managed instance groups. For example, you can monitor the CPU utilization or relying on the metrics from Stackdriver defined in your autoscaling policy. This allows you to provision and automate instances when a certain threshold is reached, or terminate the instances when it goes back to its normal state.

For multi-region or multi-cloud deployment, Galera has its own parameter called gmcast.segment for which you can set this upon server start. This parameter is designed to optimize the communication between the Galera nodes and minimize the amount of traffic sent between network segments. This includes writeset relaying and IST and SST donor selection. This type of setup allows you to deploy multiple nodes in different regions. Aside from that, you can also deploy your Galera nodes on a different cloud vendor routing from GCP, AWS, Microsoft Azure, or within on-premise. 

We recommend you to check out our blog Multiple Data Center Setups Using Galera Cluster for MySQL or MariaDB and Zero Downtime Network Migration With MySQL Galera Cluster Using Relay Node to gather more information on how to implement these types of deployments.

Galera Cluster Database Performance on GCP

Since there's no available support for Galera in GCP your choices depend on the requirements and design of your application’s traffic and resource demands. For queries that are high on memory consumption, you can start with n1-highmem-2 instance. High CPU instances (n1-highcpu* family) can be a good fit if this is a high-transactional database, or a good fit for gaming applications.

Choosing the right storage and required IOPS for your database volume is a must. Generally, SSD-based persistent disk is your choice here. It depends on the volume of traffic is required, you might have to checkout the GCP storage options so you can determine the right size for your application.

We also recommend you to check and read our blog How to Improve Performance of Galera Cluster for MySQL or MariaDB to learn more about optimizing your Galera Cluster.

Galera Data Backups on GCP

Not only does your MySQL Galera data has to be backed-up, you should also backup the entire tier which comprises your database application. This includes log files (logical or binary), external files, temporary files, dump files, etc. Google recommends that you always create a snapshot of your persistent disks volumes which are being used by your GCE instances. You can easily create and schedule snapshots. GCP Snapshots are stored in Cloud Storage and you can select your desired location or region where the backup will be located. You can also setup a schedule for your snapshots as well as set a snapshot retention policy.

You can also use external services like, ClusterControl, which provides you both monitoring and backup solutions. Check this out if you want to know more.

Galera Cluster Database Monitoring on GCP

GCP does not offer database monitoring when using GCE. Monitoring your of instance health can be done through Stackdriver. For the database, though, you will need to grab an external monitoring tool which has advanced, highly-granular database metrics. There are a lot of choices you can choose from such as PMM by Percona, DataDog, Idera, VividCortex, or our very own ClusterControl (Monitoring is FREE with ClusterControl Community.)

Galera Cluster Database Security on GCP

As discussed in our previous blog, you can take the same approach for securing your database in the public cloud. In GCP you can setup a private subnet, firewall rules to only allow the ports required for running Galera (particularly ports 3306, 4444, 4567, 4568). You can use NAT Gateway or setup a bastion host to access your private database nodes. When these nodes are encapsulated they cannot be accessed from the outside of the GCP premises. You can read our previous blog Deploying Secure Multicloud MySQL Replication on AWS and GCP with VPN on how we set this up.

In addition to this, you can secure your data-in-transit by using a TLS/SSL connection or by encrypting your data when it's at rest. If you're using ClusterControl, deploying a secure data in-transit is simple and easy. You can check out our blog SSL Key Management and Encryption of MySQL Data in Transit if you want to try out. For data at-rest, you can follow the discussion I have stated earlier in the Encryption section of this blog.

Galera Cluster Troubleshooting 

GCP offers Stackdriver Logging which you can leverage to help you with observability, monitoring, and notification requirements. The great thing about Stackdriver Logging is that it offers integration with AWS. With it you can catch the events selectively and then raise an alert based on that event. This can keep you in the loop on certain issues which may arise and help you during troubleshooting. GCP also has Cloud Audit Logs which provide you more traceable information from inside the GCP environment, from admin activity, data access, and system events. 

If you're using ClusterControl, going to Logs -> System Logs, and you'll be able to browse the captured error logs taken from the MySQL Galera node itself. Apart from this, ClusterControl provides real-time monitoring that would amplify your alarm and notification system in case an emergency or if your MySQL Galera node(s) is kaput.


The Google Cloud Platform offers a wide-variety of efficient and powerful services that you can leverage. There are indeed pros and cons for each of public cloud platforms, but GCP proves that AWS doesn’t have a lock on the cloud. 

It's interesting that big companies such as Vimeo are moving to GCP coming from on-premise and they experienced some interesting results in their technology stack. Bloomberg as well is happy with GCP and is using Percona XtraDB Cluster (a Galera variant). Let us know what you think about using GCP for MySQL Galera setups in the comments below.

by Paul Namuag at September 02, 2019 09:45 AM

August 31, 2019

Valeriy Kravchuk

Fun with Bugs #89 - On MySQL Bug Reports I am Subscribed to, Part XXIII

I have to celebrate the anniversary of my last day in Oracle (that was 7 years ago!) somehow, and I think writing yet another blog post about Oracle MySQL bugs is a good way to do this. I am actually surprised (and happy) that public bugs database is still alive, maintained and considered important in Oracle, and I know who in Oracle was working hard all these years for this to happen!

In my previous post in this series I've stopped on Bug #95954 and had not completed review of interesting MySQL bug reports that I've subscribed to in June 2019. So, below I start with the next bug in my list, complete review for June and cover some bugs reported in July. There were many.
  • Bug #95957 - "IN operator issue when comparing signed column and the column cast to unsigned". This bug was reported by Manuel Rigger. As far as I can see, MariaDB 10.3 is not affected:
    MariaDB [test]> CREATE TABLE t0(c0 INT);
    Query OK, 0 rows affected (0.518 sec)

    MariaDB [test]> INSERT INTO t0(c0) VALUES(-1);
    Query OK, 1 row affected (0.196 sec)

    MariaDB [test]> SELECT t0.c0 IN (1, CAST(t0.c0 AS UNSIGNED)) from t0;
    | t0.c0 IN (1, CAST(t0.c0 AS UNSIGNED)) |
    |                                     0 |
    1 row in set, 1 warning (0.207 sec)

    MariaDB [test]> show warnings\G
    *************************** 1. row ***************************
      Level: Note
       Code: 1105
    Message: Cast to unsigned converted negative integer to it's positive complement

    1 row in set (0.013 sec)
  • Bug #96001 - "No warning when creating foreign key in MyISAM tables". I am really surprised that this "documented feature" was accepted as a S3 bug when reported by Przemyslaw Malkowski from Percona. But this happened. I also prefer NOT to have unsupported syntax just ignored silently and would like to see a warning (or error in strict mode).
  • Bug #96002 - "'variable log_bin_trust_function_creators' -variable is "hidden"." Consistency is important, as well as correct documentation. So I was happy to see this bug report from Peter Laursen. 75(!) of his bug reports are still active, by the way, some were reported more than 10 years ago. I wonder if anyone is going to check (if not fix) them any time soon.
  • Bug #96079 - "large_tests.innodb_innochecksum_3gb test failing with debug build." Yet another MTR test failure reported by Lalit Choudhary from Percona.
  • Bug #96100 - "Generated column cause a heap-use-after-free error". Probably ASan builds are not tested as carefully in Oracle as they are by some community members, like Zkong Kong who reported this bug. Otherwise they would mark this bug report as a duplicate of some known internal bug.
  • Bug #96108 - "To run mtr "innodb.log_flush_order" MySql Server would be always core down". I've listed the bug as an example that even though code modification is needed to reproduce the crash (reported by Juncai Meng) literally, it was accepted and "Verified". In other reports this sometimes not happen, and the point is that it is surely not a rule carved in stone in Oracle to NOT accept bugs if test case involves code modification. Remember that and fight for your reports if needed.
  • Bug #96128 - "Doc: documentation is inaccurate when InnoDB starts with innodb_read_only". Correct manual matters a lot, so nice to have it corrected in this case by Calvin Sun.
  • Bug #96134 - "Please provide control functions for the IO Thread." I'd be also happy to see a way to control replication I/O thread progress and read logs only up to some position or GTID, per channel. Thanks Jean-François Gagné for this nice feature request.
  • Bug #96142 - "Inconsistent error on slave for Update event on table with non-exists partition". Yet another by report from by Lalit Choudhary. Good to see multiple versions check.
  • Bug #96148 - "using Invisible Index when slave apply EVENT". This bug was reported by Songlei Wang. Consistency matters, so if the index is invisible it should remain invisible for replication SQL thread as well. See also his another Bug #96150 - "'show slave status' show the Inaccurate Last_IO_Error message".
  • Bug #96167 - "Many header files now missing from devel package". As noted by Manuel Ung, now it is impossible to build plugins and UDFs unless users download the source tree, and then copy the headers to the appropriate places. Packaging in hard.
  • Bug #96178 - "mysqldump leaks memory when selected tables are dumped with --order-by-primary". Abhinav Sharma proposed a simple MTR test case to run on ASan build, and suggested a fix. Very nice bug report. Unfortunately I do not see any statements about the results of checking MySQL 8.0.x.
  • Bug #96192 - "Possible race condition with binlog-transaction-dependency-tracking". Bug reporter, Herman Lee, complained about one place in the code where race condition may happen even after the fix for one MySQL bug. Does it really matter for bug verification if he found more places? I fail to see a reason to keep he bug in "Need Feedback" status, when code review is enough to confirm there is a problem in that one part of the code clearly identified.
  • Bug #96196 - "performance_schema_accounts_size and p_s_hosts_size limited by 16384", Nice bug that can be confirmed by code review or just opening many connections was reported by Nikolai Ikhalainen from Percona. Autoscaling is broken/limited in this case.
  • Bug #96340 - "Slow startup for mysql 8.0 with many tables due to the tablespace files scan". This is actually a regression comparing to 5.7, but it's visible on slow disks. It's nice to see useful discussion, explanations and patch suggested (by Sunny Bains) in this bug report created by Lalit Choudhary.
I started this summer in a beautiful Barcelona. This blog posts ends it for me. One of the good changes leaving Oracle seven years ago introduced in my life is a real freedom to work from anywhere and travel as often as I really want, both for work and for fun.
To summarize:
  1. I am happy to see MySQL public bugs database still up and widely used, even though I do not work on it directly for 7 years already. It's a key service and media for cooperation for the entire MySQL Community! Just check how it happens in Bug #96340.
  2. Consistency matters.
  3. I still see cases when the time is wasted at bugs verification stage.
  4. MySQL fine manual still have details explained incorrectly, and this is unfortunate.
  5. Sometime I wonder why Percona engineers and other MySQL Community users manage to find even MTR test failures faster than anyone in Oracle cares to report and fix them. They all know magic (like actually running all tests on debug builds and checking the results, maybe). Or, maybe, they care?
* * *
Shameless self-promotion at the end. First time since 2015 I am going to attend Percona Live conference and speak there. Ticket prices increase on September 1, so using code CMESPEAK-VALERII you’ll get the best deal right now.

by Valerii Kravchuk ( at August 31, 2019 04:22 PM

August 30, 2019


Comparing Failover Times for Amazon Aurora, Amazon RDS, and ClusterControl

If your IT infrastructure is running on AWS, you have probably heard about Amazon Relational Database Service (RDS), an easy way to set up, operate, and scale a relational database in the cloud. It provides cost-effective and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching, and backups. There are a number of database engine offerings for RDS like MySQL, MariaDB, PostgreSQL, Microsoft SQL Server and Oracle Server.

ClusterControl 1.7.3 acts similarly to RDS as it supports database cluster deployment, management, monitoring, and scaling on the AWS platform. It also supports a number of other cloud platforms like Google Cloud Platform and Microsoft Azure. ClusterControl understands the database topology and is capable of performing automatic recovery, topology management, and many more advanced features to take control of your database.

In this blog post, we are going to compare automatic failover times for Amazon Aurora, Amazon RDS for MySQL, and a MySQL Replication setup deployed and managed by ClusterControl. The type of failover that we are going to do is slave promotion in case that the master goes down. This is where the most up-to-date slave takes over the master role in the cluster to resume the database service.

Our Failover Test

To measure the failover time, we are going to run a simple MySQL connect-update test, with a loop to count the SQL statement status that connect to a single database endpoint. The script looks like this:







while true


        echo -n "count $j : "

        num=$(od -A n -t d -N 1 /dev/urandom |tr -d ' ')

        timeout 1 bash -c "mysql -u${_user} -p${_pass} -h${_host} -P${_port} --connect-timeout=1 --disable-reconnect -A -Bse \

        \"UPDATE sbtest.sbtest1 SET k = $num WHERE id = 1\" > /dev/null 2> /dev/null"

        if [ $? -eq 0 ]; then

                echo "OK $(date)"


                echo "Fail ---- $(date)"


        j=$(( $j + 1 ))

        sleep 1


The above Bash script simply connects to a MySQL host and performs an update on a single row with a timeout of 1 second on both Bash and mysql client commands. The timeouts related parameters are required so we can measure the downtime in seconds correctly since mysql client defaults to always reconnect until it reaches the MySQL wait_timeout. We populated a test dataset with the following command beforehand:

$ sysbench \

/usr/share/sysbench/oltp_common.lua \

--db-driver=mysql \

--mysql-host={MYSQL HOST} \

--mysql-user=sbtest \

--mysql-db=sbtest \

--mysql-password=password \

--tables=50 \

--table-size=100000 \


The script reports whether the above query succeeded (OK) or failed (Fail). Sample outputs are shown further down.

Failover with Amazon RDS for MySQL

In our test, we use the lowest RDS offering with the following specs:

  • MySQL version: 5.7.22
  • vCPU: 4
  • RAM: 16 GB
  • Storage type: Provisioned IOPS (SSD)
  • IOPS: 1000
  • Storage: 100Gib
  • Multi-AZ Replication: Yes

After Amazon RDS provisions your DB instance, you can use any standard MySQL client application or utility to connect to the instance. In the connection string, you specify the DNS address from the DB instance endpoint as the host parameter, and specify the port number from the DB instance endpoint as the port parameter.

According to Amazon RDS documentation page, in the event of a planned or unplanned outage of your DB instance, Amazon RDS automatically switches to a standby replica in another Availability Zone if you have enabled Multi-AZ. The time it takes for the failover to complete depends on the database activity and other conditions at the time the primary DB instance became unavailable. Failover times are typically 60-120 seconds.

To initiate a multi-AZ failover in RDS, we performed a reboot operation with "Reboot with Failover" checked, as shown in the following screenshot:

Reboot AWS DB Instance

The following is what being observed by our application:


count 30 : OK Wed Aug 28 03:41:06 UTC 2019

count 31 : OK Wed Aug 28 03:41:07 UTC 2019

count 32 : Fail ---- Wed Aug 28 03:41:09 UTC 2019

count 33 : Fail ---- Wed Aug 28 03:41:11 UTC 2019

count 34 : Fail ---- Wed Aug 28 03:41:13 UTC 2019

count 35 : Fail ---- Wed Aug 28 03:41:15 UTC 2019

count 36 : Fail ---- Wed Aug 28 03:41:17 UTC 2019

count 37 : Fail ---- Wed Aug 28 03:41:19 UTC 2019

count 38 : Fail ---- Wed Aug 28 03:41:21 UTC 2019

count 39 : Fail ---- Wed Aug 28 03:41:23 UTC 2019

count 40 : Fail ---- Wed Aug 28 03:41:25 UTC 2019

count 41 : Fail ---- Wed Aug 28 03:41:27 UTC 2019

count 42 : Fail ---- Wed Aug 28 03:41:29 UTC 2019

count 43 : Fail ---- Wed Aug 28 03:41:31 UTC 2019

count 44 : Fail ---- Wed Aug 28 03:41:33 UTC 2019

count 45 : Fail ---- Wed Aug 28 03:41:35 UTC 2019

count 46 : OK Wed Aug 28 03:41:36 UTC 2019

count 47 : OK Wed Aug 28 03:41:37 UTC 2019


The MySQL downtime as seen by the application side was started from 03:41:09 until 03:41:36 which is around 27 seconds in total. From the RDS events, we can see the multi-AZ failover only happened 15 seconds after actual downtime:

Wed, 28 Aug 2019 03:41:24 GMT Multi-AZ instance failover started.

Wed, 28 Aug 2019 03:41:33 GMT DB instance restarted

Wed, 28 Aug 2019 03:41:59 GMT Multi-AZ instance failover completed.

Once the new database instance restarted around 03:41:33, the MySQL service was then accessible around 3 seconds later.

Failover with Amazon Aurora for MySQL

Amazon Aurora can be considered as a superior version of RDS, with a lot of notable features like faster replication with shared storage, no data loss during failover, and up to 64TB of a storage limit. Amazon Aurora for MySQL is based on the open source MySQL Edition, but is not open source by itself; it is a proprietary, closed-source database. It works similarly with MySQL replication (one and only one master, with multiple slaves) and failover is automatically handled by Amazon Aurora.

According to Amazon Aurora FAQS, if you have an Amazon Aurora Replica, in the same or a different Availability Zone, when failing over, Aurora flips the canonical name record (CNAME) for your DB Instance to point at the healthy replica, which is in turn is promoted to become the new primary. Start-to-finish, failover typically completes within 30 seconds.

If you do not have an Amazon Aurora Replica (i.e. single instance), Aurora will first attempt to create a new DB Instance in the same Availability Zone as the original instance. If unable to do so, Aurora will attempt to create a new DB Instance in a different Availability Zone. From start to finish, failover typically completes in under 15 minutes.

Your application should retry database connections in the event of connection loss.

After Amazon Aurora provisions your DB instance, you will get two endpoints one for the writer and one for the reader. The reader endpoint provides load-balancing support for read-only connections to the DB cluster. The following endpoints are taken from our test setup:

  • writer -
  • reader -

In our test, we used the following Aurora specs:

  • Instance type: db.r5.large
  • MySQL version: 5.7.12
  • vCPU: 2
  • RAM: 16 GB
  • Multi-AZ Replication: Yes

To trigger a failover, simply pick the writer instance -> Actions -> Failover, as shown in the following screenshot:

Amazon Aurora Failover with SysBench

The following output is reported by our application while connecting to the Aurora writer endpoint:


count 37 : OK Wed Aug 28 12:35:47 UTC 2019

count 38 : OK Wed Aug 28 12:35:48 UTC 2019

count 39 : Fail ---- Wed Aug 28 12:35:49 UTC 2019

count 40 : Fail ---- Wed Aug 28 12:35:50 UTC 2019

count 41 : Fail ---- Wed Aug 28 12:35:51 UTC 2019

count 42 : Fail ---- Wed Aug 28 12:35:52 UTC 2019

count 43 : Fail ---- Wed Aug 28 12:35:53 UTC 2019

count 44 : Fail ---- Wed Aug 28 12:35:54 UTC 2019

count 45 : Fail ---- Wed Aug 28 12:35:55 UTC 2019

count 46 : OK Wed Aug 28 12:35:56 UTC 2019

count 47 : OK Wed Aug 28 12:35:57 UTC 2019


The database downtime was started at 12:35:49 until 12:35:56 with total amount of 7 seconds. That's pretty impressive. 

Looking at the database event from Aurora management console, only these two events happened:

Wed, 28 Aug 2019 12:35:50 GMT A new writer was promoted. Restarting database as a reader.

Wed, 28 Aug 2019 12:35:55 GMT DB instance restarted

It doesn't take much time for Aurora to promote a slave to become a master, and demote the master to become a slave. Note that all Aurora replicas share the same underlying volume with the primary instance and this means that replication can be performed in milliseconds as updates made by the primary instance are instantly available to all Aurora replicas. Therefore, it has minimal replication lag (Amazon claimed to be 100 milliseconds and less). This will greatly reduce the health check time and improve the recovery time significantly.

Failover with ClusterControl

In this example, we imitate a similar setup with Amazon RDS using m5.xlarge instances, with a ProxySQL in between to automate the failover from application using a single endpoint access just like RDS. The following diagram illustrates our architecture:

ClusterControl with ProxySQL

Since we are having direct access to the database instances, we would trigger an automatic failover by simply killing the MySQL process on the active master:

$ kill -9 $(pidof mysqld)

The above command triggered an automatic recovery inside ClusterControl:

[11:08:49]: Job Completed.

[11:08:44]: Flushing logs to update 'SHOW SLAVE HOSTS'

[11:08:39]: Flushing logs to update 'SHOW SLAVE HOSTS'

[11:08:39]: Failover Complete. New master is

[11:08:39]: Attaching slaves to new master.

[11:08:39]: Command 'RESET SLAVE /*!50500 ALL */' succeeded.

[11:08:39]: Executing 'RESET SLAVE /*!50500 ALL */'.

[11:08:39]: Successfully stopped slave.

[11:08:39]: Stopping slave.

[11:08:39]: Successfully stopped slave.

[11:08:39]: Stopping slave.

[11:08:38]: Setting read_only=OFF and super_read_only=OFF.

[11:08:38]: Successfully stopped slave.

[11:08:38]: Stopping slave.

[11:08:38]: Stopping slaves.

[11:08:38]: Completed preparations of candidate.

[11:08:38]: Applied 0 transactions. Remaining: .

[11:08:38]: waiting up to 4294967295 seconds before timing out.

[11:08:38]: Checking if the candidate has relay log to apply.

[11:08:38]: preparing candidate.

[11:08:38]: No errant transactions found.

[11:08:38]: Skipping, same as slave

[11:08:38]: Checking for errant transactions.

[11:08:37]: Setting read_only=ON and super_read_only=ON.

[11:08:37]: Can't connect to MySQL server on '' (115)

[11:08:37]: Setting read_only=ON and super_read_only=ON.

[11:08:37]: Failed to CREATE USER rpl_user. Error: Query  failed: Can't connect to MySQL server on '' (115).

[11:08:36]: Creating user 'rpl_user'@'

[11:08:36]: Executing GRANT REPLICATION SLAVE 'rpl_user'@''.

[11:08:36]: Creating user 'rpl_user'@'

[11:08:36]: Elected as the new Master.

[11:08:36]: Slave lag is 0 seconds.

[11:08:36]: to slave list

[11:08:36]: Checking if slave can be used as a candidate.

[11:08:33]: Trying to shutdown the failed master if it is up.

[11:08:32]: Setting read_only=ON and super_read_only=ON.

[11:08:31]: Setting read_only=ON and super_read_only=ON.

[11:08:30]: Setting read_only=ON and super_read_only=ON.

[11:08:30]: ioerrno=2003 io running 0

[11:08:30]: Checking

[11:08:30]: REPL_UNDEFINED


[11:08:30]: Failover to a new Master.

Job spec: Failover to a new Master.

While from our test application point-of-view, the downtime happened at the following time while connecting to ProxySQL host port 6033:


count 1 : OK Wed Aug 28 11:08:24 UTC 2019

count 2 : OK Wed Aug 28 11:08:25 UTC 2019

count 3 : OK Wed Aug 28 11:08:26 UTC 2019

count 4 : Fail ---- Wed Aug 28 11:08:28 UTC 2019

count 5 : Fail ---- Wed Aug 28 11:08:30 UTC 2019

count 6 : Fail ---- Wed Aug 28 11:08:32 UTC 2019

count 7 : Fail ---- Wed Aug 28 11:08:34 UTC 2019

count 8 : Fail ---- Wed Aug 28 11:08:36 UTC 2019

count 9 : Fail ---- Wed Aug 28 11:08:38 UTC 2019

count 10 : OK Wed Aug 28 11:08:39 UTC 2019

count 11 : OK Wed Aug 28 11:08:40 UTC 2019


By looking at both the recovery job events and the output from our application, the MySQL database node was down 4 seconds before the cluster recovery job starts, from 11:08:28 until 11:08:39, with total MySQL downtime of 11 seconds. One of the most impressive things about ClusterControl is, you can track the recovery progress on what action being taken and performed by ClusterControl during the failover. It provides a level of transparency that you won't be able to get with any database offerings by cloud providers.

For MySQL/MariaDB/PostgreSQL replication, ClusterControl allows you to have a more fine-grained against your databases with the support of the following advanced configuration and parameters:

  • Master-master replication topology management
  • Chain replication topology management
  • Topology viewer
  • Whitelist/Blacklist slaves to be promoted as master
  • Errant transaction checker
  • Pre/post, success/fail failover/switchover events hook with external script
  • Automatic rebuild slave on error
  • Scale out slave from existing backup

Failover Time Summary

In terms of failover time, Amazon RDS Aurora for MySQL is the clear winner with 7 seconds, followed by ClusterControl 11 seconds and Amazon RDS for MySQL with 27 seconds

Note that this is just a simple test, with one client and one transaction per second to measure the fastest recovery time. Large transactions or a lengthy recovery process can increase failover time e.g, long running transactions may take long time rolling back when shutting down MySQL.


by ashraf at August 30, 2019 09:45 AM

August 29, 2019


Cloud Vendor Deep-Dive: PostgreSQL on Google Cloud Platform (GCP)

Where to Start?

The best place I could find to start was none other than the official documentation. There is also a GCP Youtube channel for those who prefer multimedia. Once finding myself into the Cloud SQL documentation land I turned to Concepts where we are promised to “develop a deep understanding” of the product.

So let’s get started!

PostgreSQL Google Cloud Features

Google Cloud SQL for PostgreSQL offers all the standard features we’d expect from a managed solution: high availability with automatic failover, automatic backups, encryption at rest and in transit, advanced logging and monitoring, and of course a rich API to interact with all services.

And for a bit of history, PostgreSQL support started in March 2017, up to then the only supported database engine was MySQL.

Cloud SQL runs PostgreSQL on Google’s Second Generation computing platform. The full list of features is available here and also here. Reviewing the former it is apparent that there was never a First Generation platform for PostgreSQL.

Databases running on the Second Generation platform are expected to run at speeds 7x faster and benefit of 20x more storage capacity. The blog announcing the Second Generation platform goes into the details of running the sysbench test to compare Google Cloud SQL with the then main competitor AWS in both incarnations RDS, and Aurora. The results did surprise me as they show Cloud SQL performing better whereas the recent tests performed using the AWS Benchmark released about a year later concluded the opposite. That is around the same time PostgreSQL support was available. While I’m itching at the idea of running the benchmark myself, I’m guessing that there are two potential factors that could have influenced the results: Google’s sysbench benchmark used different parameters and AWS may have improved their products during that time.

GCP PostgreSQL Compatibility

As expected Google Cloud SQL for PostgreSQL is almost a drop-in replacement for the community version and supports all PL/pgSQL SQL procedural languages.

Some features are not available due to security reasons, for example SUPERUSER access. Other features were removed due to potential risks posed to product stability and performance. Lastly, some options and parameters cannot be changed, although requests to change that behavior can be made via the Cloud SQL Discussion Group.

Cloud SQL is also wire compatible with the PostgreSQL protocol.

When it comes to transaction isolation Cloud SQL follows the PostgreSQL default behavior, defaulting to Read Committed isolation level.

For some of the server configuration parameters, Cloud SQL implements different ranges for reasons unexplained in the documentation, still an important thing to remember.


There are multiple ways for connecting to the database, depending on whether the instance is on a private network or a public network (applications connecting from outside GCP). Common to both cases is the predefined VPC managed by Google where all Cloud SQL database instances reside.

Private IP

Clients connecting to a private IP address are routed via a peering connection between the VPCs hosting the client and respectively the database instance. Although not specific to PostgreSQL it is important to review the network requirements, in order to avoid connection issues. One gotcha: once enabled, the private IP capability cannot be removed.

Connecting from External Applications

Connections from applications hosted outside GCP, can, and should be encrypted. Additionally, in order to avoid the various attacks, client connections and application must install the provided client certificate. The procedure for generating and configuring the certificates it’s somewhat complicated, requiring custom tools to ensure that certificates are renewed periodically. That may be one of the reasons why Google offers the option of using the Cloud SQL Proxy.

Connecting Using Cloud SQL Proxy

The setup is fairly straightforward, which in fact, I’ve found to be the case for all instructions in the Google Cloud SQL documentation. On a related note, submitting documentation feedback is dead simple, and the screenshot feature was a first for me.

There are multiple ways to authorize proxy connections and I chose to configure a service account, just as outlined in the Cloud SQL Proxy documentation.

Once everything is in place it’s time to start the proxy:

~/usr/local/google $ ./cloud_sql_proxy -instances=omiday:us-west1:s9s201907141919=tcp:5432 -credential_file=omiday-427c34fce588.json

2019/07/14 21:22:43 failed to setup file descriptor limits: failed to set rlimit {&{8500 4096}} for max file descriptors: invalid argument

2019/07/14 21:22:43 using credential file for authentication;

2019/07/14 21:22:43 Listening on for omiday:us-west1:s9s201907141919

2019/07/14 21:22:43 Ready for new connections

To connect to the remote instance we are now using the proxy by specifying localhost instead of the instance public IP address:

~ $ psql "user=postgres dbname=postgres password=postgres hostaddr="

Pager usage is off.

psql (11.4, server 9.6.11)

Type "help" for help.

Note that there is no encryption since we are connecting locally and the proxy takes care of encrypting the traffic flowing into the cloud.

A common DBA task is viewing the connections to the database by querying pg_stat_activity. The documentation states that proxy connections will be displayed as cloudsqlproxy~ so I wanted to verify that claim. I’ve opened two sessions as postgres, one via proxy and the other one from my home address, so the following query will do:

postgres@127:5432 postgres> select * from pg_stat_activity where usename = 'postgres';

-[ RECORD 1 ]----+-----------------------------------------------------------

datid            | 12996

datname          | postgres

pid              | 924

usesysid         | 16389

usename          | postgres

application_name | psql

client_addr      |

client_hostname  |

client_port      | -1

backend_start    | 2019-07-15 04:25:37.614205+00

xact_start       | 2019-07-15 04:28:43.477681+00

query_start      | 2019-07-15 04:28:43.477681+00

state_change     | 2019-07-15 04:28:43.477684+00

wait_event_type  |

wait_event       |

state            | active

backend_xid      |

backend_xmin     | 8229

query            | select * from pg_stat_activity where usename = 'postgres';

-[ RECORD 2 ]----+-----------------------------------------------------------

datid            | 12996

datname          | postgres

pid              | 946

usesysid         | 16389

usename          | postgres

application_name | psql

client_addr      | <MY_HOME_IP_ADDRESS>

client_hostname  |

client_port      | 60796

backend_start    | 2019-07-15 04:27:50.378282+00

xact_start       |

query_start      |

state_change     | 2019-07-15 04:27:50.45613+00

wait_event_type  |

wait_event       |

state            | idle

backend_xid      |

backend_xmin     |

query            |

It appears that the proxy connections are instead identified as client_port == -1 and an empty client_addr. This can be additionally confirmed by comparing the timestamps for backend_start and proxy log below:

2019/07/14 21:25:37 New connection for "omiday:us-west1:s9s201907141919"

PostgreSQL High Availability on Google Cloud

Google Cloud SQL for PostgreSQL ensures high availability using low level storage data synchronization by means of regional persistent disks. Failover is automatic, with a heartbeat check interval of one second, and a failover triggered after about 60 seconds.

Performance and Monitoring

The Performance section of the documentation points out general cloud rules of thumb: keep the database (both writer and read replicas) close to the application, and vertically scale the instance. What stands out is the recommendation to provisioning an instance with at least 60 GB of RAM when performance is important.

Stackdriver provides monitoring and logging, as well access to PostgreSQL logs:

Stackdriver PostgreSQL Logs

Access Control

This is implemented at project, instance and database level.

Project Access Control

Project access control is the cloud specific access control — it uses the concept of IAM roles in order to allow project members (users, groups, or service accounts) access to various Cloud SQL resources. The list of roles is somewhat self-explanatory, for a detailed description of each role and associated permissions refer to APIs Explorer, or Cloud SQL Admin API for one of the supported programming languages.

To demonstrate how IAM roles work let’s create a read-only (viewer) service account:

IAM Service Account setup

Start a new proxy instance on port 5433 using the service account associated with the viewer role:

~/usr/local/google $ ./cloud_sql_proxy -instances=omiday:us-west1:s9s201907141919=tcp:5433 -credential_file=omiday-4508243deca9.json

2019/07/14 21:49:56 failed to setup file descriptor limits: failed to set rlimit {&{8500 4096}} for max file descriptors: invalid argument

2019/07/14 21:49:56 using credential file for authentication;

2019/07/14 21:49:56 Listening on for omiday:us-west1:s9s201907141919

2019/07/14 21:49:56 Ready for new connections

Open a psql connection to

~ $ psql "user=postgres dbname=postgres password=postgres hostaddr= port=5433"

The command exits with:

psql: server closed the connection unexpectedly

      This probably means the server terminated abnormally

      before or while processing the request.

Oops! Let’s check the proxy logs:

2019/07/14 21:50:33 New connection for "omiday:us-west1:s9s201907141919"

2019/07/14 21:50:33 couldn't connect to "omiday:us-west1:s9s201907141919": ensure that the account has access to "omiday:us-west1:s9s201907141919" (and make sure there's no typo in that name). Error during createEphemeral for omiday:us-west1:s9s201907141919: googleapi: Error 403: The client is not authorized to make this request., notAuthorized

Instance Access Control

Instance-level access is dependent on the connection source:

Access based on the connection source Viorel Tabara Viorel Tabara 2:52 AM Jul 16 Automated Backups Viorel Tabara Viorel Tabara 3:18 AM Jul 16 AppEngine documentation: connectivity limits

The combination of authorization methods replaces the ubiquitous pg_hba.conf.

Backup and Recovery

By default automated backups are enabled:

Automated Backups

While backups do not affect database read and write operations they do impact the performance and therefore it is recommended that backups be scheduled during periods of lower activity.

For redundancy, backups can be stored in two regions (additional charges apply) with the option of selecting custom locations.

In order to save on storage space, use compression. .gz compressed files are transparently restored.

Cloud SQL also supports instance cloning. For the smallest dataset the operation took about 3 minutes:

Cloning start time 10:07:10:

PostgreSQL logs for cloned instance

The PostgreSQL logs show that PostgreSQL became available on the cloned instance at 10:10:47:

PostgreSQL logs for cloned instance

That is still an easier way than backup and restore, for creating a copy of an instance for testing, development or troubleshooting purposes.

Google Cloud Best Practices for PostgreSQL

  • Configure an activation policy for instances that are not required to be running 24/7.
  • Place the database instance in the same zone, or region, with the compute engine instances and App Engine applications in order to avoid network latency.
  • Create the database instance in the same zone as the Compute Engine. If using any other connection type accept the default zone.
  • Users created using Cloud SQL are by default cloud superusers. Use PostgreSQL ALTER ROLE to modify their permissions.
  • Use the latest Cloud SQL Proxy version.
  • Instance names should include a timestamp in order to be able to reuse the name when deleting and recreating instances.
  • pg_dump defaults to including large objects. If the database contains BLOB-s perform the dump during periods of low activity to prevent the instance from becoming unresponsive.
  • Use gcloud sql connect to quickly connect from an external client without the need to whitelist the client IP address.
  • Subscribe to announce group in order to receive notifications on product updates and alerts such as issues when creating instances:
Google Cloud SQL announce group
Maintenance timing options

Launch Checklist for Cloud SQL

The checklist section in the documentation provides an overview of recommended activities when setting up a production ready Cloud SQL for PostgreSQL instance. In particular, applications must be designed to handle Cloud SQL restarts. Also, while there are no queries per second limits there are connection limits.

PostgreSQL GCP Extensions Support

Cloud SQL supports most of the PostgreSQL extensions. As of this writing out of 52 community extensions there are 22 unsupported extensions and 2 unsupported PostGIS extensions.



For PostgreSQL extensions we can either review the PostgreSQL contrib repository, or better, diff the output of pg_available_extensions:


~ $ psql -U postgres -p 54396

Pager usage is off.

psql (11.4, server 9.6.14)

Type "help" for help.

postgres@[local]:54396 postgres# select * from pg_available_extensions order by name;

      name        | default_version | installed_version |                               comment


adminpack          | 1.1 |                   | administrative functions for PostgreSQL

autoinc            | 1.0 |                   | functions for autoincrementing fields

bloom              | 1.0 |                   | bloom access method - signature file based index

btree_gin          | 1.0 |                   | support for indexing common datatypes in GIN

btree_gist         | 1.2 |                   | support for indexing common datatypes in GiST

chkpass            | 1.0 |                   | data type for auto-encrypted passwords

citext             | 1.3 |                   | data type for case-insensitive character strings

cube               | 1.2 |                   | data type for multidimensional cubes

dblink             | 1.2 |                   | connect to other PostgreSQL databases from within a database

dict_int           | 1.0 |                   | text search dictionary template for integers

dict_xsyn          | 1.0 |                   | text search dictionary template for extended synonym processing

earthdistance      | 1.1 |                   | calculate great-circle distances on the surface of the Earth

file_fdw           | 1.0 |                   | foreign-data wrapper for flat file access

fuzzystrmatch      | 1.1 |                   | determine similarities and distance between strings

hstore             | 1.4 |                   | data type for storing sets of (key, value) pairs

hstore_plperl      | 1.0 |                   | transform between hstore and plperl

hstore_plperlu     | 1.0 |                   | transform between hstore and plperlu

hstore_plpython2u  | 1.0 |                   | transform between hstore and plpython2u

hstore_plpythonu   | 1.0 |                   | transform between hstore and plpythonu

insert_username    | 1.0 |                   | functions for tracking who changed a table

intagg             | 1.1 |                   | integer aggregator and enumerator (obsolete)

intarray           | 1.2 |                   | functions, operators, and index support for 1-D arrays of integers

isn                | 1.1 |                   | data types for international product numbering standards

lo                 | 1.1 |                   | Large Object maintenance

ltree              | 1.1 |                   | data type for hierarchical tree-like structures

ltree_plpython2u   | 1.0 |                   | transform between ltree and plpython2u

ltree_plpythonu    | 1.0 |                   | transform between ltree and plpythonu

moddatetime        | 1.0 |                   | functions for tracking last modification time

pageinspect        | 1.5 |                   | inspect the contents of database pages at a low level

pg_buffercache     | 1.2 |                   | examine the shared buffer cache

pg_freespacemap    | 1.1 |                   | examine the free space map (FSM)

pg_prewarm         | 1.1 |                   | prewarm relation data

pg_stat_statements | 1.4             | | track execution statistics of all SQL statements executed

pg_trgm            | 1.3 |                   | text similarity measurement and index searching based on trigrams

pg_visibility      | 1.1 |                   | examine the visibility map (VM) and page-level visibility info

pgcrypto           | 1.3 |                   | cryptographic functions

pgrowlocks         | 1.2 |                   | show row-level locking information

pgstattuple        | 1.4 |                   | show tuple-level statistics

plpgsql            | 1.0 | 1.0               | PL/pgSQL procedural language

postgres_fdw       | 1.0 |                   | foreign-data wrapper for remote PostgreSQL servers

refint             | 1.0 |                   | functions for implementing referential integrity (obsolete)

seg                | 1.1 |                   | data type for representing line segments or floating-point intervals

sslinfo            | 1.2 |                   | information about SSL certificates

tablefunc          | 1.0 |                   | functions that manipulate whole tables, including crosstab

tcn                | 1.0 |                   | Triggered change notifications

timetravel         | 1.0 |                   | functions for implementing time travel

tsearch2           | 1.0 |                   | compatibility package for pre-8.3 text search functions

tsm_system_rows    | 1.0 |                   | TABLESAMPLE method which accepts number of rows as a limit

tsm_system_time    | 1.0 |                   | TABLESAMPLE method which accepts time in milliseconds as a limit

unaccent           | 1.1 |                   | text search dictionary that removes accents

uuid-ossp          | 1.1 |                   | generate universally unique identifiers (UUIDs)

xml2               | 1.1 |                   | XPath querying and XSLT

Cloud SQL:

postgres@127:5432 postgres> select * from pg_available_extensions where name !~ '^postgis' order by name;

      name        | default_version | installed_version |                              comment


bloom              | 1.0 |                   | bloom access method - signature file based index

btree_gin          | 1.0 |                   | support for indexing common datatypes in GIN

btree_gist         | 1.2 |                   | support for indexing common datatypes in GiST

chkpass            | 1.0 |                   | data type for auto-encrypted passwords

citext             | 1.3 |                   | data type for case-insensitive character strings

cube               | 1.2 |                   | data type for multidimensional cubes

dict_int           | 1.0 |                   | text search dictionary template for integers

dict_xsyn          | 1.0 |                   | text search dictionary template for extended synonym processing

earthdistance      | 1.1 |                   | calculate great-circle distances on the surface of the Earth

fuzzystrmatch      | 1.1 |                   | determine similarities and distance between strings

hstore             | 1.4 |                   | data type for storing sets of (key, value) pairs

intagg             | 1.1 |                   | integer aggregator and enumerator (obsolete)

intarray           | 1.2 |                   | functions, operators, and index support for 1-D arrays of integers

isn                | 1.1 |                   | data types for international product numbering standards

lo                 | 1.1 |                   | Large Object maintenance

ltree              | 1.1 |                   | data type for hierarchical tree-like structures

pg_buffercache     | 1.2 |                   | examine the shared buffer cache

pg_prewarm         | 1.1 |                   | prewarm relation data

pg_stat_statements | 1.4             | | track execution statistics of all SQL statements executed

pg_trgm            | 1.3 |                   | text similarity measurement and index searching based on trigrams

pgcrypto           | 1.3 |                   | cryptographic functions

pgrowlocks         | 1.2 |                   | show row-level locking information

pgstattuple        | 1.4 |                   | show tuple-level statistics

plpgsql            | 1.0 | 1.0               | PL/pgSQL procedural language

sslinfo            | 1.2 |                   | information about SSL certificates

tablefunc          | 1.0 |                   | functions that manipulate whole tables, including crosstab

tsm_system_rows    | 1.0 |                   | TABLESAMPLE method which accepts number of rows as a limit

tsm_system_time    | 1.0 |                   | TABLESAMPLE method which accepts time in milliseconds as a limit

unaccent           | 1.1 |                   | text search dictionary that removes accents

uuid-ossp          | 1.1 |                   | generate universally unique identifiers (UUIDs)

Unsupported extensions in Cloud SQL:

adminpack          1.1 administrative functions for PostgreSQL

autoinc            1.0 functions for autoincrementing fields

dblink             1.2 connect to other PostgreSQL databases from within a database

file_fdw           1.0 foreign-data wrapper for flat file access

hstore_plperl      1.0 transform between hstore and plperl

hstore_plperlu     1.0 transform between hstore and plperlu

hstore_plpython2u  1.0 transform between hstore and plpython2u

hstore_plpythonu   1.0 transform between hstore and plpythonu

insert_username    1.0 functions for tracking who changed a table

ltree_plpython2u   1.0 transform between ltree and plpython2u

ltree_plpythonu    1.0 transform between ltree and plpythonu

moddatetime        1.0 functions for tracking last modification time

pageinspect        1.5 inspect the contents of database pages at a low level

pg_freespacemap    1.1 examine the free space map (FSM)

pg_visibility      1.1 examine the visibility map (VM) and page-level visibility info

postgres_fdw       1.0 foreign-data wrapper for remote PostgreSQL servers

refint             1.0 functions for implementing referential integrity (obsolete)

seg                1.1 data type for representing line segments or floating-point intervals

tcn                1.0 Triggered change notifications

timetravel         1.0 functions for implementing time travel

tsearch2           1.0 compatibility package for pre-8.3 text search functions

xml2               1.1 XPath querying and XSLT


Operations performed within Cloud SQL are logged under the Activity tab along with all the details. Example from creating an instance, showing all instance details:

Activity log for creating an instance

PostgreSQL Migration to GCP

In order to provide migration of on-premises PostgreSQL installations, Google takes advantage of pgBouncer.

Cloud SQL Console: Migration Wizard - start migration
Cloud SQL Console: Migration Wizard - not available for PostgreSQL

Note that there is no GCP Console wizard for PostgreSQL migrations.

DBA Beware!

High Availability and Replication

A master node cannot failover to a read replica. The same section outlines other important aspects of read replicas:

  • can be taken offline at any time for patching
  • do not follow the master node in another zone following a failover — since the replication is synchronous this can affect the replication lag
  • there is no load balancing between replicas, in other words, no single endpoint applications can be pointed to
  • replica instance size must be at least the size of the master node
  • no cross-region replication
  • replicas cannot be backed up
  • all replicas must be deleted before a master instance can be restored from backup or deleted
  • cascading replication is not available


By default, the “cloud superuser” is postgres which is a member of the cloudsqlsuperuser role. In turn, cloudsqlsuperuser inherits the default PostgreSQL roles:

postgres@35:5432 postgres> \du+ postgres

                           List of roles

Role name  | Attributes       | Member of | Description


postgres   | Create role, Create DB | {cloudsqlsuperuser} |

postgres@35:5432 postgres> \du+ cloudsqlsuperuser

                              List of roles

   Role name       | Attributes       | Member of | Description


cloudsqlsuperuser  | Create role, Create DB | {pg_monitor} |

Note that the roles SUPERUSER and REPLICATION are not available.

Backup and Recovery

Backups cannot be exported.

Backups cannot be used for upgrading an instance i.e. restoring into a different PostgreSQL engine.

Features such as PITR, Logical Replication, and JIT Compilation are not available. Feature requests can be filed in the Google’s Issue Tracker.

Google Issue Tracker - PostgreSQL feature request


At instance creation SSL/TLS is enabled but not enforced:

Creating an instance: encryption is enabled but not enforced

In this mode encryption can be requested, however certificate validation is not available.

~ $ psql "sslmode=verify-ca user=postgres dbname=postgres password=postgres hostaddr="

psql: root certificate file "/home/lelu/.postgresql/root.crt" does not exist

Either provide the file or change sslmode to disable server certificate verification.

~ $ psql "sslmode=require user=postgres dbname=postgres password=postgres hostaddr="

Pager usage is off.

psql (11.4, server 9.6.11)

SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES128-GCM-SHA256, bits: 128, compression: off)

Type "help" for help.

Attempting to connect using psql to an SSL enforced instance will return a self-explanatory error:

~ $ psql "sslmode=require user=postgres dbname=postgres password=postgres hostaddr="

psql: FATAL:  connection requires a valid client certificate


  • Storage can be increased after instance creation but never decreased so watch out for costs associated with the growing storage space, or configure the increase limit.
  • Storage is limited to 30 TB.


Instances can be created with less than one core, however, the option isn’t available in the Cloud SQL Console as the instance must be created by specifying one of the sample machine types, in this case –tier:

Cloud SQL Console: shared-code (less than one CPU) instance setting is not available

Example of creating a shared-code instance using gcloud inside Cloud Shell:

Cloud Shell: creating a shared-code instance

The number of CPUs is limited to 64, a relatively low limit for large installations, considering that back when 9.2 was benchmarked high-end servers started at 32 cores.

Instance Locations

Multi-regional location is only available for backups.

Access via Public IP

By default, the GCP Console Wizard enables only public IP address access, however, access is denied until the client’s network is configured:

Creating an instance: connectivity options


Updates may exceed the maintenance window and read replicas are updated at any time.

The documentation doesn’t specify how long the maintenance window duration is. The information is provided when creating the instance:

Maintenance window: one-hour duration

Changes to CPU count, memory size, or the zone where the instance is located requires the database to be offline for several minutes.


Cloud SQL uses the terms “role” and “user” interchangebly.

High Availability

Cost in a highly available configuration is double the standalone instance, and that includes storage.

Automatic failover is initiated after about 60 seconds following the primary node becoming unavailable. According to Oracle MAA report, this translates into $5,800 per minute loss. Considering that it takes 2 to 3 minutes until the applications can reconnect the outage doubles to triples. Additionally, the 60 seconds heartbeat interval doesn’t appear to be a configurable option.


Read replicas cannot be accessed using a single endpoint, each receiving a new IP address:

Read replicas: each instance receives an IP address

Regional persistent disks provide data redundancy at the cost of write performance.

Cloud SQL will not failover to read replicas, hence readers cannot be considered a high availability solution

External replicas and external masters are currently not supported.

Connecting to Instance

Google does not automatically renew the instance SSL certificates, however, both the initiation and rotation procedures can be automated.

If the application is built on the App Engine platform additional limits apply, such as 60 seconds for a database request to complete, 60 concurrent connections for PHP applications. The “App Engine Limits” section in Quotas and limits provides more details:

AppEngine documentation: connectivity limits

IP addresses in the range are reserved.


Once started, operations cannot be canceled. Runaway queries can still be stopped by using the pg_terminate_backend and pg_cancel_backend PostgreSQL built-in functions.

A short demonstration using two psql sessions and starting a long running query in the second session:

postgres@35:5432 postgres> select now(); select pg_sleep(3600); select now();



2019-07-16 02:08:18.739177+00

(1 row)

In the first session, cancel the long running query:

postgres@35:5432 postgres> select pid, client_addr, client_port, query, backend_start from pg_stat_activity where usename = 'postgres';

-[ RECORD 1 ]-+-------------------------------------------------------------------------------------------------------------

pid           | 2182

client_addr   |

client_port   | 56208

query         | select pid, client_addr, client_port, query, backend_start from pg_stat_activity where usename = 'postgres';

backend_start | 2019-07-16 01:57:34.99011+00

-[ RECORD 2 ]-+-------------------------------------------------------------------------------------------------------------

pid           | 2263

client_addr   |

client_port   | 56276

query         | select pg_sleep(3600);

backend_start | 2019-07-16 02:07:43.860829+00

postgres@35:5432 postgres> select pg_cancel_backend(2263); select now();

-[ RECORD 1 ]-----+--

pg_cancel_backend | t

-[ RECORD 1 ]----------------------

now | 2019-07-16 02:09:09.600399+00

Comparing the timestamps between the two sessions:

ERROR:  canceling statement due to user request



2019-07-16 02:09:09.602573+00

(1 row)

It’s a match!

While restarting an instance is a recommended method when attempting to resolve database instance issues, avoid restarting before the first restart completed.

Data Import and Export

CSV import/export is limited to one database.

Exporting data as an SQL dump that can be imported later, requires a custom pg_dump command.

To quote from the documentation:

pg_dump -U [USERNAME] --format=plain --no-owner --no-acl [DATABASE_NAME] \



Charge Type

Instance ON

Instance OFF









All actions are recorded and can be viewed under the Activity tab.


Review the Diagnosing Issues with Cloud SQL instances and Known issues sections in the documentation.


Although missing some important features the PostgreSQL DBA is used to, namely PITR and Logical Replication, Google Cloud SQL provides out of the box high-availability, replication, encryption, and automatic storage increase, just to name a few, making manage PostgreSQL an appealing solution for organizations looking to quickly deploy their PostgreSQL workloads or even migrating from Oracle.

Developers can take advantage of cheap instances such as shared CPU (less than one CPU).

Google approaches the PostgreSQL engine adoption in a conservative manner, the stable offering lagging behind current upstream by 3 versions.

Just as with any solution provider consider getting support which can come in handy during edge scenarios such as when instances are suspended.

For professional support, Google maintains a list of partners which currently includes one of the PostgreSQL professional services , namely EDB.

by Viorel Tabara at August 29, 2019 09:45 AM

August 28, 2019


An Overview of the Various Scan Methods in PostgreSQL

In any of the relational databases engines, it is required to generate a best possible plan which corresponds to the execution of the query with least time and resources. Generally, all databases generate plans in a tree structure format, where the leaf node of each plan tree is called table scan node. This particular node of the plan corresponds to the algorithm to be used to fetch data from the base table.

For example, consider a simple query example as SELECT * FROM TBL1, TBL2 where TBL2.ID>1000; and suppose the plan generated is as below:

PostgreSQL Sample Plan Tree

So in the above plan tree, “Sequential Scan on TBL1” and “Index Scan on TBL2” corresponds to table scan method on table TBL1 and TBL2 respectively. So as per this plan, TBL1 will be fetched sequentially from the corresponding pages and TBL2 can be accessed using INDEX Scan.

Choosing the right scan method as part of the plan is very important in terms of overall query performance.

Before getting into all types of scan methods supported by PostgreSQL, let’s revise some of the major key points which will be used frequently as we go through the blog.

PostgreSQL Data Layout
  • HEAP: Storage area for storing the whole row of the table. This is divided into multiple pages (as shown in the above picture) and each page size is by default 8KB. Within each page, each item pointer (e.g. 1, 2, ….) points to data within the page.
  • Index Storage: This storage stores only key values i.e. columns value contained by index. This is also divided into multiple pages and each page size is by default 8KB.
  • Tuple Identifier (TID): TID is 6 bytes number which consists of two parts. The first part is 4-byte page number and remaining 2 bytes tuple index inside the page. The combination of these two numbers uniquely points to the storage location for a particular tuple.

Currently, PostgreSQL supports below scan methods by which all required data can be read from the table:

  • Sequential Scan
  • Index Scan
  • Index Only Scan
  • Bitmap Scan
  • TID Scan

Each of these scan methods are equally useful depending on the query and other parameters e.g. table cardinality, table selectivity, disk I/O cost, random I/O cost, sequence I/O cost, etc. Let’s create some pre-setup table and populate with some data, which will be used frequently to better explain these scan methods.

postgres=# CREATE TABLE demotable (num numeric, id int);


postgres=# CREATE INDEX demoidx ON demotable(num);


postgres=# INSERT INTO demotable SELECT random() * 1000,  generate_series(1, 1000000);

INSERT 0 1000000

postgres=# analyze;


So in this example, one million records are inserted and then the table is analyzed so that all statistics are up to date.

Sequential Scan

As the name suggests, a Sequential scan of a table is done by sequentially scanning all item pointers of all pages of the corresponding tables. So if there are 100 pages for a particular table and then there are 1000 records in each page, as part of sequential scan it will fetch 100*1000 records and check if it matches as per isolation level and also as per the predicate clause. So even if only 1 record is selected as part of the whole table scan, it will have to scan 100K records to find a qualified record as per the condition.

As per the above table and data, the following query will result in a sequential scan as the majority of data are getting selected.

postgres=# explain SELECT * FROM demotable WHERE num < 21000;

                             QUERY PLAN


 Seq Scan on demotable  (cost=0.00..17989.00 rows=1000000 width=15)

   Filter: (num < '21000'::numeric)

(2 rows)


Though without calculating and comparing plan cost, it is almost impossible to tell which kind of scans will be used. But in order for the sequential scan to be used at-least below criteria should match:

  1. No Index available on key, which is part of the predicate.
  2. Majority of rows are getting fetched as part of the SQL query.


In case only very few % of rows are getting fetched and the predicate is on one (or more) column, then try to evaluate performance with or without index.

Index Scan

Unlike Sequential Scan, Index scan does not fetch all records sequentially. Rather it uses different data structure (depending on the type of index) corresponding to the index involved in the query and locate required data (as per predicate) clause with very minimal scans. Then the entry found using the index scan points directly to data in heap area (as shown in the above figure), which is then fetched to check visibility as per the isolation level. So there are two steps for index scan:

  • Fetch data from index related data structure. It returns the TID of corresponding data in heap.
  • Then the corresponding heap page is directly accessed to get whole data. This additional step is required for the below reasons:
    • Query might have requested to fetch columns more than whatever available in the corresponding index.
    • Visibility information is not maintained along with index data. So in order to check the visibility of data as per isolation level, it needs to access heap data.

Now we may wonder why not always use Index Scan if it is so efficient.  So as we know everything comes with some cost. Here the cost involved is related to the type of I/O we are doing. In the case of Index Scan, Random I/O is involved as for each record found in index storage, it has to fetch corresponding data from HEAP storage whereas in case of Sequential Scan, Sequence I/O is involved which takes roughly just 25% of random I/O timing.

So Index scan should be chosen only if overall gain outperform the overhead incurred because of Random I/O cost.

As per the above table and data, the following query will result in an index scan as only one record is getting selected. So random I/O is less as well as searching of the corresponding record is quick.

postgres=# explain SELECT * FROM demotable WHERE num = 21000;

                                QUERY PLAN


 Index Scan using demoidx on demotable  (cost=0.42..8.44 rows=1 width=15)

   Index Cond: (num = '21000'::numeric)

(2 rows)

Index Only Scan

Index Only Scan is similar to Index Scan except for the second step i.e. as the name implies it only scans index data structure. There are two additional pre-condition in order to choose Index Only Scan compare to Index Scan:

  • Query should be fetching only key columns which are part of the index.
  • All tuples (records) on the selected heap page should be visible. As discussed in previous section index data structure does not maintain visibility information so in order to select data only from index we should avoid checking for visibility and this could happen if all data of that page are considered visible.

The following query will result in an index only scan. Even though this query is almost similar in terms of selecting number of records but as only key field (i.e. “num”) is getting selected, it will choose Index Only Scan.

postgres=# explain SELECT num FROM demotable WHERE num = 21000;

                                  QUERY PLAN


Index Only Scan using demoidx on demotable  (cost=0.42..8.44 rows=1 Width=11)

   Index Cond: (num = '21000'::numeric)

(2 rows)

Bitmap Scan

Bitmap scan is a mix of Index Scan and Sequential Scan. It tries to solve the disadvantage of Index scan but still keeps its full advantage. As discussed above for each data found in the index data structure, it needs to find corresponding data in heap page. So alternatively it needs to fetch index page once and then followed by heap page, which causes a lot of random I/O. So bitmap scan method leverage the benefit of index scan without random I/O. This works in two levels as below:

  • Bitmap Index Scan: First it fetches all index data from the index data structure and creates a bit map of all TID. For simple understanding, you can consider this bitmap contains a hash of all pages (hashed based on page no) and each page entry contains an array of all offset within that page.
  • Bitmap Heap Scan: As the name implies, it reads through bitmap of pages and then scans data from heap corresponding to stored page and offset. At the end, it checks for visibility and predicate etc and returns the tuple based on the outcome of all these checks.

Below query will result in Bitmap scan as it is not selecting very few records (i.e. too much for index scan) and at the same time not selecting a huge number of records (i.e. too little for a sequential scan).

postgres=# explain SELECT * FROM demotable WHERE num < 210;

                                  QUERY PLAN


 Bitmap Heap Scan on demotable  (cost=5883.50..14035.53 rows=213042 width=15)

   Recheck Cond: (num < '210'::numeric)

   ->  Bitmap Index Scan on demoidx  (cost=0.00..5830.24 rows=213042 width=0)

      Index Cond: (num < '210'::numeric)

(4 rows)

Now consider below query, which selects the same number of records but only key fields (i.e. only index columns). Since it selects only key, it does not need to refer heap pages for other parts of data and hence there is no random I/O involved. So this query will choose Index Only Scan instead of Bitmap Scan.

postgres=# explain SELECT num FROM demotable WHERE num < 210;

                                   QUERY PLAN


 Index Only Scan using demoidx on demotable  (cost=0.42..7784.87 rows=208254 width=11)

   Index Cond: (num < '210'::numeric)

(2 rows)

TID Scan

TID, as mentioned above, is 6 bytes number which consists of 4-byte page number and remaining 2 bytes tuple index inside the page. TID scan is a very specific kind of scan in PostgreSQL and gets selected only if there is TID in the query predicate. Consider below query demonstrating the TID Scan:

postgres=# select ctid from demotable where id=21000;




(1 row) 

postgres=# explain select * from demotable where ctid='(115,42)';

                        QUERY PLAN


 Tid Scan on demotable  (cost=0.00..4.01 rows=1 width=15)

   TID Cond: (ctid = '(115,42)'::tid)

(2 rows)

So here in the predicate, instead of giving an exact value of the column as condition, TID is provided. This is something similar to ROWID based search in Oracle.


All of the scan methods are widely used and famous. Also, these scan methods are available in almost all relational database. But there is another scan method recently in discussion in the PostgreSQL community and as well recently added in other relational databases. It is called “Loose IndexScan” in MySQL, “Index Skip Scan” in Oracle and “Jump Scan” in DB2.

This scan method is used for a specific scenario where in distinct value of leading key column of B-Tree index is selected. As part of this scan, it avoids traversing all equal key column value rather just traverse the first unique value and then jump to the next big one. 

This work is still in progress in PostgreSQL with the tentative name as “Index Skip Scan” and we may expect to see this in a future release.

by Kumar Rajeev Rastogi at August 28, 2019 09:45 AM

August 27, 2019

Federico Razzoli

READ ONLY transactions in MySQL

MySQL transactions can be read only. Here's how to use them, and why they are useful.

by Federico Razzoli at August 27, 2019 11:24 AM


Tips for Storing PostgreSQL Backups on Amazon AWS

Data is probably one of the most valuable assets in a company. Because of this we should always have a Disaster Recovery Plan (DRP) to prevent data loss in the event of an accident or hardware failure. 

A backup is the simplest form of DR, however it might not always be enough to guarantee an acceptable Recovery Point Objective (RPO). It is recommended that you have at least three backups stored in different physical places. 

Best practice dictates backup files should have one stored locally on the database server (for a faster recovery), another one in a centralized backup server, and the last one the cloud. 

For this blog, we’ll take a look at which options Amazon AWS provides for the storage of PostgreSQL backups in the cloud and we’ll show some examples on how to do it.

About Amazon AWS

Amazon AWS is one of the world’s most advanced cloud providers in terms of features and services, with millions of customers. If we want to run our PostgreSQL databases on Amazon AWS we have some options...

  • Amazon RDS: It allows us to create, manage and scale a PostgreSQL database (or different database technologies) in the cloud in an easy and fast way.

  • Amazon Aurora: It’s a PostgreSQL compatible database built for the cloud. According to the AWS web site, it’s three times faster than standard PostgreSQL databases.

  • Amazon EC2: It’s a web service that provides resizable compute capacity in the cloud. It provides you with complete control of your computing resources and allows you to set up and configure everything about your instances from your operating system up to your applications.

But, in fact, we don’t need to have our databases running on Amazon to store our backups here.

Storing Backups on Amazon AWS

There are different options to store our PostgreSQL backup on AWS. If we’re running our PostgreSQL database on AWS we have more options and (as we’re in the same network) it could also be faster. Let’s see how AWS can help us store our backups.


First, let’s prepare our environment to test the different AWS options. For our examples, we’ll use an On-prem PostgreSQL 11 server, running on CentOS 7. Here, we need to install the AWS CLI following the instructions from this site.

When we have our AWS CLI installed, we can test it from the command line:

[root@PG1bkp ~]# aws --version

aws-cli/1.16.225 Python/2.7.5 Linux/4.15.18-14-pve botocore/1.12.215

Now, the next step is to configure our new client running the aws command with the configure option.

[root@PG1bkp ~]# aws configure

AWS Access Key ID [None]: AKIA7TMEO21BEBR1A7HR

AWS Secret Access Key [None]: SxrCECrW/RGaKh2FTYTyca7SsQGNUW4uQ1JB8hRp

Default region name [None]: us-east-1

Default output format [None]:

To get this information, you can go to the IAM AWS Section and check the current user, or if you prefer, you can create a new one for this task.

After this, we’re ready to use the AWS CLI to access our Amazon AWS services.

Amazon S3

This is probably the most commonly used option to store backups in the cloud. Amazon S3 can store and retrieve any amount of data from anywhere on the Internet. It’s a simple storage service that offers an extremely durable, highly available, and infinitely scalable data storage infrastructure at low costs.

Amazon S3 provides a simple web service interface which you can use to store and retrieve any amount of data, at any time, from anywhere on the web, and (with the AWS CLI or AWS SDK) you can integrate it with different systems and programming languages.

How to use it

Amazon S3 uses Buckets. They are unique containers for everything that you store in Amazon S3. So, the first step is to access the Amazon S3 Management Console and create a new Bucket.

Create Bucket Amazon AWS

In the first step, we just need to add the Bucket name and the AWS Region.

Create Bucket Amazon AWS

Now, we can configure some details about our new Bucket, like versioning and logging.

Block Public Access Bucket Amazon AWS

And then, we can specify the permissions for this new Bucket.

S3 Buckets Amazon AWS

Now we have our Bucket created, let’s see how we can use it to store our PostgreSQL backups.

First, let’s test our client connecting it to S3.

[root@PG1bkp ~]# aws s3 ls

2019-08-23 19:29:02 s9stesting1

It works! With the previous command, we list the current Buckets created.

So, now, we can just upload the backup to the S3 service. For this, we can use aws sync or aws cp command. 

[root@PG1bkp ~]# aws s3 sync /root/backups/BACKUP-5/ s3://s9stesting1/backups/

upload: backups/BACKUP-5/cmon_backup.metadata to s3://s9stesting1/backups/cmon_backup.metadata

upload: backups/BACKUP-5/cmon_backup.log to s3://s9stesting1/backups/cmon_backup.log

upload: backups/BACKUP-5/base.tar.gz to s3://s9stesting1/backups/base.tar.gz

[root@PG1bkp ~]# 

[root@PG1bkp ~]# aws s3 cp /root/backups/BACKUP-6/pg_dump_2019-08-23_205919.sql.gz s3://s9stesting1/backups/

upload: backups/BACKUP-6/pg_dump_2019-08-23_205919.sql.gz to s3://s9stesting1/backups/pg_dump_2019-08-23_205919.sql.gz

[root@PG1bkp ~]# 

We can check the Bucket content from the AWS web site.

S3 Overview

Or even by using the AWS CLI.

[root@PG1bkp ~]# aws s3 ls s3://s9stesting1/backups/

2019-08-23 19:29:31          0

2019-08-23 20:58:36    2974633 base.tar.gz

2019-08-23 20:58:36       1742 cmon_backup.log

2019-08-23 20:58:35       2419 cmon_backup.metadata

2019-08-23 20:59:52       1028 pg_dump_2019-08-23_205919.sql.gz

For more information about AWS S3 CLI, you can check the official AWS documentation.

Amazon S3 Glacier

This is the lower-cost version of Amazon S3. The main difference between them is velocity and accessibility. You can use Amazon S3 Glacier if the cost of storage needs to stay low and you don’t require millisecond access to your data. Usage is another important difference between them.

How to use it

Instead Buckets, Amazon S3 Glacier uses Vaults. It’s a container for storing any object. So, the first step is to access the Amazon S3 Glacier Management Console and create a new Vault.

Create Vault S3 Glacier

Here, we need to add the Vault Name and the Region and, in the next step, we can enable the event notifications that uses the Amazon Simple Notification Service (Amazon SNS).

Now we have our Vault created, we can access it from the AWS CLI.

[root@PG1bkp ~]# aws glacier describe-vault --account-id - --vault-name s9stesting2


    "SizeInBytes": 0,

    "VaultARN": "arn:aws:glacier:us-east-1:984227183428:vaults/s9stesting2",

    "NumberOfArchives": 0,

    "CreationDate": "2019-08-23T21:08:07.943Z",

    "VaultName": "s9stesting2"


It’s working. So now, we can upload our backup here.

[root@PG1bkp ~]# aws glacier upload-archive --body /root/backups/BACKUP-6/pg_dump_2019-08-23_205919.sql.gz --account-id - --archive-description "Backup upload test" --vault-name s9stesting2


    "archiveId": "ddgCJi_qCJaIVinEW-xRl4I_0u2a8Ge5d2LHfoFBlO6SLMzG_0Cw6fm-OLJy4ZH_vkSh4NzFG1hRRZYDA-QBCEU4d8UleZNqsspF6MI1XtZFOo_bVcvIorLrXHgd3pQQmPbxI8okyg",

    "checksum": "258faaa90b5139cfdd2fb06cb904fe8b0c0f0f80cba9bb6f39f0d7dd2566a9aa",

    "location": "/984227183428/vaults/s9stesting2/archives/ddgCJi_qCJaIVinEW-xRl4I_0u2a8Ge5d2LHfoFBlO6SLMzG_0Cw6fm-OLJy4ZH_vkSh4NzFG1hRRZYDA-QBCEU4d8UleZNqsspF6MI1XtZFOo_bVcvIorLrXHgd3pQQmPbxI8okyg"


One important thing is the Vault status is updated about once per day, so we should wait to see the file uploaded.

[root@PG1bkp ~]# aws glacier describe-vault --account-id - --vault-name s9stesting2


    "SizeInBytes": 33796,

    "VaultARN": "arn:aws:glacier:us-east-1:984227183428:vaults/s9stesting2",

    "LastInventoryDate": "2019-08-24T06:37:02.598Z",

    "NumberOfArchives": 1,

    "CreationDate": "2019-08-23T21:08:07.943Z",

    "VaultName": "s9stesting2"


Here we have our file uploaded on our S3 Glacier Vault.

For more information about AWS Glacier CLI, you can check the official AWS documentation.


This backup store option is the more expensive and time consuming one, but it’s useful if you want to have full-control over the backup storage environment and wish to perform custom tasks on the backups (e.g. Backup Verification.)

Amazon EC2 (Elastic Compute Cloud) is a web service that provides resizable compute capacity in the cloud. It provides you with complete control of your computing resources and allows you to set up and configure everything about your instances from your operating system up to your applications. It also allows you to quickly scale capacity, both up and down, as your computing requirements change.

Amazon EC2 supports different operating systems like Amazon Linux, Ubuntu, Windows Server, Red Hat Enterprise Linux, SUSE Linux Enterprise Server, Fedora, Debian, CentOS, Gentoo Linux, Oracle Linux, and FreeBSD.

How to use it

Go to the Amazon EC2 section, and press on Launch Instance. In the first step, you must choose the EC2 instance operating system.

EC2 Choose an Amazon Machine Image (AMI)

In the next step, you must choose the resources for the new instance.

Choose an Instance Type AWS

Then, you can specify more detailed configuration like network, subnet, and more.

Configure Instance Details - AWS

Now, we can add more storage capacity on this new instance, and as a backup server, we should do it.

Add Storage AWS

When we finish the creation task, we can go to the Instances section to see our new EC2 instance.

Launch AWS EC2 Instance

When the instance is ready (Instance State running), you can store the backups here, for example, sending it via SSH or FTP using the Public DNS created by AWS. Let’s see an example with Rsync and another one with SCP Linux command.

[root@PostgreSQL1 ~]# rsync -avzP -e "ssh -i /home/user/key1.pem" /root/backups/BACKUP-11/base.tar.gz

sending incremental file list


      4,091,563 100%    2.18MB/s 0:00:01 (xfr#1, to-chk=0/1)

sent 3,735,675 bytes  received 35 bytes 574,724.62 bytes/sec

total size is 4,091,563  speedup is 1.10

[root@PostgreSQL1 ~]# 

[root@PostgreSQL1 ~]# scp -i /tmp/key1.pem /root/backups/BACKUP-12/pg_dump_2019-08-25_211903.sql.gz

pg_dump_2019-08-25_211903.sql.gz                                                                                                                                        100% 24KB 76.4KB/s 00:00

AWS Backup

AWS Backup is a centralized backup service that provides you with backup management capabilities, such as backup scheduling, retention management, and backup monitoring, as well as additional features, such as lifecycling backups to a low-cost storage tier, backup storage, and encryption that is independent of its source data, and backup access policies.

You can use AWS Backup to manage backups of EBS volumes, RDS databases, DynamoDB tables, EFS file systems, and Storage Gateway volumes.

How to use it

Go to the AWS Backup section on the AWS Management Console.

AWS Backup

Here you have different options, such as Schedule, Create or Restore a backup. Let’s see how to create a new backup.

Create On Demand Backup AWS Backup

In this step, we must choose the Resource Type that can be DynamoDB, RDS, EBS, EFS or Storage Gateway, and more details like expiration date, backup vault, and the IAM Role.

AWS Backup Jobs

Then, we can see the new job created in the AWS Backup Jobs section.


Now, we can mention this known option in all virtualization environments. The snapshot is a backup taken at a specific point in time, and AWS allows us to use it for the AWS products. Let’s an example of an RDS snapshot.

AWS DB Snapshot

We only need to choose the instance and add the snapshot name, and that’s it. We can see this and the previous snapshot in the RDS Snapshot section.

Amazon RDS Snapshots

Managing Your Backups with ClusterControl

ClusterControl is a comprehensive management system for open source databases that automates deployment and management functions, as well as health and performance monitoring. ClusterControl supports deployment, management, monitoring and scaling for different database technologies and environments, EC2 included. So, we can, for example, create our EC2 instance on AWS, and deploy/import our database service with ClusterControl.

ClusterControl Database Clusters

Creating a Backup

For this task, go to ClusterControl -> Select Cluster -> Backup -> Create Backup.

ClusterControl Create Backup

We can create a new backup or configure a scheduled one. For our example, we’ll create a single backup instantly.

ClusterControl Create Backup Details

We must choose one method, the server from which the backup will be taken, and where we want to store the backup. We can also upload our backup to the cloud (AWS, Google or Azure) by enabling the corresponding button.

ClusterControl Create Backup Settings

Then we specify the use of compression, the compression level, encryption and retention period for our backup.

ClusterControl Create Backup Cloud Settings

If we enabled the upload backup to the cloud option, we’ll see a section to specify the cloud provider (in this case AWS) and the credentials (ClusterControl -> Integrations -> Cloud Providers). For AWS, it uses the S3 service, so we must select a Bucket or even create a new one to store our backups.

ClusterControl Backup Overview

On the backup section, we can see the progress of the backup, and information like method, size, location, and more.


Amazon AWS allows us to store our PostgreSQL backups, whether we’re using it as a database cloud provider or not. To have an effective backup plan you should consider storing at least one database backup copy in the cloud to avoid data loss in the event of hardware failure in another backup store. The cloud lets you store as many backups as you want to store or pay for.

by Sebastian Insausti at August 27, 2019 09:45 AM

August 23, 2019


Comparing Galera Cluster Cloud Offerings: Part One Amazon AWS

Running a MySQL Galera Cluster (either the Percona, MariaDB, or Codership build) is, unfortunately, not a  supported (nor part of) the databases supported by Amazon RDS. Most of the databases supported by RDS use asynchronous replication, while Galera Cluster is a synchronous multi-master replication solution. Galera also requires InnoDB as its storage engine to function properly, and while you can use other storage engines such as MyISAM it is not advised that you use this storage engine because of the lack of transaction handling. 

Because of the lack of support natively in RDS, this blog will focus on the offerings available when choosing and hosting your Galera-based cluster using an AWS environment.

There are certainly many reasons why you would choose or not choose the AWS cloud platform, but for this particular topic we’re going to go over the advantages and benefits of what you can leverage rather than why you would choose the AWS Platform.

The Virtual Servers (Elastic Compute Instances)

As mentioned earlier, MySQL Galera is not part of RDS and InnoDB is a transactional storage engine for which you need the right resources for your application requirement. It must have the capacity to serve the demand of your client request traffic. At the time of this article, your sole choice for running Galera Cluster is by using EC2, Amazon's compute instance cloud offering. 

Because you have the advantage of running your system on a number of nodes on EC2 instances, running a Galera Cluster on EC2 verses on-prem doesn’t differ much. You can access the server remotely via SSH, install your desired software packages, and choose the kind of Galera Cluster build you like to utilize. 

Moreover, with EC2 this offering is more elastic and flexible, allowing you to deliver and offer a simpler,  granular setup. You can take advantage of the web services to automate or build a number of nodes if you need to scaleout your environment, or for example, automate the building of your staging or development environment. It also gives you an edge to quickly build your desired environment, choose and setup your desired OS, and pickup the right computing resources that fits your requirements (such as CPU, memory, and disk storage.) EC2 eliminates the time to wait for hardware, since you can do this on the fly. You can also leverage their AWS CLI tool to automate your Galera cluster setup.

Pricing for Amazon EC2 Instances

EC2 offers a number of selections which are very flexible for consumers who would like to host their Galera Cluster environment on AWS compute nodes. The AWS Free Tier includes 750 hours of Linux and Windows t2.micro instances, each month, for one year. You can stay within the Free Tier by using only EC2 Micro instances, but this might not be the best thing for production use. 

There are multiple types of EC2 instances for which you can deploy when provisioning your Galera nodes. Ideally, these r4/r5/x1 family (memory optimized) and c4/c5 family (compute optimized) are an ideal choice, and these prices differ depending on how large your server resource needs are and type of OS.

These are the types of paid instances you can choose...

On Demand 

Pay by compute capacity (per-hour or per-second), depends on the type of instances you run. For example, prices might differ when provisioning an Ubuntu instances vs RHEL instance aside from the type of instance. It has no long-term commitments or upfront payments needed. It also has the flexibility to increase or decrease your compute capacity. These instances are recommended for low cost and flexible environment needs like applications with short-term, spiky, or unpredictable workloads that cannot be interrupted, or applications being developed or tested on Amazon EC2 for the first time. Check it out here for more info.

Dedicated Hosts

If you are looking for compliance and regulatory requirements such as the need to acquire a dedicated server that runs on a dedicated hardware for use, this type of offer suits your needs. Dedicated Hosts can help you address compliance requirements and reduce costs by allowing you to use your existing server-bound software license, including Windows Server, SQL Server, SUSE Linux Enterprise Server, Red Hat Enterprise Linux, or other software licenses that are bound to VMs, sockets, or physical cores, subject to your license terms. It can be purchased On-Demand (hourly) or as a Reservation for up to 70% off the On-Demand price. Check it out here for more info.

Spot Instances

These instances allow you to request spare Amazon EC2 computing capacity for up to 90% off the On-Demand price. This is recommended for applications that have flexible start and end times, applications that are only feasible at very low compute prices, or users with urgent computing needs for large amounts of additional capacity. Check it out here for more info.

Reserved Instances

This type of payment offer provides you the option to grab up to a 75% discount and, depending on which instance you would like to reserve, you can acquire a capacity reservation giving you additional confidence in your ability to launch instances when you need them. This is recommended if your applications have steady state or predictable usage, applications that may require reserved capacity, or customers that can commit to using EC2 over a 1 or 3 year term to reduce their total computing costs. Check it out here for more info.

Pricing Note

One last thing with EC2, they also offer a per-second billing which also takes cost of unused minutes and seconds in an hour off of the bill. This is advantageous if you are scaling-out for a minimal amount of time, just to handle traffic request from a Galera node or in case you want to try and test on a specific node for just a limited time use.

Database Encryption on AWS

If you're concerned about the confidentiality of your data, or abiding the laws required for your security compliance and regulations, AWS offers data-at-rest encryption. If you're using MariaDB Cluster version 10.2+, they have built-in plugin support to interface with the Amazon Web Services (AWS) Key Management Service (KMS) API. This allows you to take advantage of AWS-KMS key management service to facilitate separation of responsibilities and remote logging & auditing of key access requests. Rather than storing the encryption key in a local file, this plugin keeps the master key in AWS KMS. 

When you first start MariaDB, the AWS KMS plugin will connect to the AWS Key Management Service and ask it to generate a new key. MariaDB will store that key on-disk in an encrypted form. The key stored on-disk cannot be used to decrypt the data; rather, on each startup, MariaDB connects to AWS KMS and has the service decrypt the locally-stored key(s). The decrypted key is stored in-memory as long as the MariaDB server process is running, and that in-memory decrypted key is used to encrypt the local data.

Alternatively, when deploying your EC2 instances, you can encrypt your data storage volume with EBS (Elastic Block Storage) or encrypt the instance itself. Encryption for EBS type volumes are all supported, though it might have an impact but the latency is very minimal or even not visible to the end users. For EC2 instance-type encryption, most of the large instances are supported. So if you're using compute or memory optimized nodes, you can leverage its encryption. 

Below are the list of supported instances types...

  • General purpose: A1, M3, M4, M5, M5a, M5ad, M5d, T2, T3, and T3a
  • Compute optimized: C3, C4, C5, C5d, and C5n
  • Memory optimized: cr1.8xlarge, R3, R4, R5, R5a, R5ad, R5d, u-6tb1.metal, u-9tb1.metal, u-12tb1.metal, X1, X1e, and z1d
  • Storage optimized: D2, h1.2xlarge, h1.4xlarge, I2, and I3
  • Accelerated computing: F1, G2, G3, P2, and P3

You can setup your AWS account to always enable encryption upon deployment of your EC2-type instances. This means that AWS will encrypt new EBS volumes on launch and encrypts new copies of unencrypted snapshots.

Multi-AZ/Multi-Region/Multi-Cloud Deployments

Unfortunately, as of this writing, there's no such direct support in the AWS Console (nor any of their AWS API) that supports Multi-AZ/-Region/-Cloud deployments for Galera node clusters. 

High Availability, Scalability, and Redundancy

To achieve a multi-AZ deployment, it's recommendable that you provision your galera nodes in different availability zones. This prevents the cluster from going down or a cluster malfunction due to lack of quorum. 

You can also setup an AWS Auto Scaling and create an auto scaling group to monitor and do status checks so your cluster will always have redundancy, scalable, and highly availability. Auto Scaling should solve your problem in the case that your node goes down for some unknown reason.

For multi-region or multi-cloud deployment, Galera has its own parameter called gmcast.segment for which you can set this upon server start. This parameter is designed to optimize the communication between the Galera nodes and minimize the amount of traffic sent between network segments including writeset relaying and IST and SST donor selection. 

This type of setup allows you to deploy multiple nodes in different regions for your Galera Cluster. Aside from that, you can also deploy your Galera nodes on a different vendor, for example, if it's hosted in Google Cloud and you want redundancy on Microsoft Azure. 

I would recommend you to check out our blog Multiple Data Center Setups Using Galera Cluster for MySQL or MariaDB and Zero Downtime Network Migration With MySQL Galera Cluster Using Relay Node to gather more information on how to implement these types of deployments.

Database Performance on AWS

Depending on your application demand, if your queries memory consuming the memory optimized instances are your ideal choice. If your application has higher transactions that require high-performance for web servers or batch processing, then choose compute optimized instances. If you want to learn more about optimizing your Galera Cluster, you can check out this blog How to Improve Performance of Galera Cluster for MySQL or MariaDB.

Database Backups on AWS

Creating backups can be difficult since there's no direct support within AWS that is specific for MySQL Galera technology. However, AWS provides you a disaster and recovery solution using EBS Snapshots. You can take snapshots of the EBS volumes attached to your instance, then either take a backup by schedule using CloudWatch or by using the Amazon Data Lifecycle Manager (Amazon DLM) to automate the snapshots. 

Take note that the snapshots taken are incremental backups, which means that only the blocks on the device that have changed after your most recent snapshot are saved. You can store these snapshots to AWS S3 to save storage costs. Alternatively,  you can use external tools like Percona Xtrabackup, and Mydumper (for logical backups) and store these to AWS EFS -> AWS S3 -> AWS Glacier

You can also setup Lifecycle Management in AWS if you need your backup data to be stored in a more cost efficient manner. If you have large files and are going to utilize the AWS EFS, you can leverage their AWS Backup solution as this is also a simple yet cost-effective solution.

On the other hand, you can also use external services (as well such as ClusterControl) which provides you both monitoring and backup solutions. Check this out if you want to know more.

Database Monitoring on AWS

AWS offers health checks and some status checks to provide you visibility into your Galera nodes. This is done through CloudWatch and CloudTrail

CloudTrail lets you enable and inspect the logs and perform audits based on what actions and traces have been made. 

CloudWatch lets you collect and track metrics, collect and monitor log files, and set custom alarms. You can set it up according to your custom needs and gain system-wide visibility into resource utilization, application performance, and operational health. CloudWatch comes with a free tier as long as you still fall within its limits (See the screenshot below.)

CloudWatch also comes with a price depending on the volume of metrics being distributed. Checkout its current pricing by checking here

Take note: there's a downside to using CloudWatch. It is not designed to cater to the database health, especially for monitoring MySQL Galera cluster nodes. Alternatively, you can use external tools that offer high-resolution graphs or charts that are useful in reporting and are easier to analyze when diagnosing a problematic node. 

For this you can use PMM by Percona, DataDog, Idera, VividCortex, or our very own ClusterControl (as monitoring is FREE with ClusterControl Community.) I would recommend that you use a monitoring tool that suits your needs based on your individual application requirements. It's very important that your monitoring tool be able to notify you aggressively or provide you integration for instant messaging systems such as Slack, PagerDuty or even send you SMS when escalating severe health status.

Database Security on AWS

Securing your EC2 instances is one of the most vital parts of deploying your database into the public cloud. You can setup a private subnet and setup the required security groups only favored to allow the port  or source IP depending on your setup. You can set your database nodes with a non-remote access and just set up a jump host or an Internet Gateway, if nodes requires to access the internet to access or update software packages. You can read our previous blog Deploying Secure Multicloud MySQL Replication on AWS and GCP with VPN on how we set this up. 

In addition to this, you can secure your data in-transit by using TLS/SSL connection or encrypt your data when it's at rest. If you're using ClusterControl, deploying a secure data in-transit is simple and easy. You can check out our blog SSL Key Management and Encryption of MySQL Data in Transit if you want to try out. For data at-rest, storing your data via S3 can be encrypted using AWS Server-Side Encryption or use AWS-KMS which I have discussed earlier. Check this external blog on how to setup and leverage a MariaDB Cluster using AWS-KMS so you can store your data securely at-rest.

Galera Cluster Troubleshooting on AWS

AWS CloudWatch can help especially when investigating and checking out the system metrics. You can check the network, CPU, memory, disk, and it's instance or compute usage and balance. This might not, however, meet your requirements when digging into a specific case. 

CloudTrail can perform solid traces of actions that has been governed based on your specific AWS account. This will help you determine if the occurrences aren't coming from MySQL Galera, but might be some bug or issues within the AWS environment (such as Hyper-V is having issues within the host machine where your instance, as the guest, is being hosted.)

If you're using ClusterControl, going to Logs -> System Logs, you'll be able to browse the captured error logs taken from the MySQL Galera node itself. Apart from this, ClusterControl provides real-time monitoring that would amplify your alarm and notification system in case an emergency or if your MySQL Galera node(s) is kaput.


AWS does not have pure support for a MySQL Galera Cluster setup, unlike AWS RDS which has MySQL compatibility. Because of this most of the recommendations or opinions running a Galera Cluster for production use within the AWS environment are based on experienced and well-tested environments that have been running for a very long time. 

MariaDB Cluster comes with a great productivity, as they constantly provide concise support for the AWS technology stack solution. In the upcoming release of MariaDB 10.5 version, they will offer a support for S3 Storage Engine, which may be worth the wait.

External tools can help you manage and control your MySQL Galera Cluster running on the AWS Cloud, so it's not a huge concern if you have some dilemmas and FUD on why you should run or shift to the AWS Cloud Platform.

AWS might not be the one-size-fits-all solution in some cases, but it provides a wide-array of solutions that you can customize and tailor it to fit your needs. 

In the next part of our blog, we'll look at another public cloud platform, particularly Google Cloud and see how we can leverage if we choose to run our Galera Cluster into their platform.

by Paul Namuag at August 23, 2019 08:56 PM

Building a MySQL or MariaDB Database Cold Standby on Amazon AWS

High Availability is a must these days as most organizations can’t allow itself to lose its data. High Availability, however, always comes with a price tag (which can vary a lot.) Any setups which require nearly-immediate action would typically require an expensive environment which would mirror precisely the production setup. But, there are other options that can be less expensive. These may not allow for an immediate switch to a disaster recovery cluster, but they will still allow for business continuity (and won’t drain the budget.) 

An example of this type of setup is a “cold-standby” DR environment. It allows you to reduce your expenses while still being able to spin up a new environment in an external location should the disaster strikes. In this blog post we will demonstrate how to create such a setup.

The Initial Setup

Let’s assume we have a fairly standard Master / Slave MySQL Replication setup in our own datacenter. It is highly available setup with ProxySQL and Keepalived for Virtual IP handling. The main risk is that the datacenter will become unavailable. It is a small DC, maybe it’s only one ISP with no BGP in place. And in this situation, we will assume that if it would take hours to bring back the database that it’s ok as long as it’s possible to bring it back.

ClusterControl Cluster Topology

To deploy this cluster we used ClusterControl, which you can download for free. For our DR environment we will use EC2 (but it could also be any other cloud provider.)

The Challenge

The main issue we have to deal with is how should we ensure we do have a fresh data to restore our database in the disaster recovery environment? Of course, ideally we would have a replication slave up and running in EC2... but then we have to pay for it. If we are tight on the budget, we could try to get around that with backups. This is not the perfect solution as, in the worst case scenario, we will never be able to recover all the data. 

By “the worst case scenario” we mean a situation in which we won’t have access to the original database servers. If we will be able to reach them, data would not have been lost.

The Solution

We are going to use ClusterControl to setup a backup schedule to reduce the chance that the data would be lost. We will also use the ClusterControl feature to upload backups to the cloud. If the datacenter will not be available, we can hope that the cloud provider we have chosen will be reachable.

Setting up the Backup Schedule in ClusterControl

First, we will have to configure ClusterControl with our cloud credentials.

ClusterControl Cloud Credentials

We can do this by using “Integrations” from the left side menu.

ClusterControl Add Cloud Credentials

You can pick Amazon Web Services, Google Cloud or Microsoft Azure as the cloud you want ClusterControl to upload backups to. We will go ahead with AWS where ClusterControl will use S3 to store backups.

Add Cloud Credentials in ClusterControl

We then need to pass key ID and key secret, pick the default region and pick a name for this set of credentials.

AWS Cloud Integration Successful - ClusterControl

Once this is done, we can see the credentials we just added listed in ClusterControl.

Now, we shall proceed with setting up backup schedule.

Backup Scheduling ClusterControl

ClusterControl allows you to either create backup immediately or schedule it. We’ll go with the second option. What we want is to create a following schedule:

  1. Full backup created once per day
  2. Incremental backups created every 10 minutes.

The idea here is like follows. Worst case scenario we will lose only 10 minutes of the traffic. If the datacenter will become unavailable from outside but it would work internally, we could try to avoid any data loss by waiting 10 minutes, copying the latest incremental backup on some laptop and then we can manually send it towards our DR database using even phone tethering and a cellular connection to go around ISP failure. If we won’t be able to get the data out of the old datacenter for some time, this is intended to minimize the amount of transactions we will have to manually merge into DR database.

Create Backup Schedule in ClusterControl

We start with full backup which will happen daily at 2:00 am. We will use the master to take the backup from, we will store it on controller under /root/backups/ directory. We will also enable “Upload Backup to the cloud” option.

Backup Settings in ClusterControl

Next, we want to make some changes in the default configuration. We decided to go with automatically selected failover host (in case our master would be unavailable, ClusterControl will use any other node which is available). We also wanted to enable encryption as we will be sending our backups over the network.

Cloud Settings for Backup Scheduling in ClusterControl

Then we have to pick the credentials, select existing S3 bucket or create a new one if needed.

Create Backup in ClusterControl

We are basically repeating the process for the incremental backup, this time we used the “Advanced” dialog to run the backups every 10 minutes.

The rest of the settings is similar, we also can reuse the S3 bucket.

ClusterControl Cluster Details

The backup schedule looks as above. We don’t have to start full backup manually, ClusterControl will run incremental backup as scheduled and if it detects there’s no full backup available, it will run a full backup instead of the incremental.

With such setup we can be safe to say that we can recover the data on any external system with 10 minute granularity.

Manual Backup Restore

If it happens that you will need to restore the backup on the disaster recovery instance, there are a couple of steps you have to take. We strongly recommend to test this process from time to time, ensuring it works correctly and you are proficient in executing it.

First, we have to install AWS command line tool on our target server:

root@vagrant:~# apt install python3-pip

root@vagrant:~# pip3 install awscli --upgrade --user

Then we have to configure it with proper credentials:

root@vagrant:~# ~/.local/bin/aws configure

AWS Access Key ID [None]: yourkeyID

AWS Secret Access Key [None]: yourkeySecret

Default region name [None]: us-west-1

Default output format [None]: json

We can now test if we have the access to the data in our S3 bucket:

root@vagrant:~# ~/.local/bin/aws s3 ls s3://drbackup/

                           PRE BACKUP-1/

                           PRE BACKUP-2/

                           PRE BACKUP-3/

                           PRE BACKUP-4/

                           PRE BACKUP-5/

                           PRE BACKUP-6/

                           PRE BACKUP-7/

Now, we have to download the data. We will create directory for the backups - remember, we have to download whole backup set - starting from a full backup to the last incremental we want to apply.

root@vagrant:~# mkdir backups

root@vagrant:~# cd backups/

Now there are two options. We can either download backups one by one:

root@vagrant:~# ~/.local/bin/aws s3 cp s3://drbackup/BACKUP-1/ BACKUP-1 --recursive

download: s3://drbackup/BACKUP-1/cmon_backup.metadata to BACKUP-1/cmon_backup.metadata

Completed 30.4 MiB/36.2 MiB (4.9 MiB/s) with 1 file(s) remaining

download: s3://drbackup/BACKUP-1/backup-full-2019-08-20_113009.xbstream.gz.aes256 to BACKUP-1/backup-full-2019-08-20_113009.xbstream.gz.aes256

root@vagrant:~# ~/.local/bin/aws s3 cp s3://drbackup/BACKUP-2/ BACKUP-2 --recursive

download: s3://drbackup/BACKUP-2/cmon_backup.metadata to BACKUP-2/cmon_backup.metadata

download: s3://drbackup/BACKUP-2/backup-incr-2019-08-20_114009.xbstream.gz.aes256 to BACKUP-2/backup-incr-2019-08-20_114009.xbstream.gz.aes256

We can also, especially if you have tight rotation schedule, sync all contents of the bucket with what we have locally on the server:

root@vagrant:~/backups# ~/.local/bin/aws s3 sync s3://drbackup/ .

download: s3://drbackup/BACKUP-2/cmon_backup.metadata to BACKUP-2/cmon_backup.metadata

download: s3://drbackup/BACKUP-4/cmon_backup.metadata to BACKUP-4/cmon_backup.metadata

download: s3://drbackup/BACKUP-3/cmon_backup.metadata to BACKUP-3/cmon_backup.metadata

download: s3://drbackup/BACKUP-6/cmon_backup.metadata to BACKUP-6/cmon_backup.metadata

download: s3://drbackup/BACKUP-5/cmon_backup.metadata to BACKUP-5/cmon_backup.metadata

download: s3://drbackup/BACKUP-7/cmon_backup.metadata to BACKUP-7/cmon_backup.metadata

download: s3://drbackup/BACKUP-3/backup-incr-2019-08-20_115005.xbstream.gz.aes256 to BACKUP-3/backup-incr-2019-08-20_115005.xbstream.gz.aes256

download: s3://drbackup/BACKUP-1/cmon_backup.metadata to BACKUP-1/cmon_backup.metadata

download: s3://drbackup/BACKUP-2/backup-incr-2019-08-20_114009.xbstream.gz.aes256 to BACKUP-2/backup-incr-2019-08-20_114009.xbstream.gz.aes256

download: s3://drbackup/BACKUP-7/backup-incr-2019-08-20_123008.xbstream.gz.aes256 to BACKUP-7/backup-incr-2019-08-20_123008.xbstream.gz.aes256

download: s3://drbackup/BACKUP-6/backup-incr-2019-08-20_122008.xbstream.gz.aes256 to BACKUP-6/backup-incr-2019-08-20_122008.xbstream.gz.aes256

download: s3://drbackup/BACKUP-5/backup-incr-2019-08-20_121007.xbstream.gz.aes256 to BACKUP-5/backup-incr-2019-08-20_121007.xbstream.gz.aes256

download: s3://drbackup/BACKUP-4/backup-incr-2019-08-20_120007.xbstream.gz.aes256 to BACKUP-4/backup-incr-2019-08-20_120007.xbstream.gz.aes256

download: s3://drbackup/BACKUP-1/backup-full-2019-08-20_113009.xbstream.gz.aes256 to BACKUP-1/backup-full-2019-08-20_113009.xbstream.gz.aes256

As you remember, the backups are encrypted. We have to have encryption key which is stored in ClusterControl. Make sure you have its copy stored somewhere safe, outside of the main datacenter. If you cannot reach it, you won’t be able to decrypt backups. The key can be found in ClusterControl configuration:

root@vagrant:~# grep backup_encryption_key /etc/cmon.d/cmon_1.cnf


It is encoded using base64 thus we have to decode it first and store it in the file before we can start decrypting the backup:

echo "aoxhIelVZr1dKv5zMbVPLxlLucuYpcVmSynaeIEeBnM=" | openssl enc -base64 -d > pass

Now we can reuse this file to decrypt backups. For now, let’s say we will do one full and two incremental backups. 

mkdir 1

mkdir 2

mkdir 3

cat BACKUP-1/backup-full-2019-08-20_113009.xbstream.gz.aes256 | openssl enc -d -aes-256-cbc -pass file:/root/backups/pass | zcat | xbstream -x -C /root/backups/1/

cat BACKUP-2/backup-incr-2019-08-20_114009.xbstream.gz.aes256 | openssl enc -d -aes-256-cbc -pass file:/root/backups/pass | zcat | xbstream -x -C /root/backups/2/

cat BACKUP-3/backup-incr-2019-08-20_115005.xbstream.gz.aes256 | openssl enc -d -aes-256-cbc -pass file:/root/backups/pass | zcat | xbstream -x -C /root/backups/3/

We have the data decrypted, now we have to proceed with setting up our MySQL server. Ideally, this should be exactly the same version as on the production systems. We will use Percona Server for MySQL:

cd ~

sudo dpkg -i percona-release_latest.generic_all.deb

apt-get update

apt-get install percona-server-5.7

Nothing complex, just regular installation. Once it’s up and ready we have to stop it and remove the contents of its data directory.

service mysql stop

rm -rf /var/lib/mysql/*

To restore the backup we will need Xtrabackup - a tool CC uses to create it (at least for Perona and Oracle MySQL, MariaDB uses MariaBackup). It is important that this tool is installed in the same version as on the production servers:

apt install percona-xtrabackup-24

That’s all we have to prepare. Now we can start restoring the backup. With incremental backups it is important to keep in mind that you have to prepare and apply them on top of the base backup. Base backup also has to be prepared. It is crucial to run the prepare with ‘--apply-log-only’ option to prevent xtrabackup from running the rollback phase. Otherwise you won’t be able to apply next incremental backup.

xtrabackup --prepare --apply-log-only --target-dir=/root/backups/1/

xtrabackup --prepare --apply-log-only --target-dir=/root/backups/1/ --incremental-dir=/root/backups/2/

xtrabackup --prepare --target-dir=/root/backups/1/ --incremental-dir=/root/backups/3/

In the last command we allowed xtrabackup to run the rollback of not completed transactions - we won’t be applying any more incremental backups afterwards. Now it is time to populate the data directory with the backup, start the MySQL and see if everything works as expected:

root@vagrant:~/backups# mv /root/backups/1/* /var/lib/mysql/

root@vagrant:~/backups# chown -R mysql.mysql /var/lib/mysql

root@vagrant:~/backups# service mysql start

root@vagrant:~/backups# mysql -ppass

mysql: [Warning] Using a password on the command line interface can be insecure.

Welcome to the MySQL monitor.  Commands end with ; or \g.

Your MySQL connection id is 6

Server version: 5.7.26-29 Percona Server (GPL), Release '29', Revision '11ad961'

Copyright (c) 2009-2019 Percona LLC and/or its affiliates

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective


Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show schemas;


| Database           |


| information_schema |

| mysql              |

| performance_schema |

| proxydemo          |

| sbtest             |

| sys                |


6 rows in set (0.00 sec)

mysql> select count(*) from sbtest.sbtest1;


| count(*) |


|    10506 |


1 row in set (0.01 sec)

As you can see, all is good. MySQL started correctly and we were able to access it (and the data is there!) We successfully managed to bring our database back up-and-running in a separate location. The total time required depends strictly on the size of the data - we had to download data from S3, decrypt and decompress it and finally prepare the backup. Still, this is a very cheap option (you have to pay for S3 data only) which gives you an option for business continuity should a disaster strikes.


by krzysztof at August 23, 2019 09:45 AM

August 22, 2019


The Easy Way to Deploy a MySQL Galera Cluster on AWS

ClusterControl 1.7.3 comes with a notable improvement in cloud integration. It is possible to deploy a MySQL and PostgreSQL replication cluster to the cloud, as well as automatically launch a cloud instance and scale out your database cluster by adding a new database node. 

This blog post showcases how to easily deploy a Galera Cluster using ClusterControl on AWS. This new feature is part of the ClusterControl Community Edition, which comes with free deployment and monitoring features. This means that you can take advantage of this feature for no cost!

ClusterControl Database Cluster Architecture

The following diagram summarizes our overall database clusters architecture.

ClusterControl Database Cluster Architecture

The ClusterControl server is located outside of the AWS infrastructure, allowing for fair visibility to our database cluster (located in Frankfurt: eu-central-1). The ClusterControl server MUST have a dedicated public IP address. This is because the IP address will be granted by ClusterControl on the database server and AWS security group. The Galera database version that we are going to deploy is MariaDB Cluster 10.3, using ClusterControl 1.7.3.

Preparing the AWS Environment

ClusterControl is able to deploy a database cluster on supported cloud platforms, namely AWS, Google Cloud Platform (GCP), and Microsoft Azure. The first thing we have to configure is to get the AWS access keys to allow ClusterControl to perform programmatic requests to AWS services. You could use the root account access key, but this is not the recommended way. It's better to create a dedicated Identity and Access Management (IAM) user solely for this purpose.

Login to your AWS Console -> My Security Credentials -> Users -> Add User. Specify the user and pick "Programmatic Access" as the Access Type:

Adding a User in AWS Console

In the next page, create a new user group by clicking the "Create group" button and give the group name "DatabaseAutomation". Assign the following access type:

  • AmazonEC2FullAccess
  • AmazonVPCFullAccess
  • AmazonS3FullAccess (only if you plan to store the database backup on AWS S3)

Tick the DatabaseAutomation checkbox and click "Add user to group":

Add User Permissions Amazon AWS

Optionally, you can assign tags on the next page. Otherwise, just proceed to create the user. You should get the two most important things, Access key ID and Secret access key.

Add User Confirmation AWS

Download the CSV file and store it somewhere safe. We are now good to automate the deployment on cloud.

Install ClusterControl on the respective server:

$ whoami


$ wget

$ chmod 755 install-cc

$ ./install-cc

Follow the installation instructions and go to and create the super admin user and password. 

To allow ClusterControl to perform automatic deployment on cloud, one has to create cloud credentials for the selected region with a valid AWS key ID and secret. Go to Sidebar -> Integrations -> Cloud Providers -> Add your first Cloud Credential -> Amazon Web Services and enter the required details and choose Frankfurt as the default region:

Add Cloud Credentials ClusterControl

This credential will be used by ClusterControl to automate the cluster deployment and management. At this point, we are ready to deploy our first cluster.

Database Cluster Deployment

Go to Deploy -> Deploy in the Cloud -> MySQL Galera -> MariaDB 10.3 -> Configure Cluster to proceed to the next page. 

Under Configure Cluster section, ensure the number of nodes is 3 and give a cluster name and MySQL root password:

Configure MySQL Galera Cluster in ClusterControl

Under Select Credential, choose a credential called "AWS Frankfurt" and proceed to the next page by clicking "Select Virtual Machine". Choose the preferred operating system and instance size. It's recommended to run our infrastructure inside a private cloud so we could get a dedicated internal IP address for our cloud instances and the hosts are not directly exposed to the public network. Click "Add New" button next to Virtual Private Cloud (VPC) field and give a subnet of to this network:


The VPC that we have created is a private cloud and does not have internet connectivity. In order for ClusterControl to be able to deploy and manage the hosts from outside AWS network, we have to allow internet connectivity to this VPC. To do this, we have to do the following:

  1. Create an internet gateway
  2. Add external routing to the route table
  3. Associate the subnet to the route table

To create an internet gateway, login to AWS Management Console -> VPC -> Internet Gateways -> Create internet gateway -> assign a name for this gateway. Then select the created gateway from the list and go to Actions -> Attach to VPC -> select the VPC for the dropdown list -> Attach. We have now attach an internet gateway to the private cloud. However, we need to configure the network to forward all external requests via this internet gateway. Hence, we have to add a default route to the route table. Go to VPC -> Route Tables -> select the route table -> Edit Routes and specify the destination network, and target (the created internet gateway ID) as below:

Edit Route Tables AWS Console

Then, we have to associate the DB subnet to this network so it assigns all instances created inside this network to the default route that we have created earlier, select the route table -> Edit Subnet Association -> assign the the DB subnet, as shown below:

Route Table Subnet AWS Console

The VPC is now ready to be used by ClusterControl for the deployment.

Once created, select the created VPC from the dropdown. For SSH Key, we will ask ClusterControl to auto generate it:

ClusterControl SSH Key Credentials

The generated SSH key will be located inside ClusterControl server under /var/lib/cmon/autogenerated_ssh_keys/s9s/ directory.

Click on "Deployment Summary". In this page, we have to assign a subnet from the VPC to the database cluster. Since this is a new VPC, it has no subnet and we have to create a new one. Click on "Add New Subnet" button and assign as the network for our database cluster:

Add Subnet ClusterControl

Finally, select the create subnet in the textbox and click on "Deploy Cluster":

Select Virtual Machine ClusterControl

You can monitor the job progress under Activity -> Jobs -> Create Cluster. ClusterControl will perform the necessary pre-installation steps like creating the cloud instances, security group, generating SSH key and so on, before the actual installation steps begin.

Once cluster is ready, you should see the following cluster in ClusterControl dashboard:

ClusterControl Dashboard AWS Deployment

Our cluster deployment is now complete. 

Post AWS Database Deployment

We can start loading our data into the cluster or create a new database for your application usage. To connect, simply instruct your applications or clients to connect to the private or public IP address of one of the database servers. You can get this information by going to Nodes page, as shown in the following screenshot:

Node Data ClusterControl AWS Deployment

If you like to access the database nodes directly, you can use ClusterControl web-SSH module at Node Actions -> SSH Console, which gives you a similar experience like connecting via SSH client.

To scale the cluster up by adding a database node, you can just go Cluster Actions (server stack icon) -> Add Node -> Add a DB node on a new cloud instance and you will be presented with the following dialog:

Adding a Node ClusterControl AWS Deployment

Just simply follow the deployment wizard and configure your new instance accordingly. Once the instance is created, ClusterControl will install, configure and join the node into the cluster automatically.

That's it for now, folks. Happy clustering in the cloud!

by ashraf at August 22, 2019 09:45 AM

August 21, 2019


Validating Your PostgreSQL Backups on Docker

Backups are the vital and important part of any disaster recovery plan, taking backups of the production database is also a basic and an important part of PostgreSQL administration. However, DBA’s don’t often validate that those backups are reliable.

Every organization takes PostgreSQL database backups in different form, some may take a file system (physical) backup of the PostgreSQL data directories (using tools like Barman, PGBackRest) while others may take only logical backups (using pg_dump), and even others may take block level snapshots using tools like EBS or VMWare snapshot.

In this blog, we will show you how to validate your PostgreSQL backup by restoring the backup onto a Docker container using the tool pgBackRest for taking and restoring the backup. We are assuming that you already have knowledge on how to use PostgreSQL, Docker and pgBackRest.

Why Should You Use Docker?

Docker makes automation simpler, it also eases the job of integrating our PostgreSQL Backup Validation task in a CI/CD tools like CircleCI, Travis, GitLab or Jenkins. Using Docker avoids the time and resource we have to spend on bringing the new environment for testing the backup.

Demo Setup








Posgresql-11 primary Instance.

Created user and database “pgbench“ and initialized with pgbench tables.

postgresql-11, pgbackrest-2.15

Running pgbench every 5mins to simulate the workload.


Test Machine - we will run our Docker validation on this host. 

docker-ce-18.06, pgbackrest-2.15




pgBackRest Repository Host


Running pgbackrest to take Incr backup every 4 hour

Diff backup every day

Full backup weekly 


For pgbackrest to work, I have setup passwordless SSH access between these nodes.

User “postgres” on node-1 and node-2 can login passwordless to user “pgbackrest” on node-3.

[vagrant@node-1 ~]$ sudo -u postgres ssh pgbackrest@node-3 uptime

 13:31:51 up  7:00, 1 user,  load average: 0.00, 0.01, 0.05

[vagrant@node-2 ~]$ sudo -u postgres ssh pgbackrest@node-3 uptime

 13:31:27 up  7:00, 1 user,  load average: 0.00, 0.01, 0.05

User “pgbackrest” on node-3 can login passwordless to user “postgres” on node-1 and node-2.

[vagrant@node-3 ~]$ sudo -u pgbackrest ssh postgres@node-1 uptime 

 13:32:29 up  7:02, 1 user,  load average: 1.18, 0.83, 0.58

[vagrant@node-3 ~]$ sudo -u pgbackrest ssh postgres@node-2 uptime 

 13:32:33 up  7:01, 1 user,  load average: 0.00, 0.01, 0.05

Overview of Backup Validation

Below is a brief overview of the steps we will be following for our PostgreSQL Backup Validation.

  1. Using the pgbackrest restore command we will fetch the latest backup from the pgBackRest Repository Host (node-3) to the Test Machine (node-2) directory /var/lib/pgsql/11/data
  2. During the docker run, we mount the host machine (node-2) directory /var/lib/pgsql on the docker container and start the postgres/postmaster daemon from the mounted directory. We would also expose the port 5432 from container to host machine port 15432. 
  3. Once the docker container started running, we will connect to the PostgreSQL database via node-2:15432 and verify all tables and rows are restored. We would also check the PostgreSQL logs to make sure there is no ERROR message during the recovery and the instance has also reached the consistent state.

Most of the backup validation steps will be performed on host node-2.

Building the Docker Image

On node-2, create Dockerfile and build the docker image “postgresql:11”. In the below Dockerfile, we will apply the following changes over centos:7 base image.

  1. Installing postgresql-11, pgbackrest and openssh-clients. Openssh-clients is needed for pgbackrest.
  2. Configuring pgbackrest - We need pgbackrest configuration in the image to test PITR, without pgbackrest configuration restore_command would fail. As part of pgbackrest configuration 
    1. We are adding the pgbackrest repository host ip ( in the config file /etc/pgbackrest.conf
    2. We also need password less SSH access between the docker container and pgbackrest repository host. For this, I am copying SSH_PRIVATE_KEY which I have already generated and I have also added it’s public key to the pgbackrest repository host ( pgbackrest@node-3 ) .
  3. VOLUME ["${PGHOME_DIR}"] - Defines the container directory /var/lib/pgsql as a mount point. While running docker run command we will specify node-2 host directory to this mount point.
  4. USER postgres - Any command, runs on the container will be executed as postgres user.
$ cat Dockerfile
FROM  centos:7

ARG PGHOME_DIR=/var/lib/pgsql

## Adding Postgresql Repo for CentOS7
RUN yum -y install

## Installing PostgreSQL
RUN yum -y install postgresql11 postgresql11-server postgresql11-devel postgresql11-contrib postgresql11-libs pgbackrest openssh-clients

## Adding configuration for pgbackrest, needed for WAL recovery and replication.
RUN echo -ne "[global]\nrepo1-host=${PGBACKREST_REPO_HOST}\n\n[pgbench]\npg1-path=/var/lib/pgsql/11/data\n" > /etc/pgbackrest.conf

## Adding Private Key to the Docker. Docker container would use this private key for pgbackrest wal recovery.
RUN mkdir -p ${PGHOME_DIR}/.ssh &&  chmod 0750 ${PGHOME_DIR}/.ssh
COPY --chown=postgres:postgres ./SSH_PRIVATE_KEY  ${PGHOME_DIR}/.ssh/id_rsa
RUN chmod 0600 ${PGHOME_DIR}/.ssh/id_rsa
RUN echo -ne "Host ${PGBACKREST_REPO_HOST}\n\tStrictHostKeyChecking no\n" >> ${PGHOME_DIR}/.ssh/config

## Making "/var/lib/pgsql" as a mountable directory in the container

## Setting postgres as the default user for any remaining commands
USER postgres

We now have two files, Dockerfile used by docker build and SSH_PRIVATE_KEY which we will be copied to the docker image. 

$ ls


Run the below command on node-2 to build our docker image. I have mentioned the pgbackrest repository host IP in the command and this IP will be used in pgbackrest parameter “repo-host”. 

$ docker build --no-cache -t postgresql:11 --build-arg PGBACKREST_REPO_HOST= .

Sending build context to Docker daemon  230.4kB

Step 1/12 : FROM  centos:7

 ---> 9f38484d220f


 ---> Running in 8b7b36c6f151

Removing intermediate container 8b7b36c6f151

 ---> 31510e46e286

Step 3/12 : ARG PGHOME_DIR=/var/lib/pgsql


Step 4/12 : RUN yum -y install



Step 12/12 : USER postgres

 ---> Running in c91abcf46440

Removing intermediate container c91abcf46440

 ---> bebce78df5ae

Successfully built bebce78df5ae

Successfully tagged postgresql:11

Make sure the image is successfully built, and check “postgresql:11” image is created recently as shown below.

$ docker image ls postgresql:11


postgresql          11 2e03ed2a5946        3 minutes ago 482MB

Restoring the PostgreSQL Backup

We will now restore our PostgreSQL backup maintained in pgbackrest backup repository host node-3. 

Below is the pgbackrest configuration file present on host node-2 and I have mentioned node-3 as pgbackrest repository host. Directory mentioned in the param pg1-path is where the PostgreSQL data directory would get restored.

[vagrant@node-2 ~]$ cat /etc/pgbackrest.conf 






Using below pgbackrest restore command, postgresql data directory will be restored at node-2:/var/lib/pgsql/11/data

To validate PITR with the pgbackrest backup I have set --type=time --target='2019-07-30 06:24:50.241352+00', so that the WAL recovery stops before the mentioned time. 

[vagrant@node-2 ~]$ sudo -u postgres bash -c "/usr/bin/pgbackrest --type=time --target='2019-07-30 06:24:50.241352+00' --target-action=promote --recovery-option='standby_mode=on' --stanza=pgbench restore"

Above command may take time depending on the backup size and network bandwidth. Once restored, verify the size of the data directory and also check recovery.conf.

[vagrant@node-2 ~]$ sudo -u postgres du -sh /var/lib/pgsql/11/data 

2.1G    /var/lib/pgsql/11/data

[vagrant@node-2 ~]$ sudo -u postgres cat /var/lib/pgsql/11/data/recovery.conf

standby_mode = 'on'

restore_command = '/usr/bin/pgbackrest --stanza=pgbench archive-get %f "%p"'

recovery_target_time = '2019-07-30 06:24:50.241352+00'

Disable archive mode for PostgreSQL docker container.

[vagrant@node-2 ~]$ sudo -u postgres bash -c "echo 'archive_mode = off' >> /var/lib/pgsql/11/data/"

Start the docker container with the image “postgresql:11”. In the command we are 

  1. Setting container name as “pgbench”

  2. Mounting docker host(node-2) directory /var/lib/psql to the docker container directory /var/lib/psql 

  3. Exposing container port 5432 to port 15432 on node-2.

  4. Starting the postgres daemon using the command /usr/pgsql-11/bin/postmaster -D /var/lib/pgsql/11/data

[vagrant@node-2 ~]$ docker run --rm --name "pgbench" -v /var/lib/pgsql:/var/lib/pgsql -p 15432:5432 -d postgresql:11  /usr/pgsql-11/bin/postmaster -D /var/lib/pgsql/11/data


Verify “pgbench” container is created and running.

[vagrant@node-2 ~]$ docker ps -f name=pgbench

CONTAINER ID        IMAGE COMMAND                  CREATED STATUS PORTS                     NAMES

e54f2f65afa1        postgresql:11 "/usr/pgsql-11/bin/p…"   34 seconds ago Up 33 seconds>5432/tcp   pgbench

Validating PostgreSQL 

Since the host directory /var/lib/pgsql is shared with docker container, the logs generated by the PostgreSQL service is also visible from node-2. Verify today’s log to make sure PostgreSQL has started fine without any ERROR and make sure below log lines are present. 

[vagrant@node-2 ~]$ sudo -u postgres tailf /var/lib/pgsql/11/data/log/postgresql-Tue.csv


2019-07-30 06:38:34.633 UTC,,,7,,5d3fe5e9.7,5,,2019-07-30 06:38:33 UTC,1/0,0,LOG,00000,"consistent recovery state reached at E/CE000210",,,,,,,,,""

2019-07-30 06:38:34.633 UTC,,,1,,5d3fe5e9.1,2,,2019-07-30 06:38:33 UTC,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,,""

2019-07-30 06:38:35.236 UTC,,,7,,5d3fe5e9.7,6,,2019-07-30 06:38:33 UTC,1/0,0,LOG,00000,"restored log file ""000000010000000E000000CF"" from archive",,,,,,,,,""

2019-07-30 06:38:36.210 UTC,,,7,,5d3fe5e9.7,7,,2019-07-30 06:38:33 UTC,1/0,0,LOG,00000,"restored log file ""000000010000000E000000D0"" from archive",,,,,,,,,""


2019-07-30 06:39:57.221 UTC,,,7,,5d3fe5e9.7,37,,2019-07-30 06:38:33 UTC,1/0,0,LOG,00000,"recovery stopping before commit of transaction 52181192, time 2019-07-30 06:25:01.576689+00",,,,,,,,,""


2019-07-30 06:40:00.682 UTC,,,7,,5d3fe5e9.7,47,,2019-07-30 06:38:33 UTC,1/0,0,LOG,00000,"archive recovery complete",,,,,,,,,""

Message "consistent recovery state reached at E/CE000210", indicates that with the pgbackrest backup data directory we were able to reach a consistent state.

Message "archive recovery complete", indicates that we are able to replay the WAL file backed-up by pgbackrest and able to recover without any issue.

Connect to postgresql instance via local port 15432 and verify tables and row counts.

[vagrant@node-2 ~]$ sudo -iu postgres /usr/pgsql-11/bin/psql  -p 15432 -h localhost -U pgbench 

Password for user pgbench: 

psql (11.4)

Type "help" for help.

pgbench=> \dt

              List of relations

 Schema |       Name | Type  | Owner  


 public | pgbench_accounts | table | pgbench

 public | pgbench_branches | table | pgbench

 public | pgbench_history  | table | pgbench

 public | pgbench_tellers  | table | pgbench

(4 rows)

pgbench=> select * from pgbench_history limit 1;

 tid | bid |   aid | delta |           mtime | filler 


  98 |   3 | 2584617 |   507 | 2019-07-30 06:20:01.412226 | 

(1 row)

pgbench=> select max(mtime) from pgbench_history ;



 2019-07-30 06:22:01.402245

(1 row)

pgbench=> select count(1) from pgbench_history ;




(1 row)

pgbench=> select count(1) from pgbench_accounts ;




(1 row)

We have now restored our PostgreSQL backup on a docker container and also verified PITR. Once validating the backup we can stop the container and remove the data directory.

[vagrant@node-2 ~]$ docker stop pgbench


[vagrant@node-2 ~]$ sudo -u postgres bash -c "rm -rf /var/lib/pgsql/11/data && mkdir -p /var/lib/pgsql/11/data && chmod 0700 /var/lib/pgsql/11/data"


In this blog, I demonstrated the backup validation using a small database on a small VirtualBox VM. Because of this, the backup validation was completed in just a few minutes. It’s important to note that in production you will need to choose a proper VM with enough Memory, CPU, and Disk to allow the backup validation to complete successfully. You can also automate the whole validation process in a bash script or even by integrating with a CI/CD pipeline so that you can regularly validate our PostgreSQL backups.

by Ashokraj M at August 21, 2019 09:45 AM

August 20, 2019

Oli Sennhauser

FromDual Recovery Manager (rman) with progress indicator

Since version 2.1.0 the FromDual Recovery Manager (rman) for MariaDB and MySQL has also a progress indicator for the restore of logical backups made with mysqldump. This feature was implemented because of the numerous requests of FromDual rman users who were not happy with the default mysql behavior.

You can check your current rman version as follows:

# ./bin/rman --version

As with all FromDual tools you get a command overview with examples with the --help option:

# ./bin/rman --help | less
  progress      Print progress information to STDOUT.

A backup for example is done as follows:

# ./bin/bman --target=brman:secret@ --type=full --mode=logical --policy=daily --instance-name=qamariadb102

The Recovery Manager progress indicator logs to STDOUT:

# ./bin/rman --target=brman:secret@ --type=full --mode=logical --policy=daily --instance-name=qamariadb102 --progress --backup-name=bck_qamariadb102_full_2019-08-20_21:15:23.sql

Reading configuration from /etc/mysql/my.cnf
No rman configuration file.

Command line: /home/mysql/product/brman-2.2.1/bin/rman.php --target=brman:******@ --type=full --mode=logical --policy=daily --instance-name=qamariadb102 --progress --backup-name=bck_qamariadb102_full_2019-08-20_21:15:23.sql

Options from command line
  target                = brman:******@
  type                  = full
  mode                  = logical
  progress              = 
  backup-name           = bck_qamariadb102_full_2019-08-20_21:15:23.sql
  policy                = daily
  instance-name         = qamariadb102

Resulting options
  target                = brman:******@
  type                  = full
  mode                  = logical
  progress              = 
  backup-name           = bck_qamariadb102_full_2019-08-20_21:15:23.sql
  policy                = daily
  instance-name         = qamariadb102
  log                   = ./rman.log
  datadir               = /var/lib/mysql
  owner                 = mysql
  backupdir             = /home/mysql/bck
  binlog-policy         = binlog

Logging to   ./rman.log
Backupdir is /home/mysql/bck
Version is   2.2.1

Start restore at 2019-08-20 21:18:46
  mysql --user=brman --password=****** --host= --port=3308
  From backup file: /home/mysql/bck/daily/bck_qamariadb102_full_2019-08-20_21:15:23.sql.gz

  Restore progress:
. schema brman_catalog
. . table                         backup_details         0 statements,            0 rows,                  0 bytes
. . table                                backups         0 statements,            0 rows,                  0 bytes
. . table                            binary_logs         0 statements,            0 rows,                  0 bytes
. . table                                  files         0 statements,            0 rows,                  0 bytes
. . table                               metadata         1 statements,            2 rows,                 78 bytes
. schema foodmart
. schema fromdual_a
. . table                                  audit         1 statements,            3 rows,                171 bytes
. . table                                     c1         1 statements,            3 rows,                 42 bytes
. . table                                     c2         1 statements,            3 rows,                 42 bytes
. . table                                  child         1 statements,            3 rows,                177 bytes
. . table                                 parent         1 statements,            3 rows,                175 bytes
. schema fromdual_b
. . table                                  audit         1 statements,            3 rows,                171 bytes
. . table                                     c1         1 statements,            3 rows,                 42 bytes
. . table                                     c2         1 statements,            3 rows,                 42 bytes
. . table                                  child         1 statements,            3 rows,                177 bytes
. . table                              employees         0 statements,            0 rows,                  0 bytes
. . table                                 parent         1 statements,            3 rows,                175 bytes
. schema fromdual_c
. . table                                  audit         1 statements,            3 rows,                171 bytes
. . table                                     c1         1 statements,            3 rows,                 42 bytes
. . table                                     c2         1 statements,            3 rows,                 42 bytes
. . table                                  child         1 statements,            3 rows,                177 bytes
. . table                                 parent         1 statements,            3 rows,                175 bytes
. schema mysql
. . table                           column_stats         0 statements,            0 rows,                  0 bytes
. . table                           columns_priv         0 statements,            0 rows,                  0 bytes
. . table                                     db         1 statements,            2 rows,                267 bytes
. . table                                  event         0 statements,            0 rows,                  0 bytes
. . table                                   func         0 statements,            0 rows,                  0 bytes
. . table                         gtid_slave_pos         0 statements,            0 rows,                  0 bytes
. . table                          help_category         1 statements,           39 rows,               1202 bytes
. . table                           help_keyword         1 statements,          464 rows,               7649 bytes
. . table                          help_relation         1 statements,         1028 rows,               9861 bytes
. . table                             help_topic         1 statements,          527 rows,             419915 bytes
. . table                                   host         0 statements,            0 rows,                  0 bytes
. . table                            index_stats         0 statements,            0 rows,                  0 bytes
. . table                     innodb_index_stats         1 statements,          207 rows,              20611 bytes
. . table                     innodb_table_stats         1 statements,           29 rows,               1622 bytes
. . table                                 plugin         0 statements,            0 rows,                  0 bytes
. . table                                   proc         1 statements,            2 rows,               2220 bytes
. . table                             procs_priv         0 statements,            0 rows,                  0 bytes
. . table                           proxies_priv         1 statements,            2 rows,                140 bytes
. . table                          roles_mapping         0 statements,            0 rows,                  0 bytes
. . table                                servers         0 statements,            0 rows,                  0 bytes
. . table                            table_stats         0 statements,            0 rows,                  0 bytes
. . table                            tables_priv         0 statements,            0 rows,                  0 bytes
. . table                              time_zone         0 statements,            0 rows,                  0 bytes
. . table                  time_zone_leap_second         0 statements,            0 rows,                  0 bytes
. . table                         time_zone_name         0 statements,            0 rows,                  0 bytes
. . table                   time_zone_transition         0 statements,            0 rows,                  0 bytes
. . table              time_zone_transition_type         0 statements,            0 rows,                  0 bytes
. . table                                   user         1 statements,            5 rows,               1042 bytes
. . table                            general_log         0 statements,            0 rows,                  0 bytes
. . table                               slow_log         0 statements,            0 rows,                  0 bytes
. schema test
. . table                                   test       347 statements,         4621 rows,             286528 bytes
. schema test_catalog
. schema world
. . table                                   City         1 statements,         4079 rows,             177139 bytes
. . table                                Country         1 statements,          239 rows,              36481 bytes
. . table                        CountryLanguage         1 statements,          984 rows,              26160 bytes
. schema brman_catalog
. schema foodmart
. schema fromdual_a
. schema fromdual_b
. schema fromdual_c
. schema mysql
. schema test
. schema test_catalog
. schema world
  Schemas: 9, Tables: 55, Statements: 376, Rows: 12275, Bytes: 992736
  WARNING: Progress numbers for Total Byte Counter may be different of dump file size.
  Restore time was: 0d 0h 1' 28"
End restore at 2019-08-20 21:20:14 (rc=0)

The overhead of FromDual Recovery Manager progress indicator for MariaDB and MySQL is not significant. We measured less than 1% longer recovery times with the progress indicator compared to pure mysql restoration.

by Shinguz at August 20, 2019 07:44 PM


Running PostgreSQL Using Amazon RDS

Cloud computing is now commonplace in most companies. It allows for on demand availability of compute power, database, storage, applications, and other resources via the internet. 

The main advantages behind the cloud are that you don’t need to spend a lot of money to buy powerful servers or build your own data centers. But this is not the only advantage, when you need to scale you don’t need to buy a new server you can just add resources with a few clicks. In a similar way, we can also decrease the number of resources when they aren’t needed to reduce costs.

A cloud database is a database running on a cloud provider. It allows us to store, manage, retrieve, and manipulate our data via a cloud platform; accessible over the internet. 

In this blog, we’ll look at the different types of cloud offerings and then focus in on running a  PostgreSQL  database using Amazon RDS

Cloud Service Offerings & Options

Cloud Service Offerings & Options

As we can see in the image above, there are several different kinds of cloud services depending on the level of access needed.

  • On-prem: It’s installed and runs on computers on the premises of the person or organization using the system. In fact, this is not a cloud service, but it’s useful to see the difference.
  • IaaS: It’s an online service that provides high-level APIs used to access various low-level details of underlying network infrastructure like physical computing resources, location, data partitioning, scaling, security, backup, etc.
  • PaaS: It provides a platform allowing customers to develop, run, and manage applications without the complexity of building and maintaining the infrastructure associated with developing and launching an app.
  • SaaS: It’s accessed by users over the Internet using a client (browser). It doesn’t require any installation on the client side.

If we’re talking about PostgreSQL, there are cloud providers that offer PostgreSQL in the cloud; in different flavors and using different methods. As we mentioned above, we’re going to focus on Amazon RDS.

What is Amazon RDS (Relational Database Service)?

According to the Amazon web site, they offer over 165 fully featured services, including over 40 services that aren’t available anywhere else. So, AWS is probably the world’s most advanced cloud provider in terms of features and services with millions of customers.

Amazon RDS allows us to create, manage and scale a relational database in the cloud in an easy and fast way, and it’s available on different database types like Amazon Aurora, PostgreSQL, MySQL and more. AWS provides a tool called AWS Database Migration Service to migrate an existing database to Amazon RDS.

Benefits of Amazon RDS

  • Easy to use: We can use the Amazon RDS Management Console, the AWS RDS Command-Line Interface, or API calls to access the relational database. We don’t need infrastructure provisioning or installing and maintaining database software.
  • Scalable: We can scale our database's compute and storage resources with only a few clicks. Many Amazon RDS engine types allow us to launch one or more Read Replicas to offload read traffic from our primary database instance.
  • Availability: When we provision a Multi-AZ DB Instance, Amazon RDS synchronously replicates the data to a standby instance in a different Availability Zone (AZ). Amazon RDS has many other features that enhance reliability for critical production databases, including automated backups, database snapshots, and automatic host replacement.
  • Performance: We can choose between two SSD-backed storage options: one optimized for high-performance OLTP applications, and the other for cost-effective general-purpose use. 
  • Secure: Amazon RDS lets us run the database instances in Amazon VPC (Virtual Private Cloud), which allows us to isolate our database instances and to connect to our existing IT infrastructure through a VPN. Also, many Amazon RDS engine types offer encryption at rest and encryption in transit.

While this is not officially mentioned on the AWS web site, but if we consider DBaaS (Database as a Service) as a database service which is managed and deployed in the outside provider’s infrastructure (according to our list in the section above) we can say that Amazon RDS is a “kind-of” DBaaS, somewhere between a PaaS and a SaaS service.

A Guide to PostgreSQL on Amazon RDS

First, we need to login the AWS console. (If you don’t have an AWS account, you can create a free one here.)

Then, go to Services -> Database -> RDS and Create database section.

Create Database on Amazon RDS

Now, we must choose if we want to follow the normal or easy creation, the engine, and version that we’ll deploy. 

Choose a Database to Deploy on Amazon RDS

If we select the easy creation, we only need to add the database instance name, user and password.

Database Configuration Amazon RDS
In this case, we’ll choose PostgreSQL 10 and the normal creation to be able to see the creation details, so this will require a bit more work than the easy one.

In this case, we’ll choose PostgreSQL 10 and the normal creation to be able to see the creation details, so this will require a bit more work than the easy one.

In the normal creation, first, we’ll choose a template, it could be Production, Dev/Test or Free tier option.

Database Tiers Amazon RDS

In the next step, we’ll add the database instance name, user, and password.

Database Config Details Amazon RDS

The next step is the database instance size where we have several options in three different categories: Standard classes, Memory Optimized classes, and Burstable classes.

Database Instance Size Amazon RDS

In the storage section, we can select the disk type, size, and storage behavior.

Database Storage Options Amazon RDS

One of the most important AWS features is the Multi-AZ deployment, where we can create a standby instance in a different availability zone to provide redundancy.

Availability & Durability Options Amazon RDS

About the connectivity, we can choose a Virtual Private Cloud (VPC) to connect the new database. Here, we can select additional options like public access, availability zone, and database port.

Connectivity Options Amazon rDS

Then, we have additional configuration where we can specify the database name, database authentication, backups details, encryption, monitoring, logging, and maintenance service (auto minor upgrades).

Finally, we’ll have the option to check the Estimated Monthly Costs.


Estimated Costs Screen Amazon RDS

We can see more details about the costs here, or even use the AWS Monthly Calculator.

After adding all this information, we must wait until the creation process finishes.

Amazon RDS Creation Process

When the Status changes to “Available”, our database instance is ready to use.

If we press on the DB identifier (“pg1” in our example), we’ll access our database section, where we can see a summary with information like CPU usage, connections, status, and type. Here, we can also modify our instance configuration or perform different actions like reboot, delete, create read replica, take snapshots, and even more.

Database Identifier Amazon RDS

In the same place, we can also see more detailed information in different sections. 

Connectivity and Security

We can configure the security rules and check the network information.

Connectivity & Security Amazon RDS


We have some metrics to check our database status.

Database Monitoring CloudWatch Amazon RDS

Logs and Events 

We have alarms, events, and logs from our database.

Amazon RDS CloudWatch Alarms


We can see our instance configuration, but also a list of recommendations to improve it, like enable enhanced monitoring.

Instance Details Amazon RDS

Maintenance and Backups 

We can see information about the maintenance tasks, backups, and snapshot process.

Maintenance and Backups Amazon RDS

Now, we should be able to access our database by using the Endpoint name assigned by AWS (“” in our example). For this, make sure you allowed access from the security group section and you enabled the public access from the instance configuration (Public accessibility: Yes). In our example, we’re allowing all the traffic from all the sources, but for security reason, you’ll probably want to limit the access from one or a few sources.

Edit Inbound Rules Amazon RDS

Now, let’s try to connect to our Amazon RDS instance from the command line:

[root@local ~]# psql -U postgres -h

Password for user postgres:

psql (11.5, server 10.6)

SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)

Type "help" for help.

postgres=> \l

                                  List of databases

   Name    | Owner   | Encoding |   Collate | Ctype    | Access privileges


 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |

 rdsadmin  | rdsadmin | UTF8     | en_US.UTF-8 | en_US.UTF-8 | rdsadmin=CTc/rdsadmin

 template0 | rdsadmin | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/rdsadmin          +

           |          | |             | | rdsadmin=CTc/rdsadmin

 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +

           |          | |             | | postgres=CTc/postgres

(4 rows)

postgres=> select version();



 PostgreSQL 10.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit

(1 row)

In the same way, we can connect to it from our preferred GUI (if we have one).

pgAdmin Amazon RDS

A Note on Amazon Aurora

Amazon Aurora is a MySQL and PostgreSQL compatible relational database built for the cloud. According to the AWS web site, Amazon Aurora is three times faster than standard PostgreSQL databases and provides the security, availability, and reliability of commercial databases at 1/10th the cost. Regardless of the claim, this is not a true PostgreSQL instance, just a compatible engine. But, if you are considering running PostgreSQL on Amazon, you should definitely consider this as a possible alternative. You can learn more about Aurora and how it relates to PostgreSQL here.


The cloud is everywhere. We can use it for both small and huge projects alike. In this blog, we looked at the different kinds of clouds and shared how to run PostgreSQL on Amazon RDS. Let us know in the comments below you thoughts.


by Sebastian Insausti at August 20, 2019 09:45 AM

August 19, 2019


A Guide to MySQL Galera Cluster Restoration Using mysqldump

Using logical backup programs like mysqldump is a common practice performed by MySQL admins for backup and restore (the process of moving a database from one server to another) and is also the most efficient way to perform a  database mass modification using a single text file. 

When doing this for MySQL Galera Cluster, however, the same rules apply except for the fact that it takes a lot of time to restore a dump file into a running Galera Cluster. In this blog, we will look at the best way to restore a Galera Cluster using mysqldump.

Galera Cluster Restoration Performance

One of the most common misconceptions about Galera Cluster is that restoring a database into a three-node cluster is faster than doing it to a standalone node. This is definitely incorrect when talking about a stateful service, like datastore and filesystem. To keep in sync, every member has to keep up with whatever changes happened with the other members. This is where locking, certifying, applying, rollbacking, committing are forced to be involved into the picture to ensure no data loss along the process, because for a database service, data loss is a big no-no.

Let's make some comparisons to see and understand the impact. Suppose we have a 2 GB of dump file for database 'sbtest'. We usually would load the data into the cluster via two endpoints:

  • load balancer host 
  • one of the database hosts

As for control measurement, we are also going to restore on a standalone node. Variable pxc_strict_mode is set to PERMISSIVE on all Galera nodes.

The backup was created on one of the Galera nodes with the following command:

$ mysqldump --single-transaction sbtest > sbtest.sql

We are going to use 'pv' to observe the progress and measure the restoration performance. Thus, the restore command is:

$ pv sbtest.sql | mysql -uroot -p sbtest

The restorations were repeated 3 times for each host type as shown in the following table:

Endpoint Type

Database Server

Restoration Time


Restoration Speed



MySQL 5.7.25

3m 29s

3m 36s

3m 31s






HAProxy -> PXC 5.7.25 (multiple DB hosts - all active, leastconn)

5m 45s

6m 03s

5m 43s




ProxySQL -> PXC 5.7.25

(single DB host - single writer hostgroup)

6m 07s

7m 00s

6m 54s






PXC 5.7.25

(single DB host)

5m 22s

6m 00s

5m 28s





Note that the way pv measures the restoration speed is based on the mysqldump text file that is being passed through it through pipe. It's not highly accurate but good enough to give us some measurements to compare. All hosts are having the same specs and running as a virtual machine on the same underlying physical hardware.

The following column chart summarizes the average time it takes to restore the mysqldump:

Standalone host is the clear winner with 212 seconds, while ProxySQL is the worst for this workload; almost two-times slower if compared to standalone.

The following column chart summarizes the average speed pv measures when restoring the mysqldump:

As expected, restoration on the standalone note is way faster with 8.6 MiB/s on average, 1.5x better than restoration directly on the Galera node.

To summarize our observation, restoring directly on a Galera Cluster node is way slower than a standalone host. Restoring through a load balancer is even worse.

Turning Off Galera Replication

Running mysqldump on a Galera Cluster will cause every single DML statement (INSERTs in this case) being broadcasted, certified and applied by Galera nodes through its group communication and replication library. Thus, the fastest way to restore a mysqldump is to perform the restoration on a single node, with Galera Replication turned off, kind of making it running like a standalone mode. The steps are:

  1. Pick one Galera node as the restore node. Stop the rest of the nodes.
  2. Turn off Galera Replication on the restore node.
  3. Perform the restoration.
  4. Stop and bootstrap the restore node.
  5. Force the remaining node to re-join and re-sync via SST.

For example, let's say we choose db1 to be the restore node. Stop the other nodes (db2 and db3) one node at a time so the nodes would leave the cluster gracefully:

$ systemctl stop mysql #db2

$ systemctl stop mysql #db3

Note: For ClusterControl users, simply go to Nodes -> pick the DB node -> Node Actions -> Stop Node. Do not forget to turn off ClusterControl automatic recovery for cluster and nodes before performing this exercise.

Now, login to db1 and turn the Galera node into a standalone node by setting wsrep_provider variable to 'none':

$ mysql -uroot -p

mysql> SET GLOBAL wsrep_provider = 'none';

mysql> SHOW STATUS LIKE 'wsrep_connected';


| Variable_name   | Value |


| wsrep_connected | OFF   |


Then perform the restoration on db1:

$ pv sbtest.sql | mysql -uroot -p sbtest

1.78GiB 0:02:46 [  11MiB/s] [==========================================>] 100%

The restoration time has improved 2x to 166 seconds (down from ~337 seconds) with 11MiB/s (up from ~5.43MiB/s). Since this node is now has the most updated data, we have to bootstrap the cluster based on this node and let the other nodes rejoin the cluster and force to re-syncing everything back. 

On db1, stop the MySQL service and start it again in bootstrap mode:

$ systemctl status mysql #check whether mysql or mysql@bootstrap is running

$ systemctl status mysql@bootstrap #check whether mysql or mysql@bootstrap is running

$ systemctl stop mysql # if mysql was running

$ systemctl stop mysql@bootstrap # if mysql@bootstrap was running

$ systemctl start mysql@bootstrap

While on every remaining node, wipe out the datadir (or you can just simply delete grastate.dat file) and start the MySQL service:

$ rm /var/lib/mysql/grastate.dat  # remove this file to force SST

$ systemctl start mysql

Do perform the start up process one node at a time. Once the working node is synced, proceed with the next node and so on.

Note: For ClusterControl users, you could skip the above step because ClusterControl can be configured to force SST during the bootstrap process. Just click on the Cluster Actions -> Bootstrap Cluster and pick the db1 as the bootstrap node and toggle on the option for "Clear MySQL Datadir on Joining nodes", as shown below:

We could also juice up the restoration process by allowing bigger packet size for the mysql client:

$ pv sbtest.sql | mysql -uroot -p --max_allowed_packet=2G sbtest

At this point, our cluster should be running with the restored data. Take note that in this test case, the total restoration time for the cluster is actually longer than if we performed the restoration directly on the Galera node thanks to our small dataset. If you have a huge mysqldump file to restore, believe us, this is one of the best ways you should do.

That's it for now. Happy restoring!


by ashraf at August 19, 2019 09:45 AM

August 18, 2019

Oli Sennhauser

Schulung Galera Cluster für MariaDB und MySQL im September 2019 in Berlin

Die Sommerferien sind vorbei. Mit neuem Elan in den Herbst! Zeit für eine Weiterbildung?

Vom 19. bis 20. September führt FromDual wieder die Galera Cluster Schulung Galera Cluster für MySQL und MariaDB in Berlin durch. Siehe auch unsere weiteren Schulungstermine.

Es hat noch Plätze frei! Anmelden können Sie sich direkt bei unserem Schulungs-Partner, der Heinlein Akademie.

Diese MariaDB/MySQL Weiterbildung richtet sich an alle DBAs, DevOps und System Administratoren, welche MariaDB und MySQL Datenbanken mit einem Galera Cluster zu betreuen haben und gerne besser verstehen wollen, wie Sie den Galera Cluster sicher und stabil betreiben.

In dieser Schulung behandeln wir, wie Sie einen Galera Cluster richtig designen und aufsetzten, wie Sie ihn installieren, konfigurieren und betreiben. Zudem betrachten wir mögliche Load Balancing Mechanismen und besprechen Performance Fragen zu Galera.

Das Ganze ist mit zahlreichen Übungen versehen, damit Sie das gelernte auch gleich praktisch anwenden können!

Die Schulung findet in deutscher Sprache statt.

Den detaillierten Inhalt dieser zweitägigen Galera Cluster Schulung finden Sie hier.

Bei weiteren Fragen nehmen Sie bitte mit uns Kontakt auf.

Taxonomy upgrade extras: 

by Shinguz at August 18, 2019 07:38 PM

August 17, 2019

Oli Sennhauser

MariaDB and MySQL Character Set Conversion



Recently we had a consulting engagement where we had to help the customer to migrate from latin1 Character Set to utf8mb4 Character Set. In the same MySQL consulting engagement we considered to upgrade from MySQL 5.6 to MySQL 5.7 as well [ Lit. ]. We decided to split the change in 2 parts: Upgrading to 5.7 in the first step and converting to uft8mb4 in the second step. There were various reasons for this decision:

  • 2 smaller changes are easier to control then one big shot.
  • We assume that in 5.7 we experience less problems with utf8mb4 because the trend given by MySQL was more towards utf8mb4 in 5.7 than in MySQL 5.6. So we hope to hit less problems and bugs.

For Upgrading see also MariaDB and MySQL Upgrade Problems

Remark: It makes possibly also sens to think about Collations before starting with the conversion!

Character Sets

Historically MariaDB and MySQL had the default Character Set latin1 (Latin-1 or ISO-8859-1) which was sufficient for most of the western hemisphere.

But as technology spreads and demands increase other cultures want to have their characters represented understandably as well. So Unicode standard was invented. And MariaDB and MySQL applied this standard as well.

The original MariaDB/MySQL utf8(mb3) implementation was not perfect or complete so they implemented utf8mb4 as a super set of utf8(mb3). So at least since MariaDB/MySQL version 5.5 latin1, utf8 and utf8mb4 are available. The current MySQL 5.7 utf8mb4 implementation should cover Unicode 9.0.0:

SQL> SELECT * FROM information_schema.character_sets
WHERE character_set_name LIKE 'utf8%' OR character_set_name = 'latin1';
| latin1             | latin1_swedish_ci    | cp1252 West European |      1 |
| utf8               | utf8_general_ci      | UTF-8 Unicode        |      3 |
| utf8mb4            | utf8mb4_general_ci   | UTF-8 Unicode        |      4 |

The default Character Set up to MariaDB 10.4 and MySQL 5.7 was latin1. In MySQL 8.0 the default Character Set has changed to utf8mb4. There are no signs so far that MariaDB will take the same step:

SQL> status
mysql  Ver 8.0.16 for linux-glibc2.12 on x86_64 (MySQL Community Server - GPL)

Connection id:          84
Current database:
Current user:           root@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         8.0.16 MySQL Community Server - GPL
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    utf8mb4
Db     characterset:    utf8mb4
Client characterset:    utf8mb4
Conn.  characterset:    utf8mb4
UNIX socket:            /var/run/mysqld/mysql-3332.sock
Uptime:                 3 days 47 min 13 sec

So we see a general trend from latin1 to utf8(mb3) to utf8mb4 technically and business wise (aka globalization).

For the DBA this means sooner or later we have to think about a conversion of all tables of the whole database instance (all tables of all schemata) to utf8mb4!

Steps to convert Character Set to utf8mb4

Analyzing the Server

First of all one should analyze the system (O/S, database instance and client/application). On the server we can run the following command to verify the actual used and supported Character Set:

# locale

On the MariaDB/MySQL database instance we check the current server configuration and the session configuration with the following commands:

WHERE Variable_name LIKE 'character_set\_%' OR Variable_name LIKE 'collation%';
| Variable_name            | Value             |
| character_set_client     | utf8              |
| character_set_connection | utf8              |
| character_set_database   | latin1            |
| character_set_filesystem | binary            |
| character_set_results    | utf8              |
| character_set_server     | latin1            |
| character_set_system     | utf8              |
| collation_connection     | utf8_general_ci   |
| collation_database       | latin1_swedish_ci |
| collation_server         | latin1_swedish_ci |

WHERE Variable_name LIKE 'character_set\_%' OR Variable_name LIKE 'collation%';
| Variable_name            | Value             |
| character_set_client     | latin1            |
| character_set_connection | latin1            |
| character_set_database   | latin1            |
| character_set_filesystem | binary            |
| character_set_results    | latin1            |
| character_set_server     | latin1            |
| character_set_system     | utf8              |
| collation_connection     | latin1_swedish_ci |
| collation_database       | latin1_swedish_ci |
| collation_server         | latin1_swedish_ci |

These configuration variables are for Client/Server communication: character_set_client, character_set_connection and character_set_results. These for Server configuration: character_set_server and character_set_database (deprecated in MySQL 5.7). And these for System internals and File System access: character_set_system and character_set_filesystem.

Sometimes we see customers using the Logon Trigger init_connect to force clients for a specific Character Set:

| Variable_name | Value            |
| init_connect  | SET NAMES latin1 |

The SET NAMES command sets the character_set_client, character_set_connection and character_set_results session variables. [ Lit. ]

Analyzing the Application and the Clients

Similar steps to analyze the Application and Clients should be taken. We want to answer the following questions:

  • Support of utf8 of Application/Client O/S (Windows)?
  • Support of utf8 of Web Server (Apache (AddDefaultCharset utf-8), Nginx, IIS, ...)
  • Version of programming language (Java, PHP (5.4 and newer?), ...)
  • Version of MariaDB and MySQL Connectors (JDBC (5.1.47 and newer?), ODBC (5.3.11 and newer?), mysqli/mysqlnd (⋝7.0.19?, ⋝7.1.5?), ...)
  • Application code (header('Content-Type: text/html; charset=utf-8');, <?xml version="1.0" encoding="UTF-8"?>, <meta http-equiv="Content-Type" content="text/html; charset=utf-8">, <form accept-charset="utf-8">, htmlspecialchars($str, ENT_NOQUOTES, "UTF-8"), $mysqli->set_charset('utf8mb4');, mbstring [ Lit. ], etc.

See also: Configuring Application Character Set and Collation

If you do not have your Application under control your DBA can help you to find out with the General Query Log (SET GLOBAL general_log = 1;) what is going on:

190815 19:03:00    12 Connect   root@localhost on 
                   12 Query     select @@version_comment limit 1
                   12 Query     SET NAMES latin1
                   12 Query     SET NAMES utf8
190815 19:05:24    12 Quit

or with some queries on the PERFORMANCE_SCHEMA:

-- Works since MySQL 5.6/MariaDB 10.0
SQL> SELECT t.thread_id, t.processlist_id, t.processlist_user, t.processlist_host, t.processlist_db
     , sca.attr_name, sca.attr_value
  FROM performance_schema.threads AS t
  JOIN performance_schema.session_connect_attrs AS sca ON sca.processlist_id = t.processlist_id
 WHERE t.processlist_user IS NOT NULL
   AND t.thread_id = 103
| thread_id | processlist_id | processlist_user | processlist_host | processlist_db | attr_name                        | attr_value          |
|       103 |             78 | replication      | localhost        | NULL           | _os                              | linux-glibc2.12     |
|       103 |             78 | replication      | localhost        | NULL           | _client_name                     | libmysql            |
|       103 |             78 | replication      | localhost        | NULL           | _pid                             | 29269               |
|       103 |             78 | replication      | localhost        | NULL           | program_name                     | mysqld              |
|       103 |             78 | replication      | localhost        | NULL           | _platform                        | x86_64              |
|       103 |             78 | replication      | localhost        | NULL           | _client_replication_channel_name | NULL                |
|       103 |             78 | replication      | localhost        | NULL           | _client_role                     | binary_log_listener |
|       103 |             78 | replication      | localhost        | NULL           | _client_version                  | 5.7.26              |

-- Works since MySQL 5.7 only
SELECT t.thread_id, t.processlist_id, t.processlist_user, t.processlist_host, t.processlist_db
     , vbt.variable_name, vbt.variable_value
  FROM performance_schema.threads AS t
  JOIN performance_schema.variables_by_thread AS vbt ON vbt.thread_id = t.thread_id
 WHERE t.processlist_user IS NOT NULL
   AND (vbt.variable_name like 'charac%' OR vbt.variable_name LIKE 'coll%')
   AND t.thread_id = 103
| thread_id | processlist_id | processlist_user | processlist_host | processlist_db | variable_name            | variable_value    |
|       103 |             78 | replication      | localhost        | NULL           | character_set_client     | latin1            |
|       103 |             78 | replication      | localhost        | NULL           | character_set_connection | latin1            |
|       103 |             78 | replication      | localhost        | NULL           | character_set_database   | latin1            |
|       103 |             78 | replication      | localhost        | NULL           | character_set_filesystem | binary            |
|       103 |             78 | replication      | localhost        | NULL           | character_set_results    | latin1            |
|       103 |             78 | replication      | localhost        | NULL           | character_set_server     | latin1            |
|       103 |             78 | replication      | localhost        | NULL           | collation_connection     | latin1_swedish_ci |
|       103 |             78 | replication      | localhost        | NULL           | collation_database       | latin1_swedish_ci |
|       103 |             78 | replication      | localhost        | NULL           | collation_server         | latin1_swedish_ci |

Preparation of the Server Settings and the Application

To have a better control of the impact of some changes we decided to do some changes on the Application first:

  • Application is setting the Character Set properly itself ($mysqli->set_charset('utf8mb4') [ Lit. ]). In the same step also sql_mode can be set by the application so we can use the defaults on server side in the future.
  • Apache and PHP are configured to support UTF-8.
  • After this step init_connect, character_set_server and character_set_database can be changed to utf8mb4 on the Server and --skip-character-set-client-handshake can be removed at the same time [ Lit. ]

Convert Tables to utf8mb4

First we checked and converted the default Character Set of the Schemata/Databases:

SQL> SELECT schema_name, default_character_set_name, default_collation_name
  FROM information_schema.schemata
 WHERE schema_name NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
| schema_name | default_character_set_name | default_collation_name |
| focmm       | latin1                     | latin1_swedish_ci      |
| test        | latin1                     | latin1_swedish_ci      |
| foodmart    | latin1                     | latin1_swedish_ci      |
| erp         | latin1                     | latin1_swedish_ci      |
| world       | latin1                     | latin1_swedish_ci      |

Converting the Schemata is done as follows:


which is a fast operation.

To convert the tables there are many different possibilities we considered:

  • The easy one: ALTER TABLE ... CONVERT TO CHARACTER SET utf8mb4;
  • The possibly faster one: dump/restore with sed 's/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8mb4/'
  • The possibly fastest one: drop Secondary Indexes, dump/restore with sed, create Secondary Indexes again (mysqlpump).
  • The automated one: With Perconas pt-online-schema-change [ Lit. ] or Facebooks OnlineSchemaChange OSC [ Lit. ]
  • The most elegant but not supported one: Master/Slave Replication.

Already with converting the first table we run into a problem:

ERROR 1709 (HY000): Index column size too large. The maximum column size is 767 bytes.

This table had a Primary Key of a length of more than 255 bytes and this is not possible with the old InnoDB Antelope File Format:

 WHERE Variable_name LIKE 'innodb_file_format%' OR Variable_name LIKE 'innodb_large%';
| Variable_name            | Value    |
| innodb_file_format       | Antelope |
| innodb_file_format_check | ON       |
| innodb_file_format_max   | Antelope |
| innodb_large_prefix      | OFF      |

So we have to find out first, which tables are still in old Antelope File Format:

SQL> SELECT table_schema
     , CASE WHEN row_format = 'Dynamic' THEN 'Barracuda'
            WHEN row_format = 'Compressed' THEN 'Barracuda'
            WHEN row_format = 'Compact' THEN 'Antelope'
            WHEN row_format = 'Redundant' THEN 'Antelope' END AS 'file_format'
     , COUNT(*)
  FROM information_schema.tables
 WHERE engine = 'InnoDB'
   AND table_schema NOT IN ('information_schema', 'sys', 'mysql')
 GROUP BY table_schema, file_format
| table_schema | file_format | count(*) |
| foodmart     | Barracuda   |       23 |
| test         | Barracuda   |        1 |
| world        | Antelope    |        2 |
| world        | Barracuda   |        1 |

Then we could covert the table correctly forcing the Barracuda File format:


Testing of new Character Set

The last but most important step is to test the changes. Here we recommend to do as many as possible different tests:

  • MySQL CLI: mysql
  • phpMyAdmin
  • MySQL Workbench
  • Other GUIs
  • Your Application

Especially test careful data with umlauts (öäüÄÖÜß), foreign Characters (Turkish, Cyrillic, CJK Characters) and Emojis. Good Candidates are: Lastname, City or free text fields like Comment.

Some aid you also can get from here:

MySQL Pump

mysqlpumpA Database Backup Program

This utility is currently not available for MariaDB. If works for MySQL 5.7 and newer and does NOT support MySQL 5.6. So we cannot use it for the Upgrade Process from MySQL 5.6 to 5.7. Newest MySQL Releases contain Bug fixes and even new Features in mysqlpump so we can assume it is still supported and actively maintained. Recent releases contain some fixes for trivial bugs so we can assume mysqlpump is not widely used yet and not as mature yet as mysqldump. An alternative product would be MyDumper from Domas@Facebook (Lauchpad, GitHub).

Interesting features are:

  • Parallel dumping of databases.
  • Secondary Index restore separated from Table Restore.