Planet MariaDB

April 02, 2020

MariaDB Foundation

Pick your feature!


With many of us spending lots of time at home in front of our webcams, MariaDB Foundation’s YouTube channel is getting some love. […]

The post Pick your feature! appeared first on MariaDB.org.

by Ian Gilfillan at April 02, 2020 06:05 AM

April 01, 2020

SeveralNines

How Performant is Your ProxySQL Node?

ProxySQL has gained a lot of interest right now in the MySQL and MariaDB database world, not to mention ClickHouse which helps make the case for ProxySQL. 

It’s safe to say that ProxySQL has become the default database proxy for the MySQL family of databases (such as Percona Server, Oracle MySQL, Galera Cluster or even with MariaDB). 

ProxySQL is, in fact, an efficient problem solver with extremely rich functionalities that manage database client-server communication; acting as the middleware in a very advanced and performant approach. 

It has made possible the ability to shape the database traffic by delaying, caching, or rewriting queries on the fly. It can also be used to create an environment in which failovers will not affect applications and will be transparent to them. The ProxySQL community is very responsive and constantly builds fixes, patches, and version releases on a timely basis. 

But how performant is your ProxySQL setup, and how can you determine that your setup has been tuned correctly? This blog focuses on determining how performant your ProxySQL nodes are and how to monitor it efficiently.

Common Problems You Can Encounter With ProxySQL

ProxySQL’s default installation comes with a light-weight, simple tuning tool that is able to handle average to heavy load. Although this can depend on the type of queries sent to the middleware, it can impact and start to experience bottlenecks and latency.

Latency Issues

For example, what can lead to latency issues can be hard to determine if you lack a monitoring system. Likewise, you can manually monitor or check the stats schema just like below:

mysql> select * from stats_mysql_connection_pool\G

*************************** 1. row ***************************

        hostgroup: 20

         srv_host: 192.168.10.225

         srv_port: 3306

           status: ONLINE

         ConnUsed: 0

         ConnFree: 0

           ConnOK: 0

          ConnERR: 0

      MaxConnUsed: 0

          Queries: 0

Queries_GTID_sync: 0

  Bytes_data_sent: 0

  Bytes_data_recv: 0

       Latency_us: 1151

*************************** 2. row ***************************

        hostgroup: 20

         srv_host: 192.168.10.226

         srv_port: 3306

           status: ONLINE

         ConnUsed: 0

         ConnFree: 0

           ConnOK: 0

          ConnERR: 0

      MaxConnUsed: 0

          Queries: 0

Queries_GTID_sync: 0

  Bytes_data_sent: 0

  Bytes_data_recv: 0

       Latency_us: 470

*************************** 3. row ***************************

        hostgroup: 10

         srv_host: 192.168.10.227

         srv_port: 3306

           status: ONLINE

         ConnUsed: 0

         ConnFree: 0

           ConnOK: 0

          ConnERR: 0

      MaxConnUsed: 0

          Queries: 0

Queries_GTID_sync: 0

  Bytes_data_sent: 0

  Bytes_data_recv: 0

       Latency_us: 10855

*************************** 4. row ***************************

        hostgroup: 40

         srv_host: 192.168.10.225

         srv_port: 3306

           status: ONLINE

         ConnUsed: 0

         ConnFree: 0

           ConnOK: 0

          ConnERR: 0

      MaxConnUsed: 0

          Queries: 0

Queries_GTID_sync: 0

  Bytes_data_sent: 0

  Bytes_data_recv: 0

       Latency_us: 1151

*************************** 5. row ***************************

        hostgroup: 40

         srv_host: 192.168.10.226

         srv_port: 3306

           status: ONLINE

         ConnUsed: 0

         ConnFree: 0

           ConnOK: 0

          ConnERR: 0

      MaxConnUsed: 0

          Queries: 0

Queries_GTID_sync: 0

  Bytes_data_sent: 0

  Bytes_data_recv: 0

       Latency_us: 470

5 rows in set (0.01 sec)

This allows you to monitor latency based on the hostgroup. But it adds up the hassle unless you have to innovate and develop a script(s) that will manage to notify you.

Client Connection Errors

Maximum connection timeout due to maximum connections in the backend (database node itself) can lead you to perplexity if you are not able to determine what's the main source of the problem. You can check the stats database though to check for such aborted connections in the client or even the server and it's denied connections as follows,

mysql> select * from stats.stats_mysql_global where variable_name like '%connect%';

+-------------------------------------+----------------+

| Variable_Name                       | Variable_Value |

+-------------------------------------+----------------+

| Client_Connections_aborted          | 0 |

| Client_Connections_connected        | 205 |

| Client_Connections_created          | 10067 |

| Server_Connections_aborted          | 44 |

| Server_Connections_connected        | 30 |

| Server_Connections_created          | 14892 |

| Server_Connections_delayed          | 0 |

| Client_Connections_non_idle         | 205 |

| Access_Denied_Max_Connections       | 0 |

| Access_Denied_Max_User_Connections  | 0 |

| MySQL_Monitor_connect_check_OK      | 41350 |

| MySQL_Monitor_connect_check_ERR     | 92 |

| max_connect_timeouts                | 0 |

| Client_Connections_hostgroup_locked | 0              |

| mysql_killed_backend_connections    | 0 |

+-------------------------------------+----------------+

15 rows in set (0.01 sec)

It's also ideal if you can verify and check the backend user's max number of connections to see what are the number of connection limits it can open or use. For example, I have the following in my test,

mysql> select username, active, transaction_persistent, max_connections from mysql_users;

+---------------+--------+------------------------+-----------------+

| username      | active | transaction_persistent | max_connections |

+---------------+--------+------------------------+-----------------+

| proxydemo     | 1 | 1                   | 10000 |

| proxysql-paul | 1      | 1 | 10000           |

+---------------+--------+------------------------+-----------------+

2 rows in set (0.00 sec)

Slow Queries

Identifying the slow queries cannot be that difficult in ProxySQL, but it can be inefficient if done manually. You can check this if doing manual with the variable,

mysql> select * from stats_mysql_global where  variable_name like '%slow%';

+---------------+----------------+

| Variable_Name | Variable_Value |

+---------------+----------------+

| Slow_queries  | 2 |

+---------------+----------------+

1 row in set (0.00 sec)

While that can provide you some numbers, you might check on the table stats_mysql_query_digest under the stats schema if you want to dig deeper. For example below,

mysql> select count_star,sum_time,(sum_time/count_star)/1000 as average_time_ms,digest_text

    -> from stats_mysql_query_digest

    -> where count_star > 100 order by average_time_ms desc limit 10;

+------------+----------+-----------------+--------------------------------------+

| count_star | sum_time | average_time_ms | digest_text                          |

+------------+----------+-----------------+--------------------------------------+

| 884        | 15083961 | 17              | UPDATE sbtest1 SET k=k+? WHERE id=?  |

| 930        | 16000111 | 17              | UPDATE sbtest9 SET k=k+? WHERE id=?  |

| 914        | 15695810 | 17              | UPDATE sbtest4 SET k=k+? WHERE id=?  |

| 874        | 14467420 | 16              | UPDATE sbtest8 SET k=k+? WHERE id=?  |

| 904        | 15294520 | 16              | UPDATE sbtest3 SET k=k+? WHERE id=?  |

| 917        | 15228077 | 16              | UPDATE sbtest6 SET k=k+? WHERE id=?  |

| 907        | 14613238 | 16              | UPDATE sbtest2 SET k=k+? WHERE id=?  |

| 900        | 15113004 | 16              | UPDATE sbtest5 SET k=k+? WHERE id=?  |

| 917        | 15299381 | 16              | UPDATE sbtest7 SET k=k+? WHERE id=?  |

| 883        | 15010119 | 16              | UPDATE sbtest10 SET k=k+? WHERE id=? |

+------------+----------+-----------------+--------------------------------------+

10 rows in set (0.01 sec)

which catches top 10 slow queries based on a sampling by 100. 

Memory Utilization

Hardware items such as CPU, Disk, and Memory have to be monitored to ensure that your ProxySQL is performant. However, the most crucial thing is the memory, as ProxySQL will utilize heavily in the memory due to the query cache mechanism. By default, the query cache, which is dependent on the variable mysql-query_cache_size_MB defaults to 256 Mib. With that regard, it can come to a situation where it uses memory and you need to determine and diagnose if you find issues within your ProxySQL node or even being noticed within the application layer.

When identifying this, you might end up checking the tables in the stats_history and stats schemas. You can see the list of tables which can help you during diagnosis,

mysql> show tables from stats;

| stats_memory_metrics                 |

19 rows in set (0.00 sec)

or,

mysql> show tables from stats_history;

+------------------------+

| tables                 |

+------------------------+

| mysql_connections      |

| mysql_connections_day  |

| mysql_connections_hour |

| mysql_query_cache      |

| mysql_query_cache_day  |

| mysql_query_cache_hour |

| system_cpu             |

| system_cpu_day         |

| system_cpu_hour        |

| system_memory          |

| system_memory_day      |

| system_memory_hour     |

+------------------------+

15 rows in set (0.00 sec)

Efficiently Determining The Performance of your ProxySQL

There are multiple ways to determine the performance of your ProxySQL node. Using ClusterControl offers you the ability to determine this with simple yet straightforward graphs. For example, when ProxySQL is integrated into your cluster, you'll be able to set your query rules, change user's max_connections, determine the top queries, change a user host group, and provide you the performance of your ProxySQL node. See the screenshots below...

All you see is the proof of how proficiently ClusterControl can give you insights of the performance of your ProxySQL node. But this does not limit you to that. ClusterControl also has rich and powerful dashboards we call SCUMM, which includes ProxySQL Overview dashboard. 

If you intend to determine slow queries, you can simply take a glance to the dashboard. Checking your latency distribution over the different hostgroups where your backend nodes are assigned helps you to have a quick insight of the performance based on distribution. You can monitor the client and server connections, providing you query cache insights. Most importantly and not the least, it gives you the memory utilization that ProxySQL node is using. See the graphs below...

These graphs are part of the dashboard which simply helps you to easily determine the performance of your ProxySQL node.

ClusterControl doesn't limit you when dealing with ProxySQL. Also, there's a rich feature here where you can also take a backup or import the configuration which is very important when you are dealing with high-availability for your ProxySQL nodes.

Conclusion

It's never been easier to monitor and determine if you have any issues with your ProxySQL. Like in the example of this blog, we're showcasing ClusterControl as a tool that can provide you efficiency and give you insights to determine the outstanding issues that you are dealing with your performance related problems.

by Paul Namuag at April 01, 2020 06:19 PM

Oli Sennhauser

Shutdown with MySQL 8

On StackExchange for Database Administrators I recently have seen a question which attracted my interest.

The question puzzled me a bit because the answer seems too easy. Further the question was not so clear. An all theses factors smell dangerous...

About time - was, is and will be

How can I find out if the database "was" shutdown slowly? This is quite easy: Look into your MySQL Error Log and there you will find a log sequence similar to the following:

2020-03-30T08:03:36.928017Z 0 [System] [MY-010910] [Server] /home/mysql/product/mysql-8.0.19-linux-glibc2.12-x86_64/bin/mysqld: Shutdown complete (mysqld 8.0.19)  MySQL Community Server - GPL.

Ups! There are no more "shutting down ..." messages like in MySQL 5.7:

2020-03-30T08:04:49.898254Z 0 [Note] Giving 1 client threads a chance to die gracefully
2020-03-30T08:04:49.898266Z 0 [Note] Shutting down slave threads
2020-03-30T08:04:51.898389Z 0 [Note] Forcefully disconnecting 1 remaining clients
2020-03-30T08:04:51.898433Z 0 [Warning] bin/mysqld: Forcing close of thread 115  user: 'enswitch'
2020-03-30T08:04:51.898512Z 0 [Note] Event Scheduler: Purging the queue. 0 events
2020-03-30T08:04:51.924644Z 0 [Note] Binlog end
2020-03-30T08:04:51.938518Z 0 [Note] Shutting down plugin 'ngram'
...
2020-03-30T08:04:53.296239Z 0 [Note] Shutting down plugin 'binlog'
2020-03-30T08:04:53.296805Z 0 [Note] bin/mysqld: Shutdown complete

So you cannot find out, when shutdown started any more and thus you cannot say how long it took to shutdown MySQL. So MySQL messed it up somehow in 8.0. Too much clean-up!

If you want to get the old behaviour back you can stop MySQL 8 as follows:

SQL> SET GLOBAL log_error_verbosity = 3;
SQL> SHUTDOWN;

or just add the variable to your MySQL configuration file (my.cnf).

Then you will find the old shutdown sequence in your error log as before:

2020-03-30T08:13:55.071627Z 9 [System] [MY-013172] [Server] Received SHUTDOWN from user root. Shutting down mysqld (Version: 8.0.19).
2020-03-30T08:13:55.178119Z 0 [Note] [MY-010067] [Server] Giving 1 client threads a chance to die gracefully
2020-03-30T08:13:55.178210Z 0 [Note] [MY-010117] [Server] Shutting down slave threads
...
2020-03-30T08:13:56.588574Z 0 [System] [MY-010910] [Server] /home/mysql/product/mysql-8.0.19-linux-glibc2.12-x86_64/bin/mysqld: Shutdown complete (mysqld 8.0.19)  MySQL Community Server - GPL.

If you want to know where your MySQL Error Log File is located you can find it like this:

SQL> SHOW GLOBAL VARIABLES LIKE 'log_error';
+---------------+---------------------------------------------+
| Variable_name | Value                                       |
+---------------+---------------------------------------------+
| log_error     | /home/mysql/database/mysql-80/log/error.log |
+---------------+---------------------------------------------+

Typical locations are: /var/lib/mysql/<hostname>.log or /var/log/mysqld.log or /var/log/mysql/mysqld.log or /var/log/messages or similar.

Now about the "is"

When you are currently shutting down MySQL it is already to late to find it out when it started. Because you cannot connect to the database any more to change the settings and you do not see anything in the MySQL Error log. Possibly you can look at the error log with stat and you can see when the last message was written to it to find the start of the shutdown.

 shell> stat error.log 
  File: error.log
  Size: 29929           Blocks: 64         IO Block: 4096   regular file
Device: 801h/2049d      Inode: 5373953     Links: 1
Access: (0640/-rw-r-----)  Uid: ( 1001/   mysql)   Gid: ( 1001/   mysql)
Access: 2020-03-30 10:13:59.491446560 +0200
Modify: 2020-03-30 10:13:56.587467485 +0200
Change: 2020-03-30 10:13:56.587467485 +0200
 Birth: -

Symptoms for a working shutdown is either heavy writing to disk (iostat -xk 1) or heavy swapping in (vmstat). You have to wait until finished. Some brave people use a kill -9 in such a case, if they have InnoDB only and if they know exactly what they are doing and how much time a following crash recovery will take.

And finally about the question "how long will the shutdown take" (will be)

This is not so easy to predict. It depends on several things:

  • How many memory blocks of your database are swapped out.
  • How many pages are dirty and must be written to disk.
  • How fast is your I/O system (IOPS).

You can gather these information as follows:

  • How much Swap must be swapped in you can find here.
  • The number of dirty pages (pages modified but not written to disk yet) you can find with:
    SQL> SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_pages_dirty';
  • And about the IOPS you have to ask your hardware spec sheet (150 to 100'000 IOPS).

I hope with this answer I have covered all the possibly questions about shutting down MySQL 8.0?

Taxonomy upgrade extras: 

by Shinguz at April 01, 2020 02:52 PM

MariaDB Foundation

Congratulations on SkySQL, the DBaaS offering!

We congratulate MariaDB Corporation on their release of SkySQL, a DBaaS offering for MariaDB Server. A big step!

The post Congratulations on SkySQL, the DBaaS offering! appeared first on MariaDB.org.

by Kaj Arnö at April 01, 2020 09:09 AM

March 31, 2020

SeveralNines

Implementing a Multi-Datacenter Setup for PostgreSQL - Part One

Having a multi-datacenter setup is a common topology for a Disaster Recovery Plan(DRP), but there are some limitations around implementing this kind of environment. 

You should first solve the communication between the data centers by using SSH access or configuring a VPN. Then, you have the latency that (depending on the configuration) could affect your database cluster. Finally, you should think about how to perform the failover. Can the application access the remote node in case of master failure?

In this blog, we will show how to implement a multi-datacenter setup for PostgreSQL covering all these points mentioned earlier, some of them using ClusterControl. To not make it too boring, we will split it into two parts. In the first part, we will cover the connectivity between the data centers. The second one will be about the deployment and configuration itself, so let’s start!

Objective

Let’s say you want to have the following topology:

Where you have your application connected to a load balancer, a primary database node, and one standby node in one datacenter, and another standby node in a secondary datacenter for DR purposes. This could be a minimal setup for having a multi-datacenter environment. You can avoid using the load balancer, but in case of failover, you should reconfigure your application to connect to the new master, so to avoid that we recommend using it, or even use two of them (one on each DC) to avoid single point of failure. 

To make it more clear, let’s assign some public IP addresses to both datacenter 1 and 2 as an example.

In datacenter 1, the public network is 35.166.37.0/24, so let’s assign the following IP addresses in this way:

APP: 35.166.37.10

Load Balancer + ClusterControl: 35.166.37.11

Primary Node: 35.166.37.12

Standby 1 Node: 35.166.37.13

In datacenter 2, the public network is 18.197.23.0/24, so:

Standby 2 Node: 18.197.23.14

Data Center Connectivity

The first problem could be this one. You can configure a VPN between them, and that must be the most secure way, but as we covered a VPN configuration in a previous blog, and to make it as short as possible, we will connect them via SSH access using private/public keys.

Let’s create a user called ‘remote’ in all the nodes (to avoid using root):

$ useradd remote

$ passwd remote

Changing password for user remote.

New password:

Retype new password:

passwd: all authentication tokens updated successfully.

And you can add it to the sudoers file to assign privileges:

$ visudo

remote    ALL=(ALL)       ALL

Now, in the load balancer server (which will be also the ClusterControl server), generate the key pair for the new user:

$ su remote

$ ssh-keygen

Generating public/private rsa key pair.

Enter file in which to save the key (/home/remote/.ssh/id_rsa):

Created directory '/home/remote/.ssh'.

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /home/remote/.ssh/id_rsa.

Your public key has been saved in /home/remote/.ssh/id_rsa.pub.

The key fingerprint is:

SHA256:hgVe/unld9+r/Ynk1HM+t089A41bwxFlPYt5/q+ZyL8 remote@lb1

The key's randomart image is:

+---[RSA 3072]----+

|      . .   .=|

|     . +     oo|

|      . o o.o|

|       o . . o+o.|

|      . S o .oo= |

|       . . o =.o|

|          . .+.=*|

|           .+ooB@|

|            o=EB/|

+----[SHA256]-----+

Now you will have a new directory in the home

Copy the public key to each node using the remote public IP Address:

$ ssh-copy-id 35.166.37.12

/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/remote/.ssh/id_rsa.pub"

/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

remote@35.166.37.12's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh '35.166.37.12'"

and check to make sure that only the key(s) you wanted were added.

This command will copy your public key to the remote node in the authorized_keys file, so you will access it using the private one.

Then, try to access them:

$ ssh 35.166.37.12

Make sure you have the SSH traffic allowed in your firewall, and to make it more secure, you should allow it only from a known source (e.g. from 35.166.37.0/24).

For example, if you’re using AWS, you should allow the traffic from 35.166.37.0/24 to the SSH port in this way:

Or if you’re using IPTABLES, you should run something like this:

$ iptables -A INPUT -p tcp -s 35.166.37.0/24 --destination-port 22 -j ACCEPT

Or a similar command if you’re using a different firewall solution.

To make it a bit more secure, we recommend using a different SSH port than the default one, and also could be useful using some tool to ban multiple failed attempts to access it, like fail2ban.

Conclusion

At this point, if everything went fine, you will have SSH communication between your data centers, so the next step is to deploy your PostgreSQL cluster and manage the failover in case of failure, as we will see in the second part of this blog.

by Sebastian Insausti at March 31, 2020 07:23 PM

March 30, 2020

MariaDB Foundation

Enable post-compromise data protection with MariaDB and Virgil Security’s PureKit

MariaDB deployments hold vast amounts of sensitive data such as intellectual property, state secrets, healthcare and financial records. HIPAA, GDPR and other government regulations require even more stringent protections and disclosures. […]

The post Enable post-compromise data protection with MariaDB and Virgil Security’s PureKit appeared first on MariaDB.org.

by Dmytro Matviiv at March 30, 2020 09:04 PM

SeveralNines

How to Restore a Single MySQL Table Using mysqldump?

Mysqldump is the most popular logical backup tool for MySQL. It is included in the MySQL distribution, so it’s ready for use on all of the MySQL instances. 

Logical backups are not, however, the fastest nor the most space-efficient way of backing up MySQL databases, but they have a huge advantage over physical backups. 

Physical backups are usually all or nothing type of backups. While it might be possible to create partial backup with Xtrabackup (we described this in one of our previous blog posts), restoring such backup is tricky and time-consuming. 

Basically, if we want to restore a single table, we have to stop the whole replication chain and perform the recovery on all of the nodes at once. This is a major issue - these days you rarely can afford to stop all of the databases. 

Another problem is the table level is the lowest granularity level you can achieve with Xtrabackup: you can restore a single table but you cannot restore part of it. Logical backup, though, can be restored in the way of running SQL statements, therefore it can easily be performed on a running cluster and you can (we wouldn’t call it easily, but still) pick which SQL statements to run so you can do a partial restore of a table. 

Let’s take a look at how this can be done in the real world.

Restoring a Single MySQL Table Using mysqldump

At the beginning, please keep in mind that partial backups do not provide a consistent view of the data. When you take backups of separate tables, you cannot restore such backup to a known position in time (for example, to provision the replication slave) even if you would restore all of the data from the backup. Having this behind us, let’s proceed.

We have a master and a slave:

Dataset contains of one schema and several tables:

mysql> SHOW SCHEMAS;

+--------------------+

| Database           |

+--------------------+

| information_schema |

| mysql              |

| performance_schema |

| sbtest             |

| sys                |

+--------------------+

5 rows in set (0.01 sec)



mysql> SHOW TABLES FROM sbtest;

+------------------+

| Tables_in_sbtest |

+------------------+

| sbtest1          |

| sbtest10         |

| sbtest11         |

| sbtest12         |

| sbtest13         |

| sbtest14         |

| sbtest15         |

| sbtest16         |

| sbtest17         |

| sbtest18         |

| sbtest19         |

| sbtest2          |

| sbtest20         |

| sbtest21         |

| sbtest22         |

| sbtest23         |

| sbtest24         |

| sbtest25         |

| sbtest26         |

| sbtest27         |

| sbtest28         |

| sbtest29         |

| sbtest3          |

| sbtest30         |

| sbtest31         |

| sbtest32         |

| sbtest4          |

| sbtest5          |

| sbtest6          |

| sbtest7          |

| sbtest8          |

| sbtest9          |

+------------------+

32 rows in set (0.00 sec)

Now, we have to take a backup. There are several ways in which we can approach this issue. We can just take a consistent backup of the whole dataset but this will generate a large, single file with all the data. To restore the single table we would have to extract data for the table from that file. It is of course possible, but it is quite time-consuming and it’s pretty much manual operation that can be scripted but if you do not have proper scripts in place, writing ad hoc code when your database is down and you are under heavy pressure is not necessarily the safest idea.

Instead of that we can prepare backup in a way that every table will be stored in a separate file:

root@vagrant:~/backup# d=$(date +%Y%m%d) ; db='sbtest'; for tab in $(mysql -uroot -ppass -h127.0.0.1 -e "SHOW TABLES FROM ${db}" | grep -v Tables_in_${db}) ; do mysqldump --set-gtid-purged=OFF --routines --events --triggers ${db} ${tab} > ${d}_${db}.${tab}.sql ; done

Please note that we set --set-gtid-purged=OFF. We need it if we’d be loading this data later to the database. Otherwise MySQL will attempt to set @@GLOBAL.GTID_PURGED, which will, most likely, fail. MySQL would as well set SET @@SESSION.SQL_LOG_BIN= 0; which is definitely not what we want. Those settings are required if we’d make a consistent backup of the whole data set and we’d like to use it to provision a new node. In our case we know it is not a consistent backup and there is no way we can rebuild anything from it. All we want is to generate a dump that we can load on the master and let it replicate to slaves.

That command generated a nice list of sql files that can be uploaded to the production cluster:

root@vagrant:~/backup# ls -alh

total 605M

drwxr-xr-x 2 root root 4.0K Mar 18 14:10 .

drwx------ 9 root root 4.0K Mar 18 14:08 ..

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest10.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest11.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest12.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest13.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest14.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest15.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest16.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest17.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest18.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest19.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest1.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest20.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest21.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest22.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest23.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest24.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest25.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest26.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest27.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest28.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest29.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest2.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest30.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest31.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest32.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest3.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest4.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest5.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest6.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest7.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest8.sql

-rw-r--r-- 1 root root  19M Mar 18 14:10 20200318_sbtest.sbtest9.sql

When you would like to restore the data, all you need to do is to load the SQL file into the master node:

root@vagrant:~/backup# mysql -uroot -ppass sbtest < 20200318_sbtest.sbtest11.sql

Data will be loaded into the database and replicated to all of the slaves.

How to Restore a Single MySQL Table Using ClusterControl?

Currently ClusterControl does not provide an easy way of restoring just a single table but it is still possible to do it with just a few manual actions. There are two options you can use. First, suitable for small number of tables, you can basically create schedule where you perform partial backups of a separate tables one by one:

Here, we are taking a backup of sbtest.sbtest1 table. We can easily schedule another backup for sbtest2 table:

Alternatively we can perform a backup and put data from a single schema into a separate file:

Now you can either find the missing data by hand in the file, restore this backup to a separate server or let ClusterControl do it:

You keep the server up and running and you can extract the data that you wanted to restore using either mysqldump or SELECT … INTO OUTFILE. Such extracted data will be ready to be applied on the production cluster.

 

by krzysztof at March 30, 2020 07:22 PM

March 27, 2020

SeveralNines

High Availability Configuration for ClusterControl Nodes Using CMON HA

In our previous blog, we have discussed ClusterControl CMON HA for Distributed Database High Availability written by Krzysztof Ksiazek in two separate posts. In this blog, we'll cover the distribution of nodes via on-prem and on a public cloud (using Google Cloud Platform (GCP)).

The reason we wrote this blog is because we have received questions about how to implement a high availability instance of ClusterControl having CMON node(s) running on-prem and another CMON node(s) running on a different data center (such as a public cloud). In our previous blog ClusterControl CMON HA for Distributed Database High Availability, we were using Galera Cluster nodes, but this time we'll use MySQL Replication using Percona Server 5.7. An ideal setup for this is to always encapsulate the communication of nodes from your on-prem and your nodes residing in a public cloud via VPN or a secure channel. 

ClusterControl CMON HA is at the early stages for which we believe it's not yet mature enough. Yet, our CMON HA is able to provide you the sense of functionality for deploying a ClusterControl to make it highly available. Let's proceed on how you can deploy and setup distributing the nodes via on-prem  through the public cloud.

What is a CMON?

Before going to the main topic, let us introduce to you what is CMON. CMON stands for ClusterControl Controller, which is the “primary brain” of ClusterControl. A backend service performing automation, management, monitoring scheduling tasks, and also the HA availability. Data that are collected are stored into the CMON database, for which we're using MySQL compatible databases as the datastore.

The Architectural Setup

Some of you might not have known the capabilities of ClusterControl that it can perform and be set up for high-availability. If you have multiple ClusterControl (or CMON) nodes running, that is possible at no cost. You might be able to run tons of ClusterControl nodes whenever you needed. 

For this setup, we'll have ClusterControl nodes on-top of a ClusterControl in order to create or deploy the database nodes and manage an auto failover whenever a failure occurs, for example. Although you can use MHA, Orchestrator, or Maxscale to manage the auto-failover, but for efficiency and speed, I'll use ClusterControl to do the special things that other tools I have mentioned do not have.

So let's have a look at the diagram for this setup:

The setup based on that diagram shows that on top of three-node CMON, a running CMON (ClusterControl) is on-top of them which will monitor the automatic failover. Then, HAProxy will be able to load balance between the monitored three CMON nodes, wherein one node is located in a separate region hosted in GCP for this blog. You might notice that we didn't include Keepalived, that's because we cannot place a VIP under GCP since it's on a different network.

As you might have noticed, we place a total of three nodes. CMON HA requires that we need at least 3 nodes in order to proceed a voting process or so called quorum. So for this setup, we require that you have at least 3 nodes to have higher availability.

Deploying an On-Prem ClusterControl Nodes

In this section, we do expect that you have already setup or installed your ClusterControl UI which we will use to deploy a three node MySQL Replication cluster using Percona Server.

Let's first create the cluster by deploying a new MySQL Replication as shown below.

Take note that I am using Percona Server 5.7 here, for which the default setup by ClusterControl works efficiently.

Then define the hostname or IP of your nodes,

At this point, we expect that you have already set up a two node Master/Slave replication which is hosted or running on-prem. The screenshot below should show how your nodes will look like:

Setup & Install ClusterControl and Enable CMON HA On The First Node

From this previous blog  ClusterControl CMON HA for Distributed Database High Availability, we have briefly provided the steps on how to do this. Let's go down again and do the steps as stated but for this particular Master/Slave replication setup.

First thing to do, pick one node you want first ClusterControl to be installed (on this setup, I end up installing first on 192.168.70.80 node) and do the steps below.

Step One

Install ClusterControl

$ wget http://www.severalnines.com/downloads/CMON/install-cc

$ chmod +x install-cc

$ sudo ./install-cc   # omit sudo if you run as root

Take note that once you are prompted that a current mysql instance is detected, you need to let ClusterControl use the existing mysqld running since that's one of our goals here for CMON HA and for this setup to use the already setup MySQL.

Step Two

Bind CMON not only to allow via localhost, but also on the specific IP address (since we'll be enabling HA)

## edit /etc/default/CMON  and modify the line just like below or add the line if it doesn't exist

RPC_BIND_ADDRESSES="127.0.0.1,192.168.70.80"

Step Three

Then restart CMON,

service CMON restart

Step Four

Install s9s CLI tools

$ wget http://repo.severalnines.com/s9s-tools/install-s9s-tools.sh

$ chmod 755 install-s9s-tools.sh

$ ./install-s9s-tools.sh

During this installation, the s9s tool will setup an admin user for which you can use when dealing with s9s command, just like enabling CMON HA.

Step Five

Enable the CMON HA

$ s9s controller --enable-CMON-ha

Step Six

Lastly, modify the /etc/my.cnf and add,

slave-skip-errors = 1062

under the [mysqld] section. Once added, do not forget to restart mysql as,

service mysql restart

or

systemctl restart mysql

Currently, this is the limitation we're facing with CMON HA since it tries to insert log entries to the slave but this can be fine for now.

Setup, Install ClusterControl and Enable CMON HA On The Second Node

Simple as that for the first node. Now, on the 2nd node (192.168.70.70),  we need to do the same steps but instead we need to do some adjustments in the steps to make this HA possible.

Step One

Copy the configuration to the 2nd node (192.168.70.70) from first node (192.168.70.80)

$ scp -r /etc/CMON* 192.168.70.70:/etc/

Step Two

In the 2nd node, edit the /etc/CMON.cnf and ensure that the host is correctly configured. e.g.

vi /etc/CMON.cnf

Then assign hostname param as,

hostname=192.168.70.70

Step Three

Install ClusterControl,

$ wget http://www.severalnines.com/downloads/CMON/install-cc

$ chmod +x install-cc

$ sudo ./install-cc   # omit sudo if you run as root

However, skip the installation of CMON (or ClusterControl Controller) once you encounter this line,

=> An existing Controller installation detected!

=> A re-installation of the Controller will overwrite the /etc/CMON.cnf file

=> Install the Controller? (y/N):

The rest, just do as what you've done on the first node such as setting up the hostname, use the existing mysqld running instance, providing the MySQL password, and password for your CMON which must be both has the same password with the first node.

Step Four

Install s9s CLI tools

$ wget http://repo.severalnines.com/s9s-tools/install-s9s-tools.sh

$ chmod 755 install-s9s-tools.sh

$ ./install-s9s-tools.sh

Step Five

Copy the remaining configuration from 1st node to the 2nd node.

$ scp -r ~/.s9s/ 192.168.70.70:/root/

$ scp /etc/s9s.conf 192.168.70.70:/etc/

$ scp /var/www/html/clustercontrol/bootstrap.php 192.168.70.70:/var/www/html/clustercontrol/

Step Six

Install clustercontrol-controller package,

For Ubuntu/Debian,

$ apt install -y clustercontrol-controller

For RHEL/CentOS/Fedora,

$ yum install -y clustercontrol-controller

Step Seven

Copy the /etc/default/CMON file and modify the IP address for the RPC bind address

scp /etc/default/CMON 192.168.70.70:/etc/default

RPC_BIND_ADDRESSES="127.0.0.1,10.0.0.103"

Then restart CMON as follows,

service CMON restart

Step Eight

Modify the /etc/my.cnf and add,

slave-skip-errors = 1062

under the [mysqld] section. Once added, do not forget to restart mysql as,

service mysql restart

or

systemctl restart mysql

Currently, this is the limitation we're facing with CMON HA since it tries to insert log entries to the slave but this can be fine for now.

Step Nine

Finally, check how the CMON HA nodes look like,

[root@node7 ~]#  s9s controller --list --long

S VERSION    OWNER GROUP NAME            IP PORT COMMENT

l 1.7.5.3735 system admins 192.168.70.80   192.168.70.80 9501 Acting as leader.

f 1.7.5.3735 system admins 192.168.70.70   192.168.70.70 9501 Accepting heartbeats.

Total: 2 controller(s)

Deploying Your ClusterControl Node In the Cloud

As we have mentioned earlier, the ideal setup for communication is to encapsulate the packets over the VPN or other means of secure channel. If you have concerns about how to do this, check our previous blog Multi-DC PostgreSQL: Setting Up a Standby Node at a Different Geo-Location Over a VPN for which we have tackled how you can create a simple VPN setup using OpenVPN. 

So in this section, we expect that you have already set up the VPN connection. Now, what we're going to do is add a slave that we're supposed to distribute the availability of CMON into Google Cloud Platform. To do this, just go to Add Replication Slave which can be found by clicking the cluster icon near the right corner. See how it looks like below:

Now, this is how we'll end up with:

Now, since we have a new slave added which is hosted under GCP, you may need to follow again on what we did earlier on the 2nd node. I'll relay you to follow those steps and follow the instructions on how we did on the 2nd node.

Once you have it correctly, you'll end up with the following result:

[root@gnode1 ~]# s9s controller --list --long

S VERSION    OWNER GROUP NAME            IP PORT COMMENT

l 1.7.5.3735 system admins 192.168.70.80   192.168.70.80 9501 Acting as leader.

f 1.7.5.3735 system admins 192.168.70.70   192.168.70.70 9501 Accepting heartbeats.

f 1.7.5.3735 system admins 10.142.0.39     10.142.0.39 9501 Accepting heartbeats.

where in nodes 

  • 192.168.70.80 -  (node8) and is residing in my on-prem
  • 192.168.70.70 - (node7) and is residing in my on-prem
  • 10.142.0.39  - (gnode1) is hosted in GCP and on different region

CMON HA In Action

My colleague Krzysztof Ksiazek already provided the setup for HA using HAProxy here on this blog ClusterControl CMON HA for Distributed Database High Availability - Part Two (GUI Access Setup)

To follow the procedure stated in the blog, ensure you have xinetd and pathlib packages. You can install xinetd and pathlib as follows,

$ sudo yum install -y xinetd python-pathlib.noarch

Ensure also that you have the CMONhachk defined in /etc/services just as below:

[root@node7 ~]# grep 'CMONhachk' /etc/services 

CMONhachk       9201/tcp

and ensure changes and restart xinetd,

service xinetd restart

I'll skip the Keepalived and HAProxy procedure and expect you have set up accordingly. One take away you have to consider on this setup is that using Keepalived cannot be applicable if you are dispersing the VIP from on-prem to the public cloud network because they're totally a different network.

Now, let's see how CMON HA reacts if nodes are down. As shown earlier, node 192.168.70.80 (node8), was acting as a leader just like as shown below:

Wherein the master node database also shows node8 is the master from ClusterControl topology view. Let's try to kill node8 and see how CMON HA proceeds,

As you see, gnode1 (GCP node) is taking over as a leader as node8 goes down. Checking the HAProxy results to the following,

and our ClusterControl nodes shows that node8 is down, whereas GCP node is taking over as the master,

Lastly, accessing my HAProxy node which is running on host 192.168.10.100 at port 81 shows the following UI,

Conclusion

Our ClusterControl CMON HA has been since version 1.7.2 but it has also been a challenge for us since various questions and preferences of how to deploy this such as using MySQL Replication over Galera Cluster

Our CMON HA is not mature yet but it is now ready to cater your high availability needs. Different approaches can be applicable as long as your checks will determine the right node that is up and running.

We encourage you to setup and deploy using CMON HA and let us know how well suits your needs or if the problem persists, please let us know how to help you cater your high availability necessities.

 

by Paul Namuag at March 27, 2020 07:13 PM

MariaDB Foundation

Heads up: Renewing MariaDB Downloads

Time to renew downloads.mariadb.org! We are embarking on a long project. A large part of our user base mainly interacts with us through downloading new versions. […]

The post Heads up: Renewing MariaDB Downloads appeared first on MariaDB.org.

by Kaj Arnö at March 27, 2020 11:43 AM

March 26, 2020

MariaDB Foundation

MariaDB 10.5.2 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.5.2, the second beta release in the MariaDB 10.5 development series.
See the release notes and changelogs for details. […]

The post MariaDB 10.5.2 now available appeared first on MariaDB.org.

by Ian Gilfillan at March 26, 2020 09:55 PM

SeveralNines

Preparing a MySQL or MariaDB Server for Production - Part One

It is extremely important to install and configure a production MySQL server with the necessary packages and tools to smooth-out the operations in the long run. We have seen many cases where troubleshooting or tuning a production server (especially one without public internet access) is commonly difficult because of the lack of necessary tools installed on the server to help identify and solve the problem. 

In this two-part blog series, we are going to show you 9 tips and tricks on how to prepare a MySQL server for production usage from a system administrator perspective. All examples in this blog post are based on our two-node, master-slave MySQL Replication setup running on CentOS 7.

Install Essential Packages

After the installation of MySQL or MariaDB client and server packages, we need to prepare the MySQL/MariaDB server with all necessary tools to cope with all the administration, management and monitoring operations that are going to happen on the server. If you are planning to lock down the MySQL server in production, it will be a bit harder to install them all manually without the Internet connection. 

Some of the important packages that should be installed on the MySQL/MariaDB server for Linux:

  • Percona Xtrabackup/MariaDB Backup - Non-blocking physical backup of the database server.
  • ntp/ntpdate - Sync server's time.
  • pv - Monitor data through a pipeline, can also be used for throttling.
  • socat or netcat- Data streaming tool, good for streaming backup.
  • net-tools - A collection of network debugging tools for Linux.
  • bind-utils - A collection of DNS debugging tools for Linux.
  • sysstat - A collection of performance monitoring tools for Linux.
  • telnet - Telnet client to check service reachability.
  • mailx/mailutils - MTA client.
  • openssl - Toolkit for the Transport Layer Security (TLS) and Secure Sockets Layer (SSL) protocols.
  • unzip - Uncompress tool.
  • htop - Host monitoring tool.
  • innotop - MySQL monitoring tool.
  • vim - Text editor with syntax highlighting (or any preferred text editor).
  • python-setuptools - Python package manager.
  • lm_sensors/ipmitool - To check server component's temperature. Bare-metal server only.

Note that some of the suggested packages are only available in non-default package repositories like EPEL for CentOS. Therefore, for YUM-based installation:

$ yum install epel-release
$ yum install -y wget ntp pv socat htop innotop vim mailx bind-utils net-tools telnet sysstat openssl python-setuptools lm_sensors ipmitool

While for APT-based installation:

$ apt-get install ntp pv socat htop innotop vim easy_install mailutils bind-utils sysstat net-tools telnet openssl lm_sensors ipmitool

For MySQL command line interface, we can use another tool other than the standard "mysql" command line client like mycli, with auto-completion and syntax highlighting. To install the package, we can use pip (Python package manager):

$ pip install mycli

With mycli, one can reduce the human-error vector with a better visualization when dealing with production server, as shown in the following screenshot:

Meaningful Shell Prompt

This part looks unnecessary in the first place, but it is probably going to save you from making silly mistakes in production. As a human, we are prone to make errors especially when running destructive commands during an intense moment, for example when the production server is down.

Take a look at the following screenshot. By default, the bash PS1 prompt (primary prompt) looks pretty dull:

A good PS1 prompt should provide distinctful information to make SysAdmins more aware of the environment, server and current path that they are currently dealing with. As a result, one would be more careful and always know whether it's in the right path/server/user to execute the command.

To achieve this, find the line that describing PS1 (primary prompt) configuration, commonly in /etc/bashrc line 41:

  [ "$PS1" = "\\s-\\v\\\$ " ] && PS1="[\u@\h \W]\\$ "

And replace it with this line:

  [ "$PS1" = "\\s-\\v\\\$ " ] && PS1="[\[\e[36m\]\u\[\e[m\]@\[\e[32m\]\h\[\e[m\]\[\e[31;47m\]Production\[\e[m\]: \[\e[33m\]\w\[\e[m\]]$ "

Log out from the terminal and re-login again. You should see something like this in the terminal now:

As shown in the screenshot above, the current user (blue), server's hostname (green), Production tier (bold in red colour with white background), together with the full path of the current directory (yellow) provides a better summary of the current session where the important information are easily distinguishable with different colours.

You can use this free online tool to customize your bash prompt, to suit your taste.

MOTD

If you are managing a database cluster with multiple roles like MySQL or MariaDB replication, it's common to always have this anxious feeling when directly administering one of the hosts because we need to perform extra checks to verify that the node that we are in is the one that we really want to administer. Replication topology tends to become more complex as your database cluster scales out and there could be many roles in a cluster like intermediate master, binlog server, backup master with semi-sync replication, read-only slaves and also backup verification server.

It will be way better if we can get a summary of the database state whenever we are in that particular server, just to give us a heads up on what we are going to deal with. We can utilize Linux's Message of the Day (MOTD) to automate this behaviour whenever we log into the server. Using the default /etc/motd is only good for static content, which is not what we really want if we want to report the current state of a MySQL server.

To achieve similar result, we can use a simple Bash script to produce a meaningful MOTD output to summarize our MySQL/MariaDB server, for example:

$ vim ~/.motd.sh
#!/bin/bash
# Auto-generate MOTD for MySQL/MariaDB Replication
# .motd.sh, to be executed under ~/.bash_profile

#####
# Preferred role of the node, pick one
#PREFER_ROLE='Slave'
PREFER_ROLE='Master'
#####

HOSTNAME=$(hostname)
UPTIME=$(uptime -p)
MYSQL_COMMAND='mysql --connect-timeout=2 -A -Bse'
MYSQL_READONLY=$(${MYSQL_COMMAND} 'SHOW GLOBAL VARIABLES LIKE "read_only"' | awk {'print $2'})
TIER='Production'
MAIN_IP=$(hostname -I | awk {'print $1'})
CHECK_MYSQL_REPLICATION=$(${MYSQL_COMMAND} 'SHOW SLAVE STATUS\G' | egrep 'Slave_.*_Running: Yes$')
MYSQL_MASTER=$(${MYSQL_COMMAND} 'SHOW SLAVE STATUS\G' | grep Master_Host | awk {'print $2'})
# The following requires show_compatibility_56=1 for MySQL 5.7 and later
MYSQL_UPTIME=$(${MYSQL_COMMAND} 'SELECT TIME_FORMAT(SEC_TO_TIME(VARIABLE_VALUE ),"%Hh %im")  AS Uptime FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME="Uptime"')

# coloring
bold=$(tput bold)
red=$(tput setaf 1)
green=$(tput setaf 2)
normal=$(tput sgr0)

MYSQL_SHOW=1
if [ $MYSQL_READONLY == 'ON' ]; then
        CURRENT_MYSQL_ROLE='Slave'
        if ${MYSQL_COMMAND} 'SHOW SLAVE STATUS\G' | egrep 'Slave_.*_Running: Yes$' &>/dev/null ; then
                lag=$(${MYSQL_COMMAND} 'SHOW SLAVE STATUS\G' | egrep 'Seconds_Behind_Master:' | awk {'print $2'})
                if [ $lag -eq 0 ]; then
                        REPLICATION_STATUS="${green}Healthy  "
                else
                        if [ $lag == 'NULL' ]; then
                                REPLICATION_STATUS=${red}Unhealthy
                        else
                                REPLICATION_STATUS="${red}Lagging ${lag}s"
                        fi
                fi
        else
                REPLICATION_STATUS=${red}Unhealthy
        fi

elif [ $MYSQL_READONLY == 'OFF' ]; then
        CURRENT_MYSQL_ROLE='Master'
        SLAVE_HOSTS=$(${MYSQL_COMMAND} 'SHOW SLAVE HOSTS' | awk {'print $1'})
else
        MYSQL_SHOW=0
fi

if [ $TIER == 'Production' ]; then
        TIER=${green}Production
fi

if [ $PREFER_ROLE == $CURRENT_MYSQL_ROLE ]; then
        MYSQL_ROLE=${green}$CURRENT_MYSQL_ROLE
else
        MYSQL_ROLE=${red}$CURRENT_MYSQL_ROLE
fi

echo
echo "HOST INFO"
echo "========="
echo -e "  Hostname       : ${bold}$HOSTNAME${normal} \t Server Uptime  : ${bold}$UPTIME${normal}"
echo -e "  IP Address       : ${bold}$MAIN_IP${normal} \t Tier           : ${bold}$TIER${normal}"
echo
if [ $MYSQL_SHOW -eq 1 ]; then
        echo "MYSQL STATE"
        echo "==========="
        echo -e "  Current role      : ${bold}$MYSQL_ROLE${normal} \t\t Read-only      : ${bold}$MYSQL_READONLY${normal}"
        echo -e "  Preferred role    : ${bold}$PREFER_ROLE${normal} \t\t DB Uptime      : ${bold}$MYSQL_UPTIME${normal}"
        if [ $CURRENT_MYSQL_ROLE == 'Slave' ]; then
                echo -e "  Replication state : ${bold}$REPLICATION_STATUS${normal} \t Current Master : ${bold}$MYSQL_MASTER${normal}"
        else
                echo -e "  Slave Hosts(s) ID : "
                for i in $SLAVE_HOSTS; do
                        echo -e "      - ${bold}$i${normal} \t"; done
        fi
        echo
fi

Choose one of the MySQL roles, either a master or a slave on line 8 or 9 and save the script. This script requires MySQL option file to store the database user credentials, so we have to create it first:

$ vim ~/.my.cnf

And add the following lines:

[client]
user=root
password='YourRootP4ssw0rd'

Replace the password part with the actual MySQL root password. Then, apply executable permission to the script:

$ chmod 755 ~/.motd.sh

Test the executable script whether it produces the correct output or not:

$ ~/.motd.sh

If the output looks good (no errors or warnings), add the script into ~/.bash_profile so it will be automatically loaded when a user logs in:

$ whoami
root
$ echo '~/.motd.sh' >> ~/.bash_profile

Re-login the terminal and you should see something like this on the master:

While on the slave, you should see something like this:

Note that this script is specifically written for a simple MySQL/MariaDB one-tier master-slave replication. You probably have to modify the script if you have a more complex setup, or you want to use other MySQL clustering technology like Galera Cluster, Group Replication or NDB Cluster. The idea is to retrieve the database node status and information right when we logged in so we are aware of the current state of the database server that we are working on.

Sensors and Temperature

This part is commonly being ignored by many SysAdmins. Monitoring the temperatures is crucial as we do not want to get a big surprise if the server behaves unexpectedly when overheating. A physical server commonly consists of hundreds of electronic parts glued together in a box and are sensitive to temperature changes. One failed cooling fan could spike a CPU temperature to hit its hard limit, which eventually causes the CPU clock to be throttled down and affects the data processing performance as a whole.

We can use the lm-sensors package for this purpose. To install it, simply do:

$ yum install lm-sensors # apt-get install lm-sensors for APT

Then run the sensors-detect program to automatically determine which kernel modules you need to load to use lm_sensors most effectively:

$ sensors-detect

Answers all questions (commonly just accept all the suggested answers). Some hosts like virtual machines or containers do not support this module. Sensors really need to be at the hosts (bare-metal) level. Check out this list for more information.

Then, run the sensors command:

$ sensors
i350bb-pci-0203
Adapter: PCI adapter
loc1:         +53.0°C (high = +120.0°C, crit = +110.0°C)

power_meter-acpi-0
Adapter: ACPI interface
power1:        4.29 MW (interval =   1.00 s)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +55.0°C (high = +85.0°C, crit = +95.0°C)
Core 0:        +45.0°C (high = +85.0°C, crit = +95.0°C)
Core 1:        +51.0°C (high = +85.0°C, crit = +95.0°C)
Core 2:        +47.0°C (high = +85.0°C, crit = +95.0°C)
Core 3:        +51.0°C (high = +85.0°C, crit = +95.0°C)
Core 4:        +49.0°C (high = +85.0°C, crit = +95.0°C)
Core 5:        +48.0°C (high = +85.0°C, crit = +95.0°C)
Core 8:        +47.0°C (high = +85.0°C, crit = +95.0°C)
Core 9:        +49.0°C (high = +85.0°C, crit = +95.0°C)
Core 10:       +48.0°C (high = +85.0°C, crit = +95.0°C)
Core 11:       +48.0°C (high = +85.0°C, crit = +95.0°C)
Core 12:       +46.0°C (high = +85.0°C, crit = +95.0°C)
Core 13:       +49.0°C (high = +85.0°C, crit = +95.0°C)

coretemp-isa-0001
Adapter: ISA adapter
Package id 1:  +53.0°C (high = +85.0°C, crit = +95.0°C)
Core 0:        +46.0°C (high = +85.0°C, crit = +95.0°C)
Core 1:        +48.0°C (high = +85.0°C, crit = +95.0°C)
Core 2:        +47.0°C (high = +85.0°C, crit = +95.0°C)
Core 3:        +45.0°C (high = +85.0°C, crit = +95.0°C)
Core 4:        +46.0°C (high = +85.0°C, crit = +95.0°C)
Core 5:        +47.0°C (high = +85.0°C, crit = +95.0°C)
Core 8:        +47.0°C (high = +85.0°C, crit = +95.0°C)
Core 9:        +45.0°C (high = +85.0°C, crit = +95.0°C)
Core 10:       +45.0°C (high = +85.0°C, crit = +95.0°C)
Core 11:       +46.0°C (high = +85.0°C, crit = +95.0°C)
Core 12:       +46.0°C (high = +85.0°C, crit = +95.0°C)
Core 13:       +46.0°C (high = +85.0°C, crit = +95.0°C)

The above result shows the overall CPU temperature, together with its every CPU core. Another tool that we can use to see the overall state of the server components is ipmitool. To install, simply do:

$ yum -y install ipmitool

By running the following command, we can tell the overall state of the physical components in the server:

$ ipmitool sdr list full
Inlet_Temp       | 20 degrees C   | ok
PCIe_Inlet_Temp  | 37 degrees C   | ok
Outlet_Temp      | 20 degrees C   | ok
CPU0_VR_Temp     | 39 degrees C   | ok
CPU1_VR_Temp     | 41 degrees C   | ok
CPU0_Temp        | 55 degrees C   | ok
CPU1_Temp        | 52 degrees C   | ok
PCH_Temp         | 58 degrees C   | ok
DIMMG0_Temp      | 35 degrees C   | ok
DIMMG1_Temp      | 32 degrees C   | ok
PSU0_Temp        | 0 degrees C    | ok
PSU1_Temp        | 0 degrees C    | ok
SYS_3.3V         | 3.30 Volts     | ok
SYS_5V           | 5 Volts        | ok
SYS_12V          | 12.10 Volts    | ok
CPU0_VCORE       | 1.79 Volts     | ok
CPU1_VCORE       | 1.79 Volts     | ok
CPU0_DDR_VDD     | 1.23 Volts     | ok
CPU1_DDR_VDD     | 1.23 Volts     | ok
SYS_FAN1_Speed   | 4018 RPM   | ok
SYS_FAN2_Speed   | 4116 RPM   | ok
SYS_FAN3_Speed   | 4116 RPM   | ok
SYS_FAN4_Speed   | 4116 RPM   | ok
SYS_FAN5_Speed   | 4018 RPM   | ok
SYS_FAN6_Speed   | 4116 RPM   | ok
SYS_FAN7_Speed   | 4018 RPM   | ok
SYS_FAN8_Speed   | 4116 RPM   | ok
SYS_FAN9_Speed   | 4018 RPM   | ok
SYS_FAN10_Speed  | 4116 RPM   | ok
SYS_FAN11_Speed  | 4116 RPM   | ok
SYS_FAN12_Speed  | 4116 RPM   | ok
SYS_FAN13_Speed  | 4116 RPM   | ok
SYS_FAN14_Speed  | 4214 RPM   | ok
Airflow_rate     | 16 CFM     | ok
PSU1_PIN         | 0 Watts    | ok
PSU2_PIN         | 0 Watts    | ok
PSU1_POUT        | 0 Watts    | ok
PSU2_POUT        | 0 Watts    | ok
PSU1_IIN         | 0 Amps     | ok
PSU2_IIN         | 0 Amps     | ok
PSU1_VIN         | 0 Volts    | ok
PSU2_VIN         | 0 Volts    | ok
CPU_Power        | 63 Watts   | ok
MEM_Power        | 8 Watts    | ok
Total_Power      | 0 Watts    | ok
BP_Power         | 8 Watts    | ok
FAN_Power        | 6 Watts    | ok
MB_Power         | 0 Watts    | ok

The list is long but is self-explanatory and you should be able to oversee the overall server components' state. There could be cases where some of the fans are not running at full speed which then increase the CPU temperature. Hardware replacement might be required to fix the problem.

Note that the Intelligent Platform Management Interface (IPMI) kernel module requires Baseboard Management Controller (BMC) to be enabled on the motherboard. Use dmesg to verify if it is available:

$ dmesg | grep -i bmc
[    8.063470] ipmi_si IPI0001:00: Found new BMC (man_id: 0x000000, prod_id: 0x02f3, dev_id: 0x20)

Otherwise, check the server's BIOS setting if this controller is disabled.

That's it for now. Part two of this blog series will cover the remaining 5 topics like backup tool configuration, stress tests, and server lock down.

 

by ashraf at March 26, 2020 05:43 PM

March 25, 2020

SeveralNines

Database Load Balancing on Google Cloud Platform (GCP) Using HAProxy

Using a Load Balancer is a good idea for any database technology, as you can redirect applications to the available or healthy database nodes and even distribute the traffic across multiple servers to improve performance. This is not only useful on-prem but also in a cloud environment. In this blog, we’ll see how to deploy and configure a new database cluster with HAProxy on the Google Cloud Platform from scratch.

Creating the VM on Google Cloud

For this example, we’ll assume that you have a Google Cloud account created.

You can deploy your virtual machines directly from ClusterControl. Go to the deploy section and select “Deploy in the Cloud”.

Specify vendor and version for your new cluster.

Add the number of nodes, cluster name, and database information.

Choose the cloud credentials, in this case, your Google Cloud account. If you don’t have your account added in ClusterControl, you can follow our documentation for this task.

Now you can specify the virtual machine configuration, like operating system, size, and region.

ClusterControl will create the virtual machines, install the software, and configure it, all in the same job and in an unattended way.

You can monitor the creation process in the ClusterControl activity section. When it finishes, you will see your new cluster in the ClusterControl main screen.

Deploying HAProxy in Google Cloud

Note: To deploy it, first, you need to create the VM in the Google Cloud Platform as the virtual machine creation is not implemented for the ClusterControl load balancer deployment yet (it will be available soon).

Now you have your new cluster up and running, go to ClusterControl -> Select Cluster -> Cluster Actions -> Add Load Balancer.

Here you must add the information that ClusterControl will use to install and configure your HAProxy load balancer.

The information that you need to introduce is:

Action: Deploy or Import.

Server Address: IP Address for your HAProxy server.

Listen Port (Read/Write): Port for read/write mode.

Listen Port (Read-Only): Port for read-only mode.

Policy: It can be:

  • leastconn: The server with the lowest number of connections receives the connection.
  • roundrobin: Each server is used in turns, according to their weights.
  • source: The source IP address is hashed and divided by the total weight of the running servers to designate which server will receive the request.

Install for read/write splitting: For master-slave replication.

Build from Source: You can choose Install from a package manager or build from source.

And you need to select which servers you want to add to the HAProxy configuration and some additional information like:

Role: It can be Active or Backup.

Include: Yes or No.

Connection address information.

Also, you can configure Advanced Settings like Admin User, Backend Name, Timeouts, and more.

When you finish the configuration and confirm the deployment, you can follow the progress in the Activity section on the ClusterControl UI.

And when this finishes, you can go to ClusterControl -> Nodes -> HAProxy node, and check the current status.

You can also monitor your HAProxy servers from ClusterControl checking the Dashboard section.

Conclusion

A Load Balancer can help you to handle your database traffic by balancing it between multiple servers. It is also useful to improve your high availability environment by performing failover tasks. ClusterControl can help you too with different features like auto-recovery, monitoring, deployment, and even more, and it can manage on-prem, cloud or mixed environments with different database technologies at the same time.

by Sebastian Insausti at March 25, 2020 05:43 PM

Henrik Ingo

Paper review: Strong and Efficient Consistency with Consistency-Aware Durability

Mark Callaghan pointed me to a paper for my comments: Strong and Efficient Consistency with Consistency-Aware Durability by Ganesan, Alagappan and Arpaci-Dusseau ^2. It won Best Paper award at the Usenix Fast '20 conference. The paper presents a new consistency level for distributed databases where reads are causally consistent with other reads but not (necessarily) with writes.

My comments are mostly on section 2 of the paper, which describes current state of the art and a motivation for their work.

read more

by hingo at March 25, 2020 03:23 PM

March 24, 2020

SeveralNines

How to Restore a Single Table Using Percona Xtrabackup?

Backups are the means of protecting from data loss - should something happen, you can easily restore it from the backup. You cannot predict what part of the data will have to be restored - it can be everything or just a subset. Typically you want to have a full backup to ensure you can handle a total data loss scenario but what would happen if only a single table had been dropped? Can we do a partial restore if we used Xtrabackup to create our safety data copy? Let’s explore this scenario in a short blog post.

Partial Restore Using Xtrabackup

The main thing you have to keep in mind before you perform a partial restore with Xtrabackup is that this will break the consistency of the node where you would restore the backup. This is extremely important in replication or Galera setups where the consistency of the cluster is paramount as otherwise replication (standard or Galera) may break. 

How to approach this problem? It all depends on your environment. One of the solutions could be to use a separate host to restore missing data and then proceed with regular logical backup, something that you can restore on the live cluster without introducing data inconsistency. 

Alternatively, if you can afford to stop the whole cluster, you can perform the restore on all of the nodes in the cluster - this as well will result in a consistent state of the data across the whole environment. We won’t go into details how to proceed because, as we stated, this may depend on your business requirements, ability to schedule a downtime and so on. 

For now let’s take a look how to restore a single table, not focusing where you would do that.

We are assuming that a full backup created by Xtrabackup is ready. We have a simple environment of asynchronous replication with one master and one slave. We use Percona Server 8.0 therefore we ensured we have percona-xtrabackup-80 installed.

As can be seen, the backup has been created:

root@vagrant:~# ls -alh /backup/

total 149M

drwxr-xr-x  6 root root 4.0K Mar 13 12:24 .

drwxr-xr-x 25 root root 4.0K Mar 13 12:23 ..

-rw-r-----  1 root root 479 Mar 13 12:24 backup-my.cnf

-rw-r-----  1 root root 195 Mar 13 12:24 binlog.000005

-rw-r-----  1 root root   16 Mar 13 12:24 binlog.index

-rw-r-----  1 root root 5.8K Mar 13 12:24 ib_buffer_pool

-rw-r-----  1 root root 100M Mar 13 12:24 ibdata1

drwxr-x---  2 root root 4.0K Mar 13 12:24 mysql

-rw-r-----  1 root root 24M Mar 13 12:24 mysql.ibd

drwxr-x---  2 root root 4.0K Mar 13 12:24 performance_schema

drwxr-x---  2 root root 4.0K Mar 13 12:24 sbtest

drwxr-x---  2 root root 4.0K Mar 13 12:24 sys

-rw-r-----  1 root root 12M Mar 13 12:24 undo_001

-rw-r-----  1 root root 12M Mar 13 12:24 undo_002

-rw-r-----  1 root root   63 Mar 13 12:24 xtrabackup_binlog_info

-rw-r-----  1 root root   99 Mar 13 12:24 xtrabackup_checkpoints

-rw-r-----  1 root root 540 Mar 13 12:24 xtrabackup_info

-rw-r-----  1 root root 8.5K Mar 13 12:24 xtrabackup_logfile

-rw-r-----  1 root root 248 Mar 13 12:24 xtrabackup_tablespaces

Now, if we want to restore it, we have to prepare the backup - it’s a standard process for Xtrabackup. There is one major difference though in a way we will prepare it. We will use --export flag:

root@vagrant:~# xtrabackup --prepare --export --target-dir=/backup/

Now we can restore a particular table following this process:

  1. We have to create the table using exactly the same schema as it used to have when the backup has been taken.
  2. We have to discard its tablespace
  3. We will copy the tablespace from the backup along with its *.cfg file
  4. We will import new tablespace

Let’s assume one of the tables has been accidentally truncated:

mysql> SELECT COUNT(*) FROM sbtest.sbtest11\G

*************************** 1. row ***************************

COUNT(*): 0

1 row in set (0.00 sec)

In this case we already have the table with a proper schema in place and we can proceed to step 2):

mysql> ALTER TABLE sbtest.sbtest11 DISCARD TABLESPACE;

Query OK, 0 rows affected (0.02 sec)

Now we have to copy the data from the backup:

root@vagrant:~# cp /backup/sbtest/sbtest11.* /var/lib/mysql/sbtest/

root@vagrant:~# chown mysql.mysql /var/lib/mysql/sbtest/sbtest11.*

Finally, we can import the restored tablespace:

mysql> ALTER TABLE sbtest.sbtest11 IMPORT TABLESPACE;

Query OK, 0 rows affected (0.48 sec)



mysql> SELECT COUNT(*) FROM sbtest.sbtest11\G

*************************** 1. row ***************************

COUNT(*): 100000

1 row in set (0.05 sec)

As you can see, the contents of the table have been restored. Now, based on how we approached the whole problem, we can either repeat this process on all of the nodes in the cluster or we can use mysqldump or SELECT … INTO OUTFILE to extract this data and then load it on the live cluster.

Please keep in mind that Xtrabackup allows as well to take a backup of a single database or single table. This is another feature, loosely tied to what we have just discussed - it is not required to create a backup of a single table to be able to restore it. What is required though is the schema - you may want to schedule backups of the schema (no data is required) using mysqldump that will go along with your xtrabackup backups. You may find them very handy if your schema changes often.

How to Restore a Single Table Using ClusterControl?

ClusterControl, as of now, does not come with an ability to restore a single table out of full backup. You can schedule partial backups with ClusterControl though. Then you can use those backups and restore them on a separate host and then extract the data and apply it on the live cluster.

As you can see on the screenshot, you can decide which database you want to backup and then list the tables (or decide that you want to include all of them) you would like to backup. You can setup a backup schedule where you would backup individual tables, one at a time. You could as well design the schedule on a schema-per-schema basis. Once you have a backup ready, you can restore it on a standalone host:

Then, we will have to decide what host it is. You also have to make sure this host can be reached from ClusterControl node using SSH.

We want ClusterControl to setup software, provision it with data and then keep the server running after the backup has been restored.

We should review the options we took and then confirm that the backup should be restored.

Job has been started and all we need to do is to wait for it to complete.

Once the job has completed, you can access the backup verification server, dump the missing data and restore it on the live cluster.

 

by krzysztof at March 24, 2020 06:59 PM

March 23, 2020

SeveralNines

The Battle of the NoSQL Databases - Comparing MongoDB and Oracle NoSQL

Modern IT needs to have a non-relational, dynamic schema (meaning no requirements for Joins Statements Queries) to provide support for Big Data/real-time applications. NoSQL databases were created with the notion of improving data processing performance and tackling the ability to scale out to overcome distributed database load using the concept of multiple hosts have won the new generation demand for data processing. 

Besides providing the essential support for various data models and scripting languages, MongoDB also allows easy to start with the process for the developers. 

NoSQL database open the doors to... 

  • Text-based protocols using a scripting language (REST and, JSON, BSON)
  • Indeed minimal cost to generate, store and transport data
  • Support huge amounts of data processing. 
  • Increased write performance
  • Not required to perform object-relational mapping and normalization process
  • No rigid controls with referential integrity rules
  • Reducing maintenance cost with database administrators 
  • Lowering expansion cost
  • Fast key-value access
  • Advancing the support for machine learning and intelligence 

MongoDB Market Acceptance 

The modern needs for Big Data Analytics and modern applications play a crucial role in the need to improve the lifecycle of data processing, with no expectations for hardware expansion and cost increase. 

If you are planning for a new application, and you want to choose a database, arriving at the right decision with many database options in the market can be a complicated process. 

The DB-engine popularity ranking shows that MongoDB stands at no.1 compared to Oracle NoSQL (which placed at No. 74). The trend, however, is indicating that something is changing. The need for many cost-effective expansions goes hand in hand with much simpler data modelling, and administration is transforming how developers would want to consider the best for their systems. 

According to Datanyze market share information to-date there are about 289 websites that are running on Oracle Nosql with a market share of 11%, where else MongoDB has a complete 12,185 website with a market share of 4.66%. These impressive numbers indicate that there is a bright future for MongoDB. 

NoSQL Data Modeling 

Data modelling requires understanding of...

  • The types of your current data. 
  • What are the types of data that you are expecting in the future?
  • How is your application gaining access to required data from the system?
  • How is your application going to fetch required data for processing?

The exciting thing for those who have always followed the Oracle way of creating schemas, then storing the data, MongoDB allows creating the collection along with the document. This means the creation of collections is not a must-have to exist before the document creation takes place, making MongoDB much appreciated for its flexibility. 

In Oracle NoSQL, however,  the table definition has to be created first, after which you can continue to create the rows.  

The next cool thing is that MongoDB does not imply strict rules on schema and relations implementation, which gives you the freedom for continuous improvement of the system without fearing much on the need to ensure tight schematic design. 

Let’s look at some of the comparisons between MongoDB and Oracle NoSQL.

Comparing NoSQL Concepts in MongoDB and Oracle

NoSQL Terminologies

MongoDB 

Oracle NoSQL

Facts

Collection

Table / View

Collection / table act as the storage container; they are similar but not identical.

Document

Row

For MongoDB, data stored in a collection, in the form of documents and Fields. 

 

For Oracle NoSQL, a table is a collection of rows, where each row holds a data record. Each table row consists of key and data fields, which are defined when a table is created.

Field

Column

Index

Index

Both databases use an index to improve the speed of search carried out in the database.

 

Document Store and Key-Value Store 

Oracle NoSQL provides a storage system that stores values indexed by a key; this concept is viewed as the least complex model as the datasets consist of an indexed key-value. The records organised using major keys and minor keys. 

The major key can be viewed as the object pointer and the minor key as the fields in the record.  Efficient Search for the data is made possible with the use of the key as the mechanism to access the data just like a Primary key. 

MongoDB extends key-value pairs. Each document has a unique key, which serves the purpose to retrieve the document. Documents are known as dynamic schema, as the collections in a document do not need to have the same set of fields. A collection can have a common field with different types of data. These attributes lead the document data model to map directly to support the modern object-oriented languages.

MongoDB 

Oracle NoSQL 

Document store

Example: 

Oracle NoSQL Key-value Store

Key-value store

Example: 

MongoDB- Document Store format
 

BSON and JSON

Oracle NoSQL uses JSON as a standard data format to transmit (data + attribute-value pairs). On the other hand MongoDB uses BSON. 

MongoDB

Oracle NoSQL

BSON 

JSON 

Binary JSON - binary data format - induces faster processing

Javascript Object Notation - standard format.    Much slower processing compared to BSON.

Characteristics :

  • BSON is not human-readable format
  • Lightweight
  • Traversable
  • Efficient. 
  • Additional datatype: BinData and Date datatypes

Characteristics:

  • Human-readable and writable format
  • Lightweight
  • Skim through all the content
  • Text-based data interchange format
  • Language independent.
 

BSON is not in a human-readable text, unlike JSON. BSON stands for binary-encoded serialization of JSON like data, mainly used for data storage and a transfer format with MongoDB. BSON data format consists of a list of ordered elements containing a field name (string), type, and value. As for the data types BSON supports, all the datatypes commonly found in JSON and includes two additional datatypes (Binary Data and Date). Binary data or known as BinData that is less than 16MB can be stored directly into MongoDB documents. BSON is said to be consuming more space than JSON data documents. 

There are two reasons why MongoDB consumes more space as compared to Oracle NoSQL: 

  • MongoDB achieved the objective of being able to traverse fast, enabling the option to traverse fast requires the BSON document to carry additional metadata (length of string and subobjects). 
  • BSON design can encode and decode fast. For example, integers are stored as 32 (or 64) bit integers, to eliminate parsing to and from the text. This process uses more space than JSON for small integers but is much faster to parse.

Data Model Definition

MongoDB Collection Statement

Create a collection

db.createCollection("users")

Creating a collection with an automatic _id

db.users.insert

( {
    User_id: "U1",
    First_name: "Mary"                  
    Last_name : "Winslet",  

    Age       : 15

    Contact   : {

               Phone: "123-456-789"

               Email: "mary@example.com"  

                }

   access  : {

              Level:5,

              Group:"dev"

             }            

})

MongoDB allows the related pieces of information in the same database record to be embedded.  Data model Design

Oracle NoSQL Table Statement

Using SQL CLI to setup namespace: 

Create namespace newns1; 

Using namespace to associate tables and child table

news1:users

News1:users.access

Create Table with an IDENTITY using:

Create table newns1.user (

idValue INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1 MAXVALUE 10000), 

User_id String,

First_name String,

Last_name String, 

Contact Record (Phone string,         

                Email string),

Primary key (idValue));

Create Table using SQL JSON: 

Create table newns1.user (

idValue INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1 MAXVALUE 10000),

User_profile JSON, 

Primary Key (shard(idValue),User_id));

Rows for User table: type JSON

{

  "id":U1,

  "User_profile" : {

     "First_name":"Mary",

     "Lastname":"Winslet",

     "Age":15,

     "Contact":{"Phone":"123-456-789",                   

     "Email":"mary@example.com"

                   }

}

Based on the Data Definitions above, MongoDB allows different methods for schema creation. Collection can be defined explicitly or during the first insert of the data into the document. When creating a collection, you can define an objectid. Objectid is the primary key for MongoDB documents. Objectid is a 12-byte binary BSON type that contains 12 bytes generated by MongoDB drivers and the server using a default algorithm. MongoDB objectid is useful and serves the purpose to sort the document created in a specific collection.

Oracle NoSQL does have several ways to start defining tables. If you are using the Oracle SQL CLI by default, new table creation will be placed in sysdefault until you decide to create a new namespace to associate a set of new tables with it. The above example demonstrates the new namespace “ns1” created, and the user table is associated with the new namespace. 

Besides identifying the primary key, Oracle NoSQL also uses the IDENTITY column to auto-increment a value each time you add a row. The IDENTITY value is auto-generated and must be an Integer, long or number data type. In Oracle NoSQL, IDENTITY associates with the Sequence Generator similar to the concept of objectid with MongoDB. As Oracle NoSQL allows the IDENTITY key to be used as the primary key. If you are considering IDENTITY key as the primary key, this is where careful consideration is required as it may have an impact over the insertion of data and the update process takes place.  

MongoDB and Oracle NoSQL table/collection level definition show how the ‘contact’ information is embedded into the same single structure without requiring additional schema definition. The benefit of embedding a dataset is that no further queries would be necessary to retrieve the embedded dataset.  

If you are looking to maintain your system in a simple form, MongoDB provides the best option to retain the data documents with less complication. At the same time, MongoDB provides the capabilities to deliver the existing complex data model from relational schema using schema validation tool.

Oracle NoSQL provides the capabilities to use SQL, like query language with DDL and DML, which requires much less effort for users who have some experience with the use of Relation Database systems.  

MongoDB shell uses Javascript, and if you are not comfortable with the language or with the use of mongo shell, then the best fit for the process is to opt to use an IDE tool. The top 5 MongoDB IDEtools in 2020,  like studio 3T, Robo 3T, NoSQLBooster, MongoDB Compass and Nucleon Database Master will be helpful to assist you in creating and managing complex queries with the use of aggregation features. 

Performance and Availability

As the MongoDB data structure model uses documents and collections, using BSON data format for processing a huge amount of data becomes much faster compared to Oracle NoSQL. While some consider querying data with SQL is a more comfortable pathway for many users, the capacity becomes an issue. When we have a huge amount of data to support, the need for increased throughput and followed by the use of SQL to Design complex queries, these processes are asking us to relook at the server capacity and cost increase over time. 

Both MongoDB and Oracle NoSQL provide sharding and replication features. Sharding is a process that allows the dataset and the overall processing load to be distributed across multiple physical partitions to increase processing (read/write) speed. The implementation of shard with oracle requires you to have prior information on how sharding keys work. The reason behind the pre-planning process is due to the need of having to implement the shard key at the schema initiation level.

The implementation of shard with MongoDB gives the room for you to work on your dataset first to identify the potential right shard key based on query patterns before implementation. As the sharding process includes data replication, MongoDB has a reputation for fast data replication as well. Replication takes care of fault tolerance of having to have all data in a single server. 

Conclusion 

What makes MongoDB preferred over Oracle NoSQL is that it is in binary format and its inborn characteristics of lightweight, traversable, and efficient. This allows you to support the advancing modern application in the area of machine learning and artificial intelligence. 

MongoDB characteristics enable the developers to work much more confidently to build modern applications faster. The MongoDB data model allows the processing of huge amounts of unstructured data with an improved speed that is well thought off compared to Oracle NoSQL. Oracle NoSQL wins when it comes to tools it has to offer and possible options to create data models. However, it is essential to make sure developers and designers can learn and adapt to technology fast, which is not the case with Oracle NoSQL.

by GayathriMageswaran at March 23, 2020 07:27 PM

Oli Sennhauser

innodb_deadlock_detect - Rather Hands off!

Recently we had a new customer who has had from time to time massive database problems which he did not understand. When we reviewed the MySQL configuration file (my.cnf) we found, that this customer had disabled the InnoDB Deadlock detection (innodb_deadlock_detect).

Because we have advised against doing this so far, but I never stumbled upon this problem in practice, I have investigated a bit more about the MySQL variable innodb_deadlock_detect.

The MySQL documentation tells us the following [1]:

Disabling Deadlock Detection
On high concurrency systems, deadlock detection can cause a slowdown when numerous threads wait for the same lock. At times, it may be more efficient to disable deadlock detection and rely on the innodb_lock_wait_timeout setting for transaction rollback when a deadlock occurs. Deadlock detection can be disabled using the innodb_deadlock_detect configuration option.

And about the parameter innodb_deadlock_detect itself [2] itself:

This option is used to disable deadlock detection. On high concurrency systems, deadlock detection can cause a slowdown when numerous threads wait for the same lock. At times, it may be more efficient to disable deadlock detection and rely on the innodb_lock_wait_timeout setting for transaction rollback when a deadlock occurs.

The problem is, that every time, when MySQL is doing a (Row) Lock or a Table Lock, it checks, if the Lock causes a Deadlock. This check is quite expensive. By the way: The feature disabling InnoDB Deadlock detection was developed by Facebook for WebScaleSQL [3].

The relevant functions can be found in [4]:

class DeadlockChecker, method check_and_resolve (DeadlockChecker::check_and_resolve)

Every InnoDB (row) Lock (for mode LOCK_S or LOCK_X) and type ORed with LOCK_GAP or
LOCK_REC_NOT_GAP, ORed with LOCK_INSERT_INTENTION

Enqueue a waiting request for a lock which cannot be granted immediately.

lock_rec_enqueue_waiting()

and

Every (InnoDB) Table Lock

Enqueues a waiting request for a table lock which cannot be granted immediately. Checks for deadlocks.

lock_table_enqueue_waiting()

This means if the variable innodb_deadlock_detect is enabled (= default) for every Lock (Row or Table) it is checked, if it causes a Deadlock. If the variable is disabled, the check is NOT done (which is faster) and the transaction hangs in (Dead-)Lock until the Lock is freed or the time innodb_lock_wait_timeout (default 50 seconds) is exceeded. Then the InnoDB Lock Wait Timeout (detector?) strikes and kills the transaction.

SQL> SHOW GLOBAL VARIABLES LIKE 'innodb_lock_wait%';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| innodb_lock_wait_timeout | 50    |
+--------------------------+-------+

This means, deactivating InnoDB Deadlock detection is interesting, if you have many (like Facebook!?!) short and small transactions where you expect little to now conflicts.
Further it is recommended, to set the MySQL variable innodb_lock_wait_timeout to a very small value (some seconds).

Because most of our customers do not have the size of Facebook and because they have rather not so many concurrent short and small transactions but few but long transactions (with probably many Locks and therefore a high Deadlock probability), I can imagine, disabling this parameter was responsible for the hickup (Locks are piling up) of the customer system. Which leads to exceeding max_connections and finally the whole system sticks.

Therefore I strongly recommend, to let InnoDB Deadlock detection enabled. Except you know exactly what your are doing (after about 2 weeks of extensive testing and measuring).

Literature

by Shinguz at March 23, 2020 10:24 AM

March 22, 2020

Valeriy Kravchuk

Post #200 in the Blog of Former MySQL Entomologist!

I noted that since August 2012 when I started this blog (while planning to leave my job as a MySQL Entomologist in Oracle to see some real MySQL life besides bugs) I've already published 199 posts. In this post #200 I'd like to look back and pick up one post per each year of blogging so far that I consider really good, important or useful.

I try my best to pick up posts that are mostly not about specific MySQL bugs. These posts either caused some interesting discussions or defined my interests for a log time, or just opened new directions in my public blogging efforts. I think many of them also got less public attention than they deserve (so most popular posts are not in this list). I also try to summarize my work related memories and sentiments of the year.

Here is the list:
  • 2012. "New Year Wishes for Customers of Oracle's MySQL Technical Support Services".

    That year I considered my job roles in Oracle too boring and decided to quit and join Percona for real fun, practical experience and really good money! I also created few social media accounts and this blog, as I decided to finally share the details of my past and current work in public. I just started and published only 4 posts during that year and the one I picked up was probably created during my shift on December 31 (I still like to work on weekends and public holidays, but do this less often). Basically I wanted to encourage Oracle customers (and employees) to report MySQL bugs they found in public MySQL bugs database. Expected for the MySQL Entomologist to wish this, isn't it? I still wish the same today, even more! Unfortunately I see public MySQL bugs database is used less these days and this makes me sad.

  • 2013. "How to Drop the Trigger".

    I had a lot of work, bugs and problems to deal with in 2013, including hard to forget snow storm in March. This year I first visited UK to speak about Performance Schema at Percona Live (it was probably my worst accepted talk ever) and enjoy London and Brighton. Who could imagine the outcomes of that very first visit... I had written a lot in public that year, 42 blog posts and so many rumblings about bugs at Facebook that I annoyed many of my former colleagues. The blog post I picked up was probably my first ever "HowTo" style post in this blog and that's why it is important. It described the real life case when due to missing .TRG file DROP TRIGGER statement just failed. One had to create the file manually to proceed, and it's easy as it's a plain text file. Surely I listed some bugs I checked in the process, including those that could explain why the file was missing.

  • 2014. "On responsible bugs reporting".

    I was less active blogger in 2014, as I was kindly asked by both Oracle and Percona officials and colleagues to reduce my extraordinary bugs-related social media activity. This made me sad, but allowed to concentrate better on support and other job duties in Percona. I kept myself mostly silent after the blog post mentioned above, where I discussed what a responsible engineer should do when she finds some simple sequence of SQL statements that, when executed by authenticated user explicitly having all the privileges needed to execute these statements, crashes some version of her favorite MySQL fork. I was (and still is) against hiding this kind of crashes by Oracle or any other open source software vendor, and suggested to report them in public as I always did. Many MySQL Community members disagree with this approach even today, probably. In 2014 I've got my (first and only) talk (about MySQL bugs reporting) accepted to Oracle Open World 2014 somehow. I keep submitting multiple talks for this event every year, to no result since that time...

  • 2015. "Using gdb to understand what locks (and when) are really set by InnoDB. Part I.".

    In 2015 I started to study MySQL internals with gdb on a regular basis. The blog post I picked up for this year was the first in a series of such studies that continue until today. Initially it was as simple as setting few breakpoints and checking several lines of source code here and there, but I had not stopped at that stage. As a result of that studies, in 2015 I've presented interesting and really well accepted talks at FOSDEM and another one about InnoDB locks and deadlocks at Percona Live. Who could imagine that next time I speak at any Percona conference it will be 2019! The rest of the year was hard and sad at work, where I spent most of my time fighting for my Support team values and colleagues against the upcoming changes, new leaders and new company approaches to services. Not so many blog posts as a result, just 12.

  • 2016. "I'm Winston Wolf, I solve problems.".

    In January, 2016, my internal fights in Percona influenced content and highlights of my blog posts. One of them, my all times favorite, was aimed at explaining how Support should work and why, based on real life story with one of customer issues I resolved instantly working the way I prefer! I have to note that my feature request (Bug #76030 - "Add a way to disable rowid-ordered retrieval to optimizer_switch") that led me to the idea about the root cause is still "Open" today. Even more interesting that the issue was about MariaDB. Later I lost in all my fights for Support in Percona (fights, but not the battle, as Percona today mostly operates the way I defended back then, not the way they planned or tried to enforce) and had to quit, to end up working in MariaDB Corporation where I stay till today, and happy! In general, 2016 was my most successful year as a blogger, with up to 1000 page views per day during some months. Many of 29 blog posts published that year are worth re-reading and quoting, and became really popular, but I'd like to remind mostly about this series about MySQL Support people... Since 2016 I am less active at conferences due to my (now resolved) conflict with Percona on this topic as well, so FOSDEM was my only MySQL-related public appearance in 2016.

  • 2017. "perf Basics for MySQL Profiling".

    By 2017 I found yet another type of tools of enormous value for my daily work in Support - profilers (not only pt-pmp, but also real ones), specifically perf in case of any modern Linux. I had found out (hard way) that proper instrumentation of source code, while useful as MySQL Performance Schema (and Oracle's wait events long before it) shown, is never 100% complete and ready to use in practice, unlike proper OS level profiling tools that Linux got since kernel 2.6.x. Profilers helped me to solve real life performance problems and my findings were later summarized in this FOSDEM 2017 talk, the best one I ever made IMHO. It was a fruitful year for blogging and it was hard to pick up one post out of 32. I continued my gdb studies of various MySQL features and ended up with a long workshop on this topic presented in Sofia to few attendees.

  • 2018. "On Some Problematic Oracle MySQL Server Features".

    In 2018 I had crazy plans to go to Oracle Open World and speak there about problems with MySQL and Oracle's way of developing it. So, many of my numerous (37 in total) blog posts that year were explaining my views on these in details. I picked up the one above as it summarized the most problematic features of MySQL server itself (namely implementations of InnoDB data compression, FULLTEXT indexes, so called "online" DDL, automatic recalculation of InnoDB "persistent" statistics, and partitioning, as well as some details on how all these (does not) work well together). This was surely NOT a topic to be accepted for any MySQL conference! But posts from this series are still useful to review even today, as some of these features got zero improvements even in MySQL 8.0 (and just few are planned or done by other vendors, with MyRocks providing better data compression and with some ongoing changes to InnoDB that are implemented by MariaDB Corporation). I had to cancel my FOSDEM talk that year due to family issues, and had not made any public presentations.

  • 2019. "Dynamic Tracing of MySQL Server With perf probe - Basic Example".


    The year of 2019 was really successful for me. In May I've got "MySQL Community Contributor of The Year" award "for bug identification and submission", that is, for activity that is the main topic of this blog. Obviously, many (of 26 in total) blog posts were about bugs, bugs reporting and bugs processing, same as my FOSDEM talk in 2019. But later I stopped exploiting "MySQL bugs" topic and switched to my current interests (dynamic tracing and profiling with perf, BPF tools and bpftrace). So, I picked up one of the key blog posts that provided additional details for my Percona Live Europe 2019 talk on the topic.

    I had a lot of plans for related studies and talks for 2020, but with COVID-19 breaking everything and conferences cancelled it may happen so that this blog would be the only public platform for all my MySQL-related activities this year.
I also checked many old photos made with Nokia dumb phone while working on this post. Who known when (and if) I ever see the West Pier in Brighton again and swim nearby...
I do not plan to stop blogging any time soon. So, stay tuned for the next 200 post :)

by Valerii Kravchuk (noreply@blogger.com) at March 22, 2020 09:53 AM

March 20, 2020

SeveralNines

Multi-DC PostgreSQL: Setting Up a Standby Node at a Different Geo-Location Over a VPN

Previously, we wrote about Setting Up a Geo-Distributed Database Cluster Using MySQL Replication. This time, it's about PostgreSQL. Setting up a geo-distributed cluster for PostgreSQL is not a new concept and the topology is quite common. 

To achieve high availability, organizations and companies are dispersing their database nodes so that when a catastrophic event happens in a specific region (which affects your data center) you have your standby nodes available for failover.

This is a very common practice (using this type of topology) as part of your organization's Business Continuity and Disaster Recovery plans. This type of topology removes having a single point of failure (SPOF). A common requirement especially if you have a low RPO and a higher uptime (if possible at 99.999999999%).

In this blog, I'll take a simple implementation on how to do this using ClusterControl. ClusterControl is an agentless management and automation software for database clusters. It helps deploy, monitor, manage, and scale your database server/cluster directly from ClusterControl user interface.

The Desired Architectural Setup

The target outcome here is to deploy efficiently in a secure environment. To do this, it's important that you need to place your established connection in between using VPN and would be more secure if you also setup your database nodes over TLS/SSL connection. For this setup in our blog, we simply deploy a node over a VPN and showcase to you how you can easily do this approach. See below for the diagram of the target setup:

To elaborate the setup, the on-premise network shall communicate over the public cloud using a VPN tunnel and both of these networks shall have a VPN gateway so both can communicate or establish a connection. ClusterControl requires you to oversee all the nodes that have to be registered as it will collect information about your nodes for data metrics. Aside from that, it requires that your on-prem active-writer node can also reach the standby node into the other domain, which is for this blog, hosted in Google Cloud Platform (GCP).

Setting Up Your OpenVPN

OpenVPN setup is very tricky for both network domains. The gist of this is that, it must have the following consideration:

  • Nodes from your on-prem shall be able to establish a connection to the target public cloud domain nodes
  • Nodes from your on-prem can be able to have internet access to download packages that are required to set up. Unless you have all the repositories stored locally that are required, this can be not the case
  • Nodes from your public cloud domain shall be able to establish a connection to the on-premise nodes
  • Nodes from your public cloud domain can be able to have internet access to download packages that are required to set up. Unless you have all the repositories stored locally that are required, this can be not the case

OpenVPN Installation and Configuration

Step One

Install the openvpn package (and easy-rsa packages for Ubuntu/Debian distros)

$ sudo apt-get install openvpn easy-rsa

For CentOS/RHEL based OS, 

$ sudo yum install openvpn wget

$ wget -O /tmp/easyrsa https://github.com/OpenVPN/easy-rsa-old/archive/2.3.3.tar.gz

Step Two

Generate your certificates such as certification authority (CA), server, and client certificates. 

For Ubuntu/Debian, you ca do the following actions:

$ /usr/bin/make-cadir CA

Change to CA directory

$ cd CA

At this point, you might likely to edit vars file in accordance to your needs, e.g.

export KEY_COUNTRY="SE"

export KEY_PROVINCE="SMD"

export KEY_CITY="Kalmar"

export KEY_ORG="Severalnines"

export KEY_EMAIL="paul@s9s.io"

export KEY_CN="S9s"

export KEY_NAME="server"

export KEY_OU="Support Unit"

Then execute the vars script to define the required env variables

[ ~/CA ]$ source ./vars

NOTE: If you run ./clean-all, I will be doing a rm -rf on /CA/keys

Run a clean-up

[ ~/CA ]$ ./clean-all

Then build the certificates for your CA, server, and client.

[ ~/CA ]$ ./build-ca

[ ~/CA ]$ ./build-key-server server

 $ ./build-dh 2048

[ ~/CA ]$ ./build-key client

Lastly, generate a Perfect Forward Secrecy key.

$ openvpn --genkey --secret pfs.key

If you're using CentOS/RHEL type distros, you can do the following:

$ tar xfz /tmp/easyrsa

$ sudo mkdir /etc/openvpn/easy-rsa

$ sudo cp -rf easy-rsa-old-2.3.3/easy-rsa/2.0/* /etc/openvpn/easy-rsa

# Ensure your RSA keys are on the right permission for security purposes

$ sudo chown vagrant /etc/openvpn/easy-rsa/

$ sudo cp /usr/share/doc/openvpn-2.4.4/sample/sample-config-files/server.conf /etc/openvpn

$ sudo mkdir /etc/openvpn/easy-rsa/keys

$ sudo nano /etc/openvpn/easy-rsa/vars

$ cd /etc/openvpn/easy-rsa

At this point, you might likely to edit vars file in accordance to your needs, e.g.

export KEY_COUNTRY="SE"

export KEY_PROVINCE="SMD"

export KEY_CITY="Kalmar"

export KEY_ORG="Severalnines"

export KEY_EMAIL="paul@s9s.io"

export KEY_CN="S9s"

export KEY_NAME="server"

export KEY_OU="Support Unit"

Then execute the vars script to define the required env variables

$ source ./vars

NOTE: If you run ./clean-all, I will be doing a rm -rf on /CA/keys

Run a clean-up

$ ./clean-all

Then build the certificates for your CA, server, and client.

$ ./build-ca

$ ./build-key-server server

$ ./build-dh 2048

$ cd /etc/openvpn/easy-rsa

$ ./build-key client

$ cp /etc/openvpn/easy-rsa/openssl-1.0.0.cnf /etc/openvpn/easy-rsa/openssl.cnf

Once you have all setup, you must take into account where your keys and certificates are in place. If you're using systemd or service in Linux to run this, then you might place your certificates and keys to /etc/openvpn. Likely, you might have to run the following command:

sudo cp dh2048.pem ca.crt server.crt server.key /etc/openvpn

Step Three

At this point, I end up with the following server and client configuration. See my configuration files accordingly,

OpenVPN Server Config

$ cat /etc/openvpn/server-ovpn.conf 

port 1194

proto udp

dev tun

ca /etc/openvpn/keys/ca.crt

cert /etc/openvpn/keys/server.crt

key /etc/openvpn/keys/server.key # This file should be kept secret

dh /etc/openvpn/keys/dh2048.pem

cipher AES-256-CBC

auth SHA512

server 10.8.0.0 255.255.255.0

client-to-client

topology subnet

push "route 192.168.30.0 255.255.255.0"

#push "redirect-gateway def1 bypass-dhcp"

#push "redirect-gateway"

push "dhcp-option DNS 8.8.8.8"

push "dhcp-option DNS 8.8.4.4"

ifconfig-pool-persist ipp.txt

keepalive 10 120

comp-lzo

persist-key

persist-tun

#status openvpn-status.log

#log-append  openvpn.log

verb 3

tls-server

tls-auth /etc/openvpn/keys/pfs.key

The most important thing you need to take into account are these following options as noted as follows.

client-to-client - Very important so nodes in the VPN can ping the other nodes in different network domain. Say, ClusterControl is located in on-prem, it can ping the nodes in GCP.

push "route 192.168.30.0 255.255.255.0" - I push the routing tables so that GCP node/s connected to VPN can ping my nodes in the on-premise domain. In my GCP VPN gateway, I have the following routing tables as push "route 10.142.0.0 255.255.255.0"

#push "redirect-gateway def1 bypass-dhcp" ,

#push "redirect-gateway" - Both these sections are not required since I need internet connection for both to setup my repo and dependent packages upon installation. 

push "dhcp-option DNS 8.8.8.8", 

push "dhcp-option DNS 8.8.4.4" -  Both these sections can be changed to your desired DNS if needed. This is for your desired DNS especially when you need internet connection.

OpenVPN Client Config

$ cat openvpn/client-vpn.ovpn 

client

dev tun

proto udp

remote 34.73.238.239  1194  

ca ca.crt

cert client.crt

key client.key

tls-version-min 1.2

tls-cipher TLS-ECDHE-RSA-WITH-AES-128-GCM-SHA256:TLS-ECDHE-ECDSA-WITH-AES-128-GCM-SHA256:TLS-ECDHE-RSA-WITH-AES-256-GCM-SHA384:TLS-DHE-RSA-WITH-AES-256-CBC-SHA256

cipher AES-256-CBC

auth SHA512

resolv-retry infinite

auth-retry none

nobind

persist-key

persist-tun

ns-cert-type server

comp-lzo

verb 3

tls-client

tls-auth pfs.key

The most important thing here is you need to be sure of your key paths, and also replace the params in this section,

remote 34.73.238.239  1194  

which can be the hostname/IP address of your VPN server gateway to connect to.

Step Four

Lastly, setup proxy VPN server to have the network packets routed to the network interface on the server and allow the kernel to forward IPV4 traffic

sudo iptables -t nat -A POSTROUTING -s 10.8.0.0/24 -o eth0 -j MASQUERADE

sudo echo 1 > /proc/sys/net/ipv4/ip_forward

For more in-depth installation, I suggest you look at these posts for CentOS and on Ubuntu.

Extending Over the Cloud Efficiently

Suppose you have the following topology in your on-prem domain,

and you now want to extend your availability over another datacenter, which is GCP for this blog. Deploying using ClusterControl is very straightforward. You can do the following procedure stated below,

Step One

Create A Slave Cluster

Step Two

Choose your master for replication,

Step Three

Setup the access to your public cloud environment

Step Four

Specify the hostname/IP of your node to be extended into your PG replication cluster,

Step Five

Lastly, monitor the job activity how does ClusterControl reacts to this type of action

The outcome will show you the linkage between your on-prem and your extended datacenter, which is in this blog, our GCP PostgreSQL standby node. See below for the result

Conclusion

Setting up a geo-location standby node is not difficult,  but the main issue is how secure will this be in your architectural design. Using a VPN can alleviate the main concern of the problem. Using OpenVPN is just a simple way to implement this but for heavy transactional applications, organizations are likely to invest on upscale services or hardware to deal with this setup. Also by adding a TLS/SSL can be easier than done. We'll discuss this on how you can use TLS/SSL with PostgreSQL in our next blogs.

by Paul Namuag at March 20, 2020 05:48 PM

March 19, 2020

SeveralNines

How to Restore a Specific Collection in MongoDB Using Logical Backup

Keeping backups of your database is one of the most important tasks in any production environment. It is the process of copying your data to some other place to keep it safe. This can be useful in recovery from emergency situations like database corruption or a database crashing beyond repair.

Apart from recovery, a backup can also be used to mimic a production database for testing an application in a different environment, or even to debug something that can not be done on the production database.

There are various methods of database backups that you can implement, from logical backup using tools that are embedded in the database (eg. mysqldump, mongodump, pg_dump) to physical backup using third party tools (eg. xtrabackup, barman, pgbackrest, mongodb consistent backup). 

Which method to use is often determined on how you would like to restore. For instance, assume you dropped a table or a collection by mistake. Unlikely as it might seem, it does happen. So the fastest way to recover would be to restore just that table or collection, instead of having to restore an entire database.

Backup and Restore in Mongodb

Mongodump and mongorestore is the tool for logical backup used in MongoDB, it is kind of mysqldump in MySQL, or pg_dump in PostgreSQL. The mongodump and mongorestore utility will be included when you install MongoDB and it dumps the data in BSON format. Mongodump is used to backup the database logically into dump files, while mongorestore is used for the restore operation.

mongodump and mongorestore commands are easy to use, although there are a lot of options.  

As we can see below, you can backup specific databases or collections. You can even take a point in time snapshot by including the oplog.

root@n2:~# mongodump --help

Usage:

  mongodump <options>



Export the content of a running server into .bson files.



Specify a database with -d and a collection with -c to only dump that database or collection.



See http://docs.mongodb.org/manual/reference/program/mongodump/ for more information.



general options:

      --help                                                print usage

      --version                                             print the tool version and exit



verbosity options:

  -v, --verbose=<level>                                     more detailed log output (include multiple times for more verbosity, e.g. -vvvvv, or specify a numeric value, e.g. --verbose=N)

      --quiet                                               hide all log output



connection options:

  -h, --host=<hostname>                                     mongodb host to connect to (setname/host1,host2 for replica sets)

      --port=<port>                                         server port (can also use --host hostname:port)



kerberos options:

      --gssapiServiceName=<service-name>                    service name to use when authenticating using GSSAPI/Kerberos ('mongodb' by default)

      --gssapiHostName=<host-name>                          hostname to use when authenticating using GSSAPI/Kerberos (remote server's address by default)



ssl options:

      --ssl                                                 connect to a mongod or mongos that has ssl enabled

      --sslCAFile=<filename>                                the .pem file containing the root certificate chain from the certificate authority

      --sslPEMKeyFile=<filename>                            the .pem file containing the certificate and key

      --sslPEMKeyPassword=<password>                        the password to decrypt the sslPEMKeyFile, if necessary

      --sslCRLFile=<filename>                               the .pem file containing the certificate revocation list

      --sslAllowInvalidCertificates                         bypass the validation for server certificates

      --sslAllowInvalidHostnames                            bypass the validation for server name

      --sslFIPSMode                                         use FIPS mode of the installed openssl library



authentication options:

  -u, --username=<username>                                 username for authentication

  -p, --password=<password>                                 password for authentication

      --authenticationDatabase=<database-name>              database that holds the user's credentials

      --authenticationMechanism=<mechanism>                 authentication mechanism to use



namespace options:

  -d, --db=<database-name>                                  database to use

  -c, --collection=<collection-name>                        collection to use



uri options:

      --uri=mongodb-uri                                     mongodb uri connection string



query options:

  -q, --query=                                              query filter, as a JSON string, e.g., '{x:{$gt:1}}'

      --queryFile=                                          path to a file containing a query filter (JSON)

      --readPreference=<string>|<json>                      specify either a preference name or a preference json object

      --forceTableScan                                      force a table scan



output options:

  -o, --out=<directory-path>                                output directory, or '-' for stdout (defaults to 'dump')

      --gzip                                                compress archive our collection output with Gzip

      --repair                                              try to recover documents from damaged data files (not supported by all storage engines)

      --oplog                                               use oplog for taking a point-in-time snapshot

      --archive=<file-path>                                 dump as an archive to the specified path. If flag is specified without a value, archive is written to stdout

      --dumpDbUsersAndRoles                                 dump user and role definitions for the specified database

      --excludeCollection=<collection-name>                 collection to exclude from the dump (may be specified multiple times to exclude additional collections)

      --excludeCollectionsWithPrefix=<collection-prefix>    exclude all collections from the dump that have the given prefix (may be specified multiple times to exclude additional prefixes)

  -j, --numParallelCollections=                             number of collections to dump in parallel (4 by default) (default: 4)

      --viewsAsCollections                                  dump views as normal collections with their produced data, omitting standard collections

There are many options in mongorestore command, mandatory option is related to connection options such as host, port, and authentication. There are other parameters, like -j used to restore collections in parallel, -c or --collection is used for a specific collection, and -d or --db is used to define a specific database. The list of options of the mongorestore parameter can be shown using help : 

root@n2:~# mongorestore --help

Usage:

  mongorestore <options> <directory or file to restore>



Restore backups generated with mongodump to a running server.



Specify a database with -d to restore a single database from the target directory,

or use -d and -c to restore a single collection from a single .bson file.



See http://docs.mongodb.org/manual/reference/program/mongorestore/ for more information.



general options:

      --help                                                print usage

      --version                                             print the tool version and exit



verbosity options:

  -v, --verbose=<level>                                     more detailed log output (include multiple times for more verbosity, e.g. -vvvvv, or specify a numeric value, e.g. --verbose=N)

      --quiet                                               hide all log output



connection options:

  -h, --host=<hostname>                                     mongodb host to connect to (setname/host1,host2 for replica sets)

      --port=<port>                                         server port (can also use --host hostname:port)



kerberos options:

      --gssapiServiceName=<service-name>                    service name to use when authenticating using GSSAPI/Kerberos ('mongodb' by default)

      --gssapiHostName=<host-name>                          hostname to use when authenticating using GSSAPI/Kerberos (remote server's address by default)



ssl options:

      --ssl                                                 connect to a mongod or mongos that has ssl enabled

      --sslCAFile=<filename>                                the .pem file containing the root certificate chain from the certificate authority

      --sslPEMKeyFile=<filename>                            the .pem file containing the certificate and key

      --sslPEMKeyPassword=<password>                        the password to decrypt the sslPEMKeyFile, if necessary

      --sslCRLFile=<filename>                               the .pem file containing the certificate revocation list

      --sslAllowInvalidCertificates                         bypass the validation for server certificates

      --sslAllowInvalidHostnames                            bypass the validation for server name

      --sslFIPSMode                                         use FIPS mode of the installed openssl library



authentication options:

  -u, --username=<username>                                 username for authentication

  -p, --password=<password>                                 password for authentication

      --authenticationDatabase=<database-name>              database that holds the user's credentials

      --authenticationMechanism=<mechanism>                 authentication mechanism to use



uri options:

      --uri=mongodb-uri                                     mongodb uri connection string



namespace options:

  -d, --db=<database-name>                                  database to use when restoring from a BSON file

  -c, --collection=<collection-name>                        collection to use when restoring from a BSON file

      --excludeCollection=<collection-name>                 DEPRECATED; collection to skip over during restore (may be specified multiple times to exclude additional collections)

      --excludeCollectionsWithPrefix=<collection-prefix>    DEPRECATED; collections to skip over during restore that have the given prefix (may be specified multiple times to exclude additional prefixes)

      --nsExclude=<namespace-pattern>                       exclude matching namespaces

      --nsInclude=<namespace-pattern>                       include matching namespaces

      --nsFrom=<namespace-pattern>                          rename matching namespaces, must have matching nsTo

      --nsTo=<namespace-pattern>                            rename matched namespaces, must have matching nsFrom



input options:

      --objcheck                                            validate all objects before inserting

      --oplogReplay                                         replay oplog for point-in-time restore

      --oplogLimit=<seconds>[:ordinal]                      only include oplog entries before the provided Timestamp

      --oplogFile=<filename>                                oplog file to use for replay of oplog

      --archive=<filename>                                  restore dump from the specified archive file. If flag is specified without a value, archive is read from stdin

      --restoreDbUsersAndRoles                              restore user and role definitions for the given database

      --dir=<directory-name>                                input directory, use '-' for stdin

      --gzip                                                decompress gzipped input



restore options:

      --drop                                                drop each collection before import

      --dryRun                                              view summary without importing anything. recommended with verbosity

      --writeConcern=<write-concern>                        write concern options e.g. --writeConcern majority, --writeConcern '{w: 3, wtimeout: 500, fsync: true, j: true}'

      --noIndexRestore                                      don't restore indexes

      --noOptionsRestore                                    don't restore collection options

      --keepIndexVersion                                    don't update index version

      --maintainInsertionOrder                              preserve order of documents during restoration

  -j, --numParallelCollections=                             number of collections to restore in parallel (4 by default) (default: 4)

      --numInsertionWorkersPerCollection=                   number of insert operations to run concurrently per collection (1 by default) (default: 1)

      --stopOnError                                         stop restoring if an error is encountered on insert (off by default)

      --bypassDocumentValidation                            bypass document validation

      --preserveUUID                                        preserve original collection UUIDs (off by default, requires drop)

Restoring specific collections in MongoDB can be done using the parameter collection in mongorestore. Assume you have an orders database, inside the orders database there are some collections as shown below:

my_mongodb_0:PRIMARY> show dbs;

admin   0.000GB

config  0.000GB

local   0.000GB

orders  0.000GB

my_mongodb_0:PRIMARY> use orders;

my_mongodb_0:PRIMARY> show collections;

order_details

orders

stock

We already have scheduled a backup for the orders database, and we want to restore the stock collection into a new database order_new in the same server. If you want to use option --collection, you need to pass the collection name as  parameter of mongorestore or you can use the option --nsInclude={db}.{collection} if you didn’t specify the path to the collection file.

root@n2:~/dump/orders# mongorestore -umongoadmin --authenticationDatabase admin --db order_new --collection stock /root/dump/orders/stock.bson

Enter password:

​2020-03-09T04:06:29.100+0000 checking for collection data in /root/dump/orders/stock.bson

2020-03-09T04:06:29.110+0000 reading metadata for order_new.stock from /root/dump/orders/stock.metadata.json

2020-03-09T04:06:29.134+0000 restoring order_new.stock from /root/dump/orders/stock.bson

2020-03-09T04:06:29.202+0000 no indexes to restore

2020-03-09T04:06:29.203+0000 finished restoring order_new.stock (1 document)

2020-03-09T04:06:29.203+0000 done

You can check the collection in order_new database as shown below :

​my_mongodb_0:PRIMARY> use order_new;

switched to db order_new

my_mongodb_0:PRIMARY> show collections;

stock

How We Can Restore Using mongodump in ClusterControl

Restoring a backup dump through ClusterControl is easy, you just need 2 steps to restore the backup. There will be lots of backup files in the list if you enabled your backup schedule, there is some information about the backups that can be very useful. For example, status of backup which indicates if the backup was completed / failed, method of backup had taken, list of databases, and size of dump. The steps to restore MongoDB data via ClusterControl are as below :

Step One

Follow the prompts to restore backup to a node as shown below...

Step Two

You need to choose which backup that needs to be restored. 

Step Three

Review the summary...

 

 

by agus at March 19, 2020 06:39 PM

March 18, 2020

SeveralNines

How to Replace an Intermediate MySQL or MariaDB Master with a Binlog Server using MaxScale

Binary logs (binlogs) contain records of all changes to the databases. They are necessary for replication and can also be used to restore data after a backup. A binlog server is basically a binary log repository. You can think of it like a server with a dedicated purpose to retrieve binary logs from a master, while slave servers can connect to it like they would connect to a master server.

Some advantages of having a binlog server over intermediate master to distribute replication workload are:

  • You can switch to a new master server without the slaves noticing that the actual master server has changed. This allows for a more highly available replication setup where replication is high-priority.
  • Reduce the load on the master by only serving Maxscale’s binlog server instead of all the slaves.
  • The data in the binary log of the intermediate master is not a direct copy of the data that was received from the binary log of the real master. As such, if group commit is used, this can cause a reduction in the parallelism of the commits and a subsequent reduction in the performance of the slave servers.
  • Intermediate slave has to re-execute every SQL statement which potentially adds latency and lags into the replication chain.

In this blog post, we are going to look into how to replace an intermediate master (a slave host relay to other slaves in a replication chain) with a binlog server running on MaxScale for better scalability and performance.

Architecture

We basically have a 4-node MariaDB v10.4 replication setup with one MaxScale v2.3 sitting on top of the replication to distribute incoming queries. Only one slave is connected to a master (intermediate master) and the other slaves replicate from the intermediate master to serve read workloads, as illustrated in the following diagram.

We are going to turn the above topology into this:

Basically, we are going to remove the intermediate master role and replace it with a binlog server running on MaxScale. The intermediate master will be converted to a standard slave, just like other slave hosts. The binlog service will be listening on port 5306 on the MaxScale host. This is the port that all slaves will be connecting to for replication later on.

Configuring MaxScale as a Binlog Server

In this example, we already have a MaxScale sitting on top of our replication cluster acting as a load balancer for our applications. If you don't have a MaxScale, you can use ClusterControl to deploy simply go to Cluster Actions -> Add Load Balancer -> MaxScale and fill up the necessary information as the following:

Before we get started, let's export the current MaxScale configuration into a text file for backup. MaxScale has a flag called --export-config for this purpose but it must be executed as maxscale user. Thus, the command to export is:

$ su -s /bin/bash -c '/bin/maxscale --export-config=/tmp/maxscale.cnf' maxscale

On the MariaDB master, create a replication slave user called 'maxscale_slave' to be used by the MaxScale and assign it with the following privileges:

$ mysql -uroot -p -h192.168.0.91 -P3306
MariaDB> CREATE USER 'maxscale_slave'@'%' IDENTIFIED BY 'BtF2d2Kc8H';
MariaDB> GRANT SELECT ON mysql.user TO 'maxscale_slave'@'%';
MariaDB> GRANT SELECT ON mysql.db TO 'maxscale_slave'@'%';
MariaDB> GRANT SELECT ON mysql.tables_priv TO 'maxscale_slave'@'%';
MariaDB> GRANT SELECT ON mysql.roles_mapping TO 'maxscale_slave'@'%';
MariaDB> GRANT SHOW DATABASES ON *.* TO 'maxscale_slave'@'%';
MariaDB> GRANT REPLICATION SLAVE ON *.* TO 'maxscale_slave'@'%';

For ClusterControl users, go to Manage -> Schemas and Users to create the necessary privileges.

Before we move further with the configuration, it's important to review the current state and topology of our backend servers:

$ maxctrl list servers
┌────────┬──────────────┬──────┬─────────────┬──────────────────────────────┬───────────┐
│ Server │ Address      │ Port │ Connections │ State                        │ GTID      │
├────────┼──────────────┼──────┼─────────────┼──────────────────────────────┼───────────┤
│ DB_757 │ 192.168.0.90 │ 3306 │ 0           │ Master, Running              │ 0-38001-8 │
├────────┼──────────────┼──────┼─────────────┼──────────────────────────────┼───────────┤
│ DB_758 │ 192.168.0.91 │ 3306 │ 0           │ Relay Master, Slave, Running │ 0-38001-8 │
├────────┼──────────────┼──────┼─────────────┼──────────────────────────────┼───────────┤
│ DB_759 │ 192.168.0.92 │ 3306 │ 0           │ Slave, Running               │ 0-38001-8 │
├────────┼──────────────┼──────┼─────────────┼──────────────────────────────┼───────────┤
│ DB_760 │ 192.168.0.93 │ 3306 │ 0           │ Slave, Running               │ 0-38001-8 │
└────────┴──────────────┴──────┴─────────────┴──────────────────────────────┴───────────┘

As we can see, the current master is DB_757 (192.168.0.90). Take note of this information as we are going to setup the binlog server to replicate from this master.

Open the MaxScale configuration file at /etc/maxscale.cnf and add the following lines:

[replication-service]
type=service
router=binlogrouter
user=maxscale_slave
password=BtF2d2Kc8H
version_string=10.4.12-MariaDB-log
server_id=9999
master_id=9999
mariadb10_master_gtid=true
filestem=binlog
binlogdir=/var/lib/maxscale/binlogs
semisync=true # if semisync is enabled on the master

[binlog-server-listener]
type=listener
service=replication-service
protocol=MariaDBClient
port=5306
address=0.0.0.0

A bit of explanation - We are creating two components - service and listener. Service is where we define the binlog server characteristic and how it should run. Details on every option can be found here. In this example, our replication servers are running with semi-sync replication, thus we have to use semisync=true so it will connect to the master via semi-sync replication method. The listener is where we map the listening port with the binlogrouter service inside MaxScale.

Restart MaxScale to load the changes:

$ systemctl restart maxscale

Verify the binlog service is started via maxctrl (look at the State column):

$ maxctrl show service replication-service

Verify that MaxScale is now listening to a new port for the binlog service:

$ netstat -tulpn | grep maxscale
tcp        0 0 0.0.0.0:3306            0.0.0.0:* LISTEN   4850/maxscale
tcp        0 0 0.0.0.0:3307            0.0.0.0:* LISTEN   4850/maxscale
tcp        0 0 0.0.0.0:5306            0.0.0.0:* LISTEN   4850/maxscale
tcp        0 0 127.0.0.1:8989          0.0.0.0:* LISTEN   4850/maxscale

We are now ready to establish a replication link between MaxScale and the master.

Activating the Binlog Server

Log into the MariaDB master server and retrieve the current binlog file and position:

MariaDB> SHOW MASTER STATUS;
+---------------+----------+--------------+------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+---------------+----------+--------------+------------------+
| binlog.000005 |     4204 |              |                  |
+---------------+----------+--------------+------------------+

Use BINLOG_GTID_POS function to get the GTID value:

MariaDB> SELECT BINLOG_GTID_POS("binlog.000005", 4204);
+----------------------------------------+
| BINLOG_GTID_POS("binlog.000005", 4204) |
+----------------------------------------+
| 0-38001-31                             |
+----------------------------------------+

Back to the MaxScale server, install MariaDB client package:

$ yum install -y mysql-client

Connect to the binlog server listener on port 5306 as maxscale_slave user and establish a replication link to the designated master. Use the GTID value retrieved from the master:

(maxscale)$ mysql -u maxscale_slave -p'BtF2d2Kc8H' -h127.0.0.1 -P5306
MariaDB> SET @@global.gtid_slave_pos = '0-38001-31';
MariaDB> CHANGE MASTER TO MASTER_HOST = '192.168.0.90', MASTER_USER = 'maxscale_slave', MASTER_PASSWORD = 'BtF2d2Kc8H', MASTER_PORT=3306, MASTER_USE_GTID = slave_pos;
MariaDB> START SLAVE;
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
                 Slave_IO_State: Binlog Dump
                  Master_Host: 192.168.0.90
                  Master_User: maxscale_slave
                  Master_Port: 3306
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
             Master_Server_Id: 38001
             Master_Info_File: /var/lib/maxscale/binlogs/master.ini
      Slave_SQL_Running_State: Slave running
                  Gtid_IO_Pos: 0-38001-31

Note: The above output has been truncated to show only important lines.

Pointing Slaves to the Binlog Server

Now on mariadb2 and mariadb3 (the end slaves), change the master pointing to the MaxScale binlog server. Since we are running with semi-sync replication enabled, we have to turn them off first:

(mariadb2 & mariadb3)$ mysql -uroot -p
MariaDB> STOP SLAVE;
MariaDB> SET global rpl_semi_sync_master_enabled = 0; -- if semisync is enabled
MariaDB> SET global rpl_semi_sync_slave_enabled = 0; -- if semisync is enabled
MariaDB> CHANGE MASTER TO MASTER_HOST = '192.168.0.95', MASTER_USER = 'maxscale_slave', MASTER_PASSWORD = 'BtF2d2Kc8H', MASTER_PORT=5306, MASTER_USE_GTID = slave_pos;
MariaDB> START SLAVE;
MariaDB> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: 192.168.0.95
                   Master_User: maxscale_slave
                   Master_Port: 5306
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
              Master_Server_Id: 9999
                    Using_Gtid: Slave_Pos
                   Gtid_IO_Pos: 0-38001-32
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it

Note: The above output has been truncated to show only important lines.

Inside my.cnf, we have to comment the following lines to disable semi-sync in the future:

#loose_rpl_semi_sync_slave_enabled=ON
#loose_rpl_semi_sync_master_enabled=ON

At this point, the intermediate master (mariadb1) is still replicating from the master (mariadb0) while other slaves have been replicating from the binlog server. Our current topology can be illustrated like the diagram below:

The final part is to change the master pointing of the intermediate master (mariadb1) after all slaves that used to attach to it are no longer there. The steps are basically the same with the other slaves:

(mariadb1)$ mysql -uroot -p
MariaDB> STOP SLAVE;
MariaDB> SET global rpl_semi_sync_master_enabled = 0; -- if semisync is enabled
MariaDB> SET global rpl_semi_sync_slave_enabled = 0; -- if semisync is enabled
MariaDB> CHANGE MASTER TO MASTER_HOST = '192.168.0.95', MASTER_USER = 'maxscale_slave', MASTER_PASSWORD = 'BtF2d2Kc8H', MASTER_PORT=5306, MASTER_USE_GTID = slave_pos;
MariaDB> START SLAVE;
MariaDB> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: 192.168.0.95
                   Master_User: maxscale_slave
                   Master_Port: 5306
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
              Master_Server_Id: 9999
                    Using_Gtid: Slave_Pos
                   Gtid_IO_Pos: 0-38001-32

Note: The above output has been truncated to show only important lines.

Don't forget to disable semi-sync replication in my.cnf as well:

#loose_rpl_semi_sync_slave_enabled=ON
#loose_rpl_semi_sync_master_enabled=ON

We can the verify the binlog router service has more connections now via maxctrl CLI:

$ maxctrl list services
┌─────────────────────┬────────────────┬─────────────┬───────────────────┬───────────────────────────────────┐
│ Service             │ Router         │ Connections │ Total Connections │ Servers                           │
├─────────────────────┼────────────────┼─────────────┼───────────────────┼───────────────────────────────────┤
│ rw-service          │ readwritesplit │ 1           │ 1                 │ DB_757, DB_758, DB_759, DB_760    │
├─────────────────────┼────────────────┼─────────────┼───────────────────┼───────────────────────────────────┤
│ rr-service          │ readconnroute  │ 1           │ 1                 │ DB_757, DB_758, DB_759, DB_760    │
├─────────────────────┼────────────────┼─────────────┼───────────────────┼───────────────────────────────────┤
│ replication-service │ binlogrouter   │ 4           │ 51                │ binlog_router_master_host, DB_757 │
└─────────────────────┴────────────────┴─────────────┴───────────────────┴───────────────────────────────────┘

Also, common replication administration commands can be used inside the MaxScale binlog server, for example, we can verify the connected slave hosts by using this command:

(maxscale)$ mysql -u maxscale_slave -p'BtF2d2Kc8H' -h127.0.0.1 -P5306
MariaDB> SHOW SLAVE HOSTS;
+-----------+--------------+------+-----------+------------+
| Server_id | Host         | Port | Master_id | Slave_UUID |
+-----------+--------------+------+-----------+------------+
| 38003     | 192.168.0.92 | 3306 | 9999      |            |
| 38002     | 192.168.0.91 | 3306 | 9999      |            |
| 38004     | 192.168.0.93 | 3306 | 9999      |            |
+-----------+--------------+------+-----------+------------+

At this point, our topology is looking as what we anticipated:

Our migration from intermediate master setup to binlog server setup is now complete.

 

by ashraf at March 18, 2020 06:26 PM

MariaDB Foundation

Life after this

On Monday, I sent out an email to the staff of MariaDB Foundation. In the hope that my thinking is also applicable for someone else, here’s a slightly edited excerpt:
My email to our staff
For an unknown amount of time, we will live under exceptional circumstances. […]

The post Life after this appeared first on MariaDB.org.

by Kaj Arnö at March 18, 2020 11:07 AM

March 17, 2020

SeveralNines

A Message from Our CEO on the COVID-19 Pandemic

We wanted to take a moment to address how Severalnines is handling the Covid-19 pandemic. We have employees scattered across 16 countries over 5 continents, this means that some of our employees are at higher risk than others. 

The good news is that Severalnines is a fully remote organization, so for the most part it’s business as usual for us here. Our sales and support teams are ready to provide support to our prospects and customers as needed and we expect to maintain our average response times and service level agreements. Our product team is actively working on the next release of ClusterControl which comes with new features and several bug fixes.

We have encouraged our employees to follow the recommendations made by their local governments and the World Health Organization (WHO). Several of our employees have been impacted by the closure of schools around the world. To Severalnines, families come first, so we encourage our teams to work when their schedules allow and to focus on the health and security of their family as the priority. We’re postponing our in-person events as safety comes top of mind.

To our customers & potential customers, if your database operations have been impacted by illness or freezes in hiring, please contact your account manager to see how we can help. The database automation features of ClusterControl can reduce maintenance time, improve availability, and keep you up-and-running. 

Severalnines is full of amazing people and we are here to help you in any way we can.

Stay Safe, follow your regions recommendations for staying healthy, and we’ll see you on the other side.

Sincerely,

Vinay Joosery, CEO Severalnines

by vinay at March 17, 2020 04:37 PM

March 16, 2020

SeveralNines

How to Install and Configure MaxScale for MariaDB

There are different reasons for adding a load balancer between your application and your database. If you have high traffic (and you want to balance the traffic between different database nodes) or you want to use the load balancer as a single endpoint (so in case of failover, this load balancer will cope with this issue sending the traffic to the available/healthy node.) It could also be that you want to use different ports to write and read data from your database. 

In all these cases, a load balancer will be useful for you, and if you have a MariaDB cluster, one option for this is using MaxScale which is a database proxy for MariaDB databases.

In this blog, we will show you how to install and configure it manually, and how ClusterControl can help you in this task. For this example, we will use a MariaDB replication cluster with 1 master and 1 slave node, and CentOS8 as the operating system.

How to Install MaxScale

We will assume you have your MariaDB database up and running, and also a machine (virtual or physical) to install MaxScale. We recommend you use a different host, so in case of master failure, MaxScale can failover to the slave node, otherwise, MaxScale can’t take any action if the server where it is running goes down.

There are different ways to install MaxScale, in this case, we will use the MariaDB repositories. To add it into the MaxScale server, you have to run:

$ curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash

[info] Repository file successfully written to /etc/yum.repos.d/mariadb.repo

[info] Adding trusted package signing keys...

[info] Successfully added trusted package signing keys

Now, install the MaxScale package:

$ yum install maxscale

Now you have your MaxScale node installed, before starting, you need to configure it.

How to Configure MaxScale

As MaxScale perform tasks like authentication, monitoring, and more, you need to create a database user with some specific privileges:

MariaDB [(none)]> CREATE USER 'maxscaleuser'@'%' IDENTIFIED BY 'maxscalepassword';

MariaDB [(none)]> GRANT SELECT ON mysql.user TO 'maxscaleuser'@'%';

MariaDB [(none)]> GRANT SELECT ON mysql.db TO 'maxscaleuser'@'%';

MariaDB [(none)]> GRANT SELECT ON mysql.tables_priv TO 'maxscaleuser'@'%';

MariaDB [(none)]> GRANT SELECT ON mysql.roles_mapping TO 'maxscaleuser'@'%';

MariaDB [(none)]> GRANT SHOW DATABASES ON *.* TO 'maxscaleuser'@'%';

MariaDB [(none)]> GRANT REPLICATION CLIENT on *.* to 'maxscaleuser'@'%';

Keep in mind that MariaDB versions 10.2.2 to 10.2.10 also require:

MariaDB [(none)]> GRANT SELECT ON mysql.* TO 'maxscaleuser'@'%';

Now you have the database user ready, let’s see the configuration files. When you install MaxScale, the file maxscale.cnf will be created under /etc/. There are several variables and different ways to configure it, so let’s see an example:

$ cat  /etc/maxscale.cnf 

# Global parameters

[maxscale]

threads = auto

log_augmentation = 1

ms_timestamp = 1

syslog = 1



# Server definitions

[server1]

type=server

address=192.168.100.126

port=3306

protocol=MariaDBBackend

[server2]

type=server

address=192.168.100.127

port=3306

protocol=MariaDBBackend



# Monitor for the servers

[MariaDB-Monitor]

type=monitor

module=mariadbmon

servers=server1,server2

user=maxscaleuser

password=maxscalepassword

monitor_interval=2000



# Service definitions

[Read-Only-Service]

type=service

router=readconnroute

servers=server2

user=maxscaleuser

password=maxscalepassword

router_options=slave

[Read-Write-Service]

type=service

router=readwritesplit

servers=server1

user=maxscaleuser

password=maxscalepassword



# Listener definitions for the services

[Read-Only-Listener]

type=listener

service=Read-Only-Service

protocol=MariaDBClient

port=4008

[Read-Write-Listener]

type=listener

service=Read-Write-Service

protocol=MariaDBClient

port=4006

In this configuration, we have 2 database nodes, 192.168.100.126 (Master) and 192.168.100.127 (Slave), as you can see in the Servers Definition section.

We have also 2 different services, one for read-only, where there is the slave node, and another one for read-write where there is the master node.

Finally, we have 2 listeners, one for each service. The read-only listener, listening in the port 4008, and the read-write one, listening in the port 4006.

This is a basic configuration file. If you need something more specific you can follow the official MariaDB documentation.

Now you are ready to start it, so just run:

$ systemctl start maxscale.service

And check it:

$ maxctrl list services
ff
$ maxctrl list servers

You can find a maxctrl commands list here, or you can even use maxadmin to manage it.

Now let’s test the connection. For this, you can try to access your database using the MaxScale IP address and the port that you want to test. In our case, the traffic on the port 4006 should be sent to server1, and the traffic on the port 4008 to server2.

$ mysql -h 192.168.100.128 -umaxscaleuser -pmaxscalepassword -P4006 -e 'SELECT @@hostname;'

+------------+

| @@hostname |

+------------+

| server1   |

+------------+

$ mysql -h 192.168.100.128 -umaxscaleuser -pmaxscalepassword -P4008 -e 'SELECT @@hostname;'

+------------+

| @@hostname |

+------------+

| server2   |

+------------+

It works!

How to Deploy MaxScale with ClusterControl

Let’s see now, how you can use ClusterControl to simplify this task. For this, we will assume you have your MariaDB cluster added to ClusterControl.

Go to ClusterControl -> Select the MariaDB cluster -> Cluster Actions -> Add Load Balancer -> MaxScale.

Here you can deploy a new MaxScale node or you can also import an existing one. If you are deploying it, you need to add the IP Address or Hostname, the admin and user MaxScale credentials, amount of threads, and ports (write and read-only). You can also specify which database node you want to add to the MaxScale configuration.

You can monitor the task in the ClusterControl Activity section. When it finishes, you will have a new MaxScale node in your MariaDB cluster.

And running the MaxScale commands from the ClusterControl UI without the need of accessing the server via SSH.

It looks easier than deploying it manually, right?

Conclusion

Having a Load Balancer is a good solution if you want to balance or split your traffic, or even for failover actions, and MaxScale, as a MariaDB product, is a good option for MariaDB databases

The installation is easy, but the configuration and usage could be difficult if it is something new for you. In that case, you can use ClusterControl to deploy, configure, and manage it in an easier way.

by Sebastian Insausti at March 16, 2020 10:45 AM

Henrik Ingo

Writing a data loader for database benchmarks

A task that I've done many times in my career in databases is to load data into a database as a first step in some benchmark. To do it efficiently you want to use multiple threads. Dividing the work onto many threads requires good comprehension of third grade math, yet can be surprisingly hard to get right.

The typical setup is often like this:

  1. The benchmark framework launches N independent threads. For example in Sysbench these are completely isolated Lua environments with no shared data structures or communication possible between the threads.
  2. Each thread gets as input its thread id i and the total number of threads launched N.

read more

by hingo at March 16, 2020 09:04 AM

March 15, 2020

Valeriy Kravchuk

Fun with Bugs #95 - On MySQL Bug Reports I am Subscribed to, Part XXIX

With conferences cancelled or postponed and people forced to stay at home due to COVID-19 wide spreading, what can be better than to give my readers a new list of MySQL bugs to check? Useful reading should help! So today I continue my previous post with a quick review of bugs I've subscribed to in February, 2020, while things were still going as usual for most of us...

Here is the list of InnoDB, replication, optimizer and some other bugs to note while working with recent MySQL 5.7.x and 8.0.19 releases:
  • Bug #98473 - "group replication will be block after lock table". This problem report by phoenix Zhang was declared not a bug recently. Looks like for group_replication_consistency= BEFORE_AND_AFTER it is expected to get nodes blocked if one of them executed LOCK TABLE ... WRITE and another tried to insert some rows into that table. Check last comment by Nuno Carvalho for more details. Having multiple nodes that change or block data in clusters is always fun. See also Bug #98643 - "group replication will be block primary node shutdown" from the same bug reporter. Analysis is still in progress for it
  • Bug #98498 - "P_S view status_by_thread does not reflect correct results". This bug was reported by Oli Sennhauser (my former colleague in MySQL, founder of FromDual). Really weird results in the output.
  • Bug #98501 - "Exchanging partition with tables does not update SDI correctly". Having a data dictionary is cool and useful, but the information there should be consistent in all places/forms where it is stored. Fungo Wang found and reported cases when it is wrong in the individual .ibd files for partitions or imported tablespaces. After some arguing about backups tools supported, the bug was verified.
  • Bug #98511 - "OPTIMIZE TABLE on myisam can increase table size (~2x) and reduce performance".  This funny bug looks like a regression in MySQL 8.0.x comparing to 5.7. I doubt Oracle is going to fix anything for MyISAM, but maybe the regression still matters for them. As it often happens, this bug reported by Pete Dishman was verified without adding a regression tag.
  • Bug #98520 - "Suboptimal implementations for some load/store functions for little-endian arch". Alexey Kopytov identified some remaining performance problems in low level functions like int6store() and uint6korr() in MySQL 8.0 for platforms like x86_64 or ARM64. He request to optimize those functions for little-endian architectures by providing specialized implementations, as it was done in MySQL 8.0.x for many other similar and more widely used functions.
  • Bug #98530 - "crash when inplace encryption resumes makes tablespace unusable". This bug (with nice MTR test case) was reported by Satya Bodapati from Percona. See also his another bug report, Bug #98537 - "inplace encryption resume thread doesn't update DD properly".
  • Bug #98546 - "Transient indexes statistics are updated in foreground causing performance issue". As noted by Charly Batista, if persistent statistics is not enabled InnoDB checks if 1/16 rows of the table have been changed and if this was the case, it calls the dict_stats_update function in the foreground. Moreover, it does not only degrade performance while recalculating stats in the foreground thread, but also sets RW_X_LATCH for the table in the process to serialize all access. So, use persistent statistics, but do not let it be automatically recalculated (and here I explained why).
  • Bug #98616 - "XA PREPARE/XA COMMIT/XA ROLLBACK lost if mysql crash just after binlog flush". Yet another problem with XA transactions support in MySQL. Dennis Gao provided a patch and kindly explained that currently:
    "When mysql start-up recover, the innodb engine will only recover the transaction in prepared state, which means the undo->state of the recovered transaction must be TRX_UNDO_PREPARED (check trx_rollback_resurrected in trx0roll.cc). So if a "xa prepare" transaction only flush binlog, it will just be rollback during start-up recover and lost."
  • Bug #98624 - "Can't connect to MySQL after establishing around 65536 connections". Yet another great finding by Fungo Wang. Basically, this is a bug in the MDL subsystem, the scalability of MDL is limited to 65536 users, due to the pins number limitation (LF_PINBOX_MAX_PINS = 65536) of the LF_HASH it employs.
  • Bug #98639 - "Redundant row-level locking for secondary index". I think this simple case pointed out by Sergei Muraviev is a yet another case of excessive locking where improvement is possible. I do not agree with simple "Not a bug" status.
  • Bug #98642 - "CONSISTENT SNAPSHOT CAN BE CORRUPTED BY OTHER TRANSACTIONS". This is a really serious bug found by Erwan MAS. It took me some efforts to make sure this bug report was properly processed and verified (my comment in the bug report is hidden, but that does not matter as long as it was really treated seriously). Looks like MySQL 8.0.x may be not affected, but recent 5.7.x and 5.6.x versions are affected for sure. Take care!
  • Bug #98651 - "Inserting into temporary table from function breaks replication in MIXED mode". This regression bug in MySQL 8 (vs 5.7.29) was reported by Alexey Gavrilov, who had create a separate GitHub repository for the test case.
  • Bug #98665 - "replication broken on blackhole node if binlog_rows_query_log_events on master". This bug was reported by Zhenghu Wen. Take care if you use BLACKHOLE tables on slaves. Both MySQL 8.0.19 and 5.7.29 are affected.
  • Bug #98673 - "Allow hints to reference query block by system name". I like optimizer hints implementation in MySQL. This feature request to make them even more useful and less confusing in case of complex queries. As Kaiwang CHen put it:
    "Note that the query blocks are internally identified with a number
    (SELECT_LEX::select_number), with which system names are defined. That
    system name could be explored to refer to any query block in the query."
    I do not see a patch in the bug report, but there was a plan to provide it.
I have few more bugs in my list for February, but let's continue with them next time. We have many boring weeks ahead it seems...

Frida is always searching for something... I do the same with MySQL bugs.
To summarize:
  1. Group replication still has a lot of problems to resolve before becoming a really mature solution. Ask Galera developers :) 
  2. There are regressions in MySQL 8.0.19.
  3. Percona and Alibaba engineers still help to make MySQL better.
  4. I still do not see a consistent use of "regression" tag for verified regression bugs. This is unfortunate.
  5. InnoDB locking still needs more attention.
  6. There is too much Docker usage in the industry for my linking...

by Valerii Kravchuk (noreply@blogger.com) at March 15, 2020 06:20 PM

March 13, 2020

SeveralNines

An Overview of Client-Side Field Level Encryption in MongoDB

Data often requires high end security on nearly every level of the data transaction so as to meet security policies, compliance, and government regulations. Organization reputation may be wrecked if there is unauthorized access to sensitive data, hence failure to comply with the outlined mandate. 

In this blog we will be discussing some of the security measures you can employ in regards to MongoDB,  especially focusing on the client side of things.

Scenarios Where Data May Be Accessed

There are several ways someone can access your MongoDB data, here are some of them...

  1. Capture of data over an insecure network. Someone can access your data through an API with a VPN network and it will be difficult to track them down. Data at rest is often the culprit in this case.
  2. A super user such as an administrator having direct access. This happens when you fail to define user roles and restrictions.
  3. Having access to on-disk data while reading databases of backup files.
  4. Reading the server memory and logged data.
  5. Accidental disclosure of data by staff member.

MongoDB Data Categories and How They are Secured

In general, any database system involves two type of data: 

  1. Data-at-rest : One that is stored in the database files
  2. Data-in-transit: One that is transacted between a client, server and the database.

MongoDB has an Encryption at Rest feature that encrypts database files on disk hence preventing access to on-disk database files.

Data-in-transit over a network can be secured in MongoDB through Transport Encryption using TLS/SSL by encrypting the data.

In the case of data being accidentally disclosed by a staff member for instance a receptionist on desktop screen, MongoDB integrates the Role-Based Access Control that allows administrators to grant and restrict collection-level permission for users.

Data transacted over the server may remain in memory and these approaches do not at any point address the security concern against data access in server memory. MongoDB therefore introduced Client-Side Field Level Encryption for encrypting specific fields of a document that involve confidential data.

Field Level Encryption

MongoDB works with documents that have defined fields. Some fields may be required to hold confidential information such as credit card number, social security number, patience diagnosis data and so much more.

Field Level Encryption will enable us to secure the fields and they can only be accessed by an authorized personnel with the decryption keys.

Encryption can be done in two ways

  1. Using a secret key. A single key is used for both encrypting and decrypting hence it has to be presented at source and destination transmission but kept secret by all parties.
  2. Using a public key. Uses a pair of keys whereby one is used to encrypt and the other used to decrypt

When applying Field Level Encryption consider using a new database setup rather than an existing one.

Client-Side Field Level Encryption (CSFLE)

Introduced in MongoDB version 4.2 Enterprise to offer database administrators with an adjustment to encrypt fields involving values that need to be secured. This is to say, the sensitive data is encrypted or decrypted by the client and only communicated to and from the server in an encrypted form. Besides, even super users who don’t have the encryption keys, will not have control over these encrypted data fields.

How to Implement CSFLE

In order for you to implement the Client-Side Field Level Encryption, you require the following:

  1. MongoDB Server 4.2 Enterprise
  2. MongoDB  Compatible with CSFLE
  3. File System Permissions
  4. Specific language drivers. (In our blog we are going to use Node.js)

The implementation procedure involves:

  • A local development environment with a software for running client and server
  • Generating and validating the encryption keys.
  • Configuring the client for automatic field-level encryption
  • Throughout operations in terms of queries of the encrypted fields.

CSFLE Implementation

CSFLE uses  the envelope encryption strategy whereby data encryption keys are encrypted with another key known as the master key. The Client application creates a master key that is stored in the Local Key Provider essentially the local file system.However, this storage approach is insecure hence in production, one is advised to configure the key in a Key Management System (KMS) that stores and decrypts data encryption keys remotely.

After the data encryption keys are generated, they are stored in the vault collection in the same MongoDB replica set as the encrypted data.

Create Master Key

In node js, we need to generate a 96-byte locally managed master key and write it to a file in the directory where the main script is executed from: 

$npm install fs && npm install crypto

Then in the script:

const crypto = require(“crypto”)

const fs = require(“fs”)



try{

fs.writeFileSync(‘masterKey.txt’, crypto.randomBytes(96))

}catch(err){

throw err;

}

Create Data Encryption Key

This key is stored in a key vault collection where CSFLE enabled clients can access the key for encryption/decryption. To generate one, you need the following:

  • Locally-managed master key
  • Connection to your database that is, the MongoDB connection string
  • Key vault namespace (database and collection)

Steps to Generate the Data Encryption Key

  1. Read the local master key generate before

const localMasterKey = fs.readFileSync(‘./masterKey.txt’);
  1. Specify the KMS provider settings that will be used by the client to discover the master key.

const kmsProvider = {

local: {

key: localMasterKey

}

}
  1. Creating the Data Encryption Key. We need to create a client with the MongoDB connection string and key vault namespace configuration. Let’s say we will have a database called users and inside it a keyVault collection. You need to install uuid-base64 first by running the command

$ npm install uuid-base64

Then in your script

const base64 = require('uuid-base64');

const keyVaultNamespace = 'users.keyVaul';

const client = new MongoClient('mongodb://localhost:27017', {

  useNewUrlParser: true,

  useUnifiedTopology: true,

});

async function createKey() {

  try {

    await client.connect();

    const encryption = new ClientEncryption(client, {

      keyVaultNamespace,

      kmsProvider,

    });

    const key = await encryption.createDataKey('local');

    const base64DataKeyId = key.toString('base64');

    const uuidDataKeyId = base64.decode(base64DataKeyId);

    console.log('DataKeyId [UUID]: ', uuidDataKeyId);

    console.log('DataKeyId [base64]: ', base64DataKeyId);

  } finally {

    await client.close();

  }

}

createKey();

You will then be presented with some result that resemble

DataKeyId [UUID]: ad4d735a-44789-48bc-bb93-3c81c3c90824

DataKeyId [base64]: 4K13FkSZSLy7kwABP4HQyD==

The client must have ReadWrite permissions on the specified key vault namespace

 

  1. To verify that the Data Encryption Key was created

const client = new MongoClient('mongodb://localhost:27017', {

  useNewUrlParser: true,

  useUnifiedTopology: true,

});



async function checkClient() {

  try {

    await client.connect();

    const keyDB = client.db(users);

    const keyColl = keyDB.collection(keyVault);

    const query = {

      _id: ‘4K13FkSZSLy7kwABP4HQyD==’,

    };

    const dataKey = await keyColl.findOne(query);

    console.log(dataKey);

  } finally {

    await client.close();

  }

}

checkClient();

You should receive some result of the sort

{

  _id: Binary {

    _bsontype: 'Binary',

    sub_type: 4,

    position: 2,

    buffer: <Buffer 68 ca d2 10 16 5d 45 bf 9d 1d 44 d4 91 a6 92 44>

  },

  keyMaterial: Binary {

    _bsontype: 'Binary',

    sub_type: 0,

    position: 20,

    buffer: <Buffer f1 4a 9f bd aa ac c9 89 e9 b3 da 48 72 8e a8 62 97 2a 4a a0 d2 d4 2d a8 f0 74 9c 16 4d 2c 95 34 19 22 05 05 84 0e 41 42 12 1e e3 b5 f0 b1 c5 a8 37 b8 ... 110 more bytes>

  },

  creationDate: 2020-02-08T11:10:20.021Z,

  updateDate: 2020-02-08T11:10:25.021Z,

  status: 0,

  masterKey: { provider: 'local' }

}

The returned document data incorporates: data encryption key id (UUID), data encryption key in encrypted form, KMS provider information of master key and metadata like day of creation.

Specifying Fields to be Encrypted Using the JSON Schema

A JSON Schema extension is used by the MongoDB drivers to configure automatic client-side encryption and decryption of the specified fields of documents in a collection. The CSFLE configuration for this schema will require: the encryption algorithm to use when encrypting each field, one or all the encryption keys encrypted with the CSFLE master key and the BSON Type of each field. 

However, this CSFLE JSON schema does not support document validation otherwise any validation instances will cause the client to throw an error. 

Clients who are not configured with the appropriate client-side JSON Schema can be restricted from writing unencrypted data to a field by using the server-side JSON Schema. 

There are mainly two encryption algorithms: Random and deterministic.

We will define some encryptMetadata key at root level of the JSON Schema and configure it with the fields to be encrypted by defining them in the properties field of the schema hence they will be able to inherit this encryption key.

{

    "bsonType" : "object",

    "encryptMetadata" : {

        "keyId" : // keyId generated here

    },

    "properties": {

        // field schemas here

    }

}

Let’s say you want to encrypt a bank account number field, you would do something like:

"bankAccountNumber": {

    "encrypt": {

        "bsonType": "int",

        "algorithm": "AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic"

    }

}

Because of high cardinality and the field being queryable, we use the deterministic approach. Sensitive fields such as  blood type which have low query plan and low cardinality may be encrypted using the random approach.

Array fields should use random encryption with CSFLE to enhance auto-encryption for all the elements. 

Mongocryptd Application

Installed in the MongoDB Enterprise Service 4.2 and later, this is a separate encryption application that automates the Client-side Field Level Encryption. Whenever a CSFLE  enabled client is created, this service is automatically started by default to:

  • Validate encryption instructions outlined in the JSON Schema, detect which fields are to be encrypted in the throughput operations.
  • Prevent unsupported operations from being executed on the encrypted fields.

To insert the data we will do the normal insert query and the resulting document will have sample data below in regard with the bank account field.

{

…

"bankAccountNumber":"Ac+ZbPM+sk7gl7CJCcIzlRAQUJ+uo/0WhqX+KbTNdhqCszHucqXNiwqEUjkGlh7gK8pm2JhIs/P3//nkVP0dWu8pSs6TJnpfUwRjPfnI0TURzQ==",

…

}

When an authorized personnel performs a query, the driver will decrypt this data and return in in a readable format i.e 

{

…

"bankAccountNumber":43265436456456456756,

…

}

Note:  It is not possible to query for documents on a randomly encrypted field unless you use another field to find the document that contains an approximation of the randomly encrypted field data.

Conclusion

Data security should be considered at all levels in regard to one at rest and transit. MongoDB Enterprise 4.2 Server offers developers with a window to encrypt data from the client side using the Client-Side Field Level Encryption hence securing the data from the database host providers and insecure network access. CSFLE uses envelope encryption where a master key is used to encrypt data encryption keys. The master key should therefore be kept safe using key management tools such as Key Management System.

 

by Onyancha Brian Henry at March 13, 2020 10:45 AM

March 12, 2020

SeveralNines

An Overview of Generated Columns for PostgreSQL

PostgreSQL 12 comes with a great new feature, Generated Columns. The functionality isn’t exactly anything new, but the standardization, ease of use, accessibility, and performance has been improved in this new version.

A Generated Column is a special column in a table that contains data automatically generated from other data within the row. The content of the generated column is automatically populated and updated whenever the source data, such as any other columns in the row, are changed themselves.

Generated Columns in PostgreSQL 12+

In recent versions of PostgreSQL, generated columns are a built-in feature allowing the CREATE TABLE or ALTER TABLE statements to add a column in which the content is automatically ‘generated’ as a result of an expression. These expressions could be simple mathematical operations from other columns, or a more advanced immutable function.Some benefits of implementing a generated column into a database design include:

  • The ability to add a column to a table containing computed data without need of updating application code to generate the data to then include it within INSERT and UPDATE operations. 
  • Reducing processing time on extremely frequent SELECT statements that would process the data on the fly. Since the processing of the data is done at the time of INSERT or UPDATE, the data is generated once and the SELECT statements only need to retrieve the data. In heavy read environments, this may be preferable, as long as the extra data storage used is acceptable. 
  • Since generated columns are updated automatically when the source data itself is updated, adding a generated column will add an assumed guarantee that the data in the generated column is always correct. 

In PostgreSQL 12, only the ‘STORED’ type of generated column is available. In other database systems, a generated column with a type ‘VIRTUAL’ is available, which acts more like a view where the result is calculated on the fly when the data is retrieved. Since the functionality is so similar to views, and simply writing the operation into a select statement, the functionality isn’t as beneficial as the ‘STORED’ functionality discussed here, but there’s a chance future versions will include the feature. 

Creating a Table With a Generated Column is done when defining the column itself. In this example, the generated column is ‘profit’, and is automatically generated by subtracting the purchase_price from the sale_price columns, then multiplied by the quantity_sold column. 

CREATE TABLE public.transactions (

    transactions_sid serial primary key,

    transaction_date timestamp with time zone DEFAULT now() NOT NULL,

    product_name character varying NOT NULL,

    purchase_price double precision NOT NULL,

    sale_price double precision NOT NULL,

    quantity_sold integer NOT NULL,

    profit double precision NOT NULL GENERATED ALWAYS AS  ((sale_price - purchase_price) * quantity_sold) STORED

);

In this example, a ‘transactions’ table is created to track some basic transactions and profits of an imaginary coffee shop. Inserting data into this table will show some immediate results.

​severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('House Blend Coffee', 5, 11.99, 1);

severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('French Roast Coffee', 6, 12.99, 4);

severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('BULK: House Blend Coffee, 10LB', 40, 100, 6);



severalnines=# SELECT * FROM public.transactions;

 transactions_sid |       transaction_date |          product_name | purchase_price | sale_price | quantity_sold | profit

------------------+-------------------------------+--------------------------------+----------------+------------+---------------+--------

                1 | 2020-02-28 04:50:06.626371+00 | House Blend Coffee             | 5 | 11.99 | 1 | 6.99

                2 | 2020-02-28 04:50:53.313572+00 | French Roast Coffee            | 6 | 12.99 | 4 | 27.96

                3 | 2020-02-28 04:51:08.531875+00 | BULK: House Blend Coffee, 10LB |             40 | 100 | 6 | 360

When updating the row, the generated column will automatically update:

​severalnines=# UPDATE public.transactions SET sale_price = 95 WHERE transactions_sid = 3;

UPDATE 1



severalnines=# SELECT * FROM public.transactions WHERE transactions_sid = 3;

 transactions_sid |       transaction_date |          product_name | purchase_price | sale_price | quantity_sold | profit

------------------+-------------------------------+--------------------------------+----------------+------------+---------------+--------

                3 | 2020-02-28 05:55:11.233077+00 | BULK: House Blend Coffee, 10LB |             40 | 95 | 6 | 330

This will ensure that the generated column is always correct, with no additional logic needed on the application side. 

NOTE: Generated columns cannot be INSERTED into or UPDATED directly, and any attempt to do so will return in an ERROR:

severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold, profit) VALUES ('BULK: House Blend Coffee, 10LB', 40, 95, 1, 95);

ERROR:  cannot insert into column "profit"

DETAIL:  Column "profit" is a generated column.



severalnines=# UPDATE public.transactions SET profit = 330 WHERE transactions_sid = 3;

ERROR:  column "profit" can only be updated to DEFAULT

DETAIL:  Column "profit" is a generated column.

Generated Columns on PostgreSQL 11 and Before

Even though built-in generated columns are new to version 12 of PostgreSQL, the functionally can still be achieved in earlier versions, it just needs a bit more setup with stored procedures and triggers. However, even with the ability to implement it on older versions, in addition to the added functionality that can be beneficial, strict data input compliance is harder to achieve, and depends on PL/pgSQL features and programming ingenuity. 

BONUS: The below example will also work on PostgreSQL 12+, so if the added functionality with a function / trigger combo is needed or desired in newer versions, this option is a valid fallback and not restricted to just versions older than 12. 

While this is a way to do it on previous versions of PostgreSQL, there are a couple of additional benefits of this method: 

  • Since mimicking the generated column uses a function, more complex calculations are able to be used. Generated Columns in version 12 require IMMUTABLE operations, but a trigger / function option could use a STABLE or VOLATILE type of function with greater possibilities and likely lesser performance accordingly. 
  • Using a function that has the option to be STABLE or VOLATILE also opens up the possibility to UPDATE additional columns, UPDATE other tables, or even create new data via INSERTS into other tables. (However, while these trigger / function options are much more flexible, that’s not to say an actual “Generated Column” is lacking, as it does what’s advertised with greater performance and efficiency.)

In this example, a trigger / function is set up to mimic the functionality of a PostgreSQL 12+ generated column, along with two pieces that raise an exception if an INSERT or UPDATE attempt to change the generated column. These can be omitted, but if they are omitted, exceptions will not be raised, and the actual data INSERTed or UPDATEd will be quietly discarded, which generally wouldn’t be recommended. 

The trigger itself is set to run BEFORE, which means the processing happens before the actual insert happens, and requires the RETURN of NEW, which is the RECORD that is modified to contain the new generated column value. This specific example was written to run on PostgreSQL version 11.

CREATE TABLE public.transactions (

    transactions_sid serial primary key,

    transaction_date timestamp with time zone DEFAULT now() NOT NULL,

    product_name character varying NOT NULL,

    purchase_price double precision NOT NULL,

    sale_price double precision NOT NULL,

    quantity_sold integer NOT NULL,

    profit double precision NOT NULL

);



CREATE OR REPLACE FUNCTION public.generated_column_function()

 RETURNS trigger

 LANGUAGE plpgsql

 IMMUTABLE

AS $function$

BEGIN



    -- This statement mimics the ERROR on built in generated columns to refuse INSERTS on the column and return an ERROR.

    IF (TG_OP = 'INSERT') THEN

        IF (NEW.profit IS NOT NULL) THEN

            RAISE EXCEPTION 'ERROR:  cannot insert into column "profit"' USING DETAIL = 'Column "profit" is a generated column.';

        END IF;

    END IF;



    -- This statement mimics the ERROR on built in generated columns to refuse UPDATES on the column and return an ERROR.

    IF (TG_OP = 'UPDATE') THEN

        -- Below, IS DISTINCT FROM is used because it treats nulls like an ordinary value. 

        IF (NEW.profit::VARCHAR IS DISTINCT FROM OLD.profit::VARCHAR) THEN

            RAISE EXCEPTION 'ERROR:  cannot update column "profit"' USING DETAIL = 'Column "profit" is a generated column.';

        END IF;

    END IF;



    NEW.profit := ((NEW.sale_price - NEW.purchase_price) * NEW.quantity_sold);

    RETURN NEW;



END;

$function$;




CREATE TRIGGER generated_column_trigger BEFORE INSERT OR UPDATE ON public.transactions FOR EACH ROW EXECUTE PROCEDURE public.generated_column_function();

NOTE: Make sure the function has the correct permissions / ownership to be executed by the desired application user(s).

As seen in the previous example, the results are the same in previous versions with a function / trigger solution:

​severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('House Blend Coffee', 5, 11.99, 1);

severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('French Roast Coffee', 6, 12.99, 4);

severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('BULK: House Blend Coffee, 10LB', 40, 100, 6);



severalnines=# SELECT * FROM public.transactions;

 transactions_sid |       transaction_date |          product_name | purchase_price | sale_price | quantity_sold | profit

------------------+-------------------------------+--------------------------------+----------------+------------+---------------+--------

                1 | 2020-02-28 00:35:14.855511-07 | House Blend Coffee             | 5 | 11.99 | 1 | 6.99

                2 | 2020-02-28 00:35:21.764449-07 | French Roast Coffee            | 6 | 12.99 | 4 | 27.96

                3 | 2020-02-28 00:35:27.708761-07 | BULK: House Blend Coffee, 10LB |             40 | 100 | 6 | 360

Updating the data will be similar. 

​severalnines=# UPDATE public.transactions SET sale_price = 95 WHERE transactions_sid = 3;

UPDATE 1



severalnines=# SELECT * FROM public.transactions WHERE transactions_sid = 3;

 transactions_sid |       transaction_date |          product_name | purchase_price | sale_price | quantity_sold | profit

------------------+-------------------------------+--------------------------------+----------------+------------+---------------+--------

                3 | 2020-02-28 00:48:52.464344-07 | BULK: House Blend Coffee, 10LB |             40 | 95 | 6 | 330

Lastly, attempting to INSERT into, or UPDATE the special column itself will result in an ERROR:

​severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold, profit) VALUES ('BULK: House Blend Coffee, 10LB', 40, 95, 1, 95);

ERROR:  ERROR: cannot insert into column "profit"

DETAIL:  Column "profit" is a generated column.

CONTEXT:  PL/pgSQL function generated_column_function() line 7 at RAISE



severalnines=# UPDATE public.transactions SET profit = 3030 WHERE transactions_sid = 3;

ERROR:  ERROR: cannot update column "profit"

DETAIL:  Column "profit" is a generated column.

CONTEXT:  PL/pgSQL function generated_column_function() line 15 at RAISE

In this example, it does act differently than the first generated column setup in a couple of ways that should be noted:

  • If the ‘generated column’ is attempted to be updated but no row is found to be updated, it will return success with an “UPDATE 0” result, while an actual Generated Column in version 12 will still return an ERROR, even if no row is found to UPDATE.
  • When attempting to update the profit column, which ‘should’ always return an ERROR, if the specified value is the same as the correctly ‘generated’ value, it will succeed. Ultimately the data is correct, however, if the desire is to return an ERROR if the column is specified.

Documentation and PostgreSQL Community

The official documentation for the PostgreSQL Generated Columns is located at the official PostgreSQL Website. Check back when new major versions of PostgreSQL are released to discover new features when they appear. 

While generated columns in PostgreSQL 12 are fairly straight forward, implementing similar functionality in previous versions has the potential to get much more complicated. The PostgreSQL community is a very active, massive, worldwide, and multilingual community dedicated to helping people of any level of PostgreSQL experience solve problems and create new solutions such as this. 

  • IRC: Freenode has a very active channel called #postgres, where users help each other understand concepts, fix errors, or find other resources. A full list of available freenode channels for all things PostgreSQL can be found on the PostgreSQL.org website.
  • Mailing Lists: PostgreSQL has a handful of mailing lists that can be joined. Longer form questions / issues can be sent here, and can reach many more people than IRC at any given time. The lists can be found on the PostgreSQL Website, and the lists pgsql-general or pgsql-admin are good resources. 
  • Slack: The PostgreSQL community has also been thriving on Slack, and can be joined at postgresteam.slack.com. Much like IRC, an active community is available to answer questions and engage in all things PostgreSQL.

by Brian Fehrle at March 12, 2020 10:45 AM

March 11, 2020

SeveralNines

How to Fix a Lock Wait Timeout Exceeded Error in MySQL

One of the most popular InnoDB's errors is InnoDB lock wait timeout exceeded, for example:

SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction

The above simply means the transaction has reached the innodb_lock_wait_timeout while waiting to obtain an exclusive lock which defaults to 50 seconds. The common causes are:

  1. The offensive transaction is not fast enough to commit or rollback the transaction within innodb_lock_wait_timeout duration.
  2. The offensive transaction is waiting for row lock to be released by another transaction.

The Effects of a InnoDB Lock Wait Timeout

InnoDB lock wait timeout can cause two major implications:

  • The failed statement is not being rolled back by default.
  • Even if innodb_rollback_on_timeout is enabled, when a statement fails in a transaction, ROLLBACK is still a more expensive operation than COMMIT.

Let's play around with a simple example to better understand the effect. Consider the following two tables in database mydb:

mysql> CREATE SCHEMA mydb;
mysql> USE mydb;

The first table (table1):

mysql> CREATE TABLE table1 ( id INT PRIMARY KEY AUTO_INCREMENT, data VARCHAR(50));
mysql> INSERT INTO table1 SET data = 'data #1';

The second table (table2):

mysql> CREATE TABLE table2 LIKE table1;
mysql> INSERT INTO table2 SET data = 'data #2';

We executed our transactions in two different sessions in the following order:

Ordering

Transaction #1 (T1)

Transaction #2 (T2)

1

SELECT * FROM table1;

(OK)

SELECT * FROM table1;

(OK)

2

UPDATE table1 SET data = 'T1 is updating the row' WHERE id = 1;  

(OK)

 

3

 

UPDATE table2 SET data = 'T2 is updating the row' WHERE id = 1; 

(OK)

4

 

UPDATE table1 SET data = 'T2 is updating the row' WHERE id = 1; 

(Hangs for a while and eventually returns an error "Lock wait timeout exceeded; try restarting transaction")

5

COMMIT;

(OK)

 

6

 

COMMIT;

(OK)

However, the end result after step #6 might be surprising if we did not retry the timed out statement at step #4:
mysql> SELECT * FROM table1 WHERE id = 1;
+----+-----------------------------------+
| id | data                              |
+----+-----------------------------------+
| 1  | T1 is updating the row            |
+----+-----------------------------------+



mysql> SELECT * FROM table2 WHERE id = 1;
+----+-----------------------------------+
| id | data                              |
+----+-----------------------------------+
| 1  | T2 is updating the row            |
+----+-----------------------------------+

After T2 was successfully committed, one would expect to get the same output "T2 is updating the row" for both table1 and table2 but the results show that only table2 was updated. One might think that if any error encounters within a transaction, all statements in the transaction would automatically get rolled back, or if a transaction is successfully committed, the whole statements were executed atomically. This is true for deadlock, but not for InnoDB lock wait timeout.

Unless you set innodb_rollback_on_timeout=1 (default is 0 - disabled), automatic rollback is not going to happen for InnoDB lock wait timeout error. This means, by following the default setting, MySQL is not going to fail and rollback the whole transaction, nor retrying again the timed out statement and just process the next statements until it reaches COMMIT or ROLLBACK. This explains why transaction T2 was partially committed!

The InnoDB documentation clearly says "InnoDB rolls back only the last statement on a transaction timeout by default". In this case, we do not get the transaction atomicity offered by InnoDB. The atomicity in ACID compliant is either we get all or nothing of the transaction, which means partial transaction is merely unacceptable.

Dealing With a InnoDB Lock Wait Timeout

So, if you are expecting a transaction to auto-rollback when encounters an InnoDB lock wait error, similarly as what would happen in deadlock, set the following option in MySQL configuration file:

innodb_rollback_on_timeout=1

A MySQL restart is required. When deploying a MySQL-based cluster, ClusterControl will always set innodb_rollback_on_timeout=1 on every node. Without this option, your application has to retry the failed statement, or perform ROLLBACK explicitly to maintain the transaction atomicity.

To verify if the configuration is loaded correctly:

mysql> SHOW GLOBAL VARIABLES LIKE 'innodb_rollback_on_timeout';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| innodb_rollback_on_timeout | ON    |
+----------------------------+-------+

To check whether the new configuration works, we can track the com_rollback counter when this error happens:

mysql> SHOW GLOBAL STATUS LIKE 'com_rollback';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Com_rollback  | 1     |
+---------------+-------+

Tracking the Blocking Transaction

There are several places that we can look to track the blocking transaction or statements. Let's start by looking into InnoDB engine status under TRANSACTIONS section:

mysql> SHOW ENGINE INNODB STATUS\G
------------
TRANSACTIONS
------------

...

---TRANSACTION 3100, ACTIVE 2 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 50, OS thread handle 139887555282688, query id 360 localhost ::1 root updating
update table1 set data = 'T2 is updating the row' where id = 1

------- TRX HAS BEEN WAITING 2 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 6 page no 4 n bits 72 index PRIMARY of table `mydb`.`table1` trx id 3100 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;;
 1: len 6; hex 000000000c19; asc       ;;
 2: len 7; hex 020000011b0151; asc       Q;;
 3: len 22; hex 5431206973207570646174696e672074686520726f77; asc T1 is updating the row;;
------------------

---TRANSACTION 3097, ACTIVE 46 sec
2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
MySQL thread id 48, OS thread handle 139887556167424, query id 358 localhost ::1 root
Trx read view will not see trx with id >= 3097, sees < 3097

From the above information, we can get an overview of the transactions that are currently active in the server. Transaction 3097 is currently locking a row that needs to be accessed by transaction 3100. However, the above output does not tell us the actual query text that could help us figuring out which part of the query/statement/transaction that we need to investigate further. By using the blocker MySQL thread ID 48, let's see what we can gather from MySQL processlist:

mysql> SHOW FULL PROCESSLIST;
+----+-----------------+-----------------+--------------------+---------+------+------------------------+-----------------------+
| Id | User            | Host            | db                 | Command | Time | State                  | Info                  |
+----+-----------------+-----------------+--------------------+---------+------+------------------------+-----------------------+
| 4  | event_scheduler | localhost       | <null>             | Daemon  | 5146 | Waiting on empty queue | <null>                |
| 10 | root            | localhost:56042 | performance_schema | Query   | 0    | starting               | show full processlist |
| 48 | root            | localhost:56118 | mydb               | Sleep   | 145  |                        | <null>                |
| 50 | root            | localhost:56122 | mydb               | Sleep   | 113  |                        | <null>                |
+----+-----------------+-----------------+--------------------+---------+------+------------------------+-----------------------+

Thread ID 48 shows the command as 'Sleep'. Still, this does not help us much to know which statements that block the other transaction. This is because the statement in this transaction has been executed and this open transaction is basically doing nothing at the moment. We need to dive further down to see what is going on with this thread.

For MySQL 8.0, the InnoDB lock wait instrumentation is available under data_lock_waits table inside performance_schema database (or innodb_lock_waits table inside sys database). If a lock wait event is happening, we should see something like this:

mysql> SELECT * FROM performance_schema.data_lock_waits\G
***************************[ 1. row ]***************************
ENGINE                           | INNODB
REQUESTING_ENGINE_LOCK_ID        | 139887595270456:6:4:2:139887487554680
REQUESTING_ENGINE_TRANSACTION_ID | 3100
REQUESTING_THREAD_ID             | 89
REQUESTING_EVENT_ID              | 8
REQUESTING_OBJECT_INSTANCE_BEGIN | 139887487554680
BLOCKING_ENGINE_LOCK_ID          | 139887595269584:6:4:2:139887487548648
BLOCKING_ENGINE_TRANSACTION_ID   | 3097
BLOCKING_THREAD_ID               | 87
BLOCKING_EVENT_ID                | 9
BLOCKING_OBJECT_INSTANCE_BEGIN   | 139887487548648

Note that in MySQL 5.6 and 5.7, the similar information is stored inside innodb_lock_waits table under information_schema database. Pay attention to the BLOCKING_THREAD_ID value. We can use the this information to look for all statements being executed by this thread in events_statements_history table:

mysql> SELECT * FROM performance_schema.events_statements_history WHERE `THREAD_ID` = 87;
0 rows in set

It looks like the thread information is no longer there. We can verify by checking the minimum and maximum value of the thread_id column in events_statements_history table with the following query:

mysql> SELECT min(`THREAD_ID`), max(`THREAD_ID`) FROM performance_schema.events_statements_history;
+------------------+------------------+
| min(`THREAD_ID`) | max(`THREAD_ID`) |
+------------------+------------------+
| 98               | 129              |
+------------------+------------------+

The thread that we were looking for (87) has been truncated from the table. We can confirm this by looking at the size of event_statements_history table:

mysql> SELECT @@performance_schema_events_statements_history_size;
+-----------------------------------------------------+
| @@performance_schema_events_statements_history_size |
+-----------------------------------------------------+
| 10                                                  |
+-----------------------------------------------------+

The above means the events_statements_history can only store the last 10 threads. Fortunately, performance_schema has another table to store more rows called events_statements_history_long, which stores similar information but for all threads and it can contain way more rows:

mysql> SELECT @@performance_schema_events_statements_history_long_size;
+----------------------------------------------------------+
| @@performance_schema_events_statements_history_long_size |
+----------------------------------------------------------+
| 10000                                                    |
+----------------------------------------------------------+

However, you will get an empty result if you try to query the events_statements_history_long table for the first time. This is expected because by default, this instrumentation is disabled in MySQL as we can see in the following setup_consumers table:

mysql> SELECT * FROM performance_schema.setup_consumers;
+----------------------------------+---------+
| NAME                             | ENABLED |
+----------------------------------+---------+
| events_stages_current            | NO      |
| events_stages_history            | NO      |
| events_stages_history_long       | NO      |
| events_statements_current        | YES     |
| events_statements_history        | YES     |
| events_statements_history_long   | NO      |
| events_transactions_current      | YES     |
| events_transactions_history      | YES     |
| events_transactions_history_long | NO      |
| events_waits_current             | NO      |
| events_waits_history             | NO      |
| events_waits_history_long        | NO      |
| global_instrumentation           | YES     |
| thread_instrumentation           | YES     |
| statements_digest                | YES     |
+----------------------------------+---------+

To activate table events_statements_history_long, we need to update the setup_consumers table as below:

mysql> UPDATE performance_schema.setup_consumers SET enabled = 'YES' WHERE name = 'events_statements_history_long';

Verify if there are rows in the events_statements_history_long table now:

mysql> SELECT count(`THREAD_ID`) FROM performance_schema.events_statements_history_long;
+--------------------+
| count(`THREAD_ID`) |
+--------------------+
| 4                  |
+--------------------+

Cool. Now we can wait until the InnoDB lock wait event raises again and when it is happening, you should see the following row in the data_lock_waits table:

mysql> SELECT * FROM performance_schema.data_lock_waits\G
***************************[ 1. row ]***************************
ENGINE                           | INNODB
REQUESTING_ENGINE_LOCK_ID        | 139887595270456:6:4:2:139887487555024
REQUESTING_ENGINE_TRANSACTION_ID | 3083
REQUESTING_THREAD_ID             | 60
REQUESTING_EVENT_ID              | 9
REQUESTING_OBJECT_INSTANCE_BEGIN | 139887487555024
BLOCKING_ENGINE_LOCK_ID          | 139887595269584:6:4:2:139887487548648
BLOCKING_ENGINE_TRANSACTION_ID   | 3081
BLOCKING_THREAD_ID               | 57
BLOCKING_EVENT_ID                | 8
BLOCKING_OBJECT_INSTANCE_BEGIN   | 139887487548648

Again, we use the BLOCKING_THREAD_ID value to filter all statements that have been executed by this thread against events_statements_history_long table: 

mysql> SELECT `THREAD_ID`,`EVENT_ID`,`EVENT_NAME`, `CURRENT_SCHEMA`,`SQL_TEXT` FROM events_statements_history_long 
WHERE `THREAD_ID` = 57
ORDER BY `EVENT_ID`;
+-----------+----------+-----------------------+----------------+----------------------------------------------------------------+
| THREAD_ID | EVENT_ID | EVENT_NAME            | CURRENT_SCHEMA | SQL_TEXT                                                       |
+-----------+----------+-----------------------+----------------+----------------------------------------------------------------+
| 57        | 1        | statement/sql/select  | <null>         | select connection_id()                                         |
| 57        | 2        | statement/sql/select  | <null>         | SELECT @@VERSION                                               |
| 57        | 3        | statement/sql/select  | <null>         | SELECT @@VERSION_COMMENT                                       |
| 57        | 4        | statement/com/Init DB | <null>         | <null>                                                         |
| 57        | 5        | statement/sql/begin   | mydb           | begin                                                          |
| 57        | 7        | statement/sql/select  | mydb           | select 'T1 is in the house'                                    |
| 57        | 8        | statement/sql/select  | mydb           | select * from table1                                           |
| 57        | 9        | statement/sql/select  | mydb           | select 'some more select'                                      |
| 57        | 10       | statement/sql/update  | mydb           | update table1 set data = 'T1 is updating the row' where id = 1 |
+-----------+----------+-----------------------+----------------+----------------------------------------------------------------+

Finally, we found the culprit. We can tell by looking at the sequence of events of thread 57 where the above transaction (T1) still has not finished yet (no COMMIT or ROLLBACK), and we can see the very last statement has obtained an exclusive lock to the row for update operation which needed by the other transaction (T2) and just hanging there. That explains why we see 'Sleep' in the MySQL processlist output.

As we can see, the above SELECT statement requires you to get the thread_id value beforehand. To simplify this query, we can use IN clause and a subquery to join both tables. The following query produces an identical result like the above:

mysql> SELECT `THREAD_ID`,`EVENT_ID`,`EVENT_NAME`, `CURRENT_SCHEMA`,`SQL_TEXT` from events_statements_history_long WHERE `THREAD_ID` IN (SELECT `BLOCKING_THREAD_ID` FROM data_lock_waits) ORDER BY `EVENT_ID`;
+-----------+----------+-----------------------+----------------+----------------------------------------------------------------+
| THREAD_ID | EVENT_ID | EVENT_NAME            | CURRENT_SCHEMA | SQL_TEXT                                                       |
+-----------+----------+-----------------------+----------------+----------------------------------------------------------------+
| 57        | 1        | statement/sql/select  | <null>         | select connection_id()                                         |
| 57        | 2        | statement/sql/select  | <null>         | SELECT @@VERSION                                               |
| 57        | 3        | statement/sql/select  | <null>         | SELECT @@VERSION_COMMENT                                       |
| 57        | 4        | statement/com/Init DB | <null>         | <null>                                                         |
| 57        | 5        | statement/sql/begin   | mydb           | begin                                                          |
| 57        | 7        | statement/sql/select  | mydb           | select 'T1 is in the house'                                    |
| 57        | 8        | statement/sql/select  | mydb           | select * from table1                                           |
| 57        | 9        | statement/sql/select  | mydb           | select 'some more select'                                      |
| 57        | 10       | statement/sql/update  | mydb           | update table1 set data = 'T1 is updating the row' where id = 1 |
+-----------+----------+-----------------------+----------------+----------------------------------------------------------------+

However, it is not practical for us to execute the above query whenever InnoDB lock wait event occurs. Apart from the error from the application, how would you know that the lock wait event is happening? We can automate this query execution with the following simple Bash script, called track_lockwait.sh:

$ cat track_lockwait.sh
#!/bin/bash
## track_lockwait.sh
## Print out the blocking statements that causing InnoDB lock wait

INTERVAL=5
DIR=/root/lockwait/

[ -d $dir ] || mkdir -p $dir

while true; do
  check_query=$(mysql -A -Bse 'SELECT THREAD_ID,EVENT_ID,EVENT_NAME,CURRENT_SCHEMA,SQL_TEXT FROM events_statements_history_long WHERE THREAD_ID IN (SELECT BLOCKING_THREAD_ID FROM data_lock_waits) ORDER BY EVENT_ID')

  # if $check_query is not empty
  if [[ ! -z $check_query ]]; then
    timestamp=$(date +%s)
    echo $check_query > $DIR/innodb_lockwait_report_${timestamp}
  fi

  sleep $INTERVAL
done

Apply executable permission and daemonize the script in the background:

$ chmod 755 track_lockwait.sh
$ nohup ./track_lockwait.sh &

Now, we just need to wait for the reports to be generated under the /root/lockwait directory. Depending on the database workload and row access patterns, you might probably see a lot of files under this directory. Monitor the directory closely otherwise it would be flooded with too many report files.

If you are using ClusterControl, you can enable the Transaction Log feature under Performance -> Transaction Log where ClusterControl will provide a report on deadlocks and long-running transactions which will ease up your life in finding the culprit.

Conclusion

It is really important to enable innodb_rollback_on_timeout if your application does not handle the InnoDB lock wait timeout error properly. Otherwise, you might lose the transaction atomicity, and tracking down the culprit is not a straightforward task.

by ashraf at March 11, 2020 05:46 PM

MariaDB Foundation

Business As Unusual

MariaDB Foundation faces an unusual world, just like anyone else in these Corona times. Or perhaps, not quite. Here are some ideas for how to cope with a world inhibiting travel and social contact as we know it, from someone who has worked from home for 20 years, with colleagues also working from home.

The post Business As Unusual appeared first on MariaDB.org.

by Kaj Arnö at March 11, 2020 11:14 AM

March 10, 2020

SeveralNines

How High CPU Utilization Effects Database Performance

One of the indicators to see if our database is experiencing performance issues is by looking at the CPU utilization. High CPU usage is directly proportional to disk I/O (where data is either read or written to the disk). In this blog we will take a close look at some of the parameters in the database and how they are related to the CPU core of the server. 

How to Confirm if Your CPU Utilization is High

If you think you are experiencing high CPU utilization, you should check the process in the Operating System to determine what is causing the issue. To do this, you can utilize the htop / top command in the operating system. 

[root@n1]# top

top - 14:10:35 up 39 days, 20:20,  2 users, load average: 0.30, 0.68, 1.13

Tasks: 461 total,   2 running, 459 sleeping,   0 stopped, 0 zombie

%Cpu(s):  0.7 us, 0.6 sy,  0.0 ni, 98.5 id, 0.1 wa,  0.0 hi, 0.0 si, 0.0 st

KiB Mem : 13145656+total,   398248 free, 12585008+used, 5208240 buff/cache

KiB Swap: 13421772+total, 11141702+free, 22800704 used.  4959276 avail Mem



  PID USER      PR NI VIRT    RES SHR S %CPU %MEM     TIME+ COMMAND

15799 mysql     20 0 145.6g 118.2g   6212 S 87.4 94.3 7184:35 mysqld

21362 root      20 0 36688 15788   2924 S 15.2 0.0 5805:21 node_exporter

From above top command, the highest CPU utilization came from mysqld daemon.  You can check inside the database itself, what’s process is running :

​mysql> show processlist;

+-----+----------+-------------------+--------------------+------------------+------+---------------------------------------------------------------+------------------------------------------------------------------------+-----------+---------------+

| Id  | User     | Host         | db | Command          | Time | State                               | Info   | Rows_sent | Rows_examined |

+-----+----------+-------------------+--------------------+------------------+------+---------------------------------------------------------------+------------------------------------------------------------------------+-----------+---------------+

|  32 | rpl_user | 10.10.10.18:45338 | NULL               | Binlog Dump GTID | 4134 | Master has sent all binlog to slave; waiting for more updates | NULL                                                                   | 0 | 0 |

|  47 | cmon     | 10.10.10.10:37214 | information_schema | Sleep            | 1 |           | NULL   | 492 | 984 |

|  50 | cmon     | 10.10.10.10:37390 | information_schema | Sleep            | 0 |           | NULL   | 0 | 0 |

|  52 | cmon     | 10.10.10.10:37502 | information_schema | Sleep            | 2 |           | NULL   | 1 | 599 |

| 429 | root     | localhost   | item | Query            | 4 | Creating sort index                               | select * from item where i_data like '%wjf%' order by i_data   | 0 | 0 |

| 443 | root     | localhost   | item | Query            | 2 | Creating sort index                               | select * from item where i_name like 'item-168%' order by i_price desc |         0 | 0 |

| 471 | root     | localhost   | NULL | Query            | 0 | starting                               | show processlist   | 0 | 0 |

+-----+----------+-------------------+--------------------+------------------+------+---------------------------------------------------------------+------------------------------------------------------------------------+-----------+---------------+

7 rows in set (0.00 sec)

There are some running queries (as shown above) and it’s the SELECT query to the table item with state Creating sort index. The meaning of the creating sort index state is where the database is figuring out the order of returned values based on the order clause, the constraint with availability of CPU speed. So if you have limited CPU cores, this will impact the query. 

Enabling a slow query log is beneficial to check if there’s an issue from the application side. You can enable the slow query by setting two parameters in the database, which is slow_query_log and long_query_time. Parameter slow_query_log must be set to ON to enable the slow query log, and long_query_time parameter is used for the threshold alert of long running query. There is also one parameter related to location of slow query file, slow_query_log_file can be set to any path to store the slow query log file. 

mysql> explain select * from item where i_data like '%wjf%' order by i_data;

+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+

| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref | rows | filtered | Extra |

+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+

|  1 | SIMPLE      | item | NULL     | ALL | NULL   | NULL | NULL | NULL | 9758658 |    11.11 | Using where; Using filesort |

+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+

1 row in set, 1 warning (0.00 sec)

The long running queries (as shown above) were using a full table scan as the access method, you can see by checking the value of the column type, in this case it’s  ALL. It means it will scan all the rows in the table to produce the data. The number of rows examined by the query is also quite large, around 9758658 rows, of course it will take more CPU time to spend on the query.

How ClusterControl Dashboards Identify is Running CPU High

ClusterControl helps with your daily routine and tasks and let’s you know if there’s something happening in your system, such as a high CPU. You can check the dashboard directly regarding CPU utilization. 

As you can see on the screenshot above, we can clearly tell that CPU utilisation is quite high and there are significant spikes even up to 87%. The graph do not tell more, but you can always check the ‘top’ command in the shell or you can dig up what are currently running processes in the server from Top dashboard, as shown below :

If you see from the top dashboard, most of the CPU resources are taken by the mysqld daemon process. This mysqld process is the suspect who consumes a lot of CPU resources. You can dig inside the database more to check what is running, you also can check the running queries and top queries in the database through dashboard as shown below 

Running Queries Dashboard

From the Running Queries dashboard, you can see the query executed, time spent of the query, state of the query itself. User and db info is related to the user who executed the query in which database.

Top Queries Dashboard

From the Top Queries dashboard, we can see the query that is causing the problem. The query scans 10614051 rows in the item table. The average execution time of the query is around 12.4 seconds, which is quite long for the query. 

Conclusion

Troubleshooting CPU high is not so difficult, you just need to know what is currently running in the database server, with the help of the right tool, you can fix the problem immediately.

 

by agus at March 10, 2020 07:00 PM

March 09, 2020

SeveralNines

How to Rebuild an Inconsistent MySQL Slave?

MySQL slaves may become inconsistent. You can try to avoid it, but it’s really hard. Setting super_read_only and using row-based replication can help a lot, but no matter what you do, it is still possible that your slave will become inconsistent. 

What can be done to rebuild an inconsistent MySQL slave? In this blog post we’ll take a look at this problem.

First off, let’s discuss what has to happen in order to rebuild a slave. To bring a node into MySQL Replication, it has to be provisioned with data from one of the nodes in the replication topology. This data has to be consistent at the point in time when it was collected. You cannot take it on a table by table or schema by schema basis, because this will make the provisioned node inconsistent internally. Meaning some data would be older than some other part of the dataset. 

In addition to data consistency, it should also be possible to collect information about the relationship between the data and the state of replication. You want to have either binary log position at which the collected data is consistent or Global Transaction ID of the transaction which was the last one executed on the node that is the source of the data.

This leads us to the following considerations. You can rebuild a slave using any backup tool as long as this tool can generate consistent backup and it includes replication coordinates for the point-in-time in which the backup is consistent. This allows us to pick from a couple of options.

Using Mysqldump to Rebuild an Inconsistent MySQL Slave

Mysqldump is the most basic tool that we have to achieve this. It allows us to create a logical backup in, among others, the form of SQL statements. What is important, while being basic, it still allows us to take a consistent backup: it can use transaction to ensure that the data is consistent at the beginning of the transaction. It can also write down replication coordinates for that point, even a whole CHANGE MASTER statement, making it easy to start the replication using the backup.

Using Mydumper to Rebuild an Inconsistent MySQL Slave

Another option is to use mydumper - this tool, just like mysqldump, generates a logical backup and, just like mysqldump, can be used to create a consistent backup of the database. The main difference between mydumper and mysqldump is that mydumper, when paired with myloader, can dump and restore data in parallel, improving the dump and, especially, restore time.

Using a Snapshot to Rebuild an Inconsistent MySQL Slave

For those who use cloud providers, a possibility is to take a snapshot of the underlying block storage. Snapshots generate a point-in-time view of the data. This process is quite tricky though, as the consistency of the data and the ability to restore it depends mostly on the MySQL configuration. 

You should ensure that the database works in a durable mode (it is configured in a way that crash of the MySQL will not result in any data loss). This is because (from a MySQL standpoint) taking a volume snapshot and then starting another MySQL instance off the data stored in it is, basically, the same process like if you would kill -9 the mysqld and then start it again. The InnoDB recovery has to happen, replay transactions that have been stored in binary logs, rollback transactions that haven’t completed before the crash and so on. 

The downside of snapshot-based rebuild process is that it is strongly tied to the current vendor. You cannot easily copy the snapshot data from one cloud provider to another one. You may be able to move it between different regions but it will still be the same provider.

Using a Xtrabackup or Mariabackup to Rebuild an Inconsistent MySQL Slave

Finally, xtrabackup/mariabackup - this is a tool written by Percona and forked by MariaDB that allows to generate a physical backup. It is way faster than logical backups - it’s limited mostly by the hardware performance - disk or network being the most probable bottlenecks.  Most of the workload is related to copying files from MySQL data directory to another location (on the same host or over the network). 

While not nearly as fast as block storage snapshots, xtrabackup is way more flexible and can be used in any environment. The backup it produces consists of files, therefore it is perfectly possible to copy the backup to any location you like. Another cloud provider, your local datacenter, it doesn’t matter as long as you can transfer files from your current location. 

It doesn’t even have to have network connectivity - you can as well just copy the backup to some “transferable” device like USB SSD or even USB stick, as long as it can contain all of the data and store it in your pocket while you relocate from one datacenter to another.

How to Rebuild a MySQL Slave Using Xtrabackup?

We decided to focus on xtrabackup, given its flexibility and ability to work in most of the environments where MySQL can exist. How do you rebuild your slave using xtrabackup? Let’s take a look.

Initially, we have a master and a slave, which suffered from some replication issues:

mysql> SHOW SLAVE STATUS\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 10.0.0.141

                  Master_User: rpl_user

                  Master_Port: 3306

                Connect_Retry: 10

              Master_Log_File: binlog.000004

          Read_Master_Log_Pos: 386

               Relay_Log_File: relay-bin.000008

                Relay_Log_Pos: 363

        Relay_Master_Log_File: binlog.000004

             Slave_IO_Running: Yes

            Slave_SQL_Running: No

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 1007

                   Last_Error: Error 'Can't create database 'mytest'; database exists' on query. Default database: 'mytest'. Query: 'create database mytest'

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 195

              Relay_Log_Space: 756

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 1007

               Last_SQL_Error: Error 'Can't create database 'mytest'; database exists' on query. Default database: 'mytest'. Query: 'create database mytest'

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 1001

                  Master_UUID: 53d96192-53f7-11ea-9c3c-080027c5bc64

             Master_Info_File: mysql.slave_master_info

                    SQL_Delay: 0

          SQL_Remaining_Delay: NULL

      Slave_SQL_Running_State:

           Master_Retry_Count: 86400

                  Master_Bind:

      Last_IO_Error_Timestamp:

     Last_SQL_Error_Timestamp: 200306 11:47:42

               Master_SSL_Crl:

           Master_SSL_Crlpath:

           Retrieved_Gtid_Set: 53d96192-53f7-11ea-9c3c-080027c5bc64:9

            Executed_Gtid_Set: 53d96192-53f7-11ea-9c3c-080027c5bc64:1-8,

ce7d0c38-53f7-11ea-9f16-080027c5bc64:1-3

                Auto_Position: 1

         Replicate_Rewrite_DB:

                 Channel_Name:

           Master_TLS_Version:

       Master_public_key_path:

        Get_master_public_key: 0

            Network_Namespace:

1 row in set (0.00 sec)

As you can see, there is a problem with one of the schemas. Let’s assume we have to rebuild this node to bring it back into the replication. Here are the steps we have to perform.

First, we have to make sure xtrabackup is installed. In our case we use MySQL 8.0 therefore we have to use xtrabackup in version 8 to ensure compatibility:

root@master:~# apt install percona-xtrabackup-80

Reading package lists... Done

Building dependency tree

Reading state information... Done

percona-xtrabackup-80 is already the newest version (8.0.9-1.bionic).

0 upgraded, 0 newly installed, 0 to remove and 143 not upgraded.

Xtrabackup is provided by Percona repository and the guide to installing it can be found here:

https://www.percona.com/doc/percona-xtrabackup/8.0/installation/apt_repo.html

The tool has to be installed on both master and the slave that we want to rebuild.

As a next step we will remove all the data from the “broken” slave:

root@slave:~# service mysql stop

root@slave:~# rm -rf /var/lib/mysql/*

Next, we will take the backup on the master and stream it to the slave. Please keep in mind this particular one-liner requires passwordless SSH root connectivity from the master to the slave:

root@master:~# xtrabackup --backup --compress --stream=xbstream --target-dir=./ | ssh root@10.0.0.142 "xbstream -x --decompress -C /var/lib/mysql/"

At the end you should see an important line:

200306 12:10:40 completed OK!

This is an indicator that the backup completed OK. Couple of things may still go wrong but at least we got the data right. Next, on the slave, we have to prepare the backup.

root@slave:~# xtrabackup --prepare --target-dir=/var/lib/mysql/

.

.

.

200306 12:16:07 completed OK!

You should see, again, that the process completed OK. You may want now to copy the data back to the MySQL data directory. We don’t have to do that as we stored the streaming backup directly in /var/lib/mysql. What we want to do, though, is to ensure correct ownership of the files:

root@slave:~# chown -R mysql.mysql /var/lib/mysql

Now, let’s check the GTID coordinates of the backup. We will use them later when setting up the replication.

root@slave:~# cat /var/lib/mysql/xtrabackup_binlog_info

binlog.000007 195 53d96192-53f7-11ea-9c3c-080027c5bc64:1-9

Ok, all seems to be good, let’s start MySQL and proceed with configuring the replication:

root@slave:~# service mysql start

root@slave:~# mysql -ppass

mysql: [Warning] Using a password on the command line interface can be insecure.

Welcome to the MySQL monitor.  Commands end with ; or \g.

Your MySQL connection id is 8

Server version: 8.0.18-9 Percona Server (GPL), Release '9', Revision '53e606f'



Copyright (c) 2009-2019 Percona LLC and/or its affiliates

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.



Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.



Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.



mysql>

Now we have to set the gtid_purged to the GTID set that we found in the backup. Those are GTID that have been “covered” by our backup. Only new GTID should replicate from the master.

mysql> SET GLOBAL gtid_purged='53d96192-53f7-11ea-9c3c-080027c5bc64:1-9';

Query OK, 0 rows affected (0.00 sec)

Now we can start the replication:

mysql> CHANGE MASTER TO MASTER_HOST='10.0.0.141', MASTER_USER='rpl_user', MASTER_PASSWORD='yIPpgNE4KE', MASTER_AUTO_POSITION=1;

Query OK, 0 rows affected, 2 warnings (0.02 sec)



mysql> START SLAVE;

Query OK, 0 rows affected (0.00 sec)

mysql> SHOW SLAVE STATUS\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 10.0.0.141

                  Master_User: rpl_user

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: binlog.000007

          Read_Master_Log_Pos: 380

               Relay_Log_File: relay-bin.000002

                Relay_Log_Pos: 548

        Relay_Master_Log_File: binlog.000007

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 380

              Relay_Log_Space: 750

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 0

               Last_SQL_Error:

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 1001

                  Master_UUID: 53d96192-53f7-11ea-9c3c-080027c5bc64

             Master_Info_File: mysql.slave_master_info

                    SQL_Delay: 0

          SQL_Remaining_Delay: NULL

      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

           Master_Retry_Count: 86400

                  Master_Bind:

      Last_IO_Error_Timestamp:

     Last_SQL_Error_Timestamp:

               Master_SSL_Crl:

           Master_SSL_Crlpath:

           Retrieved_Gtid_Set: 53d96192-53f7-11ea-9c3c-080027c5bc64:10

            Executed_Gtid_Set: 53d96192-53f7-11ea-9c3c-080027c5bc64:1-10

                Auto_Position: 1

         Replicate_Rewrite_DB:

                 Channel_Name:

           Master_TLS_Version:

       Master_public_key_path:

        Get_master_public_key: 0

            Network_Namespace:

1 row in set (0.00 sec)

As you can see, our slave is replicating from its master.

How to Rebuild a MySQL Slave Using ClusterControl?

If you are a ClusterControl user, instead of going through this process you can rebuild the slave in just a couple of clicks. Initially we have a clear issue with the replication:

Our slave is not replicating properly due to an error.

All we have to do is to run the “Rebuild Replication Slave” job.

You will be presented with a dialog where you should pick a master node for the slave that you want to rebuild. Then, click on Proceed and you are all set. ClusterControl will rebuild the slave and set up the replication for you.

Shortly, based on the data set size, you should see working slave:

As you can see, with just a couple of clicks ClusterControl accomplished the task of rebuilding the inconsistent replication slave.

by krzysztof at March 09, 2020 07:08 PM

March 06, 2020

SeveralNines

How to Easy Manage Database Updates and Security Patches

Database security requires careful planning, but it is important to remember that security is not a state, it is a process. Once the database is in place, monitoring, alerting and reporting on changes are an integral part of the ongoing management. Also, security efforts need to be aligned with business needs.

Database vendors regularly issue critical patch updates to address software bugs or known vulnerabilities, but for a variety of reasons, organizations are often unable to install them in a timely manner, if at all. Evidence suggests that companies are actually getting worse at patching databases, with an increased number violating compliance standards and governance policies. Patching that requires database downtime would be of extreme concern in a 24/7 environment, however, most cluster upgrades can be performed online. 

ClusterControl is able to perform a rolling upgrade of a distributed environment, upgrading and restarting one node at a time. The logical upgrade steps might slightly differ between the different cluster types. Load balancers would automatically blacklist unavailable nodes that are currently being upgraded, so that applications are not affected. 

Operational Reporting on Version Upgrades and Patches is an area that requires constant attention, especially with the proliferation of open source databases in many organizations and more database environments being distributed for high availability.

ClusterControl provides a solid operational reporting framework and can help answer simple questions like 

  • What versions of the software are running across the environment?
  • Which servers should be upgraded?
  • Which servers are missing critical updates?

Automatic Database Patching

ClusterControl provides the ability for automatic rolling upgrades for MySQL & MariaDB to ensure that your databases always use the latest patches and fixes. 

Upgrades are online and are performed on one node at a time. The node will be stopped, then software will be updated, and then the node will be started again. If a node fails to upgrade, the upgrade process is aborted. 

Rolling MySQL Database Upgrades

ClusterControl provides the ability for automatic rolling upgrades for MySQL-based database clusters by automatically applying the upgrade one node at a time which results in zero downtime.

After successfully installing the selected version you must perform a rolling restart - the nodes restart one by one.

ClusterControl supports you in that step making sure nodes are responding properly during the node restart.

Database Upgrade Assistance

ClusterControl makes it easy to upgrade your MongoDB and PostgreSQL databases by, with a simple click, promoting a slave or replica to allow you to upgrade the Master and vice versa.

Database Package Summary Operational Report

ClusterControl provides the Package Summary Operational Report that shows you how many technology and security patches are available to upgrade.

You can generate it ad-hoc and view in the UI, send it via email or you can schedule such a report to be delivered to you for example once per week.

As you can see, the Upgrade Report contains information about different hosts in the cluster, which database has been installed on them and in which version. It also contains information about how many other packages installed are not up to date. You can see the total number, how many are related to database services, how many are providing security updates and the rest of them. 

The Upgrade Report lists all of the not-up-to-date packages on a per-host basis. In the screenshot above you can see that the node 10.0.3.10 has two MongoDB util packages not up to date (those are the 2 DB packages mentioned in the summary). Then there is a list of security packages and all other packages which are not up to date.

Conclusion

ClusterControl goes an extra mile to make sure you are covered regarding the security (and other) updates. As you have seen, it is very easy to know if your systems are up to date. ClusterControl can also assist in performing the upgrade of the database nodes.

by Bart Oles at March 06, 2020 04:33 PM

March 05, 2020

SeveralNines

Setting Up a Geo-Distributed Database Cluster Using MySQL Replication

A single point of failure (SPOF)  is a common reason why organizations are working towards distributing the presence of their database environments to another location geographically. It's part of the Disaster Recovery and Business Continuity strategic plans. 

Disaster Recovery (DR) planning embodies technical procedures which cover the preparation for unanticipated issues such as natural disasters, accidents (such as human error), or incidents (such as criminal acts). 

For the past decade, distributing your database environment across multiple geographical locations has been a pretty common setup, as public clouds offer a lot of ways to deal with this. The challenge comes in setting up database environments. It creates challenges when you try to manage the database(s), move your data to another geo-location, or apply security with a high level of observability.

In this blog, we'll showcase how you can do this using MySQL Replication. We'll cover how you are able to copy your data to another database node located in a different country distant from the current geography of the MySQL cluster. For this example, our target region is based on us-east, while my on-prem is in Asia located in the Philippines.

Why Do I Need A Geo-Location Database Cluster?

Even Amazon AWS, the top public cloud provider, claims they suffer from downtime or unintended outages (like the one that happened in 2017). Let's say you are using AWS as your secondary datacenter aside from your on-prem. You cannot have any internal access to its underlying hardware or to those internal networks that are managing your compute nodes. These are fully managed services which you paid for, but you cannot avoid the fact that it can suffer from an outage anytime. If such a geographic location suffers an outage then you can have a long downtime. 

This type of problem must be foreseen during your business continuity planning. It must have been analyzed and implemented based on what has been defined. Business continuity for your MySQL databases should include high uptime. Some environments are doing benchmarks and set a high bar of rigorous tests including the weak side in order to expose any vulnerability, how resilient it can be, and how scalable your technology architecture including your database infrastructure. For business especially those handling high transactions, it is imperative to ensure that production databases are available for the applications all the time even when catastrophe occurs. Otherwise, downtime can be experienced and it might cost you a large amount of money.

With these identified scenarios, organizations start extending their infrastructure to different cloud  providers and putting nodes to different geo-location to have more high uptime (if possible at 99.99999999999), lower RPO, and has no SPOF.

To ensure production databases survive a disaster, a Disaster Recovery (DR) site must be configured. Production and DR sites must be part of two geographically distant datacenters. This means, a standby database must be configured at the DR site for every production database so that, the data changes occurring on production database are immediately synced across to the standby database via transaction logs. Some setups also use their DR nodes to handle reads so as to provide load balancing between application and the data layer.

The Desired Architectural Setup

In this blog, the desired setup is simple and yet very common implementation nowadays. See below on the desired architectural setup for this blog:

In this blog, I choose Google Cloud Platform (GCP) as the public cloud provider, and using my local network as my on-prem database environment.

It is a must that when using this type of design, you always need both environment or platform to communicate in a very secure manner. Using VPN or using alternatives such as AWS Direct Connect. Although these public clouds nowadays offer managed VPN services which you can use. But for this setup, we'll be using OpenVPN since I don't need sophisticated hardware or service for this blog.

Best and Most Efficient Way

For MySQL/Percona/MariaDB database environments, the best and efficient way is to take a backup copy of your database, send to the target node to be deployed or instantiated. There are different ways to use this approach either you can use mysqldump, mydumper, rsync, or use Percona XtraBackup/Mariabackup and stream the data going to your target node.

Using mysqldump

mysqldump creates a logical backup of your whole database or you can selectively choose a list of databases, tables, or even specific records that you wanted to dump. 

A simple command that you can use to take a full backup can be,

$ mysqldump --single-transaction --all-databases --triggers --routines --events --master-data | mysql -h <target-host-db-node -u<user> -p<password> -vvv --show-warnings

With this simple command, it will directly run the MySQL statements to the target database node, for example your target database node on a Google Compute Engine. This can be efficient when data is smaller or you have a fast bandwidth. Otherwise, packing your database to a file then send it to the target node can be your option.

$ mysqldump --single-transaction --all-databases --triggers --routines --events --master-data | gzip > mydata.db

$ scp mydata.db <target-host>:/some/path

Then run mysqldump to the target database node as such,

zcat mydata.db | mysql

The downside with using logical backup using mysqldump is it's slower and consumes disk space. It also uses a single thread so you cannot run this in parallel. Optionally, you can use mydumper especially when your data is too huge. mydumper can be run in parallel but it's not as flexible compared to mysqldump.

Using xtrabackup

xtrabackup is a physical backup where you can send the streams or binary to the target node. This is very efficient and is mostly used when streaming a backup over the network especially when the target node is of different geography or different region. ClusterControl uses xtrabackup when provisioning or instantiating a new slave regardless where it is located as long as access and permission has been setup prior to the action.

If you are using xtrabackup to run it manually, you can run the command as such,

## Target node

$  socat -u tcp-listen:9999,reuseaddr stdout 2>/tmp/netcat.log | xbstream -x -C /var/lib/mysql

## Source node

$ innobackupex --defaults-file=/etc/my.cnf --stream=xbstream --socket=/var/lib/mysql/mysql.sock  --host=localhost --tmpdir=/tmp /tmp | socat -u stdio TCP:192.168.10.70:9999

To elaborate those two commands,the first command has to be executed or run first on the target node. The target node command does listen on port 9999 and will write any stream that is received from port 9999 in the target node. It is dependent on commands socat and xbstream which means you must ensure you have these packages installed.

On the source node, it executes the innobackupex perl script which invokes xtrabackup in the background and uses xbstream to stream the data that will be sent over the network. The socat command opens the port 9999 and sends its data to the desired host, which is 192.168.10.70 in this example. Still, ensure that you have socat and xbstream installed when using this command. Alternative way of using socat is nc but socat offers more advanced features compared to nc such as serialization like multiple clients can listen on a port. 

ClusterControl uses this command when rebuilding a slave or building a new slave.  It is fast and guarantees the exact copy of your source data will be copied to your target node. When provisioning a new database into a separate geo-location, using this approach offers more efficiency and offers you more speed to finish the job. Although there can be pros and cons when using logical or binary backup when streamed through the wire. Using this method is a very common approach when setting up a new geo-location database cluster to a different region and create an exact copy of your database environment.

Efficiency, Observability, and Speed

Questions left by most people who are not familiar with this approach always covers the "HOW, WHAT, WHERE" problems. In this section, we'll cover how you can efficiently setup your geo-location database with less work to deal with and with observability why it fails. Using ClusterControl is very efficient. In this current setup I have, the following environment as initially implemented:

Extending Node to GCP

Starting to setup your geo-Location database cluster, to extend your cluster and create a snapshot copy of your cluster, you can add a new slave. As mentioned earlier, ClusterControl will use xtrabackup (mariabackup for MariaDB 10.2 onwards) and deploy a new node within your cluster. Before you can register your GCP compute nodes as your target nodes, you need to setup first the appropriate system user the same as the system user you registered in ClusterControl. You can verify this in your /etc/cmon.d/cmon_X.cnf, where X is the cluster_id. For example, see below:

# grep 'ssh_user' /etc/cmon.d/cmon_27.cnf 

ssh_user=maximus

maximus (in this example) must be present in your GCP compute nodes. The user in your GCP nodes must have the sudo or super admin privileges. It must also be setup with a password-less SSH access. Please read our documentation more about the system user and it's required privileges.

Let's have an example list of servers below (from GCP console: Compute Engine dashboard):

In the screenshot above, our target region is based on the us-east region. As noted earlier, my local network is setup over a secure layer going through GCP (vice-versa) using OpenVPN. So communication from GCP going to my local network is also encapsulated over the VPN tunnel.

Add a Slave Node To GCP

The screenshot below reveals how you can do this. See images below:

As seen in the second screenshot, we're targeting node 10.142.0.12 and its source master is 192.168.70.10. ClusterControl is smart enough to determine firewalls, security modules, packages, configuration, and setup that needs to be done. See below an example of job activity log:

Quite a simple task, isn't it?

Complete The GCP MySQL Cluster

We need to add two nodes more to the GCP cluster to have a balance topology as we did have in the local network. For the second and third node, ensure that the master must be pointing to your GCP node. In this example, the master is 10.142.0.12. See below how to do this,

As seen in the screenshot above, I selected the 10.142.0.12 (slave) which is the first node we have added into the cluster. The complete result shows as follows,

Your Final Setup of Geo-Location Database Cluster

From the last screenshot, this kind of topology might not be your ideal setup. Mostly, it has to be a multi-master setup, where your DR cluster serves as the standby cluster, where as your on-prem serves as the primary active cluster. To do this, it's quite simple in ClusterControl. See the following screenshots to achieve this goal.

You can just drag your current master to the target master that has to be setup as a primary-standby writer just in case your on-prem in harm. In this example, we drag targeting host 10.142.0.12 (GCP compute node). The end result is shown below:

Then it achieves the desired result. Easy, and very quick to spawn your Geo-Location Database cluster using MySQL Replication.

Conclusion

Having a Geo-Location Database Cluster is not new. It has been a desired setup for companies and organizations avoiding SPOF who want resilience and a lower RPO. 

The main takeaways for this setup are security, redundancy, and resilience. It also covers how feasible and efficient you can deploy your new cluster to a different geographic region. While ClusterControl can offer this, expect we can have more improvement on this sooner where you can create efficiently from a backup and spawn your new different cluster in ClusterControl, so stay tuned.

by Paul Namuag at March 05, 2020 09:02 PM

March 04, 2020

SeveralNines

Using MariaDB Flashback on a MySQL Server

MariaDB has introduced a very cool feature called Flashback. Flashback is a feature that will allow instances, databases or tables to be rolled back to an old snapshot. Traditionally, to perform a point-in-time recovery (PITR), one would restore a database from a backup, and replay the binary logs to roll forward the database state at a certain time or position. 

With Flashback, the database can be rolled back to a point of time in the past, which is way faster if we just want to see the past that just happened not a long time ago. Occasionally, using flashback might be inefficient if you want to see a very old snapshot of your data relative to the current date and time. Restoring from a delayed slave, or from a backup plus replaying the binary log might be the better options. 

This feature is only available in the MariaDB client package, but that doesn't mean we can not use it with our MySQL servers. This blog post showcases how we can use this amazing feature on a MySQL server.

MariaDB Flashback Requirements

For those who want to use MariaDB flashback feature on top of MySQL, we can basically do the following:

  1. Enable binary log with the following setting:
    1. binlog_format = ROW (default since MySQL 5.7.7).
    2. binlog_row_image = FULL (default since MySQL 5.6).
  2. Use msqlbinlog utility from any MariaDB 10.2.4 and later installation.
  3. Flashback is currently supported only over DML statements (INSERT, DELETE, UPDATE). An upcoming version of MariaDB will add support for flashback over DDL statements (DROP, TRUNCATE, ALTER, etc.) by copying or moving the current table to a reserved and hidden database, and then copying or moving back when using flashback.

The flashback is achieved by taking advantage of existing support for full image format binary logs, thus it supports all storage engines. Note that the flashback events will be stored in memory. Therefore, you should make sure your server has enough memory for this feature.

How Does MariaDB Flashback Work?

MariaDB's mysqlbinlog utility comes with two extra options for this purpose:

  • -B, --flashback - Flashback feature can rollback your committed data to a special time point.
  • -T, --table=[name] - List entries for just this table (local log only).

By comparing the mysqlbinlog output with and without the --flashback flag, we can easily understand how it works. Consider the following statement is executed on a MariaDB server:

MariaDB> DELETE FROM sbtest.sbtest1 WHERE id = 1;

Without flashback flag, we will see the actual DELETE binlog event:

$ mysqlbinlog -vv \
--start-datetime="$(date '+%F %T' -d 'now - 10 minutes')" \
--database=sbtest \
--table=sbtest1 \
/var/lib/mysql/binlog.000003

...
# at 453196541
#200227 12:58:18 server id 37001  end_log_pos 453196766 CRC32 0xdaa248ed Delete_rows: table id 238 flags: STMT_END_F

BINLOG '
6rxXXhOJkAAAQwAAAP06AxsAAO4AAAAAAAEABnNidGVzdAAHc2J0ZXN0MQAEAwP+/gTu4P7wAAEB
AAID/P8AFuAQfA==
6rxXXiCJkAAA4QAAAN47AxsAAO4AAAAAAAEAAgAE/wABAAAAVJ4HAHcAODM4Njg2NDE5MTItMjg3
NzM5NzI4MzctNjA3MzYxMjA0ODYtNzUxNjI2NTk5MDYtMjc1NjM1MjY0OTQtMjAzODE4ODc0MDQt
NDE1NzY0MjIyNDEtOTM0MjY3OTM5NjQtNTY0MDUwNjUxMDItMzM1MTg0MzIzMzA7Njc4NDc5Njcz
NzctNDgwMDA5NjMzMjItNjI2MDQ3ODUzMDEtOTE0MTU0OTE4OTgtOTY5MjY1MjAyOTHtSKLa
'/*!*/;

### DELETE FROM `sbtest`.`sbtest1`
### WHERE
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2=499284 /* INT meta=0 nullable=0 is_null=0 */
###   @3='83868641912-28773972837-60736120486-75162659906-27563526494-20381887404-41576422241-93426793964-56405065102-33518432330' /* STRING(480) meta=61152 nullable=0 is_null=0 */
###   @4='67847967377-48000963322-62604785301-91415491898-96926520291' /* STRING(240) meta=65264 nullable=0 is_null=0 */
...

By extending the above mysqlbinlog command with --flashback, we can see the DELETE event is converted to an INSERT event and similarly to the respective WHERE and SET clauses:

$ mysqlbinlog -vv \
--start-datetime="$(date '+%F %T' -d 'now - 10 minutes')" \
--database=sbtest \
--table=sbtest1 \
/var/lib/mysql/binlog.000003 \
--flashback

...
BINLOG '
6rxXXhOJkAAAQwAAAP06AxsAAO4AAAAAAAEABnNidGVzdAAHc2J0ZXN0MQAEAwP+/gTu4P7wAAEB
AAID/P8AFuAQfA==
6rxXXh6JkAAA4QAAAN47AxsAAO4AAAAAAAEAAgAE/wABAAAAVJ4HAHcAODM4Njg2NDE5MTItMjg3
NzM5NzI4MzctNjA3MzYxMjA0ODYtNzUxNjI2NTk5MDYtMjc1NjM1MjY0OTQtMjAzODE4ODc0MDQt
NDE1NzY0MjIyNDEtOTM0MjY3OTM5NjQtNTY0MDUwNjUxMDItMzM1MTg0MzIzMzA7Njc4NDc5Njcz
NzctNDgwMDA5NjMzMjItNjI2MDQ3ODUzMDEtOTE0MTU0OTE4OTgtOTY5MjY1MjAyOTHtSKLa
'/*!*/;

### INSERT INTO `sbtest`.`sbtest1`
### SET
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2=499284 /* INT meta=0 nullable=0 is_null=0 */
###   @3='83868641912-28773972837-60736120486-75162659906-27563526494-20381887404-41576422241-93426793964-56405065102-33518432330' /* STRING(480) meta=61152 nullable=0 is_null=0 */
###   @4='67847967377-48000963322-62604785301-91415491898-96926520291' /* STRING(240) meta=65264 nullable=0 is_null=0 */
...

In row-based replication (binlog_format=ROW), each row change event contains two images, a “before” image (except INSERT) whose columns are matched against when searching for the row to be updated, and an “after” image (except DELETE) containing the changes. With binlog_row_image=FULL, MariaDB logs full rows (that is, all columns) for both the before and after images.

The following example shows binary log events for UPDATE. Consider the following statement is executed on a MariaDB server:

MariaDB> UPDATE sbtest.sbtest1 SET k = 0 WHERE id = 5;

When looking at the binlog event for the above statement, we will see something like this:

$ mysqlbinlog -vv \
--start-datetime="$(date '+%F %T' -d 'now - 5 minutes')" \
--database=sbtest \
--table=sbtest1 \
/var/lib/mysql/binlog.000001 

...
### UPDATE `sbtest`.`sbtest1`
### WHERE
###   @1=5 /* INT meta=0 nullable=0 is_null=0 */
###   @2=499813 /* INT meta=0 nullable=0 is_null=0 */
###   @3='44257470806-17967007152-32809666989-26174672567-29883439075-95767161284-94957565003-35708767253-53935174705-16168070783' /* STRING(480) meta=61152 nullable=0 is_null=0 */
###   @4='34551750492-67990399350-81179284955-79299808058-21257255869' /* STRING(240) meta=65264 nullable=0 is_null=0 */
### SET
###   @1=5 /* INT meta=0 nullable=0 is_null=0 */
###   @2=0 /* INT meta=0 nullable=0 is_null=0 */
###   @3='44257470806-17967007152-32809666989-26174672567-29883439075-95767161284-94957565003-35708767253-53935174705-16168070783' /* STRING(480) meta=61152 nullable=0 is_null=0 */
###   @4='34551750492-67990399350-81179284955-79299808058-21257255869' /* STRING(240) meta=65264 nullable=0 is_null=0 */
# Number of rows: 1
...

With the --flashback flag, the "before" image is swapped with the "after" image of the existing row:

$ mysqlbinlog -vv \
--start-datetime="$(date '+%F %T' -d 'now - 5 minutes')" \
--database=sbtest \
--table=sbtest1 \
/var/lib/mysql/binlog.000001 \
 --flashback

...
### UPDATE `sbtest`.`sbtest1`
### WHERE
###   @1=5 /* INT meta=0 nullable=0 is_null=0 */
###   @2=0 /* INT meta=0 nullable=0 is_null=0 */
###   @3='44257470806-17967007152-32809666989-26174672567-29883439075-95767161284-94957565003-35708767253-53935174705-16168070783' /* STRING(480) meta=61152 nullable=0 is_null=0 */
###   @4='34551750492-67990399350-81179284955-79299808058-21257255869' /* STRING(240) meta=65264 nullable=0 is_null=0 */
### SET
###   @1=5 /* INT meta=0 nullable=0 is_null=0 */
###   @2=499813 /* INT meta=0 nullable=0 is_null=0 */
###   @3='44257470806-17967007152-32809666989-26174672567-29883439075-95767161284-94957565003-35708767253-53935174705-16168070783' /* STRING(480) meta=61152 nullable=0 is_null=0 */
###   @4='34551750492-67990399350-81179284955-79299808058-21257255869' /* STRING(240) meta=65264 nullable=0 is_null=0 */
...

We can then redirect the flashback output to the MySQL client, thus rolling back the database or table to the point of time that we want. More examples are shown in the next sections.

MariaDB has a dedicated knowledge base page for this feature. Check out MariaDB Flashback knowledge base page.

MariaDB Flashback With MySQL

To have the flashback ability for MySQL, one has to do the following:

  • Copy the mysqlbinlog utility from any MariaDB server (10.2.4 or later).
  • Disable MySQL GTID before applying the flashback SQL file. Global variables gtid_mode and enforce_gtid_consistency can be set in runtime since MySQL 5.7.5.

Suppose we are having the following simple MySQL 8.0 replication topology:

In this example, we copied mysqlbinlog utility from the latest MariaDB 10.4 on one of our MySQL 8.0 slave (slave2):

(mariadb-server)$ scp /bin/mysqlbinlog root@slave2-mysql:/root/
(slave2-mysql8)$ ls -l /root/mysqlbinlog
-rwxr-xr-x. 1 root root 4259504 Feb 27 13:44 /root/mysqlbinlog

Our MariaDB's mysqlbinlog utility is now located at /root/mysqlbinlog on slave2. On the MySQL master, we executed the following disastrous statement:

mysql> DELETE FROM sbtest1 WHERE id BETWEEN 5 AND 100;
Query OK, 96 rows affected (0.01 sec)

96 rows were deleted in the above statement. Wait a couple of seconds to let the events replicate from master to all slaves before we can try to find the binlog position of the disastrous event on the slave server. The first step is to retrieve all the binary logs on that server:

mysql> SHOW BINARY LOGS;
+---------------+-----------+-----------+
| Log_name      | File_size | Encrypted |
+---------------+-----------+-----------+
| binlog.000001 |       850 |        No |
| binlog.000002 |     18796 |        No |
+---------------+-----------+-----------+

Our disastrous event should exist inside binlog.000002, the latest binary log in this server. We can then use the MariaDB's mysqlbinlog utility to retrieve all binlog events for table sbtest1 since 10 minutes ago:

(slave2-mysql8)$ /root/mysqlbinlog -vv \
--start-datetime="$(date '+%F %T' -d 'now - 10 minutes')" \
--database=sbtest \
--table=sbtest1 \
/var/lib/mysql/binlog.000002

...
# at 195
#200228 15:09:45 server id 37001  end_log_pos 281 CRC32 0x99547474 Ignorable
# Ignorable event type 33 (MySQL Gtid)
# at 281
#200228 15:09:45 server id 37001  end_log_pos 353 CRC32 0x8b12bd3c Query thread_id=19 exec_time=0 error_code=0
SET TIMESTAMP=1582902585/*!*/;
SET @@session.pseudo_thread_id=19/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1, @@session.check_constraint_checks=1/*!*/;
SET @@session.sql_mode=524288/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
SET @@session.character_set_client=255,@@session.collation_connection=255,@@session.collation_server=255/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;

BEGIN
/*!*/;
# at 353
#200228 15:09:45 server id 37001  end_log_pos 420 CRC32 0xe0e44a1b Table_map: `sbtest`.`sbtest1` mapped to number 92

# at 420
# at 8625
# at 16830
#200228 15:09:45 server id 37001  end_log_pos 8625 CRC32 0x99b1a8fc Delete_rows: table id 92
#200228 15:09:45 server id 37001  end_log_pos 16830 CRC32 0x89496a07 Delete_rows: table id 92
#200228 15:09:45 server id 37001  end_log_pos 18765 CRC32 0x302413b2 Delete_rows: table id 92 flags: STMT_END_F

To easily look up for the binlog position number, pay attention on the lines that start with "# at ". From the above lines, we can see the DELETE event was happening at position 281 inside binlog.000002 (starts at "# at 281"). We can also retrieve the binlog events directly inside a MySQL server:

mysql> SHOW BINLOG EVENTS IN 'binlog.000002';
+---------------+-------+----------------+-----------+-------------+-------------------------------------------------------------------+
| Log_name      | Pos   | Event_type     | Server_id | End_log_pos | Info                                                              |
+---------------+-------+----------------+-----------+-------------+-------------------------------------------------------------------+
| binlog.000002 |     4 | Format_desc    |     37003 | 124         | Server ver: 8.0.19, Binlog ver: 4                                 |
| binlog.000002 |   124 | Previous_gtids |     37003 | 195         | 0d98d975-59f8-11ea-bd30-525400261060:1                            |
| binlog.000002 |   195 | Gtid           |     37001 | 281         | SET @@SESSION.GTID_NEXT= '0d98d975-59f8-11ea-bd30-525400261060:2' |
| binlog.000002 |   281 | Query          |     37001 | 353         | BEGIN                                                             |
| binlog.000002 |   353 | Table_map      |     37001 | 420         | table_id: 92 (sbtest.sbtest1)                                     |
| binlog.000002 |   420 | Delete_rows    |     37001 | 8625        | table_id: 92                                                      |
| binlog.000002 |  8625 | Delete_rows    |     37001 | 16830       | table_id: 92                                                      |
| binlog.000002 | 16830 | Delete_rows    |     37001 | 18765       | table_id: 92 flags: STMT_END_F                                    |
| binlog.000002 | 18765 | Xid            |     37001 | 18796       | COMMIT /* xid=171006 */                                           |
+---------------+-------+----------------+-----------+-------------+-------------------------------------------------------------------+

9 rows in set (0.00 sec)

We can now confirm that position 281 is where we want our data to revert to. We can then use the --start-position flag to generate accurate flashback events. Notice that we omit the "-vv" flag and add the --flashback flag:

(slave2-mysql8)$ /root/mysqlbinlog \
--start-position=281 \
--database=sbtest \
--table=sbtest1 \
/var/lib/mysql/binlog.000002 \
--flashback > /root/flashback.binlog

The flashback.binlog contains all the required events to undo all changes happened on table sbtest1 on this MySQL server. Since this is a slave node of a replication cluster, we have to break the replication on the chosen slave (slave2) in order to use it for flashback purposes. To do this, we have to stop the replication on the chosen slave, set MySQL GTID to ON_PERMISSIVE and make the slave writable:

mysql> STOP SLAVE; 
SET GLOBAL gtid_mode = ON_PERMISSIVE; 
SET GLOBAL enforce_gtid_consistency = OFF; 
SET GLOBAL read_only = OFF;

At this point, slave2 is not part of the replication and our topology is looking like this:

Import the flashback via mysql client and we do not want this change to be recorded in MySQL binary log:

(slave2-mysql8)$ mysql -uroot -p --init-command='SET sql_log_bin=0' sbtest < /root/flashback.binlog

We can then see all the deleted rows, as proven by the following statement:

mysql> SELECT COUNT(id) FROM sbtest1 WHERE id BETWEEN 5 and 100;
+-----------+
| COUNT(id) |
+-----------+
|        96 |
+-----------+
1 row in set (0.00 sec)

We can then create an SQL dump file for table sbtest1 for our reference:

(slave2-mysql8)$ mysqldump -uroot -p --single-transaction sbtest sbtest1 > sbtest1_flashbacked.sql

Once the flashback operation completes, we can rejoin the slave node back into the replication chain. But firstly, we have to bring back the database into a consistent state, by replaying all events starting from the position we had flashbacked. Don't forget to skip binary logging as we do not want to "write" onto the slave and risking ourselves with errant transactions:

(slave2-mysql8)$ /root/mysqlbinlog \
--start-position=281 \
--database=sbtest \
--table=sbtest1 \
/var/lib/mysql/binlog.000002 | mysql -uroot -p --init-command='SET sql_log_bin=0' sbtest

Finally, prepare the node back to its role as MySQL slave and start the replication:

mysql> SET GLOBAL read_only = ON;
SET GLOBAL enforce_gtid_consistency = ON; 
SET GLOBAL gtid_mode = ON; 
START SLAVE; 

Verify that the slave node is replicating correctly:

mysql> SHOW SLAVE STATUS\G
...
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
...

At this point, we have re-joined the slave back into the replication chain and our topology is now back to its original state:

Shout out to the MariaDB team for introducing this astounding feature!

by ashraf at March 04, 2020 07:23 PM

March 03, 2020

SeveralNines

How to Enable TimescaleDB on an Existing PostgreSQL Database

If you have a PostgreSQL cluster up-and-running, and you need to handle data that changes with time (like metrics collected from a system) you should consider using a time-series database that is designed to store this kind of data.

TimescaleDB is an open-source time-series database optimized for fast ingest and complex queries that supports full SQL. It is based on PostgreSQL and it offers the best of NoSQL and Relational worlds for Time-series data. 

In this blog, we will see how to manually enable TimescaleDB in an existing PostgreSQL database and how to do the same task using ClusterControl.

Enabling TimescaleDB Manually

For this blog, we will use CentOS 7 as the operating system and PostgreSQL 11 as the database server.

By default, you don’t have TimescaleDB enabled for PostgreSQL:

world=# \dx

                 List of installed extensions

  Name   | Version |   Schema |     Description

---------+---------+------------+------------------------------

 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language

(1 row)

So first, you need to add the corresponding repository to install the software:

$ cat /etc/yum.repos.d/timescale_timescaledb.repo

[timescale_timescaledb]

name=timescale_timescaledb

baseurl=https://packagecloud.io/timescale/timescaledb/el/7/\$basearch

repo_gpgcheck=1

gpgcheck=0

enabled=1

gpgkey=https://packagecloud.io/timescale/timescaledb/gpgkey

sslverify=1

sslcacert=/etc/pki/tls/certs/ca-bundle.crt

metadata_expire=300

We will assume you have the PostgreSQL repository in place as this TimescaleDB installation will require dependencies from there.

Next step is to install the package:

$ yum install timescaledb-postgresql-11

And configure it in your current PostgreSQL database. For this, edit your postgresql.conf file and add 'timescaledb' in the shared_preload_libraries parameter:

shared_preload_libraries = 'timescaledb'

Or if you already have something added there:

shared_preload_libraries = 'pg_stat_statements,timescaledb'

You can also configure the max_background_workers for TimescaleDB to specify the max number of background workers.

timescaledb.max_background_workers=4

Keep in mind that this change requires a database service restart:

$ service postgresql-11 restart

And then, you will have your TimescaleDB installed:

postgres=# SELECT * FROM pg_available_extensions WHERE name='timescaledb';

    name     | default_version | installed_version |                              comment



-------------+-----------------+-------------------+-----------------------------------------------

--------------------

 timescaledb | 1.6.0           | | Enables scalable inserts and complex queries f

or time-series data

(1 row)

So now, you need to enable it:

$ psql world

world=# CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;

WARNING:

WELCOME TO

 _____ _                               _ ____________

|_   _(_)                             | | | _ \ ___ \

  | |  _ _ __ ___   ___ ___ ___ __ _| | ___| | | | |_/ /

  | | | |  _ ` _ \ / _ \/ __|/ __/ _` | |/ _ \ | | | ___ \

  | | | | | | | | |  __/\__ \ (_| (_| | |  __/ |/ /| |_/ /

  |_| |_|_| |_| |_|\___||___/\___\__,_|_|\___|___/ \____/

               Running version 1.6.0

For more information on TimescaleDB, please visit the following links:



 1. Getting started: https://docs.timescale.com/getting-started

 2. API reference documentation: https://docs.timescale.com/api

 3. How TimescaleDB is designed: https://docs.timescale.com/introduction/architecture



Note: TimescaleDB collects anonymous reports to better understand and assist our users.

For more information and how to disable, please see our docs https://docs.timescaledb.com/using-timescaledb/telemetry.



CREATE EXTENSION

Done.

world=# \dx

                                      List of installed extensions

    Name     | Version |   Schema |                         Description



-------------+---------+------------+--------------------------------------------------------------

-----

 plpgsql     | 1.0 | pg_catalog | PL/pgSQL procedural language

 timescaledb | 1.6.0   | public | Enables scalable inserts and complex queries for time-series

data

(2 rows)

Now, let’s see how to enable it using ClusterControl.

Using ClusterControl to Enable TimescaleDB

We will assume you have your PostgreSQL cluster imported in ClusterControl or even deployed using it

To enable TimescaleDB using ClusterControl, you just need to go to your PostgreSQL Cluster Actions and press on the “Enable TimescaleDB” option.

You will receive a warning about the database restart. Confirm it.

You can monitor the task in the ClusterControl Activity section.

Then you will have your TimescaleDB ready to use.

Conclusion

Now you have your TimescaleDB up and running, you can handle your time-series data in a more performant way. For this, you can create new tables or even migrate your current data, and of course, you should know how to use it to take advantage of this new concept.

by Sebastian Insausti at March 03, 2020 07:37 PM

March 02, 2020

SeveralNines

How to Rebuild an Inconsistent PostgreSQL Slave

PostgreSQL Streaming Replication is a great way of scaling PostgreSQL clusters and doing it adds high availability to them. As with every replication, the idea is that the slave is a copy of the master and that the slave is constantly updated with the changes that happened on the master using some sort of a replication mechanism. 

It may happen that the slave, for some reason, gets out of sync with the master. How can I bring it back to the replication chain? How can I ensure that the slave is again in-sync with the master? Let’s take a look in this short blog post.

What is very helpful, there is no way to write on a slave if it is in the recovery mode. You can test it like that:

postgres=# SELECT pg_is_in_recovery();

 pg_is_in_recovery

-------------------

 t

(1 row)

postgres=# CREATE DATABASE mydb;

ERROR:  cannot execute CREATE DATABASE in a read-only transaction

It still may happen that the slave would go out of sync with the master. Data corruption - neither hardware or software is without bugs and issues. Some problems with the disk drive may trigger data corruption on the slave. Some problems with the “vacuum” process may result in data being altered. How to recover from that state?

Rebuilding the Slave Using pg_basebackup

The main step is to provision the slave using the data from the master. Given that we’ll be using streaming replication, we cannot use logical backup. Luckily there’s a ready tool that can be used to set things up: pg_basebackup. Let’s see what would be the steps we need to take to provision a slave server. To make it clear, we are using PostgreSQL 12 for the purpose of this blog post.

The initial state is simple. Our slave is not replicating from its master. Data it contains is corrupted and can’t be used nor trusted. Therefore the first step we’ll do will be to stop PostgreSQL on our slave and remove the data it contains:

root@vagrant:~# systemctl stop postgresql

Or even:

root@vagrant:~# killall -9 postgres

Now, let’s check the contents of the postgresql.auto.conf file, we can use replication credentials stored in that file later, for pg_basebackup:

root@vagrant:~# cat /var/lib/postgresql/12/main/postgresql.auto.conf

# Do not edit this file manually!

# It will be overwritten by the ALTER SYSTEM command.

promote_trigger_file='/tmp/failover_5432.trigger'

recovery_target_timeline=latest

primary_conninfo='application_name=pgsql_0_node_1 host=10.0.0.126 port=5432 user=cmon_replication password=qZnVoV7LV97CFX9F'

We are interested in the user and password used for setting up the replication.

Finally we are ok to remove the data:

root@vagrant:~# rm -rf /var/lib/postgresql/12/main/*

Once the data is removed, we need to use pg_basebackup to get the data from the master:

root@vagrant:~# pg_basebackup -h 10.0.0.126 -U cmon_replication -Xs -P -R -D /var/lib/postgresql/12/main/

Password:

waiting for checkpoint

The flags that we used have following meaning:

  • -Xs: we would like to stream WAL while the backup is created. This helps avoid problems with removing WAL files when you have a large dataset.
  • -P: we would like to see progress of the backup.
  • -R: we want pg_basebackup to create standby.signal file and prepare postgresql.auto.conf file with connection settings.

pg_basebackup will wait for the checkpoint before starting the backup. If it takes too long, you can use two options. First, it is possible to set checkpoint mode to fast in pg_basebackup using ‘-c fast’ option. Alternatively, you can force checkpointing by executing:

postgres=# CHECKPOINT;

CHECKPOINT

One way or the other, pg_basebackup will start. With the -P flag we can track the progress:

 416906/1588478 kB (26%), 0/1 tablespaceceace

Once the backup is ready, all we have to do is to make sure data directory content has the correct user and group assigned - we executed pg_basebackup as ‘root’ therefore we want to change it to ‘postgres’:

root@vagrant:~# chown -R postgres.postgres /var/lib/postgresql/12/main/

That’s all, we can start the slave and it should start to replicate from the master.

root@vagrant:~# systemctl start postgresql

You can double-check the replication progress by executing following query on the master:

postgres=# SELECT * FROM pg_stat_replication;

  pid  | usesysid |     usename | application_name | client_addr | client_hostname | client_port |         backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time

-------+----------+------------------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------+-------------------------------

 23565 |    16385 | cmon_replication | pgsql_0_node_1   | 10.0.0.128 | | 51554 | 2020-02-27 15:25:00.002734+00 |              | streaming | 2/AA5EF370 | 2/AA5EF2B0 | 2/AA5EF2B0 | 2/AA5EF2B0 | | |         | 0 | async | 2020-02-28 13:45:32.594213+00

 11914 |    16385 | cmon_replication | 12/main          | 10.0.0.127 | | 25058 | 2020-02-28 13:42:09.160576+00 |              | streaming | 2/AA5EF370 | 2/AA5EF2B0 | 2/AA5EF2B0 | 2/AA5EF2B0 | | |         | 0 | async | 2020-02-28 13:45:42.41722+00

(2 rows)

As you can see, both slaves are replicating correctly.

Rebuilding the Slave Using ClusterControl

If you are a ClusterControl user you can easily achieve exactly the same just by picking an option from the UI.

The initial situation is that one of the slaves (10.0.0.127) is not working and it is not replicating. We deemed that the rebuild is the best option for us. 

As ClusterControl users all we have to do is to go to the “Nodes” tab and run “Rebuild Replication Slave” job.

Next, we have to pick the node to rebuild slave from and that is all. ClusterControl will use pg_basebackup to set up the replication slave and configure the replication as soon as the data is transferred.

After some time job completes and the slave is back in the replication chain:

As you can see, with just a couple of clicks, thanks to ClusterControl, we managed to rebuild our failed slave and bring it back to the cluster.

by krzysztof at March 02, 2020 08:33 PM

MariaDB Foundation

Looking for a C/C++ developer

The MariaDB Foundation is looking for a mid to senior-level C/C++ developer to work on MariaDB Server.

Our small team works remotely to build and support MariaDB, and requires strong skills in written English for daily communication. […]

The post Looking for a C/C++ developer appeared first on MariaDB.org.

by Ian Gilfillan at March 02, 2020 01:34 PM

Oli Sennhauser

FromDual is 10 years old

On 1 March 2020 FromDual became 10 years old! Sincere thanks are given to all our customers, partners and interested person for their support and good cooperation in the last 10 years. And we would be pleased to advise and support you again competently in the coming 10 years.

Your FromDual Team

anniversary

Picture by kalhh on Pixabay

Taxonomy upgrade extras: 

by Shinguz at March 02, 2020 09:03 AM

February 28, 2020

SeveralNines

An Introduction to Percona Server for MongoDB 4.2

When choosing a NoSQL database technology important considerations should be taken into account, such as performance, resilience, reliability, and security. These key factors should also be aligned with achieving business goals, at least as far as the database is concerned. 

Many technologies have come into play to improve these aspects, and it is advisable for an organisation to improve the salient options and try integrating them into the database systems. 

New technologies should ensure maximum performance to enhance achievement of business goals at an affordable operating cost but with more manipulative features such as error detection and alerting systems.

In this blog we will discuss the Percona version of MongoDB and how it expands the power of MongoDB in a variety of ways.

What is Percona Server for MongoDB?

For a database to perform well, there must be an optimally established underlying server for enhancing read and write transactions. Percona Server for MongoDB is a free open-source drop-in replacement for the MongoDB Community Edition, but with additional enterprise-grade functionality. It is designed with some major improvements on the default MongoDB server setup. 

It delivers high performance, improved security, and reliability for optimum performance with reduced expenditure on proprietary software vendor relationships. 

Percona Server for MongoDB Salient Features

MongoDB Community Edition is core to the Percona server considering that it already constitutes important features such as the flexible schema, distributed transactions, familiarity of the JSON documents and native high availability. Besides this, Percona Server for MongoDB integrates the following salient features that enables it to satisfy the aspects we have mentioned above:

  • Hot Backups
  • Data at rest encryption
  • Audit Logging
  • Percona Memory Engine
  • External LDAP Authentication with SASL
  • HashiCorp Vault Integration
  • Enhanced query profiling

Hot Backups 

Percona server for MongoDB creates a physical data backup on a running server in the background without any noticeable operation degradation. This is achievable by running the createBackup command as an administrator on the admin database and specifying the backup directory. 

> use admin

switched to db admin

> db.runCommand({createBackup: 1, backupDir: "/my/backup/data/path"})

{ "ok" : 1 }

When you receive { "ok" : 1 } then the backup was successful. Otherwise, if for example you specify an empty backup directory path, you may receive an error response i.e:

{ "ok" : 0, "errmsg" : "Destination path must be absolute" }

Restoring the backup requires one to first stop the mongod instance, clean the data directory, copy the files from the directory and then restart the mongod service. This can be done by running the command below

$ service mongod stop && rm -rf /var/lib/mongodb/* && cp --recursive /my/backup/data/path /var/lib/mongodb/ && service mongod start

You can also store the backup in archive format if using Percona server for MongoDB 4.2.1-1 

> use admin

> db.runCommand({createBackup: 1, archive: "path/to/archive.tar" })

You can also backup directly to AWS S3 using the default settings or with more configurations. For a default S3 bucket backup:

> db.runCommand({createBackup: 1,  s3: {bucket: "backup", path: "newBackup"}})

Data-at-Rest Encryption

MongoDB version 3.2 introduced data at rest encryption for the WiredTiger storage engine towards ensuring that data files can be decrypted and read by parties with decryption key only. Data encryption at rest in Percona Server for MongoDB was introduced in version 3.6 to go in hand with data encryption at rest interface in MongoDB. However the latest version does not include support for Amazon AWS and KIMP key management services.

The encryption can also be applied to rollback files when data at rest is enabled. Percona Server for MongoDB uses encryptionCipherMode option with 2 selective cipher modes:

  1. AES256-CBC (default cipher mode)
  2. AES256-GCM

You can encrypt data with the command below

$ mongod ... --encryptionCipherMode or 

$ mongod ... --encryptionCipherMode AES256-GCM

We use the --ecryptionKeyFile option to specify the path to a file that contains the encryption key.

$ mongod ... --enableEncryption --encryptionKeyFile <fileName>

Audit Logging

For every database system, administrators have a mandate to keep track on activities taking place. In Percona Server for MongoDB, when auditing is enabled, the server generates an audit log file than constitutes information about different user events such as authorization and authentication. However, starting the server with auditing enabled, the logs won’t be displayed dynamically during runtime. 

The Audit Logging in MongoDB Community Edition can take two data formats that is, JSON and BSON. However, for Percona Server for MongoDB, audit logging is limited to JSON file only. The server also logs only important commands contrary to MongoDB that logs everything. Since the filtering procedure in Percona is so unclear in terms of the filtering syntax, enabling the audit log without filtering would offer more entries from which one can narrow down to own specifications.

Percona Memory Engine

This is a special configuration of the WiredTiger storage engine that does not store user data on disk. The data fully resides and is readily available in the main memory except for diagnostic data that is written to disk. This makes data processing much faster but with a consideration that you must ensure there is enough memory to hold the data set and the server should not shut down. One can select a storage engine to use with the  --storageEngine command. Data created for one storage engine cannot be compatible with other storage engines because each storage engine has its own data model. For instance to select the in-memory storage engine. You first stop any running mongod instance and then issue the commands:

$ service mongod stop

$ mongod --storageEngine inMemory --dbpath <newDataDir>

If you already have some data with your default MongoDB Community edition and you would like to migrate to Percona Memory Engine, just use the mongodumb and mongorestore utilities by issuing the command:

$ mongodump --out <dumpDir>

$ service mongod stop

$ rm -rf /var/lib/mongodb/*

$ sed -i '/engine: .*inMemory/s/#//g' /etc/mongod.conf

$ service mongod start

$ mongorestore <dumpDir>

External LDAP Authentication With SASL

Whenever  clients make either  a read or write request to MongoDB mongod instance, they need to authenticate against the MongoDB server user database first. The external authentication allows the MongoDB server to verify the client credentials (username and password) against a separate service. The external authentication architecture involves:

  1. LDAP Server which remotely stores all user credentials
  2. SASL Daemon that is used as a MongoDB server-local proxy for the remote LDAP service.
  3. SASL Library: creates necessary authentication data for MongoDB client and server.

Authentication session sequence

  • The Client gets connected to a running mongod instance and creates a PLAIN authentication request using the SASL library.
  • The auth request is then sent to the server as a special Mongo command which is then received by the mongod server with its request payload.
  • The server creates some SASL sessions derived with client credentials using its own reference to the SASL library.
  • The mongod server passes the auth payload to the SASL library which hands it over to the saslauthd daemon. The daemon passes it to the LDAP and awaits a YES or NO response upon the authentication request by checking if the user exists and the submitted password is correct.
  • The saslauthd passes this response to the mongod server through the SASL library which then authenticates or rejects the request accordingly.

 Here is an illustration for this process:

To add an external user to a mongod server:

> db.getSiblingDB("$external").createUser( {user : username, roles: [ {role: "read", db: "test"} ]} );

External users however  cannot have roles assigned in the admin database.

HashiCorp Vault Integration

HashCorp Vault is a product designed to manage secrets and protect sensitive data by securely storing and tightly controlling access to confidential information. With the previous Percona version, data at rest encryption key was stored locally on the server inside the key file. The integration with HashCorp Vault secures the encryption key much far better.

Enhanced Query Profiling

Profiling has a degradation impact on the database  performance especially when there are so many queries issued. Percona server for MongoDB comes in hand by limiting the number of queries collected by the database profiler hence decreases its impact on performance.

Conclusion

Percona Server for MongoDB is an enhanced open source and highly scalable database that may act as a compatible drop-in replacement for MongoDB Community Edition but with similar syntax and configuration. It enhances extensive data security especially one at rest and improved database performance through provision of Percona Server engine, limiting on the profiling rate among other features.

Percona Server for MongoDB is fully supported by ClusterControl as an option for deployment.

by Onyancha Brian Henry at February 28, 2020 06:02 PM

February 27, 2020

SeveralNines

What to Look for if Your PostgreSQL Replication is Lagging

Replication lag issues in PostgreSQL is not a widespread issue for most setups. Although, it can occur and when it does it can impact your production setups. PostgreSQL is designed to handle multiple threads, such as query parallelism or deploying worker threads to handle specific tasks based on the assigned values in the configuration. PostgreSQL is designed to handle heavy and stressful loads, but sometimes (due to a bad configuration) your server might still go south.

Identifying the replication lag in PostgreSQL is not a complicated task to do, but there are a few different approaches to look into the problem. In this blog, we'll take a look at what things to look at when your PostgreSQL replication is lagging.

Types of Replication in PostgreSQL

Before diving into the topic, let's see first how replication in PostgreSQL evolves as there are diverse set of approaches and solutions when dealing with replication.

Warm standby for PostgreSQL was implemented in version 8.2 (back in 2006) and was based on the log shipping method. This means that the WAL records are directly moved from one database server to another to be applied, or simply an analogous approach to PITR, or very much like what you are doing with rsync.

This approach, even old, is still used today and some institutions actually prefer this older approach. This approach implements a file-based log shipping by transferring WAL records one file (WAL segment) at a time. Though it has a downside; A major failure on the primary servers, transactions not yet shipped will be lost. There is a window for data loss (you can tune this by using the archive_timeout parameter, which can be set to as low as a few seconds, but such a low setting will substantially increase the bandwidth required for file shipping).

In PostgreSQL version 9.0, Streaming Replication was introduced. This feature allowed us to stay more up-to-date when compared to file-based log shipping. Its approach is by transferring WAL records (a WAL file is composed of WAL records) on the fly (merely a record based log shipping), between a master server and one or several standby servers. This protocol does not need to wait for the WAL file to be filled, unlike file-based log shipping. In practice, a process called WAL receiver, running on the standby server, will connect to the primary server using a TCP/IP connection. In the primary server, another process exists named WAL sender. It's role is in charge of sending the WAL registries to the standby server(s) as they happen.

Asynchronous Replication setups in streaming replication can incur problems such as data loss or slave lag, so version 9.1 introduces synchronous replication. In synchronous replication, each commit of a write transaction will wait until confirmation is received that the commit has been written to the write-ahead log on disk of both the primary and standby server. This method minimizes the possibility of data loss, as for that to happen we will need for both, the master and the standby to fail at the same time. 

The obvious downside of this configuration is that the response time for each write transaction increases, as we need to wait until all parties have responded. Unlike MySQL, there's no support such as in a semi-synchronous environment of MySQL, it will failback to asynchronous if timeout has occurred. So in With PostgreSQL, the time for a commit is (at minimum) the round trip between the primary and the standby. Read-only transactions will not be affected by that.

As it evolves, PostgreSQL is continuously improving and yet its replication is diverse. For example, you can use physical streaming asynchronous replication or use logical streaming replication. Both are monitored differently though use the same approach when sending data over replication, which is still streaming replication. For more details check in the manual for different types of solutions in PostgreSQL when dealing with replication.

Causes of PostgreSQL Replication Lag

As defined in our previous blog, a replication lag is the cost of delay for transaction(s) or operation(s) calculated by its time difference of execution between the primary/master against the standby/slave node.  

Since PostgreSQL is using streaming replication, it's designed to be fast as changes are recorded as a set of sequence of log records (byte-by-byte) as intercepted by the WAL receiver then writes these log records to the WAL file. Then the startup process by PostgreSQL replays the data from that WAL segment and streaming replication begins. In PostgreSQL, a replication lag can occur by these factors:

  • Network issues
  • Not able to find the WAL segment from the primary. Usually, this is due to the checkpointing behavior where WAL segments are rotated or recycled
  • Busy nodes (primary and standby(s)). Can be caused by external processes or some bad queries caused to be a resource intensive
  • Bad hardware or hardware issues causing to take some lag
  • Poor configuration in PostgreSQL such as small numbers of max_wal_senders being set while processing tons of transaction requests (or large volume of changes).

What To Look for With PostgreSQL Replication Lag

PostgreSQL replication is yet diverse but monitoring the replication health is subtle yet not complicated. In this approach we'll showcase are based on a primary-standby setup with asynchronous streaming replication. The logical replication cannot benefit most of the cases we're discussing here but the view pg_stat_subscription can help you collect information. However, we'll not focus on that in this blog.

Using pg_stat_replication View

The most common approach is to run a query referencing this view in the primary node. Remember, you can only harvest information from the primary node using this view. This view contains the following table definition based on PostgreSQL 11 as shown below:

postgres=# \d pg_stat_replication

                    View "pg_catalog.pg_stat_replication"

      Column      | Type           | Collation | Nullable | Default 

------------------+--------------------------+-----------+----------+---------

 pid              | integer           | | | 

 usesysid         | oid           | | | 

 usename          | name           | | | 

 application_name | text                     | | | 

 client_addr      | inet           | | | 

 client_hostname  | text           | | | 

 client_port      | integer           | | | 

 backend_start    | timestamp with time zone |           | | 

 backend_xmin     | xid           | | | 

 state            | text           | | | 

 sent_lsn         | pg_lsn           | | | 

 write_lsn        | pg_lsn           | | | 

 flush_lsn        | pg_lsn           | | | 

 replay_lsn       | pg_lsn           | | | 

 write_lag        | interval           | | | 

 flush_lag        | interval           | | | 

 replay_lag       | interval           | | | 

 sync_priority    | integer           | | | 

 sync_state       | text           | | | 

Where the fields are defined as (includes PG < 10 version),

  • pid: Process id of walsender process
  • usesysid: OID of user which is used for Streaming replication.
  • username: Name of user which is used for Streaming replication
  • application_name: Application name connected to master
  • client_addr: Address of standby/streaming replication
  • client_hostname: Hostname of standby.
  • client_port: TCP port number on which standby communicating with WAL sender
  • backend_start: Start time when SR connected to Master.
  • backend_xmin: standby's xmin horizon reported by hot_standby_feedback.
  • state: Current WAL sender state i.e streaming
  • sent_lsn/sent_location: Last transaction location sent to standby.
  • write_lsn/write_location: Last transaction written on disk at standby
  • flush_lsn/flush_location: Last transaction flush on disk at standby.
  • replay_lsn/replay_location: Last transaction flush on disk at standby.
  • write_lag: Elapsed time during committed WALs from primary to the standby (but not yet committed in the standby)
  • flush_lag: Elapsed time during committed WALs from primary to the standby (WAL's has already been flushed but not yet applied)
  • replay_lag: Elapsed time during committed WALs from primary to the standby (fully committed in standby node)
  • sync_priority: Priority of standby server being chosen as synchronous standby
  • sync_state: Sync State of standby (is it async or synchronous).

A sample query would look as follows in PostgreSQL 9.6,

paultest=# select * from pg_stat_replication;

-[ RECORD 1 ]----+------------------------------

pid              | 7174

usesysid         | 16385

usename          | cmon_replication

application_name | pgsql_1_node_1

client_addr      | 192.168.30.30

client_hostname  | 

client_port      | 10580

backend_start    | 2020-02-20 18:45:52.892062+00

backend_xmin     | 

state            | streaming

sent_location    | 1/9FD5D78

write_location   | 1/9FD5D78

flush_location   | 1/9FD5D78

replay_location  | 1/9FD5D78

sync_priority    | 0

sync_state       | async

-[ RECORD 2 ]----+------------------------------

pid              | 7175

usesysid         | 16385

usename          | cmon_replication

application_name | pgsql_80_node_2

client_addr      | 192.168.30.20

client_hostname  | 

client_port      | 60686

backend_start    | 2020-02-20 18:45:52.899446+00

backend_xmin     | 

state            | streaming

sent_location    | 1/9FD5D78

write_location   | 1/9FD5D78

flush_location   | 1/9FD5D78

replay_location  | 1/9FD5D78

sync_priority    | 0

sync_state       | async

This basically tells you what blocks of location in the WAL segments that have been written, flushed, or applied. It provides you a granular overlook of the replication status.

Queries to Use In the Standby Node

In the standby node, there are functions that are supported for which you can mitigate this into a query and provide you the overview of your standby replication's health. To do this, you can run the following query below (query is based on PG version > 10),

postgres=#  select pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();

-[ RECORD 1 ]-----------------+------------------------------

pg_is_in_recovery             | t

pg_is_wal_replay_paused       | f

pg_last_wal_receive_lsn       | 0/2705BDA0

pg_last_wal_replay_lsn        | 0/2705BDA0

pg_last_xact_replay_timestamp | 2020-02-21 02:18:54.603677+00

In older versions, you can use the following query:

postgres=# select pg_is_in_recovery(),pg_last_xlog_receive_location(), pg_last_xlog_replay_location(), pg_last_xact_replay_timestamp();

-[ RECORD 1 ]-----------------+------------------------------

pg_is_in_recovery             | t

pg_last_xlog_receive_location | 1/9FD6490

pg_last_xlog_replay_location  | 1/9FD6490

pg_last_xact_replay_timestamp | 2020-02-21 08:32:40.485958-06

What does the query tell? Functions are defined accordingly here,

  • pg_is_in_recovery(): (boolean) True if recovery is still in progress.
  • pg_last_wal_receive_lsn()/pg_last_xlog_receive_location():  (pg_lsn) The write-ahead log location received and synced to disk by streaming replication. 
  • pg_last_wal_replay_lsn()/pg_last_xlog_replay_location():  (pg_lsn) The last write-ahead log location replayed during recovery. If recovery is still in progress this will increase monotonically.
  • pg_last_xact_replay_timestamp():  (timestamp with time zone) Get timestamp of last transaction replayed during recovery. 

Using some basic math, you can combine these functions. The most common used function that are used by DBA's are,

SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()

THEN 0

ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())

END AS log_delay;

or in versions PG < 10,

SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()

THEN 0

ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())

END AS log_delay;

Although this query has been in-practice and is used by DBA's. Still, it doesn't provide you an accurate view of the lag. Why? Let's discuss this in the next section.

Identifying Lag Caused by WAL Segment's Absence

PostgreSQL standby nodes, which are in recovery mode, does not report to you the exact state of what's happening of your replication. Not unless you view the PG log, you can gather information of what's going on. There's no query you can run to determine this. In most cases, organizations and even small institutions come up with 3rd party softwares to let them be alerted when an alarm is raised. 

One of these is ClusterControl, which offers you observability, sends alerts when alarms are raised or recovers your node in case of disaster or catastrophe happens. Let's take this scenario, my primary-standby async streaming replication cluster has failed. How would you know if something's wrong? Let's combine the following:

Step 1: Determine if There's a Lag

postgres=# SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()

postgres-# THEN 0

postgres-# ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())

postgres-# END AS log_delay;

-[ RECORD 1 ]

log_delay | 0

Step 2: Determine the WAL Segments Received From the Primary and Compare with Standby Node

## Get the master's current LSN. Run the query below in the master

postgres=# SELECT pg_current_wal_lsn();

-[ RECORD 1 ]------+-----------

pg_current_wal_lsn | 0/925D7E70

For older versions of PG < 10, use pg_current_xlog_location.

## Get the current WAL segments received (flushed or applied/replayed)

postgres=# select pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();

-[ RECORD 1 ]-----------------+------------------------------

pg_is_in_recovery             | t

pg_is_wal_replay_paused       | f

pg_last_wal_receive_lsn       | 0/2705BDA0

pg_last_wal_replay_lsn        | 0/2705BDA0

pg_last_xact_replay_timestamp | 2020-02-21 02:18:54.603677+00

That seems to look bad. 

Step 3: Determine How Bad it Could Be

Now, let's mix the formula from step #1 and in step #2 and get the diff. How to do this, PostgreSQL has a function called pg_wal_lsn_diff which is defined as,

pg_wal_lsn_diff(lsn pg_lsn, lsn pg_lsn) / pg_xlog_location_diff (location pg_lsn, location pg_lsn):  (numeric) Calculate the difference between two write-ahead log locations

Now, let's use it to determine the lag. You can run it in any PG node, since it's we'll just provide the static values:

postgres=# select pg_wal_lsn_diff('0/925D7E70','0/2705BDA0');                                                                                                                                     -[ RECORD 1 ]---+-----------

pg_wal_lsn_diff | 1800913104

Let's estimate how much is 1800913104, that seems to be about 1.6GiB might have been absent in the standby node,

postgres=# select round(1800913104/pow(1024,3.0),2) missing_lsn_GiB;

-[ RECORD 1 ]---+-----

missing_lsn_gib | 1.68

Lastly, you can proceed or even prior to the query look at the logs like using tail -5f to follow and check what's going on. Do this for both primary/standby nodes. In this example, we'll see that it has a problem,

## Primary

root@debnode4:/var/lib/postgresql/11/main# tail -5f log/postgresql-2020-02-21_033512.log

2020-02-21 16:44:33.574 UTC [25023] ERROR:  requested WAL segment 000000030000000000000027 has already been removed

...



## Standby

root@debnode5:/var/lib/postgresql/11/main# tail -5f log/postgresql-2020-02-21_014137.log 

2020-02-21 16:45:23.599 UTC [26976] LOG:  started streaming WAL from primary at 0/27000000 on timeline 3

2020-02-21 16:45:23.599 UTC [26976] FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000030000000000000027 has already been removed

...

When encountering this issue, it's better to rebuild your standby nodes. In ClusterControl, it's as easy as one click. Just go to the Nodes/Topology section, and rebuild the node just like below:

Other Things to Check

You can use the same approach in our previous blog (in MySQL), using system tools such as ps, top, iostat, netstat combination. For example, you can also get the current recovered WAL segment from the standby node,

root@debnode5:/var/lib/postgresql/11/main# ps axufwww|egrep "postgre[s].*startup"

postgres  8065 0.0 8.3 715820 170872 ?       Ss 01:41 0:03 \_ postgres: 11/main: startup   recovering 000000030000000000000027

How Can ClusterControl Help?

ClusterControl offers an efficient way on monitoring your database nodes from primary to the slave nodes. When going to the Overview tab, you have already the view of your replication health:

Basically, the two screenshots above displays how's the replication health and what's the current WAL segments. That's not at all. ClusterControl also shows the current activity of what's going on with your Cluster.

Conclusion

Monitoring the replication health in PostgreSQL can end up on a different approach as long as you are able to meet your needs. Using third party tools with observability that can notify you in case of catastrophe is your perfect route, whether an open source or enterprise. The most important thing is, you have your disaster recovery plan and business continuity planned ahead of such trouble.

by Paul Namuag at February 27, 2020 07:22 PM

February 26, 2020

SeveralNines

How to Protect Your MySQL & MariaDB Database Against Cyberattacks When on a Public Network

It is sometimes inevitable to run MySQL database servers on a public or exposed network. This is a common setup in a shared hosting environment, where a server is configured with multiple services and often running within the same server as the database server. For those who have this kind of setup, you should always have some kind of protection against cyberattacks like denial-of-service, hacking, cracking, data breaches; all which can result in data loss. These are things that we always want to avoid for our database server. 

Here are some of the tips that we can do to improve our MySQL or MariaDB security.

Scan Your Database Servers Regularly

Protection against any malicious files in the server is very critical. Scan the server regularly to look for any viruses, spywares, malwares or rootkits especially if the database server is co-located with other services like mail server, HTTP, FTP, DNS, WebDAV, telnet and so on. Commonly, most of the database hacked issues originated from the application tier that is facing the public network. Thus, it's important to scan all files, especially web/application files since they are one of the entry points to get into the server. If those are compromised, the hacker can get into the application directory, and have the ability to read the application files. These might contain sensitive information, for instance, the database login credentials. 

ClamAV is one of the most widely known and widely trusted antivirus solutions for a variety of operating systems, including Linux. It's free and very easy to install and comes with a fairly good detection mechanism to look for unwanted things in your server. Schedule periodic scans in the cron job, for example:

0 3 * * * /bin/freshclam ; /bin/clamscan / --recursive=yes -i > /tmp/clamav.log ; mail -s clamav_log_`hostname` monitor@mydomain.local < /tmp/clamav.log

The above will update the ClamAV virus database, scan all directories and files and send you an email on the status of the execution and report every day at 3 AM.

Use Stricter User Roles and Privileges

When creating a MySQL user, do not allow all hosts to access the MySQL server with wildcard host (%). You should scan your MySQL host and look for any wildcard host value, as shown in the following statement:

mysql> SELECT user,host FROM mysql.user WHERE host = '%';
+---------+------+
| user    | host |
+---------+------+
| myadmin | %    |
| sbtest  | %    |
| user1   | %    |
+---------+------+

From the above output, strict or remove all users that have only '%' value under Host column. Users that need to access the MySQL server remotely can be enforced to use SSH tunnelling method, which does not require remote host configuration for MySQL users. Most of the MySQL administration clients such as MySQL Workbench and HeidiSQL can be configured to connect to a MySQL server via SSH tunelling, therefore it's possible to completely eliminate remote connection for MySQL users.

Also, limit the SUPER privilege to only users from localhost, or connecting via UNIX socket file. Be more cautious when assigning FILE privilege to non-root users since it permits read and write files on the server using the LOAD DATA INFILE and SELECT ... INTO OUTFILE statements. Any user to whom this privilege is granted can also read or write any file that the MySQL server can read or write.

Change the Database Default Settings

By moving away from the default setup, naming and configurations, we can reduce the attack vector to a number of folds. The following actions are some examples on default configurations that DBAs could easily change but commonly overlooked related to MySQL:

  • Change default MySQL port to other than 3306.
  • Rename the MySQL root username to other than "root".
  • Enforce password expiration and reduce the password lifetime for all users.
  • If MySQL is co-located with the application servers, enforce connection through UNIX socket file only, and stop listening on port 3306 for all IP addresses.
  • Enforce client-server encryption and server-server replication encryption.

We actually have covered this in detail in this blog post, How to Secure MySQL/MariaDB Servers.

Setup a Delayed Slave

A delayed slave is just a typical slave, however the slave server intentionally executes transactions later than the master by at least a specified amount of time, available from MySQL 5.6. Basically, an event received from the master is not executed until at least N seconds later than its execution on the master. The result is that the slave will reflect the state of the master some time back in the past.

A delayed slave can be used to recover data, which would be helpful when the problem is found immediately, within the period of delay. Suppose we configured a slave with a 6-hour delay from the master. If our database were modified or deleted (accidentally by a developer or deliberately by a hacker) within this time range, there is a possibility for us to revert to the moment right before it happened by stopping the current master, then bringing the slave server up until certain point with the following command:

# on delayed slave
mysql> STOP SLAVE;
mysql> START SLAVE UNTIL MASTER_LOG_FILE='xxxxx', MASTER_LOG_POS=yyyyyy;

Where 'xxxxx' is the binary log file and 'yyyyy' is the position right before the disaster happens (use mysqlbinlog tool to examine those events). Finally, promote the slave to become the new master and your MySQL service is now back operational as usual. This method is probably the fastest way to recover your MySQL database in production environment without having to reload a backup. Having a number of delayed slaves with different length durations, as shown in this blog, Multiple Delayed Replication Slaves for Disaster Recovery with Low RTO on how to set up a cost-effective delayed replication servers on top of Docker containers.

Enable Binary Logging

Binary logging is generally recommended to be enabled even though you are running on a standalone MySQL/MariaDB server. The binary log contains information about SQL statements that modify database contents. The information is stored in the form of "events" that describe the modifications. Despite performance impact, having binary log allows you to have the possibility to replay your database server to the exact point where you want it to be restored, also known as point-in-time recovery (PITR). Binary logging is also mandatory for replication. 

With binary logging enabled, one has to include the binary log file and position information when taking up a full backup. For mysqldump, using the --master-data flag with value 1 or 2 will print out the necessary information that we can use as a starting point to roll forward the database when replaying the binary logs later on. 

With binary logging enabled, you can use another cool recovery feature called flashback, which is described in the next section.

Enable Flashback

The flashback feature is available in MariaDB, where you can restore back the data to the previous snapshot in a MySQL database or in a table. Flashback uses the mysqlbinlog to create the rollback statements and it needs a FULL binary log row image for that. Thus, to use this feature, the MySQL/MariaDB server must be configured with the following:

[mysqld]
...
binlog_format = ROW
binlog_row_image = FULL

The following architecture diagram illustrates how flashback is configured on one of the slave:

To perform the flashback operation, firstly you have to determine the date and time when you want to "see" the data, or binary log file and position. Then, use the --flashback flag with mysqlbinlog utility to generate SQL statements to rollback the data to that point. In the generated SQL file, you will notice that the DELETE events are converted to INSERTs and vice versa, and also it swaps WHERE and SET parts of the UPDATE events. 

The following command line should be executed on the slave2 (configured with binlog_row_image=FULL):

$ mysqlbinlog --flashback --start-datetime="2020-02-17 01:30:00"  /var/lib/mysql/mysql-bin.000028 -v --database=shop --table=products > flashback_to_2020-02-17_013000.sql

Then, detach slave2 from the replication chain because we are going to break it and use the server to rollback our data:

mysql> STOP SLAVE;
mysql> RESET MASTER;
mysql> RESET SLAVE ALL;

Finally, import the generated SQL file into the MariaDB server for database shop on slave2:

$ mysql -u root -p shop < flashback_to_2020-02-17_013000.sql

When the above is applied, the table "products" will be at the state of 2020-02-17 01:30:00. Technically, the generated SQL file can be applied to both MariaDB and MySQL servers. You could also transfer the mysqlbinlog binary from MariaDB server so you can use the flashback feature on a MySQL server. However, MySQL GTID implementation is different than MariaDB thus restoring the SQL file requires you to disable MySQL GTID.

A couple of advantages using flashback is you do not need to stop the MySQL/MariaDB server to carry out this operation. When the amount of data to revert is small, the flashback process is much faster than recovering the data from a full backup. 

Log All Database Queries

General log basically captures every SQL statement being executed by the client in the MySQL server. However, this might not be a popular decision on a busy production server due to the performance impact and space consumption. If performance matters, binary log has the higher priority to be enabled. General log can be enabled during runtime by running the following commands:

mysql> SET global general_log_file='/tmp/mysql.log'; 
mysql> SET global log_output = 'file';
mysql> SET global general_log = ON;

You can also set the general log output to a table:

mysql> SET global log_output = 'table';

You can then use the standard SELECT statement against the mysql.general_log table to retrieve queries. Do expect a bit more performance impact when running with this configuration as shown in this blog post.

Otherwise, you can use external monitoring tools that can perform query sampling and monitoring so you can filter and audit the queries that come into the server. ClusterControl can be used to collect and summaries all your queries, as shown in the following screenshots where we filter all queries that contain DELETE string:

Similar information is also available under ProxySQL's top queries page (if your application is connecting via ProxySQL):

This can be used to track recent changes that have happened to the database server and can also be used for auditing purposes. 

Conclusion

Your MySQL and MariaDB servers must be well-protected at all times since it usually contains sensitive data that attackers are looking after. You may also use ClusterControl to manage the security aspects of your database servers, as showcased by this blog post, How to Secure Your Open Source Databases with ClusterControl.

by ashraf at February 26, 2020 07:43 PM

February 25, 2020

SeveralNines

How to Identify PostgreSQL Performance Issues with Slow Queries

When working with OLTP (OnLine Transaction Processing) databases, query performance is paramount as it directly impacts the user experience. Slow queries mean that the application feels unresponsive and slow and this results in bad conversion rates, unhappy users, and all sets of problems. 

OLTP is one of the common use cases for PostgreSQL therefore you want your queries to run as smooth as possible. In this blog we’d like to talk about how you can identify problems with slow queries in PostgreSQL.

Understanding the Slow Log

Generally speaking, the most typical way of identifying performance problems with PostgreSQL is to collect slow queries. There are a couple of ways you can do it. First, you can enable it on a single database:

pgbench=# ALTER DATABASE pgbench SET log_min_duration_statement=0;

ALTER DATABASE

After this all new connections to ‘pgbench’ database will be logged into PostgreSQL log.

It is also possible to enable this globally by adding:

log_min_duration_statement = 0

to PostgreSQL configuration and then reload config:

pgbench=# SELECT pg_reload_conf();

 pg_reload_conf

----------------

 t

(1 row)

This enables logging of all queries across all of the databases in your PostgreSQL. If you do not see any logs, you may want to enable logging_collector = on as well. The logs will include all of the traffic coming to PostgreSQL system tables, making it more noisy. For our purposes let’s stick to the database level logging.

What you’ll see in the log are entries as below:

2020-02-21 09:45:39.022 UTC [13542] LOG:  duration: 0.145 ms statement: SELECT abalance FROM pgbench_accounts WHERE aid = 29817899;

2020-02-21 09:45:39.022 UTC [13544] LOG:  duration: 0.107 ms statement: SELECT abalance FROM pgbench_accounts WHERE aid = 11782597;

2020-02-21 09:45:39.022 UTC [13529] LOG:  duration: 0.065 ms statement: SELECT abalance FROM pgbench_accounts WHERE aid = 16318529;

2020-02-21 09:45:39.022 UTC [13529] LOG:  duration: 0.082 ms statement: UPDATE pgbench_tellers SET tbalance = tbalance + 3063 WHERE tid = 3244;

2020-02-21 09:45:39.022 UTC [13526] LOG:  duration: 16.450 ms statement: UPDATE pgbench_branches SET bbalance = bbalance + 1359 WHERE bid = 195;

2020-02-21 09:45:39.023 UTC [13523] LOG:  duration: 15.824 ms statement: UPDATE pgbench_accounts SET abalance = abalance + -3726 WHERE aid = 5290358;

2020-02-21 09:45:39.023 UTC [13542] LOG:  duration: 0.107 ms statement: UPDATE pgbench_tellers SET tbalance = tbalance + -2716 WHERE tid = 1794;

2020-02-21 09:45:39.024 UTC [13544] LOG:  duration: 0.112 ms statement: UPDATE pgbench_tellers SET tbalance = tbalance + -3814 WHERE tid = 278;

2020-02-21 09:45:39.024 UTC [13526] LOG:  duration: 0.060 ms statement: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (4876, 195, 39955137, 1359, CURRENT_TIMESTAMP);

2020-02-21 09:45:39.024 UTC [13529] LOG:  duration: 0.081 ms statement: UPDATE pgbench_branches SET bbalance = bbalance + 3063 WHERE bid = 369;

2020-02-21 09:45:39.024 UTC [13523] LOG:  duration: 0.063 ms statement: SELECT abalance FROM pgbench_accounts WHERE aid = 5290358;

2020-02-21 09:45:39.024 UTC [13542] LOG:  duration: 0.100 ms statement: UPDATE pgbench_branches SET bbalance = bbalance + -2716 WHERE bid = 210;

2020-02-21 09:45:39.026 UTC [13523] LOG:  duration: 0.092 ms statement: UPDATE pgbench_tellers SET tbalance = tbalance + -3726 WHERE tid = 67;

2020-02-21 09:45:39.026 UTC [13529] LOG:  duration: 0.090 ms statement: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (3244, 369, 16318529, 3063, CURRENT_TIMESTAMP);

You can see information about the query and its duration. Not much else but it’s definitely a good place to start. The main thing to keep in mind is that not every slow query is a problem. Sometimes queries have to access a significant amount of data and it is expected for them to take longer to access and analyze all of the information user asked for. Another question is what “slow” means? This mostly depends on the application. If we are talking about interactive applications, most likely anything slower than a second is noticeable. Ideally everything is executed within 100 - 200 milliseconds limit.

Developing a Query Execution Plan

Once we determine that given query is indeed something we want to improve, we should take a look at the query execution plan. First of all, it may happen that there’s nothing we can do about it and we’ll have to accept that given query is just slow. Second, query execution plans may change. Optimizers always try to pick the most optimal execution plan but they make their decisions based on just a sample of data therefore it may happen that the query execution plan changes in time. In PostgreSQL you can check the execution plan in two ways. First, the estimated execution plan, using EXPLAIN:

pgbench=# EXPLAIN SELECT abalance FROM pgbench_accounts WHERE aid = 5290358;

                                          QUERY PLAN

----------------------------------------------------------------------------------------------

 Index Scan using pgbench_accounts_pkey on pgbench_accounts  (cost=0.56..8.58 rows=1 width=4)

   Index Cond: (aid = 5290358)

As you can see, we are expected to access data using primary key lookup. If we want to double-check how exactly the query will be executed, we can use EXPLAIN ANALYZE:

pgbench=# EXPLAIN ANALYZE SELECT abalance FROM pgbench_accounts WHERE aid = 5290358;

                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------

 Index Scan using pgbench_accounts_pkey on pgbench_accounts  (cost=0.56..8.58 rows=1 width=4) (actual time=0.046..0.065 rows=1 loops=1)

   Index Cond: (aid = 5290358)

 Planning time: 0.053 ms

 Execution time: 0.084 ms

(4 rows)

Now, PostgreSQL has executed this query and it can tell us not just the estimates but exact numbers when it comes to the execution plan, number of rows accessed and so on. Please keep in mind that logging all of the queries may become a serious overhead on your system. You should also keep an eye on the logs and ensure they are properly rotated.

Pg_stat_statements

Pg_stat_statements is the extension that collects execution statistics for different query types.

pgbench=# select query, calls, total_time, min_time, max_time, mean_time, stddev_time, rows from public.pg_stat_statements order by calls desc LIMIT 10;

                                                query                                                 | calls | total_time | min_time | max_time |     mean_time | stddev_time | rows

------------------------------------------------------------------------------------------------------+-------+------------------+----------+------------+---------------------+---------------------+-------

 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2                                  | 30437 | 6636.83641200002 | 0.006533 | 83.832148 | 0.218051595492329 | 1.84977058799388 | 30437

 BEGIN                                                                                                | 30437 | 231.095600000001 | 0.000205 | 20.260355 | 0.00759258796859083 | 0.26671126085716 | 0

 END                                                                                                  | 30437 | 229.483213999999 | 0.000211 | 16.980678 | 0.0075396134310215 | 0.223837608828596 | 0

 UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2                                  | 30437 | 290021.784321001 | 0.019568 | 805.171845 | 9.52859297305914 | 13.6632712046825 | 30437

 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2                                   | 30437 | 6667.27243200002 | 0.00732 | 212.479269 | 0.219051563294674 | 2.13585110968012 | 30437

 SELECT abalance FROM pgbench_accounts WHERE aid = $1                                                 | 30437 | 3702.19730600006 | 0.00627 | 38.860846 | 0.121634763807208 | 1.07735927551245 | 30437

 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP) | 30437 | 2349.22475800002 | 0.003218 |  61.372127 | 0.0771831901304325 | 0.971590327400244 | 30437

 SELECT $1                                                                                            | 6847 | 60.785467 | 0.002321 | 7.882384 | 0.00887767883744706 | 0.105198744982906 | 6847

 insert into pgbench_tellers(tid,bid,tbalance) values ($1,$2,$3)                                      | 5000 | 18.592042 | 0.001572 | 0.741427 | 0.0037184084 | 0.0137660355678027 | 5000

 insert into pgbench_tellers(tid,bid,tbalance) values ($1,$2,$3)                                      | 3000 | 7.323788 | 0.001598 | 0.40152 | 0.00244126266666667 | 0.00834442591085048 | 3000

(10 rows)

As you can see on the data above, we have a list of different queries and information about their execution times - this is just a part of the data you can see in the pg_stat_statements but it is enough for us to understand that our primary key lookup takes sometimes almost 39 seconds to complete - this does not look good and it is definitely something we want to investigate.

If you do not have pg_stat_statements enabled, you can do it in a standard way. Either via configuration file and

shared_preload_libraries = 'pg_stat_statements'

Or you can enable it via PostgreSQL command line:

pgbench=# CREATE EXTENSION pg_stat_statements;

CREATE EXTENSION

Using ClusterControl to Eliminate Slow Queries

If you happen to use ClusterControl to manage your PostgreSQL database, you can use it to collect data about slow queries.

As you can see, it collects data about query execution - rows sent and examined, execution time statistics and so on. With it you can easily pinpoint the most expensive queries, and see what the average and maximum execution times looks like. By default ClusterControl collects queries that took longer than 0.5 second to complete, you can change this in the settings:

Conclusion

This short blog by no means covers all of the aspects and tools helpful in identifying and solving query performance problems in PostgreSQL. We hope it is a good start and that it will help you to understand what you can do to pinpoint the root cause of the slow queries.

by krzysztof at February 25, 2020 07:17 PM

February 24, 2020

SeveralNines

My PostgreSQL Database is Out of Disk Space

Disk space is a demanding resource nowadays. You usually will want to store data as long as possible, but this could be a problem if you don’t take the necessary actions to prevent a potential “out of disk space” issue. 

In this blog, we will see how we can detect this issue for PostgreSQL, prevent it, and if it is too late, some options that probably will help you to fix it.

How to Identify PostgreSQL Disk Space Issues

If you, unfortunately, are in this out of disk space situation, you will able to see some errors in the PostgreSQL database logs:

2020-02-20 19:18:18.131 UTC [4400] LOG:  could not close temporary statistics file "pg_stat_tmp/global.tmp": No space left on device

or even in your system log:

Feb 20 19:29:26 blog-pg1 rsyslogd: imjournal: fclose() failed for path: '/var/lib/rsyslog/imjournal.state.tmp': No space left on device [v8.24.0-41.el7_7.2 try http://www.rsyslog.com/e/2027 ]

PostgreSQL can continue works for awhile running read-only queries, but eventually, it will fail trying to write to disk, then you will see something like this in your client session:

WARNING:  terminating connection because of crash of another server process

DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

HINT:  In a moment you should be able to reconnect to the database and repeat your command.

server closed the connection unexpectedly

This probably means the server terminated abnormally

before or while processing the request.

The connection to the server was lost. Attempting reset: Failed.

Then, if you take a look at the disk space, you will have this unwanted output…

$ df -h

Filesystem                        Size Used Avail Use% Mounted on

/dev/mapper/pve-vm--125--disk--0   30G 30G 0 100% /

How to Prevent PostgreSQL Disk Space Issues

The main way to prevent this kind of issue is by monitoring the disk space usage, and database or disk usage growth. For this, a graph should be a friendly way to monitor the disk space increment:

PostgreSQL Disk Space - ClusterControl

And the same for the database growth:

PostgreSQL Database Growth - ClusterControl

Another important thing to monitor is the replication status. If you have a replica and, for some reason, this stops working, depending on the configuration, it could be possible that PostgreSQL store all the WAL files to restore the replica when it comes back.

PostgreSQL Topology

All this monitoring system doesn’t make sense without an alerting system to know when you need to take actions:

How to Fix PostgreSQL Disk Space Issues

Well, if you are facing this out of disk space issue even with the monitoring and alerting system implemented (or not), there are many options to try to fix this issue without data lost (or the less as possible).

What is Consuming Your Disk Space?

The first step should be determining where my disk space is. A best practice is having separate partitions, at least one separate partition for your database storage, so you can easily confirm if your database or your system is using excessive disk space. Another advantage of this is to minimize the damage. If your root partition is full, your database can still write in his own partition without issues.

Database Space Usage

Let’s see now some useful commands to check your database disk space usage.

A basic way to check the database space usage is checking the data directory in the filesystem:

$ du -sh /var/lib/pgsql/11/data/

819M /var/lib/pgsql/11/data/

Or if you have a separate partition for your data directory, you can use df -h directly.

The PostgreSQL command “\l+” list the databases adding the size information:

$ postgres=# \l+

                                                               List of databases

   Name    | Owner   | Encoding | Collate | Ctype |   Access privileges | Size | Tablespace

|                Description

-----------+----------+-----------+---------+-------+-----------------------+---------+------------

+--------------------------------------------

 postgres  | postgres | SQL_ASCII | C       | C | | 7965 kB | pg_default

| default administrative connection database

 template0 | postgres | SQL_ASCII | C       | C | =c/postgres +| 7817 kB | pg_default

| unmodifiable empty database

           |          | |         | | postgres=CTc/postgres |         |

|

 template1 | postgres | SQL_ASCII | C       | C | =c/postgres +| 7817 kB | pg_default

| default template for new databases

           |          | |         | | postgres=CTc/postgres |         |

|

 world     | postgres | SQL_ASCII | C       | C | | 8629 kB | pg_default

|

(4 rows)

Using pg_database_size and the database name you can see the database size:

postgres=# SELECT pg_database_size('world');

 pg_database_size

------------------

          8835743

(1 row)

And using the pg_size_pretty to see this value in a human-readable way could be even better:

postgres=# SELECT pg_size_pretty(pg_database_size('world'));

 pg_size_pretty

----------------

 8629 kB

(1 row)

When you know where space is, you can take the corresponding action to fix it. Keep in mind that just deleting rows is not enough to recover the disk space, you will need to run a VACUUM or VACUUM FULL to finish the task. 

Log Files

The easiest way to recover disk space is by deleting log files. You can check the PostgreSQL log directory or even the system logs to verify if you can gain some space from there. If you have something like this:

$ du -sh /var/lib/pgsql/11/data/log/

18G /var/lib/pgsql/11/data/log/

You should check the directory content to see if there is a log rotation/retention problem or something is happening in your database and writing it to the logs.

$ ls -lah /var/lib/pgsql/11/data/log/

total 18G

drwx------  2 postgres postgres 4.0K Feb 21 00:00 .

drwx------ 21 postgres postgres 4.0K Feb 21 00:00 ..

-rw-------  1 postgres postgres  18G Feb 21 14:46 postgresql-Fri.log

-rw-------  1 postgres postgres 9.3K Feb 20 22:52 postgresql-Thu.log

-rw-------  1 postgres postgres 3.3K Feb 19 22:36 postgresql-Wed.log

Before deleting the logs, if you have a huge one, a good practice is to keep the last 100 lines or so, and then delete it. So, you can check what is happening after generating free space.

$ tail -100 postgresql-Fri.log > /tmp/log_temp.log

And then:

$ cat /dev/null > /var/lib/pgsql/11/data/log/postgresql-Fri.log

If you just delete it with “rm” and the log file is being used by the PostgreSQL server (or another service) space won’t be released, so you should truncate this file using this cat /dev/null command instead.

This action is only for PostgreSQL and system log files. Don’t delete the pg_wal content or another PostgreSQL file as it could generate critical damage to your database.

Bloat

In a normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from the table; they are present until a VACUUM is performed. So, it is necessary to do the VACUUM periodically (AUTOVACUUM), especially in frequently-updated tables.

The problem here is space is not returned to the operating system using just VACUUM, it is only available for use in the same table.

VACUUM FULL rewrites the table into a new disk file, returning the unused space to the operating system. Unfortunately, it requires an exclusive lock on each table while it is running.

You should check the tables to see if a VACUUM (FULL) process is required.

Replication Slots

If you are using replication slots, and it is not active for some reason:

postgres=# SELECT slot_name, slot_type, active FROM pg_replication_slots;

 slot_name | slot_type | active

-----------+-----------+--------

 slot1     | physical  | f

(1 row)

It could be a problem for your disk space because it will store the WAL files until they have been received by all the standby nodes.

The way to fix it is recovering the replica (if possible), or deleting the slot:

postgres=# SELECT pg_drop_replication_slot('slot1');

 pg_drop_replication_slot

--------------------------

(1 row)

So, the space used by the WAL files will be released.

Conclusion

As we mentioned, monitoring and alerting systems are the keys to avoiding these kinds of issues. In this way, ClusterControl can help you to have your systems up and running, sending you alarms when needed or even taking recovery action to keep your database cluster working. You can also deploy/import different database technologies and scaling them out if needed.

by Sebastian Insausti at February 24, 2020 09:47 PM

February 22, 2020

Valeriy Kravchuk

Fun with Bugs #94 - On MySQL Bug Reports I am Subscribed to, Part XXVIII

I may get a chance to speak about proper bugs processing for open source projects later this year, so I have to keep reviewing recent MySQL bugs to be ready for that. In my previous post in this series I listed some interesting MySQL bug reports created in December, 2019. Time to move on to January, 2020! Belated Happy New Year of cool MySQL Bugs!

As usual I mostly care about InnoDB, replication and optimizer bugs and explicitly mention bug reporter by name and give link to his other active reports (if any). I also pick up examples of proper (or improper) reporter and Oracle engineers attitudes. Here is the list:
  • Bug #98103 - "unexpected behavior while logging an aborted query in the slow query log".  Query that was killed while waiting for the table metadata lock is not only get logged, but also lock wait time is saved as query execution time. I'd like to highlight how bug reporter, Pranay Motupalli, used gdb to study what really happens in the code in this case. Perfect bug report!
  • Bug #98113 - "Crash possible when load & unload a connection handler". The (quite obvious) bug was verified based on code review, but only after some efforts were spent by Oracle engineer on denial to accept the problem and its importance. This bug was reported by Fangxin Flou.
  • Bug #98132 - "Analyze table leads to empty statistics during online rebuild DDL ". Nice addition to my collections! This bug with a nice and clear test case was reported by Albert Hu, who also suggested a fix.
  • Bug #98139 - "Committing a XA transaction causes a wrong sequence of events in binlog". This bug reported by Dehao Wang was verified as a "documentation" one, but I doubt documenting current behavior properly is an acceptable fix. Bug reporter suggested to commit in the binary log first, for example. Current implementation that allows users to commit/rollback a XA transaction by using another connection if the former connection is closed or killed, is risky. A lot of arguing happened in comments in the process, and my comment asking for a clear quote from the manual:
    Would you be so kind to share some text from this page you mentioned:

    https://dev.mysql.com/doc/refman/8.0/en/xa.html

    or any other fine MySQL 8 manual page stating that XA COMMIT is NOT supported when executed from session/connection/thread other than those prepared the XA transaction? I am doing something wrong probably, but I can not find such text anywhere.
    was hidden. Let's see what happens to this bug report next.
  • Bug #98211 - "Auto increment value didn't reset correctly.". Not sure what this bug reported by Zhao Jianwei has to do with "Data Types", IMHO it's more about DDL or data dictionary. Again, some sarcastic comments from Community users were needed to put work on this bug back on track...
  • Bug #98220 - "with log_slow_extra=on Errno: info not getting updated correctly for error". This bug was reported by lalit Choudhary from Percona.
  • Bug #98227 - "innodb_stats_method='nulls_ignored' and persistent stats get wrong cardinalities". I think category is wrong for this bug. It's a but in InnoDB's persistent statistics implementation, one of many. The bug was reported by Agustín G from Percona.
  • Bug #98231 - "show index from a partition table gets a wrong cardinality value". Yet another by report by Albert Hu. that ended up as a "documentation" bug for now, even though older MySQL versions provided better cardinality estimations than MySQL 8.0 in this case (so this is a regression of a kind). I hope the bug will be re-classified and properly processed later.
  • Bug #98238 - "I_S.KEY_COLUMN_USAGE is very slow". I am surprised to see such a bug in MySQL 8. According to the bug reporter, Manuel Mausz, this is also a kind of regression comparing to older MySQL version, where these queries used to run faster. Surely, no "regression" tag in this case was added.
  • Bug #98284 - "Low sysbench score in the case of a large number of connections". This notable performance regression of MySQL 8 vs 5.7 was reported by zanye zjy. perf profiling pointed out towards ppoll() where a lot of time is spent. There is a fix suggested by Fangxin Flou (to use poll() instead), but the bug is still "Open".
  • Bug #98287 - "Explanation of hash joins is inconsistent across EXPLAIN formats". This bug was reported by Saverio M and ended up marked as a duplicate of Bug #97299 fixed in upcoming 8.0.20. Use EXPLAIN FORMAT=TREE in the meantime to see proper information about hash joins usage in the plan.
  • Bug #98288 - "xa commit crash lead mysql replication error". This bug report from Phoenix Zhang (who also suggested a patch) was declared a duplicate of Bug #76233 - "XA prepare is logged ahead of engine prepare" (that I've already discussed among other XA transactions bugs here).
  • Bug #98324 - "Deadlocks more frequent since version 5.7.26". Nice regression bug report by Przemyslaw Malkowski from Percona, with additional test provided later by Stephen Wei . Interestingly enough, test results shared by Umesh Shastry show that MySQL 8.0.19 is affected in the same way as 5.7.26+, but 8.0.19 is NOT listed as one of versions affected. This is a mistake to fix, along with missing regression tag.
  • Bug #98427 - "InnoDB FullText AUX Tables are broken in 8.0". Yet another regression in MySQL 8 was found by Satya Bodapati. Change in default collation for utf8mb4 character set caused this it seems. InnoDB FULLTEXT search was far from perfect anyway...
The are clouds in the sky of MySQL bugs processing.
To summarize:
  1.  Still too much time and efforts are sometimes spent on arguing with bug reporter instead of accepting and processing bugs properly. This is unfortunate.
  2. Sometimes bugs are wrongly classified when verified (documentation vs code bug, wrong category, wrong severity, not all affected versions are listed, ignoring regression etc). This is also unfortunate.
  3. Percona engineers still help to make MySQL better.
  4. There are some fixes in upcoming MySQL 8.0.20 that I am waiting for :)
  5. XA transactions in MySQL are badly broken (they are not atomic in storage engine + binary log) and hardly safe to use in reality.

by Valerii Kravchuk (noreply@blogger.com) at February 22, 2020 08:21 PM

February 21, 2020

SeveralNines

What to Check if MySQL Memory Utilisation is High

One of the key factors of a performant MySQL database server is having good memory allocation and utilization, especially when running it in a production environment. But how can you determine if the MySQL utilization is optimized? Is it reasonable to have high memory utilization or does it require fine tuning? What if I come up against a memory leak?

Let's cover these topics and show the things you can check in MySQL to determine traces of high memory utilization.

Memory Allocation in MySQL

Before we delve into the specific subject title, I'll just give a short information about how MySQL uses memory. Memory plays a significant resource for speed and efficiency when handling concurrent transactions and running big queries. Each thread in MySQL demands memory which is used to manage client connections, and these threads share the same base memory. Variables like thread_stack (stack for threads), net_buffer_length (for connection buffer and result buffer), or with max_allowed_packet where connection and result will dynamically enlarge up to this value when needed, are variables that do affect memory utilization. When a thread is no longer needed, the memory allocated to it is released and returned to the system unless the thread goes back into the thread cache. In that case, the memory remains allocated. Query joins, query caches, sorting, table cache, table definitions do require memory in MySQL but these are attributed with system variables that you can configure and set.

In most cases, the memory-specific variables set for a configuration are targeted on a storage-based specific configuration such as MyISAM or InnoDB. When a mysqld instance spawns within the host system, MySQL allocates buffers and caches to improve performance of database operations based on the set values set on a specific configuration. For example, the most common variables every DBA will set in InnoDB are variables innodb_buffer_pool_size and innodb_buffer_pool_instances which are both related to buffer pool memory allocation that holds cached data for InnoDB tables. It's desirable if you have large memory and are expecting to handle big transactions by setting innodb_buffer_pool_instances to improve concurrency by dividing the buffer pool into multiple buffer pool instances. 

While for MyISAM, you have to deal with key_buffer_size to handle the amount of memory that the key buffer will handle. MyISAM also allocates buffer for every concurrent threads which contains a table structure, column structures for each column, and a buffer of size 3 * N are allocated (where N is the maximum row length, not counting BLOB columns).  MyISAM also maintains one extra row buffer for internal use.

MySQL also allocates memory for temporary tables unless it becomes too large (determined by tmp_table_size and max_heap_table_size). If you are using MEMORY tables and variable max_heap_table_size is set very high, this can also take a large memory since max_heap_table_size system variable determines how large a table can grow, and there is no conversion to on-disk format.

MySQL also has a Performance Schema which is a feature for monitoring MySQL activities at a low level. Once this is enabled, it dynamically allocates memory incrementally, scaling its memory use to actual server load, instead of allocating required memory during server startup. Once memory is allocated, it is not freed until the server is restarted. 

MySQL can also be configured to allocate large areas of memory for its buffer pool if using Linux and if kernel is enabled for large page support, i.e. using HugePages

What To Check Once MySQL Memory is High

Check Running Queries

It's very common for MySQL DBAs to touch base first what's going on with the running MySQL server. The most basic procedures are check processlist, check server status, and check the storage engine status. To do these things, basically, you have just to run the series of queries by logging in to MySQL. See below:

To view the running queries,

mysql> SHOW [FULL] PROCESSLIST;

Viewing the current processlist reveals queries that are running actively or even idle or sleeping processes. It is very important and is a significant routine to have a record of queries that are running. As noted on how MySQL allocates memory, running queries will utilize memory allocation and can drastically cause performance issues if not monitored.

View the MySQL server status variables,

mysql> SHOW SERVER STATUS\G

or filter specific variables like

mysql> SHOW SERVER STATUS WHERE variable_name IN ('<var1>', 'var2'...);

MySQL's status variables serve as your statistical information to grab metric data to determine how your MySQL performs by observing the counters given by the status values. There are certain values here which gives you a glance that impacts memory utilization. For example, checking the number of threads, the number of table caches, or the buffer pool usage,

...

| Created_tmp_disk_tables                 | 24240 |

| Created_tmp_tables                      | 334999 |

…

| Innodb_buffer_pool_pages_data           | 754         |

| Innodb_buffer_pool_bytes_data           | 12353536         |

...

| Innodb_buffer_pool_pages_dirty          | 6         |

| Innodb_buffer_pool_bytes_dirty          | 98304         |

| Innodb_buffer_pool_pages_flushed        | 30383         |

| Innodb_buffer_pool_pages_free           | 130289         |

…

| Open_table_definitions                  | 540 |

| Open_tables                             | 1024 |

| Opened_table_definitions                | 540 |

| Opened_tables                           | 700887 |

...

| Threads_connected                             | 5 |

...

| Threads_cached    | 2 |

| Threads_connected | 5     |

| Threads_created   | 7 |

| Threads_running   | 1 |

View the engine's monitor status, for example, InnoDB status

mysql> SHOW ENGINE INNODB STATUS\G

The InnoDB status also reveals the current status of transactions that the storage engine is processing. It gives you the heap size of a transaction, adaptive hash indexes revealing its buffer usage, or shows you the innodb buffer pool information just like the example below:

---TRANSACTION 10798819, ACTIVE 0 sec inserting, thread declared inside InnoDB 1201

mysql tables in use 1, locked 1

1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 8801

MySQL thread id 68481, OS thread handle 139953970235136, query id 681821 localhost root copy to tmp table

ALTER TABLE NewAddressCode2_2 ENGINE=INNODB



…

-------------------------------------

INSERT BUFFER AND ADAPTIVE HASH INDEX

-------------------------------------

Ibuf: size 528, free list len 43894, seg size 44423, 1773 merges

merged operations:

 insert 63140, delete mark 0, delete 0

discarded operations:

 insert 0, delete mark 0, delete 0

Hash table size 553193, node heap has 1 buffer(s)

Hash table size 553193, node heap has 637 buffer(s)

Hash table size 553193, node heap has 772 buffer(s)

Hash table size 553193, node heap has 1239 buffer(s)

Hash table size 553193, node heap has 2 buffer(s)

Hash table size 553193, node heap has 0 buffer(s)

Hash table size 553193, node heap has 1 buffer(s)

Hash table size 553193, node heap has 1 buffer(s)

115320.41 hash searches/s, 10292.51 non-hash searches/s

...

----------------------

BUFFER POOL AND MEMORY

----------------------

Total large memory allocated 2235564032

Dictionary memory allocated 3227698

Internal hash tables (constant factor + variable factor)

    Adaptive hash index 78904768        (35404352 + 43500416)

    Page hash           277384 (buffer pool 0 only)

    Dictionary cache    12078786 (8851088 + 3227698)

    File system         1091824 (812272 + 279552)

    Lock system         5322504 (5313416 + 9088)

    Recovery system     0 (0 + 0)

Buffer pool size   131056

Buffer pool size, bytes 2147221504

Free buffers       8303

Database pages     120100

Old database pages 44172

Modified db pages  108784

Pending reads      0

Pending writes: LRU 2, flush list 342, single page 0

Pages made young 533709, not young 181962

3823.06 youngs/s, 1706.01 non-youngs/s

Pages read 4104, created 236572, written 441223

38.09 reads/s, 339.46 creates/s, 1805.87 writes/s

Buffer pool hit rate 1000 / 1000, young-making rate 12 / 1000 not 5 / 1000

Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s

LRU len: 120100, unzip_LRU len: 0

I/O sum[754560]:cur[8096], unzip sum[0]:cur[0]

…

Another thing to add, you can also use Performance Schema and sys schema for monitoring memory consumption and utilization by your MySQL server. By default, most instrumentations are disabled by default so there are manual things to do to use this. 

Check for Swappiness 

Either way, it's probable that MySQL is swapping out its memory to disk. This is oftentimes a very common situation especially when MySQL server and the underlying hardware is not set optimally in parallel to the expected requirements. There are certain cases that the demand of traffic has not been anticipated, memory could grow increasingly especially if bad queries are run causing to consume or utilize a lot of memory space causing degrading performance as data are picked on disk instead of on the buffer. To check for swappiness, just run freemem command or vmstat just like below,

[root@node1 ~]# free -m

              total        used free      shared buff/cache available

Mem:           3790 2754         121 202 915         584

Swap:          1535 39        1496

[root@node1 ~]# vmstat 5 5

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----

 r  b swpd   free buff  cache si so    bi bo in cs us sy id wa st

 2  0 40232 124100      0 937072 2 3 194  1029 477 313 7 2 91 1  0

 0  0 40232 123912      0 937228 0 0   0 49 1247 704 13 3 84  0 0

 1  0 40232 124184      0 937212 0 0   0 35 751 478 6 1 93  0 0

 0  0 40232 123688      0 937228 0 0   0 15 736 487 5 1 94  0 0

 0  0 40232 123912      0 937220 0 0   3 74 1065 729 8 2 89  0 0

You may also check using procfs and gather information such as going to /proc/vmstat or /proc/meminfo.

Using Perf, gdb, and Valgrind with Massif

Using tools like perf, gdb, and valgrind helps you dig into a more advanced method of determining MySQL memory utilization. There are times that an interesting outcome becomes a mystery of solving memory consumption that leads to your bewilderment in MySQL. This turns in the need to have more skepticism and using these tools helps you investigate how MySQL is using handling memory from allocating it to utilizing it for processing transactions or processes. This is useful for example if you are observing MySQL is behaving abnormally that might cause bad configuration or could lead to a findings of memory leaks.

For example, using perf in MySQL reveals more information in a system level report:

[root@testnode5 ~]# perf report --input perf.data --stdio

# To display the perf.data header info, please use --header/--header-only options.

#

#

# Total Lost Samples: 0

#

# Samples: 54K of event 'cpu-clock'

# Event count (approx.): 13702000000

#

# Overhead  Command Shared Object        Symbol                                                                                                                                                                                             

# ........  ....... ...................  ...................................................................................................................................................................................................

#

    60.66%  mysqld [kernel.kallsyms]    [k] _raw_spin_unlock_irqrestore

     2.79%  mysqld   libc-2.17.so         [.] __memcpy_ssse3

     2.54%  mysqld   mysqld             [.] ha_key_cmp

     1.89%  mysqld   [vdso]             [.] __vdso_clock_gettime

     1.05%  mysqld   mysqld             [.] rec_get_offsets_func

     1.03%  mysqld   mysqld             [.] row_sel_field_store_in_mysql_format_func

     0.92%  mysqld   mysqld             [.] _mi_rec_pack

     0.91%  mysqld   [kernel.kallsyms]    [k] finish_task_switch

     0.90%  mysqld   mysqld             [.] row_search_mvcc

     0.86%  mysqld   mysqld             [.] decimal2bin

     0.83%  mysqld   mysqld             [.] _mi_rec_check

….

Since this can be a special topic to dig in, we suggest you look into these really good external blogs as your references, perf Basics for MySQL Profiling, Finding MySQL Scaling Problems Using perf, or learn how to debug using valgrind with massif.

Efficient Way To Check MySQL Memory Utilization

Using ClusterControl relieves any hassle routines like going over through your runbooks or even creating your own playbooks that would deliver reports for you. In ClusterControl, you have Dashboards (using SCUMM) where you can have a quick overview of your MySQL node(s). For example, viewing the MySQL General dashboard,

you can determine how the MySQL node performs,

You see that the images above reveal variables that impact MySQL memory utilization. You can check how the metrics for sort caches, temporary tables, threads connected, query cache, or storage engines innodb buffer pool or MyISAM's key buffer.

Using ClusterControl offers you a one-stop utility tool where you can also check queries running to determine those processes (queries) that can impact high memory utilization. See below for an example,

Viewing the status variables of MySQL is quiet easy,

You can even go to Performance -> Innodb Status as well to reveal the current InnoDB status of your database nodes. Also, in ClusterControl, an incident is detected, it will try to collect incident and shows history as a report that provides you InnoDB status as shown in our previous blog about MySQL Freeze Frame.

Summary

Troubleshooting and diagnosing your MySQL database when suspecting high memory utilization isn't that difficult as long as you know the procedures and tools to use. Using the right tool offers you more flexibility and faster productivity to deliver fixes or solutions with a chance of greater result.

by Paul Namuag at February 21, 2020 10:45 AM

February 20, 2020

Percona

A Hidden Gem in MySQL: MyRocks

using MyRocks in MySQL

using MyRocks in MySQLIn this blog post, we will share some experiences with the hidden gem in MySQL called MyRocks, a storage engine for MySQL’s famous pluggable storage engine system. MyRocks is based on RocksDB which is a fork of LevelDB. In short, it’s another key-value store based on LSM-tree, thus granting it some distinctive features compared with other MySQL engines. It was introduced in 2016 by Facebook and later included, respectively, in Percona Server for MySQL and MariaDB

Background and History

The original paper on LSM was published in 1996, and if you need a single takeaway, the following quote is the one: “The LSM-tree uses an algorithm that defers and batches index changes, cascading the changes from a memory-based component through one or more disk components in an efficient manner reminiscent of merge sort.”  At the time, disks were slow and IOPS expensive, and the idea was to minimize the write costs by essentially turning random write load into a sequential one. The technology is quite popular, being a foundation or inspiration in a multitude of databases and storage engines: HBase, LevelDB, RocksDB, Tarantool, WiredTiger, and more. Even in 2020, when storage is faster and cheaper, LSM-tree can still provide substantial benefits for some workloads.

The development of MyRocks was started around 2015 by Facebook. Yoshinori Matsunobu gave multiple presentations, detailing the reasoning behind using RocksDB inside MySQL. They were underutilizing the servers because they were constrained in disk space and MyRocks allowed for better space efficiency. This better space efficiency is inherent for LSM tree storage engines.

So far, MyRocks continues to be a somewhat niche solution, and, frankly, not a lot of people know about it and consider its use. Without further ado, let’s see how it works and why would you want to use it.

Working Dynamics of MyRocks

MyRocks in MySQL

MyRocks engine is based on LSM-tree structure, which we have mentioned above. That makes it a very different beast than InnoDB. So let’s take a high-level overview of MyRocks internals. First, how does row-based data fit into key-value store? You can think of a regular clustered index as a key-value structure on its own: there’s a key, which value is a whole row. Secondary indexes can have primary indexes’ key as value, and additionally a column data value.

Writes

All writes in MyRocks are done sequentially to a special structure called memtable, one of the few mutable structures in the engine. Since we need the writes to actually be durable, all writes are also written to WAL (a concept similar to InnoDB redo log), which is flushed to disk. Once the memtable becomes full, it’s copied in memory and made immutable. In the background, the immutable memtables will be flushed to disk in the form of sorted string tables (SSTs), forming the L0 of the multi-leveled compaction scheme. During this initial flush, changes in the memtable are deduplicated (a thousand updates for one key become a single update). Resulting SSTs are immutable, and, on L0, have overlapping data.

As more SSTs are created on L0, they will start to pour over to L1…L6. On each level after L0, data within SSTs is not overlapping, thus compaction can proceed in parallel. Compaction takes an SST from the higher level, and merges it with one (or more) SSTs on the lower level, deleting the originals and creating new ones on the lower level. Eventually, data reaches the lowest level. As you can see below, each level has more and more data, so most data is actually stored at the lower levels. The merge mentioned happens for Key Value pairs, and during the merge KV on the lower level will always be older than KV on the higher one, and thus can be discarded.

LSM Leveled Compaction

 

Having immutable SSTs allows them to be filled to 100% all the time, improving space utilization. In fact, that’s one of the selling points of MyRocks, as it allows for greater space efficiency. In addition to the inherent compactness of the SSTs, data there is also compressed, which further minimizes the footprint. An interesting feature here is that you can specify different compression algorithms for the bottommost (where, by nature, most of the data is) and other levels.

Another important component for the MyRocks engine is Column Family (CF). Each key-value pair (or, in familiar terms, each index) is associated with a CF. Quoting the Percona Server for MySQL docs: “Each column family has distinct attributes, such as block size, compression, sort order, and MemTable.” In addition to controlling physical storage characteristics, this provides atomicity for queries across different key spaces.

MyRocks in MySQL

Reads

So far we’ve only been talking about writing the data. Reading it is also quite different in MyRocks due to its structure. Since the data is leveled, to find a value for a key, you need to look at memtables, L0, L1 … L6. This is an LSM read penalty. However, you don’t always have to scan the whole dataset to find the row, and not all scans go to disk. The read path starts in memtables, which will by definition have the most recent data. Then the block cache will be used, which might contain the recently-accessed data.

Once in-memory options are exhausted, reads will spill to disk and start traversing SSTs on consecutive levels. L0 has to be scanned whole since data in SSTs overlaps, but only a subset of SSTs on other levels has to be scanned, as we know key ranges of data inside each SST. To further improve this scanning, bloom filters are utilized, which helps the scan operation answer a question: “is key present in given SST?” – but only if we are sure it’s not present. Thus, we can avoid reading some SSTs, whose key range covers the key we look for. Unfortunately, for now, there’s no BF-like technique for range scans, though prefix bloom filters might help.

Each time we find the data we’re looking for, we populate the block cache for future use. In addition to that, index and bloom filter data is also cached, thus speeding up the SST scans even if the data is not in block cache. Even with all of these improvements, you can see that in general, the reads are more involved than they are in regular b-tree storage engines. The negative effects, however, become less pronounced the more data there’s in the data set.

Tools and Utilities

Production readiness of a solution is defined not only by its own maturity but also by the ecosystem around it. Let’s review how MyRocks fits with existing tools and regular maintenance activities.

First and foremost, can we back it up online with minimal locking as we can innodb? The answer is yes (with some catches). Original Facebook’s MySQL 5.6 includes myrocks_hotbackup script, which enables hot backups of MyRocks, but no other engines. Starting with Percona XtraBackup version 8.0.6 and Mariabackup 10.2.16/10.3.8, we have the ability to use a single tool to back up heterogeneous clusters.

One of the significant MyRocks limitations is that it doesn’t support online DDL as InnoDB does. You can use solutions like pt-online-schema-change and gh-ost, which are preferred anyway when doing large table changes. For pt-osc, there are some details to note. Global transaction isolation should be set to Read Committed, or pt-osc will fail when a target table is already in RocksDB engine. It also needs binlog_format to be set to ROW. Both of these settings are usually advisable for MyRocks anyway, as it doesn’t support gap locking yet, and so its repeatable read implementation differs.

Because we’re limited to ROW-level replication, tools like pt-table-checksum and pt-table-sync will not work, so be careful with the data consistency.

Monitoring is another important consideration for production use. MyRocks is quite well-instrumented internally, providing more than a hundred metrics, extensive show engine output, and verbose logging. Here’s an overview of some of the available metrics: MyRocks Information Schema. With Percona Monitoring and Management, you get a dedicated dashboard for MyRocks, providing an overview of the internals of the engine.

Partitioning in MyRocks is supported and has an interesting feature where you can assign partitions to different column families: Column Families on Partitioned Tables.

Unfortunately, for now, encryption does not work with MyRocks, even though RocksDB supports pluggable encryption.

Load Test and Comparison Versus InnoDB

We have compiled a basic load test on MyRocks vs InnoDB with the following details. 

We downloaded Ontime Performance Data Reporting for the year 2019 and loaded it to both engines. The test consisted of loading to a single table data for one year worth of information (about 14million rows). Load scripts can be found at github repo.

AWS Instance : t2.large – 8Gb Ram – 16Gb SSD

Engine  Size Duration Rows Method
innodb + log_bin off 5.6Gb 9m56 14,009,743 Load Infile
innodb + log_bin on 5.6Gb ** 11m58 14,009,743 Load Infile
innodb compressed + log_bin on 2.6Gb ** 17m9 14,009,743 Load Infile
innodb compressed + log_bin off 2.6Gb 15m56 14,009,743 Load Infile
myrocks/lz4 + log_bin on 1.4G* 9m24 14,009,743 Load Infile
myrocks/lz4 + log_bin off 1.4G* 8m2 14,009,743 Load Infile

 

* MyRocks WAL files aren’t included (This is a configurable parameter) 

**InnoDB Redo logs aren’t included

Conclusion

As we’ve shown above, MyRocks can be a surprisingly versatile choice of the storage engine. While usually it’s sold on space efficiency and write load, benchmarks show that it’s quite good in TPC-C workload. So when would you use MyRocks?

 In the simplest terms:

  • You have extremely large data sets, much bigger than the memory available
  • The bulk of your load is write-only
  • You need to save on space

This best translates to servers with expensive storage (SSDs), and to the cloud, where these could be significant price points.

But real databases rarely consist of pure log data. We do selects, be it point lookups or range queries, we modify the data. As it happens, if you can sacrifice some database-side constraints, MyRocks can be surprisingly good as a general-purpose storage engine, more so the larger the data set you have. Give it a try, and let us know. 

Limitations to consider before moving forward:

  • Foreign Keys
  • Full-Text Keys
  • Spatial Keys
  • No Tablespaces (instead, Column Families)
  • No Online DDL (pt-osc and gh-ost help here)
  • Other limitations listed in the documentation
  • Not supported by Percona XtraDB Cluster/Galera
  • Only binary collations supported for indexes

Warnings:

It’s designed for small transactions, so configure for bulk operations. For loading data, use rocksdb_bulk_load=1, and for deleting large data sets use rocksdb-commit-in-the-middle.

Mixing different storage engines in one transaction will work, but be aware of the differences of how isolation levels work between InnoDB and RocksDB engines, and limitations like the lack of Savepoints. Another important thing to note when mixing storage engines is that they use different memory structures, so plan carefully.

Corrupted immutable files are not recoverable.

References 

MyRocks Deep Dive

Exposing MyRocks Internals Via System Variables: Part 1, Data Writing

Webinar: How to Rock with MyRocks

MyRocks Troubleshooting

MyRocks Introduction

Optimizer Statistics in MyRocks

MyRocks and InnoDB: a summary

RocksDB Is Eating the Database World

by Alkin Tezuysal at February 20, 2020 04:47 PM

SeveralNines

How to Protect your MySQL or MariaDB Database From SQL Injection: Part Two

In the first part of this blog we described how ProxySQL can be used to block incoming queries that were deemed dangerous. As you saw in that blog, achieving this is very easy. This is not a full solution, though. You may need to design an even more tightly secured setup - you may want to block all of the queries and then allow just some select ones to pass through. It is possible to use ProxySQL to accomplish that. Let’s take a look at how it can be done.

There are two ways to implement whitelist in ProxySQL. First, the historical one, would be to create a catch-all rule that will block all the queries. It should be the last query rule in the chain. An example below:

We are matching every string and generate an error message. This is the only rule existing at this time, it prevents any query from being executed.

mysql> USE sbtest;

Database changed

mysql> SELECT * FROM sbtest1 LIMIT 10;

ERROR 1148 (42000): This query is not on the whitelist, you have to create a query rule before you'll be able to execute it.

mysql> SHOW TABLES FROM sbtest;

ERROR 1148 (42000): This query is not on the whitelist, you have to create a query rule before you'll be able to execute it.

mysql> SELECT 1;

ERROR 1148 (42000): This query is not on the whitelist, you have to create a query rule before you'll be able to execute it.

As you can see, we can’t run any queries. In order for our application to work we would have to create query rules for all of the queries that we want to allow to execute. It can be done per query, based on the digest or pattern. You can also allow traffic based on the other factors: username, client host, schema. Let’s allow SELECTs to one of the tables:

Now we can execute queries on this table, but not on any other:

mysql> SELECT id, k FROM sbtest1 LIMIT 2;

+------+------+

| id   | k |

+------+------+

| 7615 | 1942 |

| 3355 | 2310 |

+------+------+

2 rows in set (0.01 sec)

mysql> SELECT id, k FROM sbtest2 LIMIT 2;

ERROR 1148 (42000): This query is not on the whitelist, you have to create a query rule before you'll be able to execute it.

The problem with this approach is that it is not efficiently handled in ProxySQL, therefore in ProxySQL 2.0.9 comes with new mechanism of firewalling which includes new algorithm, focused on this particular use case and as such more efficient. Let’s see how we can use it.

First, we have to install ProxySQL 2.0.9. You can download packages manually from https://github.com/sysown/proxysql/releases/tag/v2.0.9 or you can set up the ProxySQL repository.

 

Once this is done, we can start looking into it and try to configure it to use SQL firewall. 

The process itself is quite easy. First of all, you have to add a user to the mysql_firewall_whitelist_users table. It contains all the users for which firewall should be enabled.

mysql> INSERT INTO mysql_firewall_whitelist_users (username, client_address, mode, comment) VALUES ('sbtest', '', 'DETECTING', '');

Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL FIREWALL TO RUNTIME;

Query OK, 0 rows affected (0.00 sec)

In the query above we added ‘sbtest’ user to the list of users which should have firewall enabled. It is possible to tell that only connections from a given host are tested against the firewall rules. You can also have three modes: ‘OFF’, when firewall is not used, ‘DETECTING’, where incorrect queries are logged but not blocked and ‘PROTECTING’, where not allowed queries will not be executed.

Let’s enable our firewall:

mysql> SET mysql-firewall_whitelist_enabled=1;

Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL VARIABLES TO RUNTIME;

Query OK, 0 rows affected (0.00 sec)

ProxySQL firewall bases on the digest of the queries, it does not allow for regular expressions to be used. The best way to collect data about which queries should be allowed is to use stats.stats_mysql_query_digest table, where you can collect queries and their digests. On top of that, ProxySQL 2.0.9 comes with a new table: history_mysql_query_digest, which is an persistent extension to the previously mentioned in-memory table. You can configure ProxySQL to store data on disk from time to time:

mysql> SET admin-stats_mysql_query_digest_to_disk=30;

Query OK, 1 row affected (0.00 sec)

Every 30 seconds data about queries will be stored on disk. Let’s see how it goes. We’ll execute couple of queries and then check their digests:

mysql> SELECT schemaname, username, digest, digest_text FROM history_mysql_query_digest;

+------------+----------+--------------------+-----------------------------------+

| schemaname | username | digest             | digest_text |

+------------+----------+--------------------+-----------------------------------+

| sbtest     | sbtest | 0x76B6029DCBA02DCA | SELECT id, k FROM sbtest1 LIMIT ? |

| sbtest     | sbtest | 0x1C46AE529DD5A40E | SELECT ?                          |

| sbtest     | sbtest | 0xB9697893C9DF0E42 | SELECT id, k FROM sbtest2 LIMIT ? |

+------------+----------+--------------------+-----------------------------------+

3 rows in set (0.00 sec)

As we set the firewall to ‘DETECTING’ mode, we’ll also see entries in the log:

2020-02-14 09:52:12 Query_Processor.cpp:2071:process_mysql_query(): [WARNING] Firewall detected unknown query with digest 0xB9697893C9DF0E42 from user sbtest@10.0.0.140

2020-02-14 09:52:17 Query_Processor.cpp:2071:process_mysql_query(): [WARNING] Firewall detected unknown query with digest 0x76B6029DCBA02DCA from user sbtest@10.0.0.140

2020-02-14 09:52:20 Query_Processor.cpp:2071:process_mysql_query(): [WARNING] Firewall detected unknown query with digest 0x1C46AE529DD5A40E from user sbtest@10.0.0.140

Now, if we want to start blocking queries, we should update our user and set the mode to ‘PROTECTING’. This will block all the traffic so let’s start by whitelisting queries above. Then we’ll enable the ‘PROTECTING’ mode:

mysql> INSERT INTO mysql_firewall_whitelist_rules (active, username, client_address, schemaname, digest, comment) VALUES (1, 'sbtest', '', 'sbtest', '0x76B6029DCBA02DCA', ''), (1, 'sbtest', '', 'sbtest', '0xB9697893C9DF0E42', ''), (1, 'sbtest', '', 'sbtest', '0x1C46AE529DD5A40E', '');

Query OK, 3 rows affected (0.00 sec)

mysql> UPDATE mysql_firewall_whitelist_users SET mode='PROTECTING' WHERE username='sbtest' AND client_address='';

Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL FIREWALL TO RUNTIME;

Query OK, 0 rows affected (0.00 sec)

mysql> SAVE MYSQL FIREWALL TO DISK;

Query OK, 0 rows affected (0.08 sec)

That’s it. Now we can execute whitelisted queries:

mysql> SELECT id, k FROM sbtest1 LIMIT 2;

+------+------+

| id   | k |

+------+------+

| 7615 | 1942 |

| 3355 | 2310 |

+------+------+

2 rows in set (0.00 sec)

But we cannot execute non-whitelisted ones:

mysql> SELECT id, k FROM sbtest3 LIMIT 2;

ERROR 1148 (42000): Firewall blocked this query

ProxySQL 2.0.9 comes with yet another interesting security feature. It has embedded libsqlinjection and you can enable the detection of possible SQL injections. Detection is based on the algorithms from the libsqlinjection. This feature can be enabled by running:

mysql> SET mysql-automatic_detect_sqli=1;

Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL VARIABLES TO RUNTIME;

Query OK, 0 rows affected (0.00 sec)

It works with the firewall in a following way:

  • If the firewall is enabled and the user is in PROTECTING mode, SQL injection detection is not used as only explicitly whitelisted queries can pass through.
  • If the firewall is enabled and the user is in DETECTING mode, whitelisted queries are not tested for SQL injection, all others will be tested.
  • If the firewall is enabled and the user is in ‘OFF’ mode, all queries are assumed to be whitelisted and none will be tested for SQL injection.
  • If the firewall is disabled, all queries will be tested for SQL intection.

Basically, it is used only if the firewall is disabled or for users in ‘DETECTING’ mode. SQL injection detection, unfortunately, comes with quite a lot of false positives. You can use table mysql_firewall_whitelist_sqli_fingerprints to whitelist fingerprints for queries which were detected incorrectly. Let’s see how it works. First, let’s disable firewall:

mysql> set mysql-firewall_whitelist_enabled=0;

Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL VARIABLES TO RUNTIME;

Query OK, 0 rows affected (0.00 sec)

Then, let’s run some queries.

mysql> SELECT id, k FROM sbtest2 LIMIT 2;

ERROR 2013 (HY000): Lost connection to MySQL server during query

Indeed, there are false positives. In the log we could find:

2020-02-14 10:11:19 MySQL_Session.cpp:3393:handler(): [ERROR] SQLinjection detected with fingerprint of 'EnknB' from client sbtest@10.0.0.140 . Query listed below:

SELECT id, k FROM sbtest2 LIMIT 2

Ok, let’s add this fingerprint to the whitelist table:

mysql> INSERT INTO mysql_firewall_whitelist_sqli_fingerprints VALUES (1, 'EnknB');

Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL FIREWALL TO RUNTIME;

Query OK, 0 rows affected (0.00 sec)

Now we can finally execute this query:

mysql> SELECT id, k FROM sbtest2 LIMIT 2;

+------+------+

| id   | k |

+------+------+

|   84 | 2456 |

| 6006 | 2588 |

+------+------+

2 rows in set (0.01 sec)

We tried to run sysbench workload, this resulted in two more fingerprints added to the whitelist table:

2020-02-14 10:15:55 MySQL_Session.cpp:3393:handler(): [ERROR] SQLinjection detected with fingerprint of 'Enknk' from client sbtest@10.0.0.140 . Query listed below:

SELECT c FROM sbtest21 WHERE id=49474

2020-02-14 10:16:02 MySQL_Session.cpp:3393:handler(): [ERROR] SQLinjection detected with fingerprint of 'Ef(n)' from client sbtest@10.0.0.140 . Query listed below:

SELECT SUM(k) FROM sbtest32 WHERE id BETWEEN 50053 AND 50152

We wanted to see if this automated SQL injection can protect us against our good friend, Booby Tables.

mysql> CREATE TABLE school.students (id INT, name VARCHAR(40));

Query OK, 0 rows affected (0.07 sec)

mysql> INSERT INTO school.students VALUES (1, 'Robert');DROP TABLE students;--

Query OK, 1 row affected (0.01 sec)

Query OK, 0 rows affected (0.04 sec)

mysql> SHOW TABLES FROM school;

Empty set (0.01 sec)

Unfortunately, not really. Please keep in mind this feature is based on automated forensic algorithms, it is far from perfect. It may come as an additional layer of defence but it will never be able to replace properly maintained firewall created by someone who knows the application and its queries.

We hope that after reading this short, two-part series you have a better understanding of how you can protect your database against SQL injection and malicious attempts (or just plainly user errors) using ProxySQL. If you have more ideas, we’d love to hear from you in the comments.

by krzysztof at February 20, 2020 10:45 AM

February 19, 2020

SeveralNines

When Should I Add an Extra Database Node?

The fact that people are not easily convinced to have an additional database node in production due to cost is somewhat absurd and is an idea that should be put aside. While adding a new node would bring more complexity to the current database infrastructure, there is a plethora of automation and helper tools in the market that can help you manage the scalability and continuity of the database layer. 

There are diverse reasons that may influence this somewhat costly decision, and you will probably realize it only when something is going south or starting to fall apart. This blog post provides common reasons when you should add an extra database node into your existing database infrastructure, whether you are running on a standalone or a clustered setup.

Faster Recovery Time

The ultimate reason for having an extra database node for redundancy is to achieve better availability and faster recovery time when something goes wrong. It's a protection against malfunctions that could occur on the primary database node and you would have a standby node which is ready to take over the primary role from the problematic node at any given time. 

A standby node replicating to a primary node is probably the most cost-effective solution that you can have to improve the recovery time. When the primary database node is down, promote the standby node as the new master and change the database connection string in the applications to connect to the new master and you are pretty much back in business. The failover process can then be automated and fine tuned over time, or you could introduce a reverse proxy tier which acts as the gateway on top of the database tier.

Improved Performance

Application grows to be more demanding over time. The magnitude of growth could be exponential depending on the success of your business. Scaling out your database tier to cater for bigger workloads is commonly necessary to improve the performance and responsiveness of your applications. 

Database workloads can be categorized into two - reads or writes. For read-intensive workload, adding more database replicas will help to spread out the load to multiple database servers. For write-intensive workload, adding more database masters will likely reduce the contention that commonly happens in a single node and improve parallelism processing. Just make sure that the multi-master clustering technology that you use supports conflict detection and resolution, otherwise the application has to handle this part separately.

Approaching the Thresholds

As your database usage grows, there will be a point of time where the database node is fast approaching the defined threshold for the server and database resources. Resources like CPU clock, RAM, disk I/O and disk space are frequently becoming the limiting factors for your database to keep up with the demand.

For example, one would probably hit the limit of storage allocated for the database and also approaching the maximum connections allowed to the database. In this case, partitioning your data into multiple nodes would make more sense because you would get more storage space and I/O operations with the ability to process bigger write workloads for the database, just like killing two birds with one stone.

Upgrade Testing

Before upgrading to another major version, it's recommended to test out your current dataset on the new version just to make sure you can operate smoothly and eliminate the element of surprise later on. It's pretty common for the new major version to deprecate some legacy options or parameters that we have been using in the current version and some incompatibilities where application programming changes might be required. Also, you can measure the performance improvement (or regression) that you will get after upgrading, which could justify the reason for this exercise.

Major version upgrade commonly requires extra attention to the upgrade step, if compared to minor version patching which usually can be performed with few steps. Minor releases never change the internal storage format and are always compatible with earlier and later minor releases of the same major version number.

Generally, there are 3 ways to perform database major version upgrade:

  • In-place
  • Logical upgrade
  • Replication

In-place, where you use existing data directory against the new database major version, with just running upgrade script after binaries are upgraded. For logical upgrade, use the logical backup on an old version and then restore it on a new version. This usually requires an additional database node, unless you would like to restore the logical backup on the new version installed in the same server as the old one.

For replication, create a standby server with the updated database version and replicate from the old major version. Once everything is synced up, connect your application to the to the standby (or slave) server and verify if necessary adjustments are required. Then, you can promote the standby server as the new master and your database server is officially upgraded, with a very minimal downtime.

Backup Verification

We have stressed out this a couple of times in the older blog posts - backup is not a backup if it is not restorable. Backup verification is an important process to ensure you meet your RTO, which basically represents how long it takes to restore from the incident until normal operations are available to the mass users.

You can measure the amount of time it takes to recover by observing the backup verification process, which is the best to be performed on a separate node, as you don't want to increase the burden or put the production database servers under risks.

The ClusterControl backup verification feature allows you to estimate your total mean recovery time, with the extra database node used for verification process can be configured to shut down automatically right after verification process completes. Check out this blog post if you want to learn more about how ClusterControl performs this job.

Conclusion

As your database grows, scaling out your database nodes is going to be necessary and must be well-thought since the beginning. The actual cost of having more database nodes for your environment sometimes justify your requirements and could be more than worth it to keep up with the growth of your business.

 

by ashraf at February 19, 2020 10:45 AM

February 18, 2020

SeveralNines

What to Check if PostgreSQL Memory Utilization is High

Reading from memory will always be more performant than going to disk, so for all database technologies you would want to use as much memory as possible. If you are not sure about the configuration, or you have an error, this could generate high memory utilization or even an out-of-memory issue.

In this blog, we’ll look at how to check your PostgreSQL memory utilization and which parameter you should take into account to tune it. For this, let’s start by seeing an overview of PostgreSQL's architecture.

PostgreSQL Architecture

PostgreSQL's architecture is based on three fundamental parts: Processes, Memory, and Disk.

The memory can be classified into two categories:

  • Local Memory: It is loaded by each backend process for its own use for queries processing. It is divided into sub-areas:
    • Work mem: The work mem is used for sorting tuples by ORDER BY and DISTINCT operations, and for joining tables.
    • Maintenance work mem: Some kinds of maintenance operations use this area. For example, VACUUM, if you’re not specifying autovacuum_work_mem.
    • Temp buffers: It is used for store temporary tables.
  • Shared Memory: It is allocated by the PostgreSQL server when it is started, and it is used by all the processes. It is divided into sub-areas:
    • Shared buffer pool: Where PostgreSQL loads pages with tables and indexes from disk, to work directly from memory, reducing the disk access.
    • WAL buffer: The WAL data is the transaction log in PostgreSQL and contains the changes in the database. WAL buffer is the area where the WAL data is stored temporarily before writing it to disk into the WAL files. This is done every some predefined time called checkpoint. This is very important to avoid the loss of information in the event of a server failure.
    • Commit log: It saves the status of all transactions for concurrency control.

How to Know What is Happening

If you are having high memory utilization, first, you should confirm which process is generating the consumption.

Using the “Top” Linux Command

The top linux command is probably the best option here (or even a similar one like htop). With this command, you can see the process/processes that are consuming too much memory. 

When you confirm that PostgreSQL is responsible for this issue, the next step is to check why.

Using the PostgreSQL Log

Checking both the PostgreSQL and systems logs is definitely a good way to have more information about what is happening in your database/system. You could see messages like:

Resource temporarily unavailable

Out of memory: Kill process 1161 (postgres) score 366 or sacrifice child

If you don’t have enough free memory.

Or even multiple database message errors like:

FATAL:  password authentication failed for user "username"

ERROR:  duplicate key value violates unique constraint "sbtest21_pkey"

ERROR:  deadlock detected

When you are having some unexpected behavior on the database side. So, the logs are useful to detect these kinds of issues and even more. You can automate this monitoring by parsing the log files looking for works like “FATAL”, “ERROR” or “Kill”, so you will receive an alert when it happens.

Using Pg_top

If you know that the PostgreSQL process is having a high memory utilization, but the logs didn’t help, you have another tool that can be useful here, pg_top.

This tool is similar to the top linux tool, but it’s specifically for PostgreSQL. So, using it, you will have more detailed information about what is running your database, and you can even kill queries, or run an explain job if you detect something wrong. You can find more information about this tool here.

But what happens if you can’t detect any error, and the database is still using a lot of RAM. So, you will probably need to check the database configuration.

Which Configuration Parameters to Take into Account

If everything looks fine but you still have the high utilization problem, you should check the configuration to confirm if it is correct. So, the following are parameters that you should take into account in this case.

shared_buffers

This is the amount of memory that the database server uses for shared memory buffers. If this value is too low, the database would use more disk, which would cause more slowness, but if it is too high, could generate high memory utilization. According to the documentation, if you have a dedicated database server with 1GB or more of RAM, a reasonable starting value for shared_buffers is 25% of the memory in your system.

work_mem

It specifies the amount of memory that will be used by the ORDER BY, DISTINCT and JOIN before writing to the temporary files on disk. As with the shared_buffers, if we configure this parameter too low, we can have more operations going into disk, but too high is dangerous for the memory usage. The default value is 4 MB.

max_connections

Work_mem also goes hand to hand with the max_connections value, as each connection will be executing these operations at the same time, and each operation will be allowed to use as much memory as specified by this value before it starts to write data in temporary files. This parameter determines the maximum number of simultaneous connections to our database, if we configure a high number of connections, and don’t take this into account, you can start having resource issues. The default value is 100.

temp_buffers

The temporary buffers are used to store the temporary tables used in each session. This parameter sets the maximum amount of memory for this task. The default value is 8 MB.

maintenance_work_mem

This is the max memory that an operation like Vacuuming, adding indexes or foreign keys can consume. The good thing is that only one operation of this type can be run in a session, and is not the most common thing to be running several of these at the same time in the system. The default value is 64 MB.

autovacuum_work_mem

The vacuum uses the maintenance_work_mem by default, but we can separate it using this parameter. We can specify the maximum amount of memory to be used by each autovacuum worker here.

wal_buffers

The amount of shared memory used for WAL data that has not yet been written to disk. The default setting is 3% of shared_buffers, but not less than 64kB nor more than the size of one WAL segment, typically 16MB. 

Conclusion

There are different reasons to have a high memory utilization, and detecting the root issue could be a time-consuming task. In this blog, we mentioned different ways to check your PostgreSQL memory utilization and which parameter should you take into account to tune it, to avoid excessive memory usage.

by Sebastian Insausti at February 18, 2020 07:20 PM

Shlomi Noach

The state of Orchestrator, 2020 (spoiler: healthy)

This post serves as a pointer to my previous announcement about The state of Orchestrator, 2020.

Thank you to Tom Krouper who applied his operational engineer expertise to content publishing problems.

by shlomi at February 18, 2020 07:14 PM

Oli Sennhauser

InnoDB Page Cleaner intended loop takes too long

Recently we migrated a database system from MySQL 5.7 to MariaDB 10.3. Everything went fine so far just the following message started to pop-up in the MariaDB Error Log File with the severity Note:

InnoDB: page_cleaner: 1000ms intended loop took 4674ms. The settings might not be optimal. (flushed=102 and evicted=0, during the time.)

I remember that this message also appeared in earlier MySQL 5.7 releases but somehow disappeared in later releases. I assume MySQL has just disabled the Note?

You can find various advices in the Internet on to get rid of this Note:

innodb_lru_scan_depth        = 1024, 256
innodb_buffer_pool_instances = 1, 8
innodb_io_capcity            = 100, 200 or 1000
innodb_page_cleaners         = 1, 4 or 8

But non of these changes made the Note go away in our case. I only found one voice claiming it could be an external reason which makes this message appear. Because we are actually running on a Cloud-Machine the appearance of this message could really be an effect of the Cloud and not caused by the Database or the Application.

We further know that our MariaDB Database has a more or less uniform workload over the day. Further it is a Master/Master (active/passive) set-up. So both nodes should see more or less the same write load at the same time.

But as our investigation clearly shows that the Note does not appear at the same time on both nodes. So I strongly assume it is a noisy-neighbour problem.

First we tried to find any trend or correlation between these 2 Master/Master Databases maas1 and maas2:

What we can see here is, that the message appeared on different days on maas1 and maas2. The database maas1 had a problem in the beginning of December and end of January. Database maas1 had much less problems in general but end of December there was a problem.

During night both instances seem to have less problems than during the day. And maas2 has more problems in the afternoon and evening.

If we look at the distribution per minute we can see that maas2 has some problems around xx:45 to xx:50 and maas1 more at xx:15.

Then we had a closer look at 28 January at about 12:00 to 15:00 on maas2:

We cannot see any anomalies which would explain a huge increase of dirty pages and and a page_cleaner stuck.

The only thing we could see at the specified time is that I/O latency significantly increased on server side. Because we did not cause more load and over-saturated the system it must be triggered externally:

This correlates quite well to the Notes we see in the MariaDB Error Log on maas2:

2020-01-28 12:45:27 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5760ms. The settings might not be optimal. (flushed=101 and evicted=0, during the time.)
2020-01-28 12:46:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 6908ms. The settings might not be optimal. (flushed=4 and evicted=0, during the time.)
2020-01-28 12:46:32 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5339ms. The settings might not be optimal. (flushed=17 and evicted=0, during the time.)
2020-01-28 12:47:36 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4379ms. The settings might not be optimal. (flushed=101 and evicted=0, during the time.)
2020-01-28 12:48:08 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5053ms. The settings might not be optimal. (flushed=7 and evicted=0, during the time.)
2020-01-28 12:48:42 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5760ms. The settings might not be optimal. (flushed=102 and evicted=0, during the time.)
2020-01-28 12:49:38 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4202ms. The settings might not be optimal. (flushed=100 and evicted=0, during the time.)
2020-01-28 12:57:28 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4615ms. The settings might not be optimal. (flushed=18 and evicted=0, during the time.)
2020-01-28 12:58:01 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5593ms. The settings might not be optimal. (flushed=4 and evicted=0, during the time.)
2020-01-28 12:58:34 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5442ms. The settings might not be optimal. (flushed=101 and evicted=0, during the time.)
2020-01-28 12:59:31 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4327ms. The settings might not be optimal. (flushed=101 and evicted=0, during the time.)
2020-01-28 13:00:05 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5154ms. The settings might not be optimal. (flushed=82 and evicted=0, during the time.)
2020-01-28 13:08:01 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4321ms. The settings might not be optimal. (flushed=4 and evicted=0, during the time.)
2020-01-28 13:10:46 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 21384ms. The settings might not be optimal. (flushed=100 and evicted=0, during the time.)
2020-01-28 13:14:16 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4180ms. The settings might not be optimal. (flushed=20 and evicted=0, during the time.)
2020-01-28 13:14:49 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4935ms. The settings might not be optimal. (flushed=4 and evicted=0, during the time.)
2020-01-28 13:15:20 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4472ms. The settings might not be optimal. (flushed=25 and evicted=0, during the time.)
2020-01-28 13:15:47 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4358ms. The settings might not be optimal. (flushed=9 and evicted=0, during the time.)
2020-01-28 13:48:31 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 6212ms. The settings might not be optimal. (flushed=9 and evicted=0, during the time.)
2020-01-28 13:55:44 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4280ms. The settings might not be optimal. (flushed=101 and evicted=0, during the time.)
2020-01-28 13:59:43 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5817ms. The settings might not be optimal. (flushed=4 and evicted=0, during the time.)
2020-01-28 14:00:16 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5384ms. The settings might not be optimal. (flushed=100 and evicted=0, during the time.)
2020-01-28 14:00:52 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 9460ms. The settings might not be optimal. (flushed=5 and evicted=0, during the time.)
2020-01-28 14:01:25 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 7727ms. The settings might not be optimal. (flushed=103 and evicted=0, during the time.)
2020-01-28 14:01:57 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 7154ms. The settings might not be optimal. (flushed=5 and evicted=0, during the time.)
2020-01-28 14:02:29 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 7501ms. The settings might not be optimal. (flushed=5 and evicted=0, during the time.)
2020-01-28 14:03:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4322ms. The settings might not be optimal. (flushed=78 and evicted=0, during the time.)
2020-01-28 14:32:02 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4927ms. The settings might not be optimal. (flushed=4 and evicted=0, during the time.)
2020-01-28 14:32:34 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4506ms. The settings might not be optimal. (flushed=101 and evicted=0, during the time.)

by Shinguz at February 18, 2020 04:50 PM

Shlomi Noach

The state of Orchestrator, 2020 (spoiler: healthy)

Yesterday was my last day at GitHub, and this post explains what this means for orchestrator. First, a quick historical review:

  • 2014: I began work on orchestrator at Outbrain, as https://github.com/outbrain/orchestrator. I authored several open source projects while working for Outbrain, and created orchestrator to solve discovery, visualization and simple refactoring needs. Outbrain was happy to have the project developed as a public, open source repo from day 1, and it was released under the Apache 2 license. Interestingly, the idea to develop orchestrator came after I attended Percona Live Santa Clara 2014 and watched "ChatOps: How GitHub Manages MySQL" by one Sam Lambert.
  • 2015: Joined Booking.com where my main focus was to redesign and solve issues with the existing high availability setup. With Booking.com's support, I continued work on orchestrator, pursuing better failure detection and recovery processes. Booking.com was an incredible playground and testbed for orchestrator, a massive deployment of multiple MySQL/MariaDB flavors and configuration.
  • 2016 - 2020: Joined GitHub. GitHub adopted orchestrator and I developed it under GitHub's own org, at https://github.com/github/orchestrator. It became a core component in github.com's high availability design, running failure detection and recoveries across sites and geographical regions, with more to come. These 4+ years have been critical to orchestrator's development and saw its widespread use. At this time I'm aware of multiple large-scale organizations using orchestrator for high availability and failovers. Some of these are GitHub, Booking.com, Shopify, Slack, Wix, Outbrain, and more. orchestrator is the underlying failover mechanism for vitess, and is also included in Percona's PMM. These years saw a significant increase in community adoption and contributions, in published content, such as Pythian and Percona technical blog posts, and, not surprisingly, increase in issues and feature requests.


2020

GitHub was very kind to support moving the orchestrator repo under my own https://github.com/openark org. This means all issues, pull requests, releases, forks, stars and watchers have automatically transferred to the new location: https://github.com/openark/orchestrator. The old links do a "follow me" and implicitly direct to the new location. All external links to code and docs still work. I'm grateful to GitHub for supporting this transfer.

I'd like to thank all the above companies for their support of orchestrator and of open source in general. Being able to work on the same product throughout three different companies is mind blowing and an incredible opportunity. orchestrator of course remains open source and licensed with Apache 2. Existing Copyrights are unchanged.

As for what's next: some personal time off, please understand if there's delays to reviews/answers. My intention is to continue developing orchestrator. Naturally, the shape of future development depends on how orchestrator meets my future work. Nothing changes in that respect: my focus on orchestrator has always been first and foremost the pressing business needs, and then community support as possible. There are some interesting ideas by prominent orchestrator users and adopters and I'll share more thoughts in due time.

 

by shlomi at February 18, 2020 08:09 AM

February 17, 2020

Oli Sennhauser

FromDual Ops Center for MariaDB and MySQL 0.9.3 has been released

FromDual has the pleasure to announce the release of the new version 0.9.3 of its popular FromDual Ops Center focmm, a Graphical User Interface (GUI) for MariaDB and MySQL.

The FromDual Ops Center for MariaDB and MySQL (focmm) helps DBA's and System Administrators to better manage their MariaDB and MySQL database farms. Ops Center makes DBA and Admins life easier!

The main task of Ops Center is to support you in your daily MySQL and MariaDB operation tasks. More information about FromDual Ops Center you can find here.

Download

The new FromDual Ops Center for MariaDB and MySQL (focmm) can be downloaded from here. How to install and use focmm is documented in the Ops Center User Guide.

In the inconceivable case that you find a bug in the FromDual Ops Center for MariaDB and MySQL please report it to the FromDual bug tracker or just send us an email.

Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.

Installation of Ops Center 0.9.3

A complete guide on how to install FromDual Ops Center you can find in the Ops Center User Guide.

Upgrade from 0.9.x to 0.9.3

Upgrade from 0.9.x to 0.9.3 should happen automatically. Please do a backup of your Ops Center Instance before you upgrade! Please also check Upgrading.

Changes in Ops Center 0.9.3

Machine

  • Machine without a usergroup is not allowed, fixed.
  • Machine name number is incremented by one if server already exists.
  • Delete machine fixed by setting server_id on instance to null.
  • Refresh repo automatization added.
  • Monitoring link added to menu.
  • Machine performance graphs added to menu.

Instance

  • Node renamed to Instance.
  • Create instance for CentOS and Ubuntu integrated.
  • Instance version comment was too long for PXC. Fixed.
  • Data Dictionary version added.
  • Special case for error log to syslog added (MariaDB packages on CentOS).
  • performance_schema last seen queries added.
  • Instance show overview made nicer.
  • Bug in instance check fixed.
  • Generate password improved thus bad special characters are not suggested any more.
  • Instance edit, eye added and field length shortened.
  • Instance name checking improved for creating and add.
  • Various minor bugs fixed.
  • Monitoring link added to menu.

Cluster

  • Cluster is clickable now in instance overview.
  • Minor Cluster bugs fixed.
  • Galera cluster added.
  • Master/Slave replication made smoother.

Load-Balancer

  • HAproxy and glb load-balancer added.

Tools

  • Jobs: Error logging improved to get more info about aborted jobs.
  • Crontab: Run job now icon added.
  • Schema compare: Schema drop-down sorted ascending and related code cleaned and refactored.

Configuration

  • Crontab: start_jobs.php was removed from crontab and is now started by run_crontab.php.

Database-as-a-Service (DBaaS)

  • Pricing plan added.
  • Database pricing added.
  • Machine cost added.
  • Resource cost included into job structure.

Building and Packaging

  • Installer: Repository installation hint added to installer.
  • Upgrade: Table fixed for MySQL 5.7.
  • Packaging: Session folder included into packaging.
  • Packaging: DEB and RPM improved for Upgrade.

Themes / UI

  • Default theme made a bit nicer.
  • Link more visible in default theme.

General

  • Disable license warning.
  • Link to fromdual license information added.
  • Jquery upgraded from 1.12.0 to 1.12.1.
  • http authentication brought to Apache version 2.4.
  • Session store changed because we loose very often ours sessions.
  • Session path also for all frag adapted.
  • Function implode syntax made compatible with newer PHP versions.
  • Minor typos fixed.
  • Minor errors fixed.

by Shinguz at February 17, 2020 03:37 PM

SeveralNines

Migrating PostgreSQL to the Cloud - Comparing Solutions from Amazon, Google & Microsoft

From a bird’s eye view, it would appear that when it comes to migrating the PostgreSQL workloads into the cloud, the choice of cloud provider should make no difference. Out of the box, PostgreSQL makes it easy to replicate data, with no downtime, via Logical Replication, although with some restrictions. In order to make their service offering more attractive cloud providers may work out some of those restrictions. As we start thinking about differences in the available PostgreSQL versions, compatibility, limits, limitations, and performance, it becomes clear that the migration services are key factors in the overall service offering. It isn’t any longer a case of “we offer it, we migrate it”. It’s become more like “we offer it, we migrate it, with the least limitations”.

Migration is important to small and large organizations alike. It is not as much about the size of the PostgreSQL cluster, as it is about the acceptable downtime and post-migration effort.

Selecting a Strategy

The migration strategy should take into consideration the size of the database, the network link between the source and the target, as well as the migration tools offered by the cloud provider.

Hardware or Software?

Just as mailing USB keys, and DVDs back in the early days of Internet, in cases where the network bandwidth isn’t enough for transferring data at the desired speed, cloud providers are offering hardware solutions, able to carry up to hundreds of petabytes of data. Below are the current solutions from each of the big three:

A handy table provided by Google showing the available options:

GCP migration options

GCP appliance is Transfer Appliance

A similar recommendation from Azure based on the data size vs network bandwidth:

Azure migration options

Azure appliance is Data box

Towards the end of its data migrations page, AWS provides a glimpse of what we can expect, along with their recommendation of the solution:

AWS migration choices: managed or unmanaged.

In cases where the database sizes exceed 100GB and limited network bandwidth AWS suggest a hardware solution.

AWS appliance is Snowball Edge

Data Export/Import

Organizations that tolerate downtime, can benefit from the simplicity of common tools provided by PostgreSQL out of the box. However, when migrating data from one cloud (or hosting) provider to another cloud provider, beware of the egress cost.

AWS

For testing the migrations I used a local installation of my Nextcloud database running on one of my home network servers:

postgres=# select pg_size_pretty(pg_database_size('nextcloud_prod'));

pg_size_pretty

----------------

58 MB

(1 row)



nextcloud_prod=# \dt

                     List of relations

Schema |             Name | Type  | Owner

--------+-------------------------------+-------+-----------

public | awsdms_ddl_audit              | table | s9sdemo

public | oc_accounts                   | table | nextcloud

public | oc_activity                   | table | nextcloud

public | oc_activity_mq                | table | nextcloud

public | oc_addressbookchanges         | table | nextcloud

public | oc_addressbooks               | table | nextcloud

public | oc_appconfig                  | table | nextcloud

public | oc_authtoken                  | table | nextcloud

public | oc_bruteforce_attempts        | table | nextcloud

public | oc_calendar_invitations       | table | nextcloud

public | oc_calendar_reminders         | table | nextcloud

public | oc_calendar_resources         | table | nextcloud

public | oc_calendar_resources_md      | table | nextcloud

public | oc_calendar_rooms             | table | nextcloud

public | oc_calendar_rooms_md          | table | nextcloud

...

public | oc_termsofservice_terms       | table | nextcloud

public | oc_text_documents             | table | nextcloud

public | oc_text_sessions              | table | nextcloud

public | oc_text_steps                 | table | nextcloud

public | oc_trusted_servers            | table | nextcloud

public | oc_twofactor_backupcodes      | table | nextcloud

public | oc_twofactor_providers        | table | nextcloud

public | oc_users                      | table | nextcloud

public | oc_vcategory                  | table | nextcloud

public | oc_vcategory_to_object        | table | nextcloud

public | oc_whats_new                  | table | nextcloud

(84 rows)

The database is running PostgreSQL version 11.5:

postgres=# select version();

                                                version

------------------------------------------------------------------------------------------------------------

PostgreSQL 11.5 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), 64-bit

(1 row)

I have also created a PostgreSQL user to be used by AWS DMS which is Amazon’s service for importing PostgreSQL into Amazon RDS:

postgres=# \du s9sdemo

            List of roles

Role name | Attributes |  Member of

-----------+------------+-------------

s9sdemo   |   | {nextcloud}

AWS DMS provides many advantages, just as we’d expect from a managed solution in the cloud: 

  • auto-scaling (storage only, as compute instance must be right sized)
  •  automatic provisioning
  •  pay-as-you-go model
  •  automatic failover

However, maintaining data consistency for a live database is a best effort. A 100% consistency is achieved only when the database is in read-only mode — that is a consequence of how table changes are captured.

In other words, tables have a different point-in-time cutover:

AWS DMS: tables have different point in time cutover.

Just as with everything in the cloud, there is a cost associated with the migration service.

In order to create the migration environment, follow the Getting Started guide to setup a replication instance, a source, a target endpoint, and one or more tasks.

Replication Instance

Creating the replication instance is straightforward to anyone familiar with EC2 instances on AWS:

The only change from the defaults was in selecting AWS DMS 3.3.0 or later due to my local PostgreSQL engine being 11.5:

AWS DMS: Supported PostgreSQL versions.

And here’s the list of currently available AWS DMS versions:

Current AWS DMS versions.

Large installations should also take note of the AWS DMS Limits:

AWS DMS limits.

There is also a set of limitations that are a consequence of PostgreSQL logical replication restrictions. For example, AWS DMS will not migrate secondary objects:

AWS DMS: secondary objects are not migrated.

It is worth mentioning that in PostgreSQL all indexes are secondary indexes, and that is not a bad thing, as noted in this more detailed discussion.

Source Endpoint

Follow the wizard to create the Source Endpoint:

AWS DMS: Source Endpoint configuration.

In the setup scenario Configuration for a Network to a VPC Using the Internet my home network required a few tweaks in order to allow the source endpoint IP address to access my internal server. First, I created a port forwarding on the edge router (173.180.222.170) to sent traffic on port 30485 to my internal gateway (10.11.11.241) on port 5432 where I can fine tune access based on the source IP address via iptables rules. From there, network traffic flows through an SSH tunnel to the web server running the PostgreSQL database. With the described configuration the client_addr in the output of pg_stat_activity will show up as 127.0.0.1.

Before allowing incoming traffic, iptables logs show 12 attempts from replication instance at ip=3.227.167.58):

Jan 19 17:35:28 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=23973 DF PROTO=TCP SPT=54662 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:35:29 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=23974 DF PROTO=TCP SPT=54662 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:35:31 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=23975 DF PROTO=TCP SPT=54662 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:35:35 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=23976 DF PROTO=TCP SPT=54662 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:35:48 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=4328 DF PROTO=TCP SPT=54667 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:35:49 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=4329 DF PROTO=TCP SPT=54667 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:35:51 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=4330 DF PROTO=TCP SPT=54667 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:35:55 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=4331 DF PROTO=TCP SPT=54667 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:36:08 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=8298 DF PROTO=TCP SPT=54670 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:36:09 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=8299 DF PROTO=TCP SPT=54670 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:36:11 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=8300 DF PROTO=TCP SPT=54670 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Jan 19 17:36:16 mha.can.local kernel: filter/INPUT: IN=enp0s29f7u2 OUT= MAC=00:24:9b:17:3a:fa:9c:1e:95:e5:ad:b0:08:00 SRC=3.227.167.58 DST=10.11.11.241 LEN=60 TOS=0x00 PREC=0x00 TTL=39 ID=8301 DF PROTO=TCP SPT=54670 DPT=5432 WINDOW=26880 RES=0x00 SYN URGP=0

Once allowing the source endpoint IP address (3.227.167.58) the connection test succeed and the source endpoint configuration is complete. We also have an SSL connection in order to encrypt the traffic through public networks. This can be confirmed on the PostgreSQL server using the query below as well as in the AWS console:

postgres=# SELECT datname, usename, client_addr, ssl, cipher, query, query_start FROM pg_stat_activity a, pg_stat_ssl s where a.pid=s.pid and usename = 's9sdemo';

datname | usename | client_addr | ssl | cipher | query | query_start

---------+---------+-------------+-----+--------+-------+-------------

(0 rows)

…and then watch while running the connection from the AWS console. The results should looks similar to the following:

postgres=# \watch



                                                                           Sun 19 Jan 2020 06:50:51 PM PST (every 2s)



    datname     | usename | client_addr | ssl |           cipher |                 query | query_start

----------------+---------+-------------+-----+-----------------------------+------------------------------------------------------------------------------------+-------------------------------

 nextcloud_prod | s9sdemo | 127.0.0.1   | t | ECDHE-RSA-AES256-GCM-SHA384 | select cast(setting as integer) from pg_settings where name = 'server_version_num' | 2020-01-19 18:50:51.463496-08

(1 row)

…while AWS console should report a success:

AWS DMS: Source Endpoint connection test successful.

As indicated in the prerequisites section, if we choose the migration option Full load, ongoing replication, we will need to alter the permissions for the PostgreSQL user. This migration option requires superuser privileges, therefore I adjusted the settings for the PostgreSQL user created earlier:

nextcloud_prod=# \du s9sdemo

         List of roles

Role name | Attributes | Member of

-----------+------------+-----------

s9sdemo   | Superuser  | {}

The same document contains instructions for modifying postgresql.conf. Here’s a diff from the original one:

--- a/var/lib/pgsql/data/postgresql.conf

+++ b/var/lib/pgsql/data/postgresql.conf

@@ -95,7 +95,7 @@ max_connections = 100                 # (change requires restart)



# - SSL -



-#ssl = off

+ssl = on

#ssl_ca_file = ''

#ssl_cert_file = 'server.crt'

#ssl_crl_file = ''

@@ -181,6 +181,7 @@ dynamic_shared_memory_type = posix  # the default is the first option



# - Settings -



+wal_level = logical

#wal_level = replica                   # minimal, replica, or logical

                                       # (change requires restart)

#fsync = on                            # flush data to disk for crash safety

@@ -239,6 +240,7 @@ min_wal_size = 80MB

#max_wal_senders = 10          # max number of walsender processes

                              # (change requires restart)

#wal_keep_segments = 0         # in logfile segments; 0 disables

+wal_sender_timeout = 0

#wal_sender_timeout = 60s      # in milliseconds; 0 disables



#max_replication_slots = 10    # max number of replication slots

@@ -451,6 +453,7 @@ log_rotation_size = 0                       # Automatic rotation of logfiles will

#log_duration = off

#log_error_verbosity = default         # terse, default, or verbose messages

Lastly, don’t forget to adjust the pg_hba.conf settings in order to allow SSL connection from the replication instance IP address.

We are now ready for the next step.

Target Endpoint

Follow the wizard to create the Target Endpoint:

AWS DMS: Target Endpoint configuration.

This step assumes that the RDS instance with the specified endpoint already exists along with the empty database nextcloud_awsdms. The database can be created during the RDS instance setup.

At this point, if the AWS networking is correctly setup, we should be ready to run the connection test:

AWS DMS: Target Endpoint connection test successful.

With the environment in place, it is now time to create the migration task:

Migration Task

Once the wizard completed the configuration looks like this:

AWS DMS: Migration Task configuratoin - part 1.

...and the second part of the same view:

AWS DMS: Migration Task configuration - part 2.

Once the task is started we can monitor the progress —open up the task details and scroll down to Table Statistics:

AWS DMS: Table Statistics for running tasks.

 AWS DMS is using the cached schema in order to migrate the database tables. While migration progresses, we can continue “watching” the queries on the source database, and the PostgreSQL error log, in addition to the AWS console:

psql: `\watch'-ing the AWS DMS queries.

In case of errors, the failure state is displayed in the console:

AWS DMS: failed task display.

One place to look for clues is CloudWatch, although during my tests the logs didn’t end up being published, which could likely be just another glitch in the beta version of the AWS DMS 3.3.0 as it turned out to be towards the end of this exercise:

AWS DMS: logs not published to CloudWatch - 3.3.0 beta version glitch?

The migration progress is nicely displayed in the AWS DMS console:

AWS DMS: migration progress displayed in console.

Once the migration is complete, reviewing one more time, the PostgreSQL error log, reveals a surprising message:

PostgreSQL error log: relhaspkey error - another AWS DMS 3.3.0 beta version glitch?

What seems to happen, is that in PostgreSQL 9.6, 10 the pg_class table contains the named column relhaspkey, but that’s not the case in 11. And that’s the glitch in beta version of AWS DMS 3.3.0 that I was referring to earlier.

GCP

Google’s approach is based on the open source tool PgBouncer. The excitement was short lived, as the official documentation talks about migrating PostgreSQL into a compute engine environment.

Further attempts to find a migration solution to Cloud SQL that resembles AWS DMS failed. The Database migration strategies contain no reference to PostgreSQL:

GCP: migrating to Cloud SQL - not available for PostgreSQL.

On-prem PostgreSQL installations can be migrated to Cloud SQL by using the services of one of the Google Cloud partners.

A potential solution may be PgBouncer to Cloud SQL, but that is not within the scope of this blog.

Microsoft Cloud Services (Azure)

In order to facilitate the migration of PostgreSQL workloads from on-prem to the managed Azure Database for PostgreSQL Microsoft provides Azure DMS which according to documentation can be used to migrate with minimal downtime. The tutorial Migrate PostgreSQL to Azure Database for PostgreSQL online using DMS describes these steps in detail.

The Azure DMS documentation discusses in great detail the issues and limitations associated with migrating the PostgreSQL workloads into Azure.

One notable difference from AWS DMS is the requirement to manually create the schema:

Azure DMS: schema must be migrated manually.

A demo of this will be the topic of a future blog. Stay tuned.

 

by Viorel Tabara at February 17, 2020 10:45 AM

February 14, 2020

MariaDB Foundation

MariaDB 10.5.1 now available

The MariaDB Foundation is pleased to announce the availability of MariaDB 10.5.1, the first beta release in the MariaDB 10.5 development series.
See the release notes and changelogs for details. […]

The post MariaDB 10.5.1 now available appeared first on MariaDB.org.

by Ian Gilfillan at February 14, 2020 05:25 PM

SeveralNines

What's New in MongoDB 4.2

Database updates come with improved features for performance, security, and with new integrated features. It is always advisable to test a new version before deploying it into production, just to ensure that it suits your needs and there is no possibility of crashes. 

Considering many products, those preceding the first minor versions of a new major version have the most important fixes. For instance I would prefer to have MongoDB version 4.2.1 in production a few days after release than I would for version 4.2.0. 

In this blog we are going to discuss what has been included and what improvements have been made to MongoDB version 4.2

What’s New in MongoDB 4.2

  1. Distributed transactions
  2. Wildcard indexes
  3. Retryable reads and writes
  4. Automatic Client-Side Field-level encryption.
  5. Improved query language for expressive updates
  6. On-demand materialized Views
  7. Modern maintenance operations

Distributed Transactions

Transactions are important database features that ensure data consistency and integrity especially those that guarantees the ACID procedures. MongoDB version 4.2  now supports multi-document transactions on replica sets and a sharded cluster through the distributed transaction approach. The same syntax for using transactions has been maintained as the previous 4.0 version.

However, the client driver specs have changed a bit hence if one intends to use transactions in MongoDB 4.2, you must  upgrade the drivers to versions that are compatible with 4.2 servers.

This version does not limit the size of a transaction in terms of memory usage but only dependent on the size of your hardware and the hardware handling capability.

Global cluster locale reassignment is now possible with version 4.2. This is to say, for a geo zone sharding implementation, if a user residing in region A moves to region B, by changing the value of their location field, the data can be automatically moved from region A to B through a transaction.

The sharding system now allows one to change a shard key contrary to the previous version. Literally, when a shard key is changed, it is an equivalent to moving the document to another shard. In this version, MongoDB wraps this update and if the document needs to be moved from one shard to another, the update will be executed inside a transaction in a background manner.

Using transactions is not an advisable approach since they degrade database performance especially if they occur multiple times. During a transaction period,  there is a stretched window for operations that may cause conflicts when making writes to a document that is affected. As much as a transaction can be retried, there might be an update made to the document before this retry, and whenever the retry happens, it may deal with the old rather than the latest document version. Retries obviously exert more processing cost besides increasing the application downtime through growing latency. 

A good practice around using transactions include:

  1. Avoid using unindexed queries inside a transaction as a way of ensuring the Op will not be slow.
  2. Your transaction should involve a few documents.

With the MongoDB dynamic schema format and embedding feature, you can opt to put all fields in the same collection to avoid the need to use transaction as a first measure.

Wildcard Indexes

Wildcard indexes were introduced in MongoDB version 4.2 to enhance queries against arbitrary fields or fields whose names are not known in advance, by indexing the entire document or subdocument.They are not intended to replace the workload based indexes but suit working with data involving polymorphic pattern. A polymorphic pattern is where all documents in a collection are similar but not having an identical structure. Polymorphic data patterns can be generated from application involving, product catalogs or social data. Below is an example of Polymorphic collection data

{

Sport: ‘Chess’,

playerName: ‘John Mah’,

Career_earning: {amount: NumberDecimal(“3000”), currency: “EUR”},

gamesPlayed:25,

career_titles:10

},

{

Sport: Tennis,

playerName: ‘Semenya Jones,

Career_earning: {amount: NumberDecimal(“34545”), currency: “USD”},

Event: {

name:”Olympics”,

career_titles:10,

career_tournaments:14

}

By indexing the entire document using Wildcard indexes, you can make a query using any arbitrary field as an index.

To create a Wildcard index

$db.collection.createIndex({“fieldA.$**”: 1})

If the selected field is a nested document or an array, the Wildcard index recurses into the document and stores the value for all the fields in the document or array.

Retryable Reads and Writes

Normally a database may incur some frequent network transient outages that may result in a query being partially or fully unexecuted. These network errors may not be that serious hence offer a chance for a retry of these queries once reconnected. Starting with MongoDB 4.2, the retry configuration is enabled by default. The MongoDB drivers can retry failed reads and writes for certain transactions whenever they encounter minor network errors or rather when they are unable to find some healthy primary in the sharded cluster/ replica set. However, if you don’t want the retryable writes, you can explicitly disable them in your configurations but I don’t find a compelling reason why one should disable them.

This feature is to ensure that in any case MongoDB infrastructure changes, the application code shouldn’t be affected. Regarding an example explained by Eliot Horowitz, the Co-founder of MongoDB, for a webpage that does 20 different database operations, instead of reloading the entire thing or having to wrap the entire web page in some sort of loop, the driver under the covers can just decide to retry the operation. Whenever a write fails, it will retry automatically and will have a contract with the server to guarantee that every write happens only once.

The retryable writes only makes a single retry attempt which helps to address replica set elections and  transient network errors but not for persistent network errors.

Retryable writes do not address instances where the failover period exceeds serverSelectionTimoutMs value in the parameter configurations.

With this MongoDB version, one can update document shard key values (except if the shardkey is the immutable _id field) by issuing a single-document findAndModify/update operations either in a transaction or as a retryable write.

MongoDB version 4.2 can now retry a single-document upsert operation (i.e upsert: true and multi: false) which may have failed because of duplicate key error if the operation meets these key conditions:

  1. Target collection contains a unique index that made the duplicate key error.
  2. The update operation will not modify any of the fields in the query predicate.
  3. The update match condition is either a single equality predicate {field: “value”} or a logical AND of equality predicates {filed: “value”, field0: “value0”}
  4. Set of fields in the unique index key pattern matches the set of fields in the update query predicate.

Automatic Client-Side Field-level Encryption

MongoDB version 4.2 comes with the Automatic Client-Side Field Level encryption (CSFLE), a feature that allows  developers to selectively encrypt individual fields of a document on the client side before it is sent to the server. The encrypted data is thus kept private from the providers hosting the database and any user that may have direct access to the database.

Only applications with the access to the correct encryption keys can decrypt and read the protected data. In case the encryption key is deleted, all data that was encrypted will be rendered unreadable.

Note: this feature is only available with MongoDB enterprise only.

Improved query language for expressive updates

MongoDB version 4.2 provides a richer query language than its predecessors. It now supports aggregations and modern use-case operations along the lines of geo-based searches, graph search and text search. It has integrated a third-party search engine which makes searches faster considering that the search engine is running on a different process/server. This generally improves on database performance contrary to if all searches were made to the mongod process which would rather make the database operation latency volatile whenever the search engine reindexes.

With this version, you can now handle arrays, do sums and other maths operations directly through an update statement.

On-Demand Materialized Views

The data aggregation pipeline framework in MongoDB is a great feature with different stages of transforming a document to some desired state. The MongoDB version 4.2 introduces a new stage $merge which for me I will say it saved me some time working with the final output that needed to be stored in a collection. Initially, the $out stage allows creating a new collection based on aggregation and populates the collection with the results obtained. If the collection already exist, it would overwrite the collection with the new results contrary to the $merge stage which only incorporates the pipeline results into an existing output rather than fully replacing the collection. Regenerating an entire collection everytime with the $out stage consumes a lot of CPU and IO which may degrade the database performance. The output content will therefore be timely updated enabling users to create on-demand materialized views

Modern Maintenance Operations

Developers can now have a great operational experience with MongoDB version 4.2 with integrated features that enhance high availability, cloud managed backup strategy,  improve the monitoring power and alerting systems. MongoDB Atlas and MongoDB Ops Manager are the providing platforms for these features. The latter has been labeled as the best for running MongoDB on-enterprise. It has also been integrated with Kubernetes operator for on-premise users who are moving to private cloud. This interface enables one to directly control Ops Manager.

There are some internal changes made to MongoDB version 4.2 which include:

  1. Listing open cursors.
  2. Removal of the MMAPv1 storage engine.
  3. Improvement on the WiredTiger data file repair.
  4. Diagnostic fields can now have queryHash
  5. Auto-splitting thread for mongos nodes has been removed.

Conclusion

MongoDB version 4.2 comes with some improvements along the lines of security and database performance. It has included an Automatic Client-Side Field Level Encryption that ensures data is protected from the client angle. More features like a Third party search engine and inclusion of the $merge stage in the aggregation framework render some improvement in the database performance. Before putting this version in production, please ensure that all your needs are fully addressed.

by Onyancha Brian Henry at February 14, 2020 10:45 AM

February 13, 2020

SeveralNines

Steps to Take if You Have a MySQL Outage

A MySQL outage simply means your MySQL service is not accessible or unresponsive from the other's perspective. Outages can be originated by a bunch of possible causes..

  • Network issue - Connectivity issue, switch, routing, resolver, load-balancer tier.
  • Resource issue - Whether you have reached resources limit or bottleneck.
  • Misconfiguration - Wrong permission or ownership, unknown variable, wrong password, privilege changed.
  • Locking - Global or table lock prevent others from accessing the data.

In this blog post, we’ll look at some steps to take if you’re having a MySQL outage (Linux environment).

Step One: Get the Error Code

When you have an outage, your application will throw out some errors and exceptions. These errors commonly come with an error code, that will give you a rough idea on what you’re facing and what to do next to troubleshoot the issue and recover the outage. 

To get more details on the error, check the MySQL Error Code or MariaDB Error Code pages respectively to figure out what the error means.

Step Two: Is the MySQL Server Running?

Log into the server via terminal and see if MySQL daemon is running and listening to the correct port. In Linux, one would do the following:

Firstly, check the MySQL process:

$ ps -ef | grep -i mysql

You should get something in return. Otherwise, MySQL is not running. If MySQL is not running, try to start it up:

$ systemctl start mysql # systemd

$ service mysql start # sysvinit/upstart

$ mysqld_safe # manual

If you are seeing an error on the above step, you should go look at the MySQL error log, which varies depending on the operating system and MySQL variable configuration for log_error in MySQL configuration file. For RedHat-based server, the file is commonly located at:

$ cat /var/log/mysqld.log

Pay attention to the most recent lines with log level "[Error]". Some lines labelled with "[Warning]" could indicate some problems, but those are pretty uncommon. Most of the time, misconfiguration and resource issues can be detected from here.

If MySQL is running, check whether it's listening to the correct port:

$ netstat -tulpn | grep -i mysql

tcp6       0 0 :::3306                 :::* LISTEN   1089/mysqld

You would get the process name "mysqld", listening on all interfaces (:::3306 or 0.0.0.0:3306) on port 3306 with PID 1089 and the state is "LISTEN". If you see the above line shows 127.0.0.1:3306, MySQL is only listening locally. You might need to change the bind_address value in MySQL configuration file to listen to all IP addresses, or simply comment on the line. 

Step Three: Check for Connectivity Issues

If the MySQL server is running fine without error inside the MySQL error log, the chance that connectivity issues are happening is pretty high. Start by checking connectivity to the host via ping (if ICMP is enabled) and telnet to the MySQL server from the application server:

(application-server)$ ping db1.mydomain.com

(application-server)$ telnet db1.mydomain.com 3306

Trying db1.mydomain.com...

Connected to 192.168.0.16.

Escape character is '^]'.

O

5.6.46-86.2sN&nz9NZ�32?&>H,EV`_;mysql_native_password

You should see some lines in the telnet output if you can get connected to the MySQL port. Now, try once more by using MySQL client from the application server:

(application-server)$ mysql -u db_user -p -h db1.mydomain.com -P3306

ERROR 1045 (28000): Access denied for user 'db_user'@'db1.mydomain.com' (using password: YES)

In the above example, the error gives us a bit of information on what to do next. The above probably because someone has changed the password for "db_user" or the password for this user has expired. This is a rather normal behaviour from MySQL 5.7. 4 and above, where the automatic password expiration policy is enabled by default with a 360 days threshold - meaning that all passwords will expire once a year.

Step Four: Check the MySQL Processlist

If MySQL is running fine without connectivity issues, check the MySQL process list to see what processes are currently running:

mysql> SHOW FULL PROCESSLIST;

+-----+------+-----------+------+---------+------+-------+-----------------------+-----------+---------------+

| Id  | User | Host      | db | Command | Time | State | Info                  | Rows_sent | Rows_examined |

+-----+------+-----------+------+---------+------+-------+-----------------------+-----------+---------------+

| 117 | root | localhost | NULL | Query   | 0 | init | SHOW FULL PROCESSLIST |       0 | 0 |

+-----+------+-----------+------+---------+------+-------+-----------------------+-----------+---------------+

1 row in set (0.01 sec)

Pay attention to the Info and Time column. Some MySQL operations could be destructive enough to make the database stalls and become unresponsive. The following SQL statements, if running, could block others to access the database or table (which could bring a brief outage of MySQL service from the application perspective):

  • FLUSH TABLES WITH READ LOCK
  • LOCK TABLE ...
  • ALTER TABLE ...

Some long running transactions could also stall others, which eventually would cause timeouts to other transactions waiting to access the same resources. You may either kill the offensive transaction to let others access the same rows or retry the enqueue transactions after the long transaction finishes.

Conclusion

Proactive monitoring is really important to minimize the risk of MySQL outage. If your database is managed by ClusterControl, all the mentioned aspects are being monitored automatically without any additional configuration from the user. You shall receive alarms in your inbox for anomaly detections like long running queries, server misconfiguration, resource exceeding threshold and many more. Plus, ClusterControl will automatically attempt to recover your database service if something goes wrong with the host or network.

You can also learn more about MySQL & MariaDB Disaster Recovery by reading our whitepaper.

by ashraf at February 13, 2020 10:45 AM

February 12, 2020

SeveralNines

What to Look for if Your MySQL Replication is Lagging

A master/slave replication cluster setup is a common use case in most organizations. Using MySQL Replication enables your data to be replicated across different environments and guarantees that the information gets copied. It is asynchronous and single-threaded (by default), but replication also allows you to configure it to be synchronous (or actually “semi-synchronous”) and can run slave thread to multiple threads or in parallels.

This idea is very common and usually arrives with a simple setup, making its slave serving as its recovery or for backup solutions. However, this always comes to a price especially when bad queries (such as lack of primary or unique keys) are replicated or some trouble with the hardware (such as network or disk IO issues). When these issues occur, the most common problem to face is the replication lag. 

A replication lag is the cost of delay for transaction(s) or operation(s) calculated by its time difference of execution between the primary/master against the standby/slave node. The most certain cases in MySQL relies on bad queries being replicated such as lack of primary keys or bad indexes, a poor network hardware or malfunctioning network card, a distant location between different regions or zones, or some processes such as physical backups running can cause your MySQL database to delay applying the current replicated transaction. This is a very common case when diagnosing these issues. In this blog, we'll check how to deal with these cases and what to look if you are experiencing MySQL replication lag.

The "SHOW SLAVE STATUS": The MySQL DBA's Mantra

In some cases, this is the silver bullet when dealing with replication lag and it reveals mostly everything the cause of an issue in your MySQL database. Simply run this SQL statement in your slave node that is suspected experiencing a replication lag. 

The initial fields that are common to trace for problems are,

  • Slave_IO_State - It tells you what the thread is doing. This field will provide you good insights if the replication health is running normally, facing network problems such as reconnecting to a master, or taking too much time to commit data which can indicate disk problems when syncing data to disk. You can also determine this state value when running SHOW PROCESSLIST.
  • Master_Log_File -  Master's binlog file name where the I/O thread is currently fetch.
  • Read_Master_Log_Pos - binlog file position from the master where the replication I/O thread has already read.
  • Relay_Log_File - the relay log filename for which the SQL thread is currently executing the events
  • Relay_Log_Pos - binlog position from the file specified in Relay_Log_File for which SQL thread has already executed.
  • Relay_Master_Log_File - The master's binlog file that the SQL thread has already executed and is a congruent to  Read_Master_Log_Pos value.
  • Seconds_Behind_Master -  this field shows an approximation for difference between the current timestamp on the slave against the timestamp on the master for the event currently being processed on the slave. However, this field might not be able to tell you the exact lag if the network is slow because the difference in seconds are taken between the slave SQL thread and the slave I/O thread. So there can be cases that it can be caught up with slow-reading slave I/O thread, but i master it's already different.
  • Slave_SQL_Running_State - state of the SQL thread and the value is identical to the state value displayed in SHOW PROCESSLIST.
  • Retrieved_Gtid_Set - Available when using GTID replication. This is the set of GTID's corresponding to all transactions received by this slave. 
  • Executed_Gtid_Set - Available when using GTID replication. It's the set of GTID's written in the binary log.

For example, let's take the example below which uses a GTID replication and is experiencing a replication lag:

mysql> show slave status\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 192.168.10.70

                  Master_User: cmon_replication

                  Master_Port: 3306

                Connect_Retry: 10

              Master_Log_File: binlog.000038

          Read_Master_Log_Pos: 826608419

               Relay_Log_File: relay-bin.000004

                Relay_Log_Pos: 468413927

        Relay_Master_Log_File: binlog.000038

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB: 

          Replicate_Ignore_DB: 

           Replicate_Do_Table: 

       Replicate_Ignore_Table: 

      Replicate_Wild_Do_Table: 

  Replicate_Wild_Ignore_Table: 

                   Last_Errno: 0

                   Last_Error: 

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 826608206

              Relay_Log_Space: 826607743

              Until_Condition: None

               Until_Log_File: 

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File: 

           Master_SSL_CA_Path: 

              Master_SSL_Cert: 

            Master_SSL_Cipher: 

               Master_SSL_Key: 

        Seconds_Behind_Master: 251

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error: 

               Last_SQL_Errno: 0

               Last_SQL_Error: 

  Replicate_Ignore_Server_Ids: 

             Master_Server_Id: 45003

                  Master_UUID: 36272880-a7b0-11e9-9ca6-525400cae48b

             Master_Info_File: mysql.slave_master_info

                    SQL_Delay: 0

          SQL_Remaining_Delay: NULL

      Slave_SQL_Running_State: copy to tmp table

           Master_Retry_Count: 86400

                  Master_Bind: 

      Last_IO_Error_Timestamp: 

     Last_SQL_Error_Timestamp: 

               Master_SSL_Crl: 

           Master_SSL_Crlpath: 

           Retrieved_Gtid_Set: 36272880-a7b0-11e9-9ca6-525400cae48b:7631-9192

            Executed_Gtid_Set: 36272880-a7b0-11e9-9ca6-525400cae48b:1-9191,

864dd532-a7af-11e9-85f2-525400cae48b:1-173,

df68c807-a7af-11e9-9b56-525400cae48b:1-4

                Auto_Position: 1

         Replicate_Rewrite_DB: 

                 Channel_Name: 

           Master_TLS_Version: 

1 row in set (0.00 sec)

Diagnosing issues like this, mysqlbinlog can also be your tool to identify what query has been running on a specific binlog x & y position. To determine this, let's take the Retrieved_Gtid_Set, Relay_Log_Pos, and the Relay_Log_File. See the command below:

[root@testnode5 mysql]# mysqlbinlog --base64-output=DECODE-ROWS --include-gtids="36272880-a7b0-11e9-9ca6-525400cae48b:9192" --start-position=468413927 -vvv relay-bin.000004

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;

/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;

DELIMITER /*!*/;

# at 468413927

#200206  4:36:14 server id 45003  end_log_pos 826608271 CRC32 0xc702eb4c        GTID last_committed=1562 sequence_number=1563    rbr_only=no

SET @@SESSION.GTID_NEXT= '36272880-a7b0-11e9-9ca6-525400cae48b:9192'/*!*/;

# at 468413992

#200206  4:36:14 server id 45003  end_log_pos 826608419 CRC32 0xe041ec2c        Query thread_id=24 exec_time=31 error_code=0

use `jbmrcd_date`/*!*/;

SET TIMESTAMP=1580963774/*!*/;

SET @@session.pseudo_thread_id=24/*!*/;

SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;

SET @@session.sql_mode=1436549152/*!*/;

SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;

/*!\C utf8 *//*!*/;

SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;

SET @@session.lc_time_names=0/*!*/;

SET @@session.collation_database=DEFAULT/*!*/;

ALTER TABLE NewAddressCode ADD INDEX PostalCode(PostalCode)

/*!*/;

SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;

DELIMITER ;

# End of log file

/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

It tells us that it was trying to replicate and execute a DML statement which tries to be the source of the lag. This table is a huge table containing 13M of rows. 

Check SHOW PROCESSLIST, SHOW ENGINE INNODB STATUS, with ps, top, iostat command combination

In some cases, SHOW SLAVE STATUS is not enough to tell us the culprit. It's possible that the replicated statements are affected by internal processes running in the MySQL database slave. Running the statements SHOW [FULL] PROCESSLIST and SHOW ENGINE INNODB STATUS also provides informative data that gives you insights about the source of the problem. 

For example, let's say a benchmarking tool is running causing to saturate the disk IO and CPU. You can check by running both SQL statements. Combine it with ps and top commands. 

You can also determine bottlenecks with your disk storage by running iostat which provides statistics of the current volume you are trying to diagnose. Running iostat can show how busy or loaded your server is. For example, taken by a slave which is lagging but also experiencing high IO utilization at the same time, 

[root@testnode5 ~]# iostat -d -x 10 10

Linux 3.10.0-693.5.2.el7.x86_64 (testnode5)     02/06/2020 _x86_64_ (2 CPU)



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 0.42 3.71   60.65 218.92 568.39   24.47 0.15 2.31 13.79    1.61 0.12 0.76

dm-0              0.00 0.00 3.70   60.48 218.73 568.33   24.53 0.15 2.36 13.85    1.66 0.12 0.76

dm-1              0.00 0.00 0.00    0.00 0.04 0.01 21.92     0.00 63.29 2.37 96.59 22.64   0.01



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 0.20 392.30 7983.60  2135.60 49801.55 12.40 36.70    3.84 13.01 3.39 0.08 69.02

dm-0              0.00 0.00 392.30 7950.20  2135.60 50655.15 12.66 36.93    3.87 13.05 3.42 0.08 69.34

dm-1              0.00 0.00 0.00    0.30 0.00 1.20   8.00 0.06 183.67 0.00  183.67 61.67 1.85



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 1.40 370.93 6775.42  2557.04 46184.22 13.64 43.43    6.12 11.60 5.82 0.10 73.25

dm-0              0.00 0.00 370.93 6738.76  2557.04 47029.62 13.95 43.77    6.20 11.64 5.90 0.10 73.41

dm-1              0.00 0.00 0.00    0.30 0.00 1.20   8.00 0.03 107.00 0.00  107.00 35.67 1.07



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 0.00 299.80 7253.35  1916.88 52766.38 14.48 30.44    4.59 15.62 4.14 0.10 72.09

dm-0              0.00 0.00 299.80 7198.60  1916.88 51064.24 14.13 30.68    4.66 15.70 4.20 0.10 72.57

dm-1              0.00 0.00 0.00    0.00 0.00 0.00   0.00 0.00 0.00 0.00    0.00 0.00 0.00



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 0.10 215.50 8939.60  1027.60 67497.10 14.97 59.65    6.52 27.98 6.00 0.08 72.50

dm-0              0.00 0.00 215.50 8889.20  1027.60 67495.90 15.05 60.07    6.60 28.09 6.08 0.08 72.75

dm-1              0.00 0.00 0.00    0.30 0.00 1.20   8.00 0.01 32.33 0.00   32.33 30.33 0.91



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 0.90 140.40 8922.10   625.20 54709.80 12.21 11.29    1.25 9.88 1.11 0.08 68.60

dm-0              0.00 0.00 140.40 8871.50   625.20 54708.60 12.28 11.39    1.26 9.92 1.13 0.08 68.83

dm-1              0.00 0.00 0.00    0.30 0.00 1.20   8.00 0.01 27.33 0.00   27.33 9.33 0.28



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 1.70 284.50 8621.30 24228.40 51535.75    17.01 34.14 3.27 8.19 3.11 0.08 72.78

dm-0              0.00 0.00 290.90 8587.10 25047.60 53434.95    17.68 34.28 3.29 8.02 3.13 0.08 73.47

dm-1              0.00 0.00 0.00    2.00 0.00 8.00   8.00 0.83 416.45 0.00  416.45 63.60 12.72



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 0.30 851.60 11018.80 17723.60 85165.90    17.34 142.59 12.44 7.61 12.81 0.08 99.75

dm-0              0.00 0.00 845.20 10938.90 16904.40 83258.70    17.00 143.44 12.61 7.67 12.99 0.08 99.75

dm-1              0.00 0.00 0.00    0.00 0.00 0.00   0.00 0.00 0.00 0.00    0.00 0.00 0.00



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 1.10 24.60 12965.40   420.80 51114.45 7.93 39.44    3.04 0.33 3.04 0.07 93.39

dm-0              0.00 0.00 24.60 12890.20   420.80 51114.45 7.98 40.23    3.12 0.33 3.12 0.07 93.35

dm-1              0.00 0.00 0.00    0.00 0.00 0.00   0.00 0.00 0.00 0.00    0.00 0.00 0.00



Device:         rrqm/s wrqm/s     r/s w/s rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await svctm  %util

sda               0.00 0.00 3.60 13420.70    57.60 51942.00 7.75 0.95   0.07 0.33 0.07 0.07 92.11

dm-0              0.00 0.00 3.60 13341.10    57.60 51942.00 7.79 0.95   0.07 0.33 0.07 0.07 92.08

dm-1              0.00 0.00 0.00    0.00 0.00 0.00   0.00 0.00 0.00 0.00    0.00 0.00 0.00

The result above displays the high IO utilization and a high writes. It also reveals that the average queue size and average request size are moving which means it's an indication of a high workload. In these cases, you need to determine if there are external processes that cause MySQL to choke the replication threads.

How Can ClusterControl Help?

With ClusterControl, dealing with slave lag and determining the culprit is very easy and efficient. It directly tells you in the web UI, see below:

It reveals to you the current slave lag your slave nodes are experiencing. Not only that, with SCUMM dashboards, if enabled, provides you more insights of what your slave node's health or even the whole cluster is doing:

ClusterControl Replication Slave Dashboard
ClusterControl Cluster Overview Dashboard
ClusterControl Cluster Overview Dashboard

Not only that these things are available in ClusterControl, it does provide you also the capability to avoid bad queries from occurring with these features as seen below,

The redundant indexes allows you to determine that these indexes can cause performance issues for incoming queries that references the duplicate indexes. It also tells you tables that have no Primary Keys which are usually a common problem of slave lag when a certain SQL query or transactions that references big tables without primary or unique keys when it's replicated to the slaves.

Conclusion

Dealing with MySQL Replication lag is a frequent problem in a master-slave replication setup. It can be easy to diagnose, but difficult to solve. Make sure you have your tables with primary key or unique key existing, and determine the steps and tools on how to troubleshoot and diagnose the cause of slave lag. Efficiency is always the key when solving problems though.

by Paul Namuag at February 12, 2020 07:34 PM

February 11, 2020

SeveralNines

How Do I Know if My PostgreSQL Backup is Good?

Backups are a must in all Disaster Recovery Plan. It might not always be enough to guarantee an acceptable Recovery Point Objective, but is a good first approach. The problem is what happens if, in case of failure, you need to use this backup, and it’s not usable for some reason? Probably you don’t want to be in that situation, so, in this blog, we’ll see how to confirm if your backup is good to use.

Types of PostgreSQL Backups

Let’s start talking about the different types of backups. There are different types, but in general, we can separate it in two simple categories:

  • Logical: The backup is stored in a human-readable format like SQL.
  • Physical: The backup contains binary data.

Why are we mentioning this? Because we’ll see that there are some checks we can do for one type and not for the other one.

Checking the Backup Logs

The first way to confirm if everything goes fine is by checking the backup logs.

The simplest command to run a PostgreSQL backup could be for example:

$ pg_dumpall > /path/to/dump.sql

But, how can I know if there was an error when the command was running? You can just add to send the output to some specific log file:

$ pg_dumpall > /path/to/dump.sql > /var/log/postgres/pg_dump.log

So, you can add this line in the server cron to run it every day:

30 0 * * * pg_dumpall > /path/to/dump.sql > /var/log/postgres/pg_dump.log

And you should monitor the log file to look for errors, for example, adding it into some monitoring tool like Nagios.

Checking logs is not enough to confirm that the backup will work, because for example, if the backup file is corrupted for some reason, you probably won’t see that in the log file.

Checking the Backup Content

If you are using logical backups, you can verify the content of the backup file, to confirm you have all databases there.

You can list your current PostgreSQL databases using, for example, this command:

$ psql -l | awk '{ print $1 }'| awk 'FNR > 3' |grep '^[a-zA-Z0-9]' |grep -v 'template0'

postgres

template1

world

And check which databases you have in the backup file:

$ grep '^[\]connect' /path/to/dump.sql |awk '{print $2}'

template1

postgres

world

The problem with this check is you don’t check the size or data, so it could be possible that you have some data loss if there was some error when the backup was executed.

Restoring to Check the Backup Manually

The most secure way to confirm if a backup is working is restoring it and access the database.

After the backup is completed, you can restore it manually in another host by copying the dump file and running for example:

$ psql -f /path/to/dump.sql postgres

Then, you can access it and check the databases:

$ psql

postgres=# \l

                                  List of databases

   Name    | Owner   | Encoding |   Collate | Ctype    | Access privileges

-----------+----------+----------+-------------+-------------+-----------------------

 postgres  | postgres | UTF8     | en_US.utf-8 | en_US.utf-8 |

 template0 | postgres | UTF8     | en_US.utf-8 | en_US.utf-8 | =c/postgres          +

           |          | |             | | postgres=CTc/postgres

 template1 | postgres | UTF8     | en_US.utf-8 | en_US.utf-8 | =c/postgres          +

           |          | |             | | postgres=CTc/postgres

 world     | postgres | UTF8     | en_US.utf-8 | en_US.utf-8 |

(4 rows)

The problem with this method is, of course, you should run it manually, or find a way to automate this, which could be a time-consuming task.

Automatic ClusterControl Backup Verification

Now, let’s see how ClusterControl can automate the verification of PostgreSQL backups and help avoid any surprises or manual tasks.

In ClusterControl, select your cluster and go to the "Backup" section, then, select “Create Backup”.

The automatic verify backup feature is available for the scheduled backups. So, let’s choose the “Schedule Backup” option.

When scheduling a backup, in addition to selecting the common options like method or storage, you also need to specify schedule/frequency.

In the next step, you can compress and encrypt your backup and specify the retention period. Here, you also have the “Verify Backup” feature.

To use this feature, you need a dedicated host (or VM) that is not part of the cluster.

ClusterControl will install the software and it’ll restore the backup in this host. After restoring, you can see the verification icon in the ClusterControl Backup section.

Conclusion

As we mentioned, backups are mandatory in any environment, but backup is not a backup if you can’t use it. So, you should make sure that your backup is useful in case you need it one day. In this blog, we showed different ways to check your backup to avoid problems when you want to restore it.

 

by Sebastian Insausti at February 11, 2020 06:51 PM

Federico Razzoli

Use cases for MariaDB Invisible Columns

Invisible columns are columns that are not returned by a SELECT *. Their use cases are not obvious.

The post Use cases for MariaDB Invisible Columns appeared first on Federico Razzoli.

by Federico Razzoli at February 11, 2020 09:47 AM

February 10, 2020

SeveralNines

How to Protect your MySQL or MariaDB Database From SQL Injection: Part One

Security is one of the most important elements of the properly designed database environment. There are numerous attack vectors used with SQL injection being probably the most popular one. You can design layers of defence in the application code but what can you do on the database layer? Today we would like to show you how easily you can implement SQL firewall on top of MySQL using ProxySQL. In the second part of this blog we will explain how you can create a whitelist of queries that are allowed to access the database.

First, we want to deploy ProxySQL. The easiest way to do it is to use ClusterControl. With a couple of clicks you can deploy it to your cluster.

Deploy ProxySQL to Database Cluster

Define where to deploy it, you can either pick existing host in the cluster or just write down any IP or hostname. Set credentials for administrative and monitoring users.

Then you can create a new user in the database to be used with ProxySQL or you can import one of the existing ones. You also need to define the database nodes you want to include in the ProxySQL. Answer if you use implicit transactions or not and you are all set to deploy ProxySQL. In a couple of minutes a ProxySQL with configuration prepared based on your input is ready to use.

Given our issue is security, we want to be able to tell ProxySQL how to handle inappropriate queries. Let’s take a look at the query rules, the core mechanism that governs how ProxySQL handles the traffic that passes through it. The list of query rules may look like this:

They are being applied from the lowest ID onwards.

Let’s try to create a query rule which will allow only SELECT queries for a particular user:

We are adding a query rule at the beginning of the rules list. We are going to match anything that is not SELECTs (please note Negate Match Pattern is enabled). The query rule will be used only when the username is ‘devuser’. If all the conditions are matched, the user will see the error as in the “Error Msg” field.

root@vagrant:~# mysql -u devuser -h 10.0.0.144 -P6033 -ppass

mysql: [Warning] Using a password on the command line interface can be insecure.

Welcome to the MySQL monitor.  Commands end with ; or \g.

Your MySQL connection id is 3024

Server version: 5.5.30 (ProxySQL)



Copyright (c) 2009-2019 Percona LLC and/or its affiliates

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.



Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.



Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.



mysql> create schema myschema;

ERROR 1148 (42000): The query is not allowed

mysql> SELECT 1;

+---+

| 1 |

+---+

| 1 |

+---+

1 row in set (0.01 sec)



mysql> SELECT * FROM sbtest.sbtest1 LIMIT 1\G

*************************** 1. row ***************************

 id: 1

  k: 503019

  c: 18034632456-32298647298-82351096178-60420120042-90070228681-93395382793-96740777141-18710455882-88896678134-41810932745

pad: 43683718329-48150560094-43449649167-51455516141-06448225399

1 row in set (0.00 sec)

Another example, this time we will try to prevent accidents related to the Bobby Tables situation.

With this query rule in place, your ‘students’ table won’t be dropped by Bobby:

mysql> use school;

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A



Database changed

mysql> INSERT INTO students VALUES (1, 'Robert');DROP TABLE students;--

Query OK, 1 row affected (0.01 sec)



ERROR 1148 (42000): Only superuser can execute DROP TABLE;

As you can see, Bobby was not able to remove our ‘students’ table. He was only nicely inserted into the table.

 

by krzysztof at February 10, 2020 08:58 PM

MariaDB Foundation

FOSDEM 2020: Some memories and todos

FOSDEM gives energy. FOSDEM gives ideas. FOSDEM opens up opportunities, FOSDEM allows you to connect with old friends and colleagues. Hence, no big surprise that MariaDB Foundation attended FOSDEM, in order to promote Open Source and to get ourselves closer to the community. […]

The post FOSDEM 2020: Some memories and todos appeared first on MariaDB.org.

by Kaj Arnö at February 10, 2020 03:10 PM

February 08, 2020

Valeriy Kravchuk

Fun with Bugs #93 - On MySQL Bug Reports I am Subscribed to, Part XXVII

No matter what I write and present about dynamic tracing, blog posts about MySQL bugs are more popular based on statistics. So, to make more readers happy, I'd like to continue my review of interesting bugs reported in November with this post on bugs reported during December, 2019.

As usual, I'll try to concentrate on bug reports related to InnoDB, replication and optimizer, but some other categories also got my attention:
  • Bug #97911 - "[FATAL] Semaphore wait has lasted > 600 seconds. We intentionally crash the serv...". This bug got marked as a duplicate of other, older long semaphore wait bug (in "Can't repeat" status!) without much analysis. I think all Oracle engineers who added comments to that bug missed one interesting point:
    ... [ERROR] [MY-012872] [InnoDB] Semaphore wait has lasted > 39052544 seconds.
    even though bug reporter highlighted it in a comment. Reported wait time is a problem and is surely a bug, no matter how to reproduce the long wait itself and what is its root cause!
  • Bug #97913 - "Undo logs growing during partitioning ALTER queries". This bug (affecting only MySQL 5.7.x) was reported by Przemyslaw Malkowski from Percona, who also presented useful examples of monitoring queries to the information_schema.innodb_metrics and performance_schema. Check also comments that may explain why 8.0 is not affected in a similar way.
  • Bug #97935 - "Memory leak in client connection using information_schema". It took some efforts (starting from but not limited to Valgrind Massif profiling of heap memory usage) and time for Daniel Nichter to prove the point and get this bug "Verified". It is also not clear if MySQL 8 is also affected.
  • Bug #97950 - "buf_read_page_handle_error can trigger assert failure". Bug reporter, Shu Lin, tried his best to make the point. It's clear enough how to repeat this, and one could use one of documented test synchonisation methods if gdb is too much for bug verification. I do not think this bug was handled properly or got the level of attention it truly deserved.
  • Bug #97966 - "XA COMMIT in another session will not write binlog event". This bug was reported by Lujun Wang and immediately verified, but again with no documented check if MySQL 8 is affected. This happens too often, unfortunately.
  • Bug #97971 - "Roles not handling column level privileges correctly; Can SELECT, but not UPDATE". Clear and simple bug report with a workaround from Travis Bement. It was immediately verified.
  • Bug #98014 - "Lossy implicit conversion in conditional breaks ONLY_FULL_GROUP_BY". Yet another case of (IMHO) improper bug processing. The argument presented (from the manual):
    "MySQL 5.7.5 and later also permits a nonaggregate column not named in a GROUP BY clause when ONLY_FULL_GROUP_BY SQL mode is enabled, provided that this column is limited to a single value"
    does not apply, as "single value" for = 0 is NOT selected, we have multiple Host values matching it due to conversion. This is how proper version (guess what it is) works:
    mysql> SELECT User, Host, COUNT(*) FROM mysql.user WHERE Host = 0 GROUP BY 1;
    ERROR 1055 (42000): 'mysql.user.Host' isn't in GROUP BY
    mysql> select @@sql_mode;
    +--------------------+
    | @@sql_mode         |
    +--------------------+
    | ONLY_FULL_GROUP_BY |
    +--------------------+
    1 row in set (0.001 sec)

    mysql> set session sql_mode='';
    Query OK, 0 rows affected (0.029 sec)

    mysql> SELECT User, Host, COUNT(*) FROM mysql.user WHERE Host = 0 GROUP BY 1;+---------------+-----------+----------+
    | User          | Host      | COUNT(*) |
    +---------------+-----------+----------+
    | data_engineer |           |        1 |
    | en            | localhost |        1 |
    | ro1           |           |        1 |
    | ro2           |           |        1 |
    | role-1        |           |        1 |
    | root          | ::1       |        3 |
    ...
    | user1         | %         |        1 |
    +---------------+-----------+----------+
    13 rows in set, 17 warnings (0.003 sec)
    I think this bug reported by Joshua Varner must be verified.
  • Bug #98046 - "Inconsistent behavior while logging a killed query in the slow query log". Bug reporter, Pranay Motupalli, provided a clear test case and a detailed analysis, including the gdb debugging session that proves the point. Nice bug report.
  • Bug #98055 - "MySQL Optimizer Bug not picking right index". Both the bug reporter (Sudheer Gadipathi) and engineer who verified the bug stated that MySQL 8.0.x is similarly affected (UNIQUE key is preferred for the partitioned table, even though there is a better non-unique index). But 8.0.x is NOT listed in the "Version:" filed. Weird.
  • Bug #98068 - "SELECT FOR UPDATE not-exist table in PERFORMANCE SCHEMA reports confusing error". This is a funny (but still a regression) bug report by William ZHANG. Proper versions work like this:
    mysql> select database();+--------------------+
    | database()         |
    +--------------------+
    | performance_schema |
    +--------------------+
    1 row in set (0.001 sec)

    mysql> select * from not_exist_table;
    ERROR 1146 (42S02): Table 'performance_schema.not_exist_table' doesn't exist
    mysql> select * from not_exist_table for update;
    ERROR 1146 (42S02): Table 'performance_schema.not_exist_table' doesn't exist
  • Bug #98072 - "innochecksum summary shows blob pages as other type of page for 8.0 tables". The bug was reported by SERGEY KUZMICHEV. This time "regression" tag is missing, even though it's clearly stated that MySQL 5.7 worked differently. This is from the proper version:
    ...
    File::..\data\test\blob_test.ibd
    ================PAGE TYPE SUMMARY==============
    #PAGE_COUNT     PAGE_TYPE
    ===============================================
           1        Index page
           0        Undo log page
           1        Inode page
           0        Insert buffer free list page
         508        Freshly allocated page
           1        Insert buffer bitmap
           0        System page
           0        Transaction system page
           1        File Space Header
           0        Extent descriptor page
          64        BLOB page
           0        Compressed BLOB page
           0        Page compressed page
           0        Page compressed encrypted page
           0        Other type of page

    ...
  • Bug #98083 - "Restarting the computer when deleting the database will cause directory residues". One would expect that MySQL 8 with a data dictionary should have some means to figure out the remaining database directory for a dropped database upon startup (as it stores information about databases elsewhere) and do proper cleanup. I think this bug reported by Jinming Liao must be verified and fixed. There is no "... manual creation or deletion of tables or databases..." involved in this case.
  • Bug #98091 - "InnoDB does not initialize raw disk partitions". As simple as that and both 5.7.29 and 8.0.219 are surely affected. It was not always the case, I've used raw devices myself with older MySQL versions, so this bug reported by Saverio M is a regression. Still, "regression" tag is missing.
That's all bugs reported in December, 2019 that I cared to subscribe to  and mention here. Next time I'll check bugs reported in January, 2020. There are at least 16 in my list already, so stay tuned.

Follow the links in this post to get more details about profiling and creating off-CPU FlameGraphs for MySQL. This post is devoted to bugs, though :)


To summarize:
  1. I am happy to see bug reports from people whom I never noticed before. MySQL Community is alive.
  2. Some flexibility in following common sense based bugs verification procedures is still visible. Bugs reported for 5.7 are not checked on 8.0 (or the results of this check are not documented in public), nobody cares to read what bug reporter says carefully or go extra mile, "regression" tag not added, and so on. 
  3. Probably at this stage my writings are mostly ignored by Oracle's decision makers. But I keep watching them all anyway.

by Valerii Kravchuk (noreply@blogger.com) at February 08, 2020 05:25 PM

February 07, 2020

SeveralNines

How to Identify MySQL Performance Issues with Slow Queries

Performance issues are common problems when administering MySQL databases. Sometimes these problems are, in fact, due to slow queries. In this blog, we'll deal with slow queries and how to identify these.

Checking Your Slow Query Logs

MySQL has the capability to filter and log slow queries. There are various ways you can investigate these, but the most common and efficient way is to use the slow query logs. 

You need to determine first if your slow query logs are enabled. To deal with this, you can go to your server and query the following variable:

MariaDB [(none)]> show global variables like 'slow%log%';

+---------------------+-------------------------------+

| Variable_name       | Value           |

+---------------------+-------------------------------+

| slow_query_log      | ON           |

| slow_query_log_file | /var/log/mysql/mysql-slow.log |

+---------------------+-------------------------------+

2 rows in set (0.001 sec)

You must ensure that the variable slow_query_log is set to ON, while the slow_query_log_file determines the path where you need to place your slow query logs. If this variable is not set, it will use the DATA_DIR of your MySQL data directory.

Accompanied by the slow_query_log variable are the long_query_time and min_examined_row_limit which impacts how the slow query logging works. Basically, the slow query logs work as SQL statements that take more than long_query_time seconds to execute and also require at least min_examined_row_limit rows to be examined. It can be used to find queries that take a long time to execute and are therefore candidates for optimization and then you can use external tools to bring the report for you, which will talk later.

By default, administrative statements (ALTER TABLE, ANALYZE TABLE, CHECK TABLE, CREATE INDEX, DROP INDEX, OPTIMIZE TABLE, and REPAIR TABLE) do not fall into slow query logs. In order to do this, you need to enable variable log_slow_admin_statements

Querying Process List and InnoDB Status Monitor

In a normal DBA routine, this step is the most common way to determine the long running queries or active running queries that causes performance degradation. It might even cause your server to be stuck followed by piled up queues that are slowly increasing due to a lock covered by a running query. You can just simply run,

SHOW [FULL] PROCESSLIST;

or

SHOW ENGINE INNODB STATUS \G

If you are using ClusterControl, you can find it by using <select your MySQL cluster> → Performance → InnoDB Status just like below,

or using <select your MySQL cluster> → Query Monitor → Running Queries (which will discuss later) to view the active processes, just like how a SHOW PROCESSLIST works but with better control of the queries.

Analyzing MySQL Queries

The slow query logs will show you a list of  queries that have been identified as slow, based on the given values in the system variables as mentioned earlier. The slow queries definition might differ in different cases since there are certain occasions that even a 10 second query is acceptable and still not slow. However, if your application is an OLTP, it's very common that a 10 second or even a 5 second query is an issue or causes performance degradation to your database. MySQL query log does help you this but it's not enough to open the log file as it does not provide you an overview of what are those queries, how they perform, and what are the frequency of their occurrence. Hence, third party tools can help you with these.

pt-query-digest

Using Percona Toolkit, which I can say the most common DBA tool, is to use pt-query-digest. pt-query-digest provides you a clean overview of a specific report coming from your slow query log. For example, this specific report shows a clean perspective of understanding the slow query reports in a specific node:

# A software update is available:



# 100ms user time, 100ms system time, 29.12M rss, 242.41M vsz

# Current date: Mon Feb  3 20:26:11 2020

# Hostname: testnode7

# Files: /var/log/mysql/mysql-slow.log

# Overall: 24 total, 14 unique, 0.00 QPS, 0.02x concurrency ______________

# Time range: 2019-12-12T10:01:16 to 2019-12-12T15:31:46

# Attribute          total min max     avg 95% stddev median

# ============     ======= ======= ======= ======= ======= ======= =======

# Exec time           345s 1s 98s   14s 30s 19s 7s

# Lock time             1s 0 1s 58ms    24ms 252ms 786us

# Rows sent          5.72M 0 1.91M 244.14k   1.86M 629.44k 0

# Rows examine      15.26M 0 1.91M 651.23k   1.86M 710.58k 961.27k

# Rows affecte       9.54M 0 1.91M 406.90k 961.27k 546.96k       0

# Bytes sent       305.81M 11 124.83M  12.74M 87.73M 33.48M 56.92

# Query size         1.20k 25 244   51.17 59.77 40.60 38.53



# Profile

# Rank Query ID                         Response time Calls R/Call V/M   

# ==== ================================ ============= ===== ======= ===== 

#    1 0x00C8412332B2795DADF0E55C163... 98.0337 28.4%     1 98.0337 0.00 UPDATE sbtest?

#    2 0xDEF289292EA9B2602DC12F70C7A... 74.1314 21.5%     3 24.7105 6.34 ALTER TABLE sbtest? sbtest3

#    3 0x148D575F62575A20AB9E67E41C3... 37.3039 10.8%     6 6.2173 0.23 INSERT SELECT sbtest? sbtest

#    4 0xD76A930681F1B4CC9F748B4398B... 32.8019  9.5% 3 10.9340 4.24 SELECT sbtest?

#    5 0x7B9A47FF6967FD905289042DD3B... 20.6685  6.0% 1 20.6685 0.00 ALTER TABLE sbtest? sbtest3

#    6 0xD1834E96EEFF8AC871D51192D8F... 19.0787  5.5% 1 19.0787 0.00 CREATE

#    7 0x2112E77F825903ED18028C7EA76... 18.7133  5.4% 1 18.7133 0.00 ALTER TABLE sbtest? sbtest3

#    8 0xC37F2569578627487D948026820... 15.0177  4.3% 2 7.5088 0.00 INSERT SELECT sbtest? sbtest

#    9 0xDE43B2066A66AFA881D6D45C188... 13.7180  4.0% 1 13.7180 0.00 ALTER TABLE sbtest? sbtest3

# MISC 0xMISC                           15.8605 4.6% 5 3.1721 0.0 <5 ITEMS>



# Query 1: 0 QPS, 0x concurrency, ID 0x00C8412332B2795DADF0E55C1631626D at byte 5319

# Scores: V/M = 0.00

# Time range: all events occurred at 2019-12-12T13:23:15

# Attribute    pct total min     max avg 95% stddev  median

# ============ === ======= ======= ======= ======= ======= ======= =======

# Count          4 1

# Exec time     28 98s 98s     98s 98s 98s   0 98s

# Lock time      1 25ms 25ms    25ms 25ms 25ms       0 25ms

# Rows sent      0 0 0       0 0 0 0       0

# Rows examine  12 1.91M 1.91M   1.91M 1.91M 1.91M       0 1.91M

# Rows affecte  20 1.91M 1.91M   1.91M 1.91M 1.91M       0 1.91M

# Bytes sent     0 67 67      67 67 67   0 67

# Query size     7 89 89      89 89 89   0 89

# String:

# Databases    test

# Hosts        localhost

# Last errno   0

# Users        root

# Query_time distribution

#   1us

#  10us

# 100us

#   1ms

#  10ms

# 100ms

#    1s

#  10s+  ################################################################

# Tables

#    SHOW TABLE STATUS FROM `test` LIKE 'sbtest3'\G

#    SHOW CREATE TABLE `test`.`sbtest3`\G

update sbtest3 set c=substring(MD5(RAND()), -16), pad=substring(MD5(RAND()), -16) where 1\G

# Converted for EXPLAIN

# EXPLAIN /*!50100 PARTITIONS*/

select  c=substring(MD5(RAND()), -16), pad=substring(MD5(RAND()), -16) from sbtest3 where  1\G



# Query 2: 0.00 QPS, 0.01x concurrency, ID 0xDEF289292EA9B2602DC12F70C7A041A9 at byte 3775

# Scores: V/M = 6.34

# Time range: 2019-12-12T12:41:47 to 2019-12-12T15:25:14

# Attribute    pct total min     max avg 95% stddev  median

# ============ === ======= ======= ======= ======= ======= ======= =======

# Count         12 3

# Exec time     21 74s 6s     36s 25s 35s 13s     30s

# Lock time      0 13ms 1ms     8ms 4ms 8ms   3ms 3ms

# Rows sent      0 0 0       0 0 0 0       0

# Rows examine   0 0 0       0 0 0 0       0

# Rows affecte   0 0 0       0 0 0 0       0

# Bytes sent     0 144 44      50 48 49.17   3 49.17

# Query size     8 99 33      33 33 33   0 33

# String:

# Databases    test

# Hosts        localhost

# Last errno   0 (2/66%), 1317 (1/33%)

# Users        root

# Query_time distribution

#   1us

#  10us

# 100us

#   1ms

#  10ms

# 100ms

#    1s ################################

#  10s+  ################################################################

# Tables

#    SHOW TABLE STATUS FROM `test` LIKE 'sbtest3'\G

#    SHOW CREATE TABLE `test`.`sbtest3`\G

ALTER TABLE sbtest3 ENGINE=INNODB\G

Using performance_schema

Slow query logs might be an issue if you don't have direct access to the file such as using RDS or using fully-managed database services such Google Cloud SQL or Azure SQL. Although it might need you some variables to enable these features, it comes handy when querying for queries logged into your system. You can order it by using a standard SQL statement in order to retrieve a partial result. For example,

mysql> SELECT SCHEMA_NAME, DIGEST, DIGEST_TEXT, COUNT_STAR, SUM_TIMER_WAIT/1000000000000 SUM_TIMER_WAIT_SEC, MIN_TIMER_WAIT/1000000000000 MIN_TIMER_WAIT_SEC, AVG_TIMER_WAIT/1000000000000 AVG_TIMER_WAIT_SEC, MAX_TIMER_WAIT/1000000000000 MAX_TIMER_WAIT_SEC, SUM_LOCK_TIME/1000000000000 SUM_LOCK_TIME_SEC, FIRST_SEEN, LAST_SEEN FROM events_statements_summary_by_digest;

+--------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------------------+--------------------+--------------------+--------------------+-------------------+---------------------+---------------------+

| SCHEMA_NAME        | DIGEST               | DIGEST_TEXT                                                                                                                                                                                                                                                                                                                               | COUNT_STAR | SUM_TIMER_WAIT_SEC | MIN_TIMER_WAIT_SEC | AVG_TIMER_WAIT_SEC | MAX_TIMER_WAIT_SEC | SUM_LOCK_TIME_SEC | FIRST_SEEN | LAST_SEEN |

+--------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------------------+--------------------+--------------------+--------------------+-------------------+---------------------+---------------------+

| NULL               | 390669f3d1f72317dab6deb40322d119 | SELECT @@`skip_networking` , @@`skip_name_resolve` , @@`have_ssl` = ? , @@`ssl_key` , @@`ssl_ca` , @@`ssl_capath` , @@`ssl_cert` , @@`ssl_cipher` , @@`ssl_crl` , @@`ssl_crlpath` , @@`tls_version`                                                                                                                                                             | 1 | 0.0373 | 0.0373 | 0.0373 | 0.0373 | 0.0000 | 2020-02-03 20:22:54 | 2020-02-03 20:22:54 |

| NULL               | fba95d44e3d0a9802dd534c782314352 | SELECT `UNIX_TIMESTAMP` ( )                                                                                                                                                                                                                                                                                                                                     | 2 | 0.0002 | 0.0001 | 0.0001 | 0.0001 | 0.0000 | 2020-02-03 20:22:57 | 2020-02-03 20:23:00 |

| NULL               | 18c649da485456d6cdf12e4e6b0350e9 | SELECT @@GLOBAL . `SERVER_ID`                                                                                                                                                                                                                                                                                                                                   | 2 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0000 | 2020-02-03 20:22:57 | 2020-02-03 20:23:00 |

| NULL               | dd356b8a5a6ed0d7aee2abd939cdb6c9 | SET @? = ?                                                                                                                                                                                                                                                                                                                                                      | 6 | 0.0003 | 0.0000 | 0.0001 | 0.0001 | 0.0000 | 2020-02-03 20:22:57 | 2020-02-03 20:23:00 |

| NULL               | 1c5ae643e930af6d069845d74729760d | SET @? = @@GLOBAL . `binlog_checksum`                                                                                                                                                                                                                                                                                                                           | 2 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0000 | 2020-02-03 20:22:57 | 2020-02-03 20:23:00 |

| NULL               | ad5208ffa004a6ad7e26011b73cbfb4c | SELECT @?                                                                                                                                                                                                                                                                                                                                                       | 2 | 0.0001 | 0.0000 | 0.0000 | 0.0001 | 0.0000 | 2020-02-03 20:22:57 | 2020-02-03 20:23:00 |

| NULL               | ed0d1eb982c106d4231b816539652907 | SELECT @@GLOBAL . `GTID_MODE`                                                                                                                                                                                                                                                                                                                                   | 2 | 0.0001 | 0.0000 | 0.0000 | 0.0001 | 0.0000 | 2020-02-03 20:22:57 | 2020-02-03 20:23:00 |

| NULL               | cb47e22372fdd4441486b02c133fb94f | SELECT @@GLOBAL . `SERVER_UUID`                                                                                                                                                                                                                                                                                                                                 | 2 | 0.0001 | 0.0000 | 0.0000 | 0.0001 | 0.0000 | 2020-02-03 20:22:57 | 2020-02-03 20:23:00 |

| NULL               | 73301368c301db5d2e3db5626a21b647 | SELECT @@GLOBAL . `rpl_semi_sync_master_enabled`                                                                                                                                                                                                                                                                                                                | 2 | 0.0001 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 2020-02-03 20:22:57 | 2020-02-03 20:23:00 |

| NULL               | 0ff7375c5f076ba5c040e78a9250a659 | SELECT @@`version_comment` LIMIT ?                                                                                                                                                                                                                                                                                                                              | 1 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0000 | 2020-02-03 20:45:59 | 2020-02-03 20:45:59 |

| NULL               | 5820f411e67a393f987c6be5d81a011d | SHOW TABLES FROM `performance_schema`                                                                                                                                                                                                                                                                                                                           | 1 | 0.0008 | 0.0008 | 0.0008 | 0.0008 | 0.0002 | 2020-02-03 20:46:11 | 2020-02-03 20:46:11 |

| NULL               | a022a0ab966c51eb820da1521349c7ef | SELECT SCHEMA ( )                                                                                                                                                                                                                                                                                                                                               | 1 | 0.0005 | 0.0005 | 0.0005 | 0.0005 | 0.0000 | 2020-02-03 20:46:29 | 2020-02-03 20:46:29 |

| performance_schema | e4833a7c1365b0b4492e9a514f7b3bd4 | SHOW SCHEMAS                                                                                                                                                                                                                                                                                                                                                    | 1 | 0.1167 | 0.1167 | 0.1167 | 0.1167 | 0.0001 | 2020-02-03 20:46:29 | 2020-02-03 20:46:29 |

| performance_schema | 1107f048fe6d970cb6a553bd4727a1b4 | SHOW TABLES                                                                                                                                                                                                                                                                                                                                                     | 1 | 0.0002 | 0.0002 | 0.0002 | 0.0002 | 0.0000 | 2020-02-03 20:46:29 | 2020-02-03 20:46:29 |

...

You can use the table performance_schema.events_statements_summary_by_digest. Although there are chances that the entries on the tables from performance_schema will be flush, you can decide to save this in a specific table. Take a look at this external post from Percona MySQL query digest with Performance Schema

In case you're wondering why we need to divide the wait time columns (SUM_TIMER_WAIT, MIN_TIMER_WAIT_SEC, AVG_TIMER_WAIT_SEC), these columns are using picoseconds so you might need to do some math or some round ups to make it more readable to you.

Analyzing Slow Queries Using ClusterControl

If you are using ClusterControl, there are different ways to deal with this. For example, in a MariaDB Cluster I have below, it shows you the following tab (Query Monitor) and it's drop-down items (Top Queries, Running Queries, Query Outliers):

  • Top Queries -   aggregated list of all your top queries running on all the nodes of your database cluster
  • Running Queries - View current running queries on your database cluster similar to SHOW FULL PROCESSLIST command in MySQL
  • Query Outliers - Shows queries that are outliers. An outlier is a query taking longer time than the normal query of that type.

On top of that, ClusterControl also captures query performance using graphs which provides you a quick overlook of how your database system performs in relation to query performance. See below,

Wait, it's not over yet. ClusterControl also offers a high resolution metric using Prometheus and showcases very detailed metrics and captures real-time statistics from the server. We have discussed this in our previous blogs which are divided into two part series of blog. Check out part 1 and then the part 2 blogs. It offers you on how to efficiently monitor not only the slow queries but the overall performance of your MySQL, MariaDB, or Percona database servers. 

There are also other tools in ClusterControl which provide pointers and hints that can cause slow query performance even if it's not yet occurred or captured by the slow query log. Check out the Performance Tab as seen below,

these items provides you the following:

  • Overview - You can view graphs of different database counters on this page
  • Advisors - Lists of scheduled advisors’ results created in ClusterControl > Manage > Developer Studio using ClusterControl DSL.
  • DB Status - DB Status provides a quick overview of MySQL status across all your database nodes, similar to SHOW STATUS statement
  • DB Variables - DB Variables provide a quick overview of MySQL variables that are set across all your database nodes, similar to SHOW GLOBAL VARIABLES statement
  • DB Growth - Provides a summary of your database and table growth on daily basis for the last 30 days. 
  • InnoDB Status - Fetches the current InnoDB monitor output for selected host, similar to SHOW ENGINE INNODB STATUS command.
  • Schema Analyzer - Analyzes your database schemas for missing primary keys, redundant indexes and tables using the MyISAM storage engine. 
  • Transaction Log - Lists out long-running transactions and deadlocks across database cluster where you can easily view what transactions are causing the deadlocks. The default query time threshold is 30 seconds.

Conclusion

Tracing your MySQL Performance issue is not really difficult with MySQL. There are various external tools that provide you the efficiency and capabilities that you are looking for. The most important thing is that, it's easy to use and you are able to provide productivity at work. Fix the most outstanding issues or even avoid a certain disaster before it can happen.

by Paul Namuag at February 07, 2020 08:53 PM

February 06, 2020

SeveralNines

My MySQL Database is Out of Disk Space

When the MySQL server ran out of disk space, you would see one of the following error in your application (as well as in the MySQL error log):

ERROR 3 (HY000) at line 1: Error writing file '/tmp/AY0Wn7vA' (Errcode: 28 - No space left on device)

For binary log:

[ERROR] [MY-000035] [Server] Disk is full writing './binlog.000019' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.

For relay log:

[ERROR] [MY-000035] [Server] Disk is full writing './relay-bin.000007' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.

For slow query log:

[ERROR] [MY-011263] [Server] Could not use /var/log/mysql/mysql-slow.log for logging (error 28 - No space left on device). Turning logging off for the server process. To turn it on again: fix the cause, then either restart the query logging by using "SET GLOBAL SLOW_QUERY_LOG=ON" or restart the MySQL server.

For InnoDB:

[ERROR] [MY-012144] [InnoDB] posix_fallocate(): Failed to preallocate data for file ./#innodb_temp/temp_8.ibt, desired size 16384 bytes. Operating system error number 28. Check that the disk is not full or a disk quota exceeded. Make sure the file system supports this function. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/8.0/en/operating-system-error-codes.html
[Warning] [MY-012638] [InnoDB] Retry attempts for writing partial data failed.
[ERROR] [MY-012639] [InnoDB] Write to file ./#innodb_temp/temp_8.ibt failed at offset 81920, 16384 bytes should have been written, only 0 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
[ERROR] [MY-012640] [InnoDB] Error number 28 means 'No space left on device'
[Warning] [MY-012145] [InnoDB] Error while writing 16384 zeroes to ./#

They are all reporting the same error code number which is 28. Alternatively, we can use the error code to see the actual error with perror command:

$ perror 28
OS error code  28: No space left on device

The above simply means the MySQL server is out of disk space, and most of the time MySQL is stopped or stalled at this point. In this blog post, we are going to look into ways to solve this issue for MySQL running in a Linux-based environment.

Troubleshooting

First of all, we have to determine which disk partition is full. MySQL can be configured to store data on different disk or partition. Look at the path as stated in the error to start with. In this example, our directory is located in the default location, /var/lib/mysql which is under the / partition. We can use df command and specify the full path to the datadir to get the partition the data is stored:

$ df -h /var/lib/mysql
Filesystem      Size Used Avail Use% Mounted on
/dev/sda1        40G 40G 20K 100% /

The above means we have to clear up some space in the root partition.

Temporary Workarounds

The temporary workaround is to clear up some disk space so MySQL can write to the disk and resume the operation. Things that we can do if we face this kind of problems are:

  • Remove unnecessary files
  • Purge binary logs
  • Drop old tables, or rebuild a very big table

Remove Unnecessary Files

This is commonly the first step to do if MySQL server is down or unresponsive, or you have no binary logs enabled. For example, files under /var/log/ are commonly the first place to look for unnecessary files:

$ cd /var/log
$ find . -type f -size +5M -exec du -sh {} +
8.1M ./audit/audit.log.6
8.1M ./audit/audit.log.5
8.1M ./audit/audit.log.4
8.1M ./audit/audit.log.3
8.1M ./audit/audit.log.2
8.1M ./audit/audit.log.1
11M ./audit/audit.log
8.5M ./secure-20190429
8.0M ./wtmp

The above example shows how to retrieve files that are bigger than 5MB. We can safely remove the rotated log files which are usually in {filename}.{number} format, for example audit.log.1 until audit.log.6. The same thing goes to any huge older backups that are stored in the server. If you had performed a restoration via Percona Xtrabackup or MariaDB Backup, all files prefixed with xtrabackup_ can be removed from the MySQL datadir, as they are no longer necessary for the restoration. The xtrabackup_logfile usually is the biggest file since it contains all transactions executed while the xtrabackup process copying the datadir to the destination. The following example shows all the related files in MySQL datadir:

$ ls -lah /var/lib/mysql | grep xtrabackup_
-rw-r-----.  1 mysql root   286 Feb 4 11:30 xtrabackup_binlog_info
-rw-r--r--.  1 mysql root    24 Feb 4 11:31 xtrabackup_binlog_pos_innodb
-rw-r-----.  1 mysql root    83 Feb 4 11:31 xtrabackup_checkpoints
-rw-r-----.  1 mysql root   808 Feb 4 11:30 xtrabackup_info
-rw-r-----.  1 mysql root  179M Feb 4 11:31 xtrabackup_logfile
-rw-r--r--.  1 mysql root     1 Feb 4 11:31 xtrabackup_master_key_id
-rw-r-----.  1 mysql root   248 Feb 4 11:31 xtrabackup_tablespaces

Therefore, the mentioned files are safe to be deleted. Start MySQL service once there is at least 10% more free space.

Purge the Binary Logs

If the MySQL server is still responsive and it has binary log enabled, e.g, for replication or point-in-time recovery, we can purge the old binary log files by using PURGE statement and provide the interval. In this example, we are deleting all binary logs before 3 days ago:

mysql> SHOW BINARY LOGS;
mysql> PURGE BINARY LOGS BEFORE DATE(NOW() - INTERVAL 3 DAY);
mysql> SHOW BINARY LOGS;

For MySQL Replication, it's safe to delete all logs that have been replicated and applied on slaves. Check the Relay_Master_Log_File value on the server:

mysql> SHOW SLAVE STATUS\G
...
        Relay_Master_Log_File: binlog.000008
...

And delete the older log files for example binlog.000007 and older. It's good practice to restart MySQL server to make sure that it has enough resources. We can also let the binary log rotation to happen automatically via expire_logs_days variable (<MySQL 8.0). For example, to keep only 3 days of binary logs, run the following statement:

mysql> SET GLOBAL expire_logs_days = 3;

Then, add the following line into MySQL configuration file under [mysqld] section:

expire_logs_days=3

In MySQL 8.0, use binlog_expire_logs_seconds instead, where the default value is 2592000 seconds (30 days). In this example, we reduce it to only 3 days (60 seconds x 60 minutes x 24 hours x 3 days):

mysql> SET GLOBAL binlog_expire_logs_seconds = (60*60*24*3);
mysql> SET PERSIST binlog_expire_logs_seconds = (60*60*24*3);

SET PERSIST will make sure the configuration is loaded in the next restart. Configuration set by this command is stored inside /var/lib/mysql/mysqld-auto.cnf.

Drop Old Tables / Rebuild Tables

Note that DELETE operation won't free up the disk space unless OPTIMIZE TABLE is executed afterward. Thus, if you have deleted many rows, and you would like to return the free space back to the OS after a huge DELETE operation, run the OPTIMIZE TABLE, or rebuild it. For example:

mysql> DELETE tbl_name WHERE id < 100000; -- remove 100K rows
mysql> OPTIMIZE TABLE tbl_name;

We can also force to rebuild a table by using ALTER statement:

mysql> ALTER TABLE tbl_name FORCE;
mysql> ALTER TABLE tbl_name; -- a.k.a "null" rebuild

Note that the above DDL operation is performed via online DDL, meaning MySQL permits concurrent DML operations while the rebuilding is ongoing. Another way to perform a defragmentation operation is to use mysqldump to dump the table to a text file, drop the table, and reload it from the dump file. Ultimately, we can also use DROP TABLE to remove the unused table or TRUNCATE TABLE to clear up all rows in the table, which consequently return the space back to the OS.

Permanent Solutions to Disk Space Issues

The permanent solution is of course adding more space to the corresponding disk or partition, or applying a shorter retention rule to keep unnecessary files in the server. If you are running on top of a scalable file storage system, you should be able to scale the resource up without too much hassle, or with minimal disruption and downtime to the MySQL service. To learn more on how to dimension your storage and understand MySQL and MariaDB capacity planning, check out this blog post.

You can be least worried with ClusterControl proactive monitoring, where you would get a warning notification when the disk space has reached 80%, and critical notification if the disk usage is 90% and higher.

by ashraf at February 06, 2020 08:19 PM

February 05, 2020

SeveralNines

Is My Database Vulnerable to Attack? A Security Checklist

Data is probably the most important asset in a company, so you should make sure your database is secured to avoid any possible data theft. It’s hard to create an environment that is 100% secure, but in this blog we’ll share a checklist to help you make your database as secure as possible.

Controlling Database Access

You should always restrict both physical and remote access.

  • Physical access (on-prem): Restrict unauthorized physical access to the database server.
  • Remote access: Limit the remote access to only the necessary people, and from the less amount of source possibles. Using a VPN to access it is definitely a must here.

Managing Database User Accounts

Depending on the technology, there are many ways to improve security for your user accounts.

  • Remove inactive users.
  • Grant only the necessary privileges.
  • Restrict the source for each user connection.
  • Define a secure password policy (or, depending on the technology, enable a plugin for this if there is one).

Secure Installations and Configurations

There are some changes to do to secure your database installation.

  • Install only the necessary packages and services on the server.
  • Change the default admin user password and restrict the usage from only the localhost.
  • Change the default port and specify the interface to listen in.
  • Enable password security policy plugin.
  • Configure SSL certificates to encrypt data in-transit.
  • Encrypt data at-rest (if it’s possible).
  • Configure the local firewall to allow access to the database port only from the local network (if it’s possible).

Employ a WAF to Avoid SQL Injections or DoS attack (Denial of Service)

These are the most common attacks to a database, and the most secure way to avoid it is by using a WAF (Web Application Firewall) to catch this kind of SQL queries or a SQL Proxy to analyze the traffic.

Keep Your OS and Database Up-to-Date

There are several fixes and improvements that the database vendor or the operating system release in order to fix or avoid vulnerabilities. It’s important to keep your system as up-to-date as possible applying patches and security upgrades.

Check CVE (Common Vulnerabilities and Exposures) Frequently

Every day, new vulnerabilities are detected for your database server. You should check it frequently to know if you need to apply a patch or change something in your configuration. One way to know it is by reviewing the CVE website, where you can find a list of vulnerabilities with a description, and you can look for your database version and vendor, to confirm if there is something critical to fix ASAP.

Conclusion

Following the tips above, your server will be safer, but unfortunately, there is always a risk of being hacked.

To minimize this risk, you should have a good monitoring system like ClusterControl, and run periodically some security scan tool looking for vulnerabilities like Nessus.

by Sebastian Insausti at February 05, 2020 07:40 PM

Percona

Observability Differences Between MySQL 8 and MariaDB 10.4

mysql mariadb observability

mysql mariadb observabilityI did a MariaDB Observability talk at MariaDB Day in Brussels, which  I roughly based on the MySQL 8 Observability talk I gave earlier in the year. This process pushed me to contrast MySQL and MariaDB observability.

In summary, there are a lot of differences that have accumulated through the years; a lot more than I expected.  Here are some highlights.

SHOW STATUS and SHOW VARIABLES

If you want to access SHOW [GLOBAL] STATUS output through tables, they have been moved to performance_schema in MySQL 8 but they are in  information_schema in MariaDB 10.4, meaning you need to use different queries.

mysql> select * from performance_schema.global_status where variable_name='questions';
+---------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+---------------+----------------+
| Questions     | 401146958      |
+---------------+----------------+
1 row in set (0.00 sec)

MariaDB [(none)]> select * from information_schema.global_status where variable_name='questions';
+---------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+---------------+----------------+
| QUESTIONS     | 21263834       |
+---------------+----------------+
1 row in set (0.002 sec)

The other difference you may notice is how VARIABLE_NAME is capitalized. It is all capitals for MariaDB and leading capital in MySQL, which can be a problem if you store data in a case-sensitive datastore.

The same applies to SHOW VARIABLES tables which are exposed as information_schema.global_variables in MariaDB 10.4 and performance_schema.global_variables in MySQL 8.

MariaDB 10.4 also exposes more variables in the SHOW STATUS (542) while in the current version of MySQL 8 it is less than 500.

INFORMATION_SCHEMA

Besides the location of the named tables, there are a lot of other differences in INFORMATION_SCHEMA.  For example, MariaDB 10.4 has INNODB_MUTEXES to expose “SHOW ENGINE INNODB MUTEX” in a table format which is easier to extract and report rather than parsing strings.  MySQL 8 does not have an INFORMATION_SCHEMA.INNODB_MUTEXES table.

MariaDB [information_schema]> select * from innodb_mutexes;
+------+-------------+-------------+----------+
| NAME | CREATE_FILE | CREATE_LINE | OS_WAITS |
+------+-------------+-------------+----------+
|      | log0log.cc  |         578 |        1 |
|      | btr0sea.cc  |         243 |      232 |
+------+-------------+-------------+----------+
2 rows in set (0.008 sec)

Another example of the tables that MariaDB 10.4 provides is current InnoDB Semaphore waits as INNODB_SYS_SEMAPHORE_WAITS  or  USER_VARIABLES to show currently set User Variables:

MariaDB [information_schema]> select * from user_variables;
+---------------+----------------+---------------+--------------------+
| VARIABLE_NAME | VARIABLE_VALUE | VARIABLE_TYPE | CHARACTER_SET_NAME |
+---------------+----------------+---------------+--------------------+
| a             | 2              | INT           | utf8               |
+---------------+----------------+---------------+--------------------+
1 row in set (0.001 sec)

MySQL 8 does not have this particular table but provides similar functionality via the USER_VARIABLES_BY_THREAD table in PERFORMANCE_SCHEMA.

mysql> select *  from performance_schema.user_variables_by_thread;
+-----------+---------------+----------------+
| THREAD_ID | VARIABLE_NAME | VARIABLE_VALUE |
+-----------+---------------+----------------+
|    202312 | a             | 2              |
+-----------+---------------+----------------+
1 row in set (0.00 sec)

Note that quite different information is provided in those tables!

There is also a lot of difference in what is available from the MariaDB 10.4 processlist table. Most significantly, you can discover how many rows were accessed (EXAMINED_ROWS) as well as the memory used by the query:

MariaDB [performance_schema]> select * from information_schema.processlist \G
*************************** 1. row ***************************
             ID: 118
           USER: root
           HOST: localhost
             DB: performance_schema
        COMMAND: Query
           TIME: 0
          STATE: Filling schema table
           INFO: select * from information_schema.processlist
        TIME_MS: 0.696
          STAGE: 0
      MAX_STAGE: 0
       PROGRESS: 0.000
    MEMORY_USED: 106592
MAX_MEMORY_USED: 2267712
  EXAMINED_ROWS: 0
       QUERY_ID: 21264066
    INFO_BINARY: select * from information_schema.processlist
            TID: 9977

Compare this to MySQL 8:

mysql> select * from information_schema.processlist \G
*************************** 1. row ***************************
           ID: 202266
         USER: root
         HOST: localhost
           DB: performance_schema
      COMMAND: Query
         TIME: 0
        STATE: executing
         INFO: select * from information_schema.processlist

I like how MariaDB adds a couple of practical fields here which are available simply and efficiently.  MySQL provides much more extended sys.processlist table as part of SYS_SCHEMA (driven by data from Performance Schema), but it is a lot more difficult to query.

mysql> select * from sys.processlist \G
*************************** 13. row ***************************
                thd_id: 202312
               conn_id: 202266
                  user: root@localhost
                    db: performance_schema
               command: Query
                 state: NULL
                  time: 0
     current_statement: select * from sys.processlist
     statement_latency: 83.48 ms
              progress: NULL
          lock_latency: 789.00 us
         rows_examined: 0
             rows_sent: 0
         rows_affected: 0
            tmp_tables: 4
       tmp_disk_tables: 0
             full_scan: YES
        last_statement: NULL
last_statement_latency: NULL
        current_memory: 1.38 MiB
             last_wait: NULL
     last_wait_latency: NULL
                source: NULL
           trx_latency: 82.71 ms
             trx_state: ACTIVE
        trx_autocommit: YES
                   pid: 24746
          program_name: mysql

There are many more differences than outlined above, so take it as an example of what amount of information available through INFORMATION_SCHEMA is substantially different in MySQL 8 and MariaDB 10.4, not as a complete list.

PERFORMANCE_SCHEMA

MySQL 8 is focused on observability through Performance Schema which is where all the new information is being exposed in a consistent manner.  MariaDB 10.4 does not place as high a value on Performance Schema.

Also, MySQL 8 has Performance Schema enabled by default while MariaDB 10.4 has it disabled. It also is missing a lot of instrumentations added in later MySQL series and MariaDB Performance Schema looks similar to one in MySQL 5.6.

Performance Schema Tables in MySQL 8

mysql> show tables;
+------------------------------------------------------+
| Tables_in_performance_schema                         |
+------------------------------------------------------+
| accounts                                             |
| cond_instances                                       |
| data_lock_waits                                      |
| data_locks                                           |
| events_errors_summary_by_account_by_error            |
| events_errors_summary_by_host_by_error               |
| events_errors_summary_by_thread_by_error             |
| events_errors_summary_by_user_by_error               |
| events_errors_summary_global_by_error                |
| events_stages_current                                |
| events_stages_history                                |
| events_stages_history_long                           |
| events_stages_summary_by_account_by_event_name       |
| events_stages_summary_by_host_by_event_name          |
| events_stages_summary_by_thread_by_event_name        |
| events_stages_summary_by_user_by_event_name          |
| events_stages_summary_global_by_event_name           |
| events_statements_current                            |
| events_statements_histogram_by_digest                |
| events_statements_histogram_global                   |
| events_statements_history                            |
| events_statements_history_long                       |
| events_statements_summary_by_account_by_event_name   |
| events_statements_summary_by_digest                  |
| events_statements_summary_by_host_by_event_name      |
| events_statements_summary_by_program                 |
| events_statements_summary_by_thread_by_event_name    |
| events_statements_summary_by_user_by_event_name      |
| events_statements_summary_global_by_event_name       |
| events_transactions_current                          |
| events_transactions_history                          |
| events_transactions_history_long                     |
| events_transactions_summary_by_account_by_event_name |
| events_transactions_summary_by_host_by_event_name    |
| events_transactions_summary_by_thread_by_event_name  |
| events_transactions_summary_by_user_by_event_name    |
| events_transactions_summary_global_by_event_name     |
| events_waits_current                                 |
| events_waits_history                                 |
| events_waits_history_long                            |
| events_waits_summary_by_account_by_event_name        |
| events_waits_summary_by_host_by_event_name           |
| events_waits_summary_by_instance                     |
| events_waits_summary_by_thread_by_event_name         |
| events_waits_summary_by_user_by_event_name           |
| events_waits_summary_global_by_event_name            |
| file_instances                                       |
| file_summary_by_event_name                           |
| file_summary_by_instance                             |
| global_status                                        |
| global_variables                                     |
| host_cache                                           |
| hosts                                                |
| keyring_keys                                         |
| log_status                                           |
| memory_summary_by_account_by_event_name              |
| memory_summary_by_host_by_event_name                 |
| memory_summary_by_thread_by_event_name               |
| memory_summary_by_user_by_event_name                 |
| memory_summary_global_by_event_name                  |
| metadata_locks                                       |
| mutex_instances                                      |
| objects_summary_global_by_type                       |
| performance_timers                                   |
| persisted_variables                                  |
| prepared_statements_instances                        |
| replication_applier_configuration                    |
| replication_applier_filters                          |
| replication_applier_global_filters                   |
| replication_applier_status                           |
| replication_applier_status_by_coordinator            |
| replication_applier_status_by_worker                 |
| replication_connection_configuration                 |
| replication_connection_status                        |
| replication_group_member_stats                       |
| replication_group_members                            |
| rwlock_instances                                     |
| session_account_connect_attrs                        |
| session_connect_attrs                                |
| session_status                                       |
| session_variables                                    |
| setup_actors                                         |
| setup_consumers                                      |
| setup_instruments                                    |
| setup_objects                                        |
| setup_threads                                        |
| socket_instances                                     |
| socket_summary_by_event_name                         |
| socket_summary_by_instance                           |
| status_by_account                                    |
| status_by_host                                       |
| status_by_thread                                     |
| status_by_user                                       |
| table_handles                                        |
| table_io_waits_summary_by_index_usage                |
| table_io_waits_summary_by_table                      |
| table_lock_waits_summary_by_table                    |
| threads                                              |
| user_defined_functions                               |
| user_variables_by_thread                             |
| users                                                |
| variables_by_thread                                  |
| variables_info                                       |
+------------------------------------------------------+
103 rows in set (0.01 sec)

Performance Schema Tables in MariaDB 10.4

MariaDB [performance_schema]> show tables;
+----------------------------------------------------+
| Tables_in_performance_schema                       |
+----------------------------------------------------+
| accounts                                           |
| cond_instances                                     |
| events_stages_current                              |
| events_stages_history                              |
| events_stages_history_long                         |
| events_stages_summary_by_account_by_event_name     |
| events_stages_summary_by_host_by_event_name        |
| events_stages_summary_by_thread_by_event_name      |
| events_stages_summary_by_user_by_event_name        |
| events_stages_summary_global_by_event_name         |
| events_statements_current                          |
| events_statements_history                          |
| events_statements_history_long                     |
| events_statements_summary_by_account_by_event_name |
| events_statements_summary_by_digest                |
| events_statements_summary_by_host_by_event_name    |
| events_statements_summary_by_thread_by_event_name  |
| events_statements_summary_by_user_by_event_name    |
| events_statements_summary_global_by_event_name     |
| events_waits_current                               |
| events_waits_history                               |
| events_waits_history_long                          |
| events_waits_summary_by_account_by_event_name      |
| events_waits_summary_by_host_by_event_name         |
| events_waits_summary_by_instance                   |
| events_waits_summary_by_thread_by_event_name       |
| events_waits_summary_by_user_by_event_name         |
| events_waits_summary_global_by_event_name          |
| file_instances                                     |
| file_summary_by_event_name                         |
| file_summary_by_instance                           |
| host_cache                                         |
| hosts                                              |
| mutex_instances                                    |
| objects_summary_global_by_type                     |
| performance_timers                                 |
| rwlock_instances                                   |
| session_account_connect_attrs                      |
| session_connect_attrs                              |
| setup_actors                                       |
| setup_consumers                                    |
| setup_instruments                                  |
| setup_objects                                      |
| setup_timers                                       |
| socket_instances                                   |
| socket_summary_by_event_name                       |
| socket_summary_by_instance                         |
| table_io_waits_summary_by_index_usage              |
| table_io_waits_summary_by_table                    |
| table_lock_waits_summary_by_table                  |
| threads                                            |
| users                                              |
+----------------------------------------------------+
52 rows in set (0.000 sec)

MariaDB also lacks “sys schema” shipped with a server, which means it does not provide a built-in interface to access Performance Schema data, which would make it easy and convenient for humans. In the end, for me, it all points to Performance Schema not being a priority for MariaDB.

SLOW QUERY LOG

Both MySQL 8  and MariaDB 10.4 support basic Slow Query Log.  When it comes to additional options, though, there is quite a divergence. MariaDB supports quite a few extended slow query logging options from Percona Server for MySQL, both for enhancing the data logged as well as for filtering. It also supports logging Query EXPLAIN Plan. On the other hand, MySQL 8 can log  additional information:

MariaDB 10.4 Slow Query Log (with Explain)

# Time: 200201 22:32:37
# User@Host: root[root] @ localhost []
# Thread_id: 113  Schema: sbtest  QC_hit: No
# Query_time: 0.000220  Lock_time: 0.000091  Rows_sent: 1  Rows_examined: 1
# Rows_affected: 0  Bytes_sent: 190
#
# explain: id   select_type     table   type    possible_keys   key     key_len ref     rows    r_rowsfiltered r_filtered      Extra
# explain: 1    SIMPLE  sbtest1 const   PRIMARY PRIMARY 4       const   1       NULL    100.00  NULL
#
SET timestamp=1580596357;
SELECT c FROM sbtest1 WHERE id=101985;

MySQL 8 Slow Query Log with Extended Metrics

# Time: 2019-06-14T14:14:22.980797Z
# User@Host: root[root] @ localhost []  Id:     8
# Query_time: 0.005342  Lock_time: 0.000451 Rows_sent: 33  Rows_examined: 197 Thread_id: 8 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 664 Read_first: 1 Read_last: 0 Read_key: 71 Read_next: 127 Read_prev: 0 Read_rnd: 33 Read_rnd_next:
34 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 33 Sort_scan_count: 1 Created_tmp_disk_tables: 0 Created_tmp_tables: 1 Start: 2019-06-14T14:14:22.975455Z
 End: 2019-06-14T14:14:22.980797Z
SET timestamp=1560521662;
show tables;

EXPLAIN

Both MySQL and MariaDB support the classic “Table” EXPLAIN output. Although, even in this output there may be format differences. This actually makes sense as optimizers in MySQL and MariaDB have different features and optimizations so it only makes sense the EXPLAIN outputs are different:

MySQL 8.0 EXPLAIN

mysql> explain select count(*) from sbtest1 s1,sbtest1 s2 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: s1
   partitions: NULL
         type: index
possible_keys: NULL
          key: k_1
      key_len: 4
          ref: NULL
         rows: 987292
     filtered: 100.00
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: s2
   partitions: NULL
         type: index
possible_keys: NULL
          key: k_1
      key_len: 4
          ref: NULL
         rows: 987292
     filtered: 100.00
        Extra: Using index
2 rows in set, 1 warning (0.00 sec)

 

MariaDB 10.4  EXPLAIN

MariaDB [sbtest]> explain select count(*) from sbtest1 s1,sbtest1 s2 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: s1
         type: index
possible_keys: NULL
          key: k_1
      key_len: 4
          ref: NULL
         rows: 986499
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: s2
         type: index
possible_keys: NULL
          key: k_1
      key_len: 4
          ref: NULL
         rows: 986499
        Extra: Using index; Using join buffer (flat, BNL join)
2 rows in set (0.001 sec)

Where things get more interesting though is advanced EXPLAIN features. If you want to explain running query you need to use SHOW EXPLAIN FOR <thread_id> in MariaDB but EXPLAIN FOR CONNECTION <connection_id> for MySQL. 

EXPLAIN FORMAT=JSON works both with MariaDB 10.4 and MySQL 8 but the output is so different you would surely need to handle it separately.

EXPLAIN FORMAT=TREE is only supported in MySQL 8.  It is a very new feature so it may appear in MariaDB sometime in the future. TREE format strives to provide an easier-to-read output, especially for users not familiar with MySQL query execution details or terminology.  For example, for this query it gives this output:

mysql> explain FORMAT=TREE select count(*) from sbtest1 s1,sbtest1 s2 \G
*************************** 1. row ***************************
EXPLAIN: -> Count rows in s1

1 row in set (0.00 sec)

This leaves a lot of questions unanswered but is very human-readable.

Finally, both MySQL and MariaDB allow you to Analyze (profile) the query to see how it is really executed. Both syntaxes for this feature and output are significantly different between MySQL 8 and MariaDB 10.4.

MySQL 8.0  EXPLAIN ANALYZE

mysql> explain analyze  select count(*) from sbtest1 where k>2 \G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)  (actual time=506.084..506.085 rows=1 loops=1)
    -> Filter: (sbtest1.k > 2)  (cost=99211.38 rows=493646) (actual time=0.037..431.186 rows=999997 loops=1)
        -> Index range scan on sbtest1 using k_1  (cost=99211.38 rows=493646) (actual time=0.035..312.929 rows=999997 loops=1)

1 row in set (0.51 sec)

MariaDB 10.4  ANALYZE

MariaDB [sbtest]> analyze select count(*) from sbtest1 where k>2 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: sbtest1
         type: range
possible_keys: k_1
          key: k_1
      key_len: 4
          ref: NULL
         rows: 493249
       r_rows: 999997.00
     filtered: 100.00
   r_filtered: 100.00
        Extra: Using where; Using index
1 row in set (0.365 sec)

Summary

I’ve been saying for a while now that “MariaDB is not MySQL” and you need to treat MySQL and MariaDB as separate databases.  It is even more important when you’re looking at observability functionality, as this space is where MySQL and MariaDB are unconstrained by SQL standards and can innovate as they like, which they really have been doing a lot of and diverging rapidly as a result.

by Peter Zaitsev at February 05, 2020 04:56 PM

February 04, 2020

SeveralNines

What to Check if the MySQL I/O Utilisation is High

The I/O performance is vital for MySQL databases. Data is read and written to the disk in numerous places. Redo logs, tablespaces, binary and relay logs. With an increase of the usage of solid state drives I/O performance has significantly increased allowing users to push their databases even faster but even then I/O may become a bottleneck and a limiting factor of the performance of the whole database. In this blog post we will take a look at the things you want to check if you notice your I/O performance is high on your MySQL instance.

What does “High” I/O utilisation mean? In short, if the performance of your database is affected by it, it is high. Typically you would notice it as writes slowing down in the database. It will also clearly manifest as high I/O wait on your system. Please keep in mind, though, on hosts with 32 and more CPU cores, even if one core will show 100% I/O wait, you may not notice it on a aggregated view - it will represent only 1/32 of the whole load. Seems not impacting but in fact some single-threaded I/O operation is saturating your CPU and some application is waiting for that I/O activity to finish.

Let’s say we did notice an increase in the I/O activity, just as in the screenshot above. What to look at if you noticed high I/O activity? First, check the list of the processes in the system. Which one is responsible for an I/O wait? You can use iotop to check that:

In our case it is quite clear that it is MySQL which is responsible for most of it. We should start with the simplest check - what exactly is running in the MySQL right now?

We can see there is replication activity on our slave. What is happening to the master?

We can clearly see some batch load job is running. This sort of ends our journey here as we managed to pinpoint the problem quite easily.

There are other cases, though, which may not be that easy to understand and track. MySQL comes with some instrumentation, which is intended to help with understanding the I/O activity in the system. As we mentioned, I/O can be generated in numerous places in the system. Writes are the most clear ones but we may also have on-disk temporary tables - it’s good to see if your queries do use such tables or not.

If you have performance_schema enabled, one way to check which files are responsible for the I/O load can be to query ‘table_io_waits_summary_by_table’:

*************************** 13. row ***************************

                FILE_NAME: /tmp/MYfd=68

               EVENT_NAME: wait/io/file/sql/io_cache

    OBJECT_INSTANCE_BEGIN: 140332382801216

               COUNT_STAR: 17208

           SUM_TIMER_WAIT: 23332563327000

           MIN_TIMER_WAIT: 1596000

           AVG_TIMER_WAIT: 1355913500

           MAX_TIMER_WAIT: 389600380500

               COUNT_READ: 10888

           SUM_TIMER_READ: 20108066180000

           MIN_TIMER_READ: 2798750

           AVG_TIMER_READ: 1846809750

           MAX_TIMER_READ: 389600380500

 SUM_NUMBER_OF_BYTES_READ: 377372793

              COUNT_WRITE: 6318

          SUM_TIMER_WRITE: 3224434875000

          MIN_TIMER_WRITE: 16699500

          AVG_TIMER_WRITE: 510356750

          MAX_TIMER_WRITE: 223219960500

SUM_NUMBER_OF_BYTES_WRITE: 414000000

               COUNT_MISC: 2

           SUM_TIMER_MISC: 62272000

           MIN_TIMER_MISC: 1596000

           AVG_TIMER_MISC: 31136000

           MAX_TIMER_MISC: 60676000

*************************** 14. row ***************************

                FILE_NAME: /tmp/Innodb Merge Temp File

               EVENT_NAME: wait/io/file/innodb/innodb_temp_file

    OBJECT_INSTANCE_BEGIN: 140332382780800

               COUNT_STAR: 1128

           SUM_TIMER_WAIT: 16465339114500

           MIN_TIMER_WAIT: 8490250

           AVG_TIMER_WAIT: 14596931750

           MAX_TIMER_WAIT: 583930037500

               COUNT_READ: 540

           SUM_TIMER_READ: 15103082275500

           MIN_TIMER_READ: 111663250

           AVG_TIMER_READ: 27968670750

           MAX_TIMER_READ: 583930037500

 SUM_NUMBER_OF_BYTES_READ: 566231040

              COUNT_WRITE: 540

          SUM_TIMER_WRITE: 1234847420750

          MIN_TIMER_WRITE: 286167500

          AVG_TIMER_WRITE: 2286754250

          MAX_TIMER_WRITE: 223758795000

SUM_NUMBER_OF_BYTES_WRITE: 566231040

               COUNT_MISC: 48

           SUM_TIMER_MISC: 127409418250

           MIN_TIMER_MISC: 8490250

           AVG_TIMER_MISC: 2654362750

           MAX_TIMER_MISC: 43409881500

As you can see above, it also shows temporary tables that are in use.

To double-check if a particular query uses temporary table you can use EXPLAIN FOR CONNECTION:

mysql> EXPLAIN FOR CONNECTION 3111\G

*************************** 1. row ***************************

           id: 1

  select_type: SIMPLE

        table: sbtest1

   partitions: NULL

         type: ALL

possible_keys: NULL

          key: NULL

      key_len: NULL

          ref: NULL

         rows: 986400

     filtered: 100.00

        Extra: Using temporary; Using filesort

1 row in set (0.16 sec)

On the example above a temporary table is used for filesort.

Another way of catching up disk activity is, if you happen to use Percona Server for MySQL, to enable full slow log verbosity:

mysql> SET GLOBAL log_slow_verbosity='full';

Query OK, 0 rows affected (0.00 sec)

Then, in the slow log, you may see entries like this:

# Time: 2020-01-31T12:05:29.190549Z

# User@Host: root[root] @ localhost []  Id: 12395

# Schema:   Last_errno: 0  Killed: 0

# Query_time: 43.260389  Lock_time: 0.031185 Rows_sent: 1000000  Rows_examined: 2000000 Rows_affected: 0

# Bytes_sent: 197889110  Tmp_tables: 0 Tmp_disk_tables: 0  Tmp_table_sizes: 0

# InnoDB_trx_id: 0

# Full_scan: Yes  Full_join: No Tmp_table: No  Tmp_table_on_disk: No

# Filesort: Yes  Filesort_on_disk: Yes  Merge_passes: 141

#   InnoDB_IO_r_ops: 9476  InnoDB_IO_r_bytes: 155254784  InnoDB_IO_r_wait: 5.304944

#   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.000000

#   InnoDB_pages_distinct: 8191

SET timestamp=1580472285;

SELECT * FROM sbtest.sbtest1 ORDER BY RAND();

As you can see, you can tell if there was a temporary table on disk or if the data was sorted on disk. You can also check the number of I/O operations and amount of data accessed.

We hope this blog post will help you understand the I/O activity in the system and let you manage it better.

 

by krzysztof at February 04, 2020 08:59 PM

February 03, 2020

SeveralNines

An Overview of Job Scheduling Tools for PostgreSQL

Unlike other database management systems that have their own built-in scheduler (like Oracle, MSSQL or MySQL), PostgreSQL still doesn’t have this kind of feature.

In order to provide scheduling functionality in PostgreSQL you will need to use an external tool like...

  • Linux crontab
  • Agent pgAgent
  • Extension pg_cron

In this blog we will explore these tools and highlight how to operate them and their main features.

Linux crontab

It’s the oldest one, however, an efficient and useful way to execute scheduling tasks. This program is based on a daemon (cron) that allows tasks to be automatically run in the background periodically and regularly verifies the configuration files ( called crontab files) on which are defined the script/command to be executed and its scheduling.

Each user can have his own crontab file and for the newest Ubuntu releases are located in: 

/var/spool/cron/crontabs (for other linux distributions the location could be different):

root@severalnines:/var/spool/cron/crontabs# ls -ltr

total 12

-rw------- 1 dbmaster crontab 1128 Jan 12 12:18 dbmaster

-rw------- 1 slonik   crontab 1126 Jan 12 12:22 slonik

-rw------- 1 nines    crontab 1125 Jan 12 12:23 nines

The syntax of the configuration file is the following:

mm hh dd mm day <<command or script to execute>>



mm: Minute(0-59)

hh: Hour(0-23)

dd: Day(1-31)

mm: Month(1-12)

day: Day of the week(0-7 [7 or 0 == Sunday])

A few operators could be used with this syntax to streamline the scheduling definition and these symbols allow to specify multiple values in a field:

Asterisk (*)  - it means all possible values for a field

The comma (,) - used to define a list of values

Dash (-) - used to define a range of values

Separator (/) - specifies a step value

The script all_db_backup.sh will be executed according each scheduling expression:

0 6 * * * /home/backup/all_db_backup.sh

At 6 am every day

20 22 * * Mon, Tue, Wed, Thu, Fri /home/backup/all_db_backup.sh

At 10:20 PM, every weekday

0 23 * * 1-5 /home/backup/all_db_backup.sh

At 11 pm during the week

0 0/5 14 * * /home/backup/all_db_backup.sh

Every five hours starting at 2:00 p.m. and ending at 2:55 p.m., every day

Although it’s not very difficult, this syntax can be automatically generated on multiple web pages.

If the crontab file doesn’t exist for a user it can be created by the following command:

slonik@severalnines:~$ crontab -e

or presented it using the -l parameter:

slonik@severalnines:~$ crontab -l

If necessary to remove this file, the appropriate parameter is -r:

slonik@severalnines:~$ crontab -r

The cron daemon status is shown by the execution of the following command:

Agent pgAgent

The pgAgent is a job scheduling agent available for PostgreSQL that allows the execution of stored procedures, SQL statements, and shell scripts. Its configuration is stored on the postgres database in the cluster.

The purpose is to have this agent running as a daemon on Linux systems and periodically does a connection to the database to check if there are any jobs to execute.

This scheduling is easily managed by PgAdmin 4, but it’s not installed by default once the pgAdmin installed, it’s necessary to download and install it on your own.

Hereafter are described all the necessary steps to have the pgAgent working properly:

Step One

Installation of pgAdmin 4

$ sudo apt install pgadmin4 pgadmin4-apache

Step Two

Creation of plpgsql procedural language if not defined

CREATE TRUSTED PROCEDURAL LANGUAGE ‘plpgsql’

     HANDLER plpgsql_call_handler

          HANDLER plpgsql_validator;

Step Three

Installation of  pgAgent

$ sudo apt-get install pgagent

Step Four

Creation of the pgagent extension

CREATE EXTENSION pageant 

This extension will create all the tables and functions for the pgAgent operation and hereafter is showed the data model used by this extension:

Now the pgAdmin interface already has the option “pgAgent Jobs” in order to manage the pgAgent: 

In order to define a new job, it’s only necessary select "Create" using the right button on “pgAgent Jobs”, and it’ll insert a designation for this job and define the steps to execute it:

In the tab “Schedules” must be defined the scheduling for this new job:

Finally, to have the agent running in the background it’s necessary to launch the following process manually:

/usr/bin/pgagent host=localhost dbname=postgres user=postgres port=5432 -l 1

Nevertheless, the best option for this agent is to create a daemon with the previous command.

Extension pg_cron

The pg_cron is a cron-based job scheduler for PostgreSQL that runs inside the database as an extension (similar to the DBMS_SCHEDULER in Oracle) and allows the execution of database tasks directly from the database, due to a background worker.

The tasks to perform can be any of the following ones:

  • stored procedures
  • SQL statements
  • PostgreSQL commands (as VACUUM, or VACUUM ANALYZE)

pg_cron can run several jobs in parallel, but only one instance of a program can be running at a time. 

If a second run should be started before the first one finishes, then it is queued and will be started as soon as the first run completes.

This extension was defined for the version 9.5 or higher of PostgreSQL.

Installation of pg_cron

The installation of this extension only requires the following command:

slonik@sveralnines:~$ sudo apt-get -y install postgresql-10-cron

Updating of Configuration Files

In order to start the pg_cron background worker once PostgreSQL server starts, it’s necessary to set pg_cron to shared_preload_libraries parameter in postgresql.conf: 

shared_preload_libraries = ‘pg_cron’

It’s also necessary to define in this file, the database on which the pg_cron extension will be created, by adding the following parameter:

cron.database_name= ‘postgres’

On the other hand, in pg_hba.conf file that manages the authentication, it’s necessary to define the postgres login as trust for the IPV4 connections, because pg_cron requires such user to be able to connect to the database without providing any password, so the following line needs to be added to this file:

host postgres postgres 192.168.100.53/32 trust

The trust method of authentication allows anyone to connect to the database(s) specified in the pg_hba.conf file, in this case the postgres database. It's a method used often to allow connection using Unix domain socket on a single user machine to access the database and should only be used when there isan  adequate operating system-level protection on connections to the server.

Both changes require a PostgreSQL service restart:

slonik@sveralnines:~$ sudo system restart postgresql.service

It’s important to take into account that pg_cron does not run any jobs as long as the server is in hot standby mode, but it automatically starts when the server is promoted.

Creation of pg_cron extension

This extension will create the meta-data and the procedures to manage it, so the following command should be executed on psql:

postgres=#CREATE EXTENSION pg_cron;

CREATE EXTENSION

Now, the needed objects to schedule jobs are already defined on the cron schema:

This extension is very simple, only the job table is enough to manage all this functionality:

Definition of New Jobs

The scheduling syntax to define jobs on pg_cron is the same one used on the cron tool, and the definition of new jobs is very simple, it’s only necessary to call the function cron.schedule:

select cron.schedule('*/5 * * * *','CALL reporting.p_compute_client_data(12356,''DAILY_DATA'');')

select cron.schedule('*/5 * * * *','CALL reporting.p_compute_client_data(998934,''WEEKLY_DATA'');')

select cron.schedule('*/5 * * * *','CALL reporting.p_compute_client_data(45678,''DAILY_DATA'');')

select cron.schedule('*/5 * * * *','CALL reporting.p_compute_client_data(1010,''WEEKLY_DATA'');')

select cron.schedule('*/5 * * * *','CALL reporting.p_compute_client_data(1001,''MONTHLY_DATA'');')

select cron.schedule('*/5 * * * *','select reporting.f_reset_client_data(0,''DATA'')')

select cron.schedule('*/5 * * * *','VACUUM')

select cron.schedule('*/5 * * * *','$$DELETE FROM reporting.rep_request WHERE create_dt<now()- interval '60 DAYS'$$)

The job setup is stored on the job table: 

Another way to define a job is by inserting the data directly on the cron.job table:

INSERT INTO cron.job (schedule, command, nodename, nodeport, database, username)

VALUES ('0 11 * * *','call loader.load_data();','postgresql-pgcron',5442,'staging', 'loader');

and use custom values for nodename and nodeport to connect to a different machine (as well as other databases).

Deactivation of a Jobs

On the other hand, to deactivate a job it’s only necessary to execute the following function:

select cron.schedule(8)

Jobs Logging

The logging of these jobs can be found on the PostgreSQL log file /var/log/postgresql/postgresql-12-main.log:

by Hugo Dias at February 03, 2020 08:01 PM

February 02, 2020

MariaDB Foundation

2020 MariaDB Day presentations

Our first MariaDB Day in Brussels is seeing some interesting presentations. Slides and videos are posted below.
This post will be updated as more slides and presentations become available. […]

The post 2020 MariaDB Day presentations appeared first on MariaDB.org.

by Ian Gilfillan at February 02, 2020 10:19 AM

January 30, 2020

Percona

Webinar 2/6: MySQL 8 vs. MariaDB 10.4

MySQL 8 vs. MariaDB 10.4

At the moment, MySQL 8 and MariaDB 10.4 are the latest versions of the corresponding database management systems. Each of these DBMS has a unique set of features. For example, specific MariaDB features might be unavailable in MySQL, and vice versa. In this presentation, we’ll cover these new features and provide recommendations regarding which will work best on which DBMS.

Please join Percona Senior Technical Manager Alkin Tezuysal on Thursday, February 6, 2020, at 9 am EST for his webinar “MySQL 8 vs MariaDB 10.4”.

View the Recording

If you can’t attend, sign up anyway and we’ll send you the slides and recording afterward.

by David Quilty at January 30, 2020 02:46 PM

SeveralNines

Managing Database Backup Retention Schedules

Attention: Skip reading this blog post if you can afford unlimited storage space. 

If you could afford unlimited storage space, you wouldn't have to worry about backup retention at all, since you could store your backups infinitely without any restriction, provided your storage space provider can assure the data won't be missing. Database backup retention is commonly overlooked because it doesn't seem important at first, and only comes into sheer attention when you have stumbled upon resource limit or hit a bottleneck.

In this blog post, we are going to look into database backup retention management and scheduling and how we can manage them efficiently with ClusterControl.

Database Backup Retention Policy

Database backup retention policy refers to how long the database backups are kept within our possession. Some examples would be:

  • daily backups for big databases are kept for one week using local storage, 
  • weekly backups for small databases are kept for eight weeks on disk storage (both local and remote),
  • monthly backups for all databases are kept for 3 months on cloud storage,
  • no backups are saved beyond 3 months.

The main advantage of having a database backup retention policy is to make sure we efficiently manage our storage resources, without impacting our database recovery process if something goes wrong. You don't want to get caught up when an urgent recovery is needed and the necessary backup file is no longer there to help us perform restoration because we had got it deleted to clear some space up.

To build a good backup retention policy, we need to consider the two most important aspects:

  • Backup storage size.
  • Database backup size.

Backup Storage Size

The first priority is to ensure that we have enough space to store our backups as a starter. A simple rule of thumb is the storage space must at least have the same size as the data directory size for the database server. Generally, the bigger the storage size, the bigger the cost is. If you can opt for a bigger storage space, you can keep older backups longer. This aspect hugely influences your retention policy in terms of the number of backups that you can store. 

Storing the backups off-site, in the cloud, can be a good way to secure your backups against disaster. It comes with a higher price per GB ratio but it's still affordable considering the advantages that you will get from it. Most of the cloud storage providers now offer a secure, scalable, highly available with decent IO performance. Either way, ClusterControl supports storing your backup in the local storage, remote storage or in the cloud.

Database Backup Size

The size of backups are directly affected by the following factors:

  • Backup tools - Physical backup is commonly bigger than logical backup.
  • Backup method - Incremental and partial backups are smaller than a full backup.
  • Compression ratio - Higher compression level produces smaller backup, with a tradeoff of processing power.

Mix and match these 3 factors will allow you to have a suitable backup size to fit into your backup storage and restoration policy. If storing a full backup is considered too big and costly, we can combine incremental backups with a full backup to have a backup for one particular set. Incremental backups commonly stores the delta between two points, and usually only takes a relatively small amount of disk space if compared to a full backup. Or, you can opt for a full partial backup, just backs up chosen databases or tables that can potentially impact the business operation.

If a full physical backup with a compression ratio of 50%, producing 100M of backup size, you could increase the compression ratio to 100% in order to reduce the disk space usage but with a slower backup creation time. Just make sure that you are complying with your database recovery policy when deciding which backup tools, method and compression ratio to use.

Managing Retention Schedules Using ClusterControl

ClusterControl sophisticated backup management features includes the retention management for all database backup methods when creating or scheduling a backup:

The default value is 31 days, which means the backup will be kept in possession for 31 days, and will be automatically deleted on the 32nd day after it was successfully created. The default retention value (in day) can be changed under Backup Settings. One can customize this value for every backup schedule or on-demand backup creation job to any number of days or keep it forever. ClusterControl also supports retention for backup that is stored in the supported cloud platforms (AWS S3, Google Cloud Storage and Azure Blob Storage).

When a backup is successfully created, you will see the retention period in the backup list, as highlighted in the following screenshot:

For backup purging process, ClusterControl triggers backup purge thread every single time after any backup process for that particular cluster is completed. The purge backup thread looks for all "expired" backups and performs the necessary deletion process automatically. The purging interval sounds a bit excessive in some environments but this is the best purging scheduling configuration that we have figured out for most configurations thus far.  To understand this easily, consider the following backup retention setting for a cluster:

  1. One creates a weekly backup, with a retention period of 14 days.
  2. One creates an hourly backup, with a retention period of 7 days.
  3. One creates a monthly backup, without a retention period (keep forever).

For the above configuration, ClusterControl will initiate a backup purge thread for (a) and (b) every hour because of (b), although the retention period for (a) is 14 days. Created backups that have been marked as "Keep Forever" (c) will be skipped by the purge thread. This configuration protects ClusterControl from excessive purging if the job is scheduled daily. Thus, don't be surprised if you see the following lines in job messages after any of the backup job is completed:

Advanced Retention Management with ClusterControl CLI

ClusterControl CLI a.k.a s9s, can be used to perform advanced retention management operations like deleting old backup files while keeping a number of copies exist for safety purpose. This can be very useful when you need to clear up some space and have no idea which backups that will be purged by ClusterControl, and you want to make sure that a number of copies of the backup must exist regardless of its expiration as a precaution. We can easily achieve this with the following command:

$ s9s backup \
--delete-old \
--cluster-id=4 \
--backup-retention=60 \
--cloud-retention=180 \
--safety-copies=3 \
--log

Deleting old backups.
Local backup retention is 60 day(s).
Cloud backup retention is 180 day(s).
Kept safety backup copies 3.
Querying records older than 60 day(s).
Checking for backups to purge.
No old backup records found, nothing to delete.
Checking for old backups is finished.

The above job will force ClusterControl to look for local backups that have been created which are older than 60 days and backups that are stored in the cloud which are older than 180 days. If ClusterControl finds something that matches this query, ClusterControl will make sure only the 4th copy and older will be deleted, regardless of the retention period.

The --backup-retention and --cloud-retention parameters accept a number of values:

  • A positive number  value can control how long (in days) the taken backups will be preserved.
  • -1 has a very special meaning, it means the backup will be kept forever.
  • 0 is the default, it means prefer the global setting which can be configured from the UI.

Apart from the above, the standard backup creation job can be triggered directly from the command line. The following command create a mysqldump backup for cluster ID 4 on node 192.168.1.24, where we will keep the backup forever:

$ s9s backup --create \
--backup-method=mysqldump \
--cluster-id=4 \
--nodes=192.168.1.24:3306 \
--backup-retention=-1 \
--log

192.168.1.24:3306: Preparing for backup - host state (MYSQL_OK) is acceptable.
192.168.1.24:3306: Verifying connectivity and credentials.
Checking backup creation job.
192.168.1.24:3306: Timezone of backup host is UTC.
Backup title is     ''.
Backup host is      192.168.1.24:3306.
Backup directory is /backups/production/mysqldump/.
Backup method is    mysqldump.
PITR compatible     no.
Backup record created.
Backup record saved.
192.168.1.24: Creating backup dir '/backups/production/mysqldump/BACKUPPERDB-190-mysqldump-2020-01-25_093526'.
Using gzip to compress archive.
192.168.1.24:3306: detected version 5.7.28-31-log.
Extra-arguments be passed to mysqldump:  --set-gtid-purged=OFF
Backup (mysqldump, storage node): '192.168.1.24: /usr/bin/mysqldump --defaults-file=/etc/my.cnf  --flush-privileges --hex-blob --opt --master-data=2 --single-transaction --skip-lock-tables --triggers --routines --events   --set-gtid-purged=OFF --databases mysql backupninja backupninja_doc proxydemo severalnines_prod severalnines_service --ignore-table='mysql.innodb_index_stats'  --ignore-table='mysql.innodb_table_stats' |gzip -c > /backups/production/mysqldump/BACKUPPERDB-190-mysqldump-2020-01-25_093526/mysqldump_2020-01-25_093546_dbdumpfile.sql.gz'.
192.168.1.24: MySQL >= 5.7.6 detected, enabling 'show_compatibility_56'
192.168.1.24: A progress message will be written every 1 minutes
192.168.1.24: Backup 190 completed and is stored in 192.168.1.24:/backups/production/mysqldump/BACKUPPERDB-190-mysqldump-2020-01-25_093526.
192.168.1.24:/backups/production/mysqldump/BACKUPPERDB-190-mysqldump-2020-01-25_093526: Custom retention period: never delete.
Checking for backup retention (clearing old backups).
Local backup retention is 31 day(s).
Cloud backup retention is 180 day(s).
Kept safety backup copies 1.
Querying records older than 31 day(s).
Checking for backups to purge.
Found 4 backups older than 31 day(s).
We have 9 completed full backups.

For more explanation and examples, check out the s9s backup guide.

Conclusion

ClusterControl backup retention management allows you to manage your backup storage space efficiently, without compromising your database recovery policy.

by ashraf at January 30, 2020 10:45 AM

Federico Razzoli

Practical advice for MySQL/MariaDB live migrations

Modifying table structures is sometimes necessary, or desirable. Modifying them online can be a pain, especially with big tables. Migrations should be ran properly in production.

The post Practical advice for MySQL/MariaDB live migrations appeared first on Federico Razzoli.

by Federico Razzoli at January 30, 2020 10:42 AM

January 29, 2020

MariaDB Foundation

MariaDB Day Brussels 02.02.2020 – Introducing speakers – Sveta Smirnova on How to Avoid Pitfalls in Schema Upgrade with Galera

Galera Cluster for MySQL is a 100% synchronized cluster in regards to data modification operations (DML). It is ensured by the optimistic locking model and ability to rollback a transaction, which cannot be applied on all nodes. […]

The post MariaDB Day Brussels 02.02.2020 – Introducing speakers – Sveta Smirnova on How to Avoid Pitfalls in Schema Upgrade with Galera appeared first on MariaDB.org.

by Anna Widenius at January 29, 2020 03:59 PM

SeveralNines

What to Monitor in MySQL 8.0

Monitoring is a must in all environments, and databases aren’t the exception. Once you have your database infrastructure up-and-running, you’ll need to keep tabs on what’s happening. Monitoring is a must if you want to be sure everything is going fine but also if you make necessary adjustments while your system grows and evolves. That will enable you to identify trends, plan for upgrades or improvements, or react adequately to any problems or errors that may arise with new versions, different purposes, and so on.

For each database technology, there are different things to monitor. Some of these are specific to the database engine, vendor, or even the particular version that you’re using. Database clusters heavily depend on the underlying infrastructure, so network and operating stats are interesting to see by the database administrators too. 

When running multiple database systems, the monitoring of these systems can become quite a chore. 

In this blog, we’ll take a look at what you need to monitor a MySQL 8.0 environment. We will also take a look at cluster control monitoring features, which may help you to track the health of your databases for free.

OS and Database System Monitoring

When observing a database cluster or node, there are two main points to take into account: the operating system and the MySQL instance itself. You will need to define which metrics you are going to monitor from both sides and how you are going to do it. You need to follow the parameter always in the meaning of your system, and you should look for alterations on the behavior model.

Grip in mind that when one of your parameters is affected, it can also affect others, making troubleshooting of the issue more complicated. Having a proper monitoring and alerting system is essential to make this task as simple as possible.

In most cases, you will need to use some tools, as it is difficult to find one to cover all the wanted metrics. 

OS System Monitoring

One major thing (which is common to all database engines and even to all systems) is to monitor the Operating System behavior. Here are some points to check here. Below you can find top system resources to watch on a database server. It's actually also the list of very first things to check.

CPU Usage

A high CPU usage is not a bad thing as long as you don’t reach the limit. Excessive percentage of CPU usage could be a problem if it’s not usual behavior. In this case, it is essential to identify the process/processes that are generating this issue. If the problem is the database process, you will need to check what is happening inside the database.

RAM Memory or SWAP Usage

Ideally, your entire database should be stored in memory, but this is not always possible. Give MySQL as much as you can afford but leave enough for other processes to function.

If you see a high value for this metric and nothing has changed in your system, you probably need to check your database configuration. Parameters like shared_buffers and work_mem can affect this directly as they define the amount of memory to be able to use for the MySQL database. Swap is for emergencies only, and it should not be used, make sure you also have your operating system set to let MySQL decide about swap usage.

Disk Usage 

Disk usage is one of the key metrics to monitor and alert. Make sure you always have free space for new data, temporary files, snapshots, or backups.

Monitoring hard metric values is not good enough. An abnormal increase in the use of disk space or an excessive disk access consumption is essential things to watch as you could have a high number of errors logged in the MySQL log file or a lousy cache configuration that could generate a vital disk access consumption instead of using memory to process the queries. Make sure you are able to catch abnormal behaviors even if your warning and critical metrics are not reached yet.

Along with monitoring space we also should monitor disk activity.  The top values to monitor are:

  • Read/Write requests
  • IO Queue length
  • Average IO wait
  • Average Read/Write time
  • Read/Write bandwidth

You can use iostat or pt-diskstats from Percona to see all these details. 

Things that can affect your disk performance are often related to data transfer from and towards your disk so monitor abnormal processes than can be started from other users.

Load Average

An all-in-one performance metric. Understanding Linux Load is a key to monitor OS and database dependent systems.

Load average related to the three points mentioned above. A high load average could be generated by an excessive CPU, RAM, or disk usage.

Network

Unless doing backups or transferring vast amounts of data, it shouldn’t be the bottleneck.

A network issue can affect all the systems as the application can’t connect (or connect losing packages) to the database, so this is an important metric to monitor indeed. You can monitor latency or packet loss, and the main issue could be a network saturation, a hardware issue, or just a lousy network configuration.

Database Monitoring

While monitoring is a must, it’s not typically free. There is always a cost on the database performance, depending on how much you are monitoring, so you should avoid monitoring things that you won’t use.

In general, there are two ways to monitor your databases, from the logs or from the database side by querying.

In the case of logs, to be able to use them, you need to have a high logging level, which generates high disk access and it can affect the performance of your database.

For the querying mode, each connection to the database uses resources, so depending on the activity of your database and the assigned resources, it may affect the performance too.

Of course, there are many metrics in MySQL. Here we will focus on the top important.

Monitoring Active Sessions

You should also track the number of active sessions and DB up down status. Often to understand the problem you need to see how long the database is running. so we can use this to detect respawns.

The next thing would be a number of sessions. If you are near the limit, you need to check if something is wrong or if you just need to increment the max_connections value. The difference in the number can be an increase or decrease of connections. Improper usage of connection pooling, locking or network issues are the most common problems related to the number of connections.

The key values here are

  • Uptime
  • Threads_connected
  • Max_used_connections
  • Aborted_connects

Database Locks

If you have a query waiting for another query, you need to check if that another query is a normal process or something new. In some cases, if somebody is making an update on a big table, for example, this action can be affecting the normal behavior of your database, generating a high number of locks.

Monitoring Replication

The key metrics to monitor for replication are the lag and the replication state. Not only the up down status but also the lag because a continuous increase in this value is not a very good sign as it means that the slave is not able to catch up with its master.

The most common issues are networking issues, hardware resource issues, or under dimensioning issues. If you are facing a replication issue you will need to know this asap as you will need to fix it to ensure the high availability environment. 

Replication is best monitored by checking SLAVE STATUS and the following parameters:

  • SLAVE_RUNNING
  • SLAVE_IO_Running
  • SLAVE_SQL_RUNNING
  • LAST_SQL_ERRNO
  • SECONDS_BEHIND_MASTER

Backups

Unfortunately, the vanilla community edition doesn't come with the backup manager. You should know if the backup was completed, and if it’s usable. Usually, this last point is not taken into account, but it’s probably the most critical check in a backup process. Here we would have to use external tools like percona-xtrabackup or ClusterControl.

Database Logs

You should monitor your database log for errors like FATAL or deadlock, or even for common errors like authentication issues or long-running queries. Most of the errors are written in the log file with detailed useful information to fix it. Common failure points you need to keep an eye on are errors, log file sizes. The location of the error log can be found under the log_error variable.

External Tools

Last but not least you can find a list of useful tools to monitor your database activity. 

Percona Toolkit - is the set of Linux tools from Percona to analyze MySQL and OS activities. You can find it here. It supports the most popular 64 bit Linux distributions like Debian, Ubuntu, and Redhat. 

mysqladmin - mysqladmin is an administration program for the MySQL daemon. It can be used to check server health (ping), list the processes, see the values of the variables, but also do some administrative work like create/drop databases, flush (reset) logs, statistics, and tables, kill running queries, stop the server and control replication.

innotop - offers an extended view of SHOW statements. It's very powerful and can significantly reduce the investigation time. Among vanilla MySQL support, you can see the Galera view and Master-slave replication details. 

mtop - monitors a MySQL server showing the queries which are taking the most amount of time to complete. Features include 'zooming' in on a process to show the complete query, 'explaining' the query optimizer information for a query and 'killing' queries. In addition, server performance statistics, configuration information, and tuning tips are provided.

Mytop -  runs in a terminal and displays statistics about threads, queries, slow queries, uptime, load, etc. in tabular format, much similar to the Linux

Conclusion

This blog is not intended to be an exhaustive guide to how to enhance database monitoring, but it hopefully gives a clearer picture of what things can become essential and some of the basic parameters that can be watched. Do not hesitate to let us know if we’ve missed any important ones in the comments below.

 

by Bart Oles at January 29, 2020 09:09 AM

MariaDB Foundation

MariaDB Day Brussels 02.02.2020 – Introducing speakers – Seppo Jaakola on MariaDB 10.5 new Galera features

Galera R&D team is currently finalizing new features targeted for the next MariaDB 10.5 release. This presentation is a high level overview of the most prominent Galera clustering features under work, such as:
* Non Blocking DDL – […]

The post MariaDB Day Brussels 02.02.2020 – Introducing speakers – Seppo Jaakola on MariaDB 10.5 new Galera features appeared first on MariaDB.org.

by Anna Widenius at January 29, 2020 08:45 AM

January 28, 2020

MariaDB Foundation

MariaDB day Brussels 02.02.2020 – Introducing speakers – Vicențiu Ciorbaru on comparing MariaDB and MySQL Roles.

MySQL 8.0 has introduced roles, a feature that was present since MariaDB 10.0. There are quite a number of differences between the two databases.
During the MariaDB day Vicențiu will present a comparison between them and see how roles are useful for your application and what are the key differences to consider when working with both databases. […]

The post MariaDB day Brussels 02.02.2020 – Introducing speakers – Vicențiu Ciorbaru on comparing MariaDB and MySQL Roles. appeared first on MariaDB.org.

by Anna Widenius at January 28, 2020 03:47 PM