Planet MariaDB

January 23, 2020

SeveralNines

An Introduction to MySQL Deployment Using an Ansible Role

Ansible automates and simplifies repetitive, complex, and tedious operations. It is an IT automation engine that automates cloud provisioning, configuration management, application deployment, intra-service orchestration, and many other IT needs. It requires no agents, using only SSH to push changes from a single source to multiple remote resources with no additional custom security infrastructure configuration and use a simple language format (YAML) to describe the automation jobs.

Installing a standalone MySQL server is a simple straightforward task, but this can be problematic if you have multiple database servers, versions, platforms and environments to support. Thus, having a configuration management tool is the way to go to improve efficiency, remove repetitiveness and reduce human errors.

In this blog post, we are going to go walk you through the basics of Ansible's automation for MySQL, as well as configuration management with examples and explanations. We will start with a simple standalone MySQL deployment, as illustrated in the following high-level diagram:

Installing Ansible

For this walkthrough, we need to have at least two hosts - One host is for Ansible (you could use a workstation instead of a server) and another one is the target host that we want to deploy a MySQL server. 

To install Ansible on CentOS 7, simply run the following commands:

(ansible-host)$ yum install -y epel-release

(ansible-host)$ yum install -y ansible

For other OS distributions, check out the Ansible installation guide.

Setting up Passwordless SSH

Using password during SSH is supported, but passwordless SSH keys with ssh-agent are one of the best ways to use Ansible. The initial step is to configure passwordless SSH since Ansible will perform the deployment solely by this channel. Firstly, generate a SSH key on the Ansible host:

(ansible-host)$ whoami

root

(ansible-host)$ ssh-keygen -t rsa -N '' -f ~/.ssh/id_rsa

You should get at least the following files generated:

(ansible-host)$ ls -al ~/.ssh/

-rw-------. 1 root root 1679 Jan 14 03:40 id_rsa

-rw-r--r--. 1 root root  392 Jan 14 03:40 id_rsa.pub

To allow passwordless SSH, we need to copy the SSH public key (id_rsa.pub) to the remote host that we want to access. We can use a tool called ssh-copy-id to do this task for us. However, you must know the user's password of the target host and the password authentication is allowed on the target host:

(ansible-host)$ whoami

root

(ansible-host)$ ssh-copy-id root@192.168.0.221

The above command will prompt out for root password of 192.168.0.221, simply enter the password and the SSH key for the current user of the Ansible host will be copied over to the target host, 192.168.0.221 into ~/.ssh/authorized_keys, meaning we authorize that particular key to access this server remotely. To test out, you should be able to run the following remote command without any password from Ansible host:

(ansible-host)$ ssh root@192.168.0.221 "hostname -I"

192.168.0.221

In case where you are not allowed to use root user for SSH (e.g, "PermitRootLogin no" in SSH configuration), you can use a sudo user instead. In the following example, we set up passwordless SSH for a sudo user called "vagrant":

(ansible-host)$ whoami

vagrant

(ansible-host)$ ssh-keygen -t rsa -N '' -f ~/.ssh/id_rsa

(ansible-host)$ ls -al ~/.ssh/

-rw-------. 1 vagrant vagrant 1679 Jan 14 03:45 id_rsa

-rw-r--r--. 1 vagrant vagrant  392 Jan 14 03:45 id_rsa.pub

(ansible-host)$ ssh-copy-id vagrant@192.168.0.221

If the target server doesn't allow password authentication via SSH, simply copy the content of SSH public key at ~/.ssh/id_rsa.pub manually into the target hosts' ~/.ssh/authorized_keys file. For example, on the Ansible host, retrieve the public key content:

(ansible-host)$ cat ~/.ssh/id_rsa.pub

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5MZjufN0OiKyKa2OG0EPBEF/w23FnOG2x8qpAaYYuqHlVc+ZyRugtGm+TdTJDfLA1Sr/rtZpXmPDuLUdlAvPmmwqIhgiatKiDw5t2adNUwME0sVgAlBv/KvbusTTdtpFQ1o+Z9CltGiENDCFytr2nVeBFxImoZu2H0ilZed/1OY2SZejUviXTQ0Dh0QYdIeiQHkMf1CiV2sNYs8j8+ULV26OOKCd8c1h1O9M5Dr4P6kt8E1lVSl9hbd4EOHQmeZ3R3va5zMesLk1A+iadIGJCJNCVOA2RpxDHmmaX28zQCwrpCliH00g9iCRixlK+cB39d1coUWVGy7SeaI8bzfv3 vagrant@cc

Connect to the target host and paste the Ansible's host public key into ~/.ssh/authorized_keys:

(target-host)$ whoami

root

(target-host)$ vi ~/.ssh/authorized_keys

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5MZjufN0OiKyKa2OG0EPBEF/w23FnOG2x8qpAaYYuqHlVc+ZyRugtGm+TdTJDfLA1Sr/rtZpXmPDuLUdlAvPmmwqIhgiatKiDw5t2adNUwME0sVgAlBv/KvbusTTdtpFQ1o+Z9CltGiENDCFytr2nVeBFxImoZu2H0ilZed/1OY2SZejUviXTQ0Dh0QYdIeiQHkMf1CiV2sNYs8j8+ULV26OOKCd8c1h1O9M5Dr4P6kt8E1lVSl9hbd4EOHQmeZ3R3va5zMesLk1A+iadIGJCJNCVOA2RpxDHmmaX28zQCwrpCliH00g9iCRixlK+cB39d1coUWVGy7SeaI8bzfv3 vagrant@cc

You may now try to run a remote command from Ansible host to verify and you should not be prompted with any password. At this point, our passwordless SSH is configured.

Defining the Target Host

Next we need to define the target host, the host that we want to manage using Ansible. Based on our architecture, we are going to deploy only one MySQL server which is 192.168.0.221. Add the following lines into /etc/ansible/hosts:

[db-mysql]

192.168.0.221

The above simply means we defined a group called "db-mysql", which will be the identifier when we refer to the target host in Ansible playbook. We can also list out all IP addresses or hostnames of the target hosts under this group. At this point, we only have one MySQL server to deploy, thus only one entry is there. You can also specify a any matching rule to match the hosts under one group, for example:

[db-mysql]

192.168.0.[221:223]

The above definition means we are having 3 hosts under this very group with the following IP addresses:

  • 192.168.0.221
  • 192.168.0.222
  • 192.168.0.223

There are a lot of ways and rules to match and group the target hosts as shown in the Ansible inventory guide.

Choosing an Ansible Role

To tell Ansible what to deploy, we need to define the deployment steps in a YML formatted file called playbook. As you might know, installing a complete MySQL server requires multiple steps to satisfy all MySQL dependencies, post-installation configuration, user and schema creation and so on. Ansible has provided a number of MySQL modules that can help us out, but still we have to write a playbook for the deployment steps.

To simplify the deployment steps, we can use existing Ansible roles. Ansible role is an independent component which allows reuse of common configuration steps. An Ansible role has to be used within the playbook. There are a number of MySQL Ansible roles available in the Ansible Galaxy, a repository for Ansible roles that are available to drop directly into your playbooks.

If you lookup "mysql", you will get plenty of Ansible roles for MySQL:

We will use the most popular one named "mysql" by geerlingguy. You can opt to use other roles but mostly the most downloaded one tends to be for general purpose which usually works fine in most cases.

On the Ansible host, run the following command to download the Ansible role:

(ansible-host)$ ansible-galaxy install geerlingguy.mysql

The role will be downloaded into ~/.ansible/roles/geerlingguy.mysql/ of the current user.

Writing the Ansible Playbook

By looking at the Readme of the Ansible role, we can follow the example playbook that is being provided. Firstly, create a playbook file called deploy-mysql.yml and add the following lines:

(ansible-host)$ vim ~/deploy-mysql.yml

- hosts: db-mysql

  become: yes

  vars_files:

    - vars/main.yml

  roles:

    - { role: geerlingguy.mysql }

In the above lines, we define the target host which is all hosts under db-mysql entries in /etc/ansible/hosts. The next line (become) tells Ansible to execute the playbook as a root user, which is necessary for the role (it is stated there in the Readme file). Next, we define the location of variables file (var_files) located at vars/main.yml, relative to the playbook path.

Let's create the variable directory and file and specify the following line:

(ansible-host)$ mkdir vars

(ansible-host)$ vim vars/main.yml

mysql_root_password: "theR00tP455w0rd"

For more information check out the Role Variables section in the Readme file of this role.

Start the Deployment

Now we are ready to start the MySQL deployment. Use the ansible-playbook command to execute our playbook definitions:

(ansible-host)$ ansible-playbook deploy-mysql.yml

You should see a bunch of lines appear in the output. Focus on the last line where it summarizes the deployment:

PLAY RECAP ***************************************************************************************************************************************

192.168.0.221              : ok=36 changed=8 unreachable=0    failed=0 skipped=16 rescued=0 ignored=0

If everything turns up green and OK, you can verify on the database host that our MySQL server is already installed and running:

(mysql-host)$ rpm -qa | grep -i maria

mariadb-server-5.5.64-1.el7.x86_64

mariadb-libs-5.5.64-1.el7.x86_64

mariadb-5.5.64-1.el7.x86_64



(mysql-host)$ mysqladmin -uroot -p ping

Enter password:

mysqld is alive

As you can see from the above, for CentOS 7, the default MySQL installation is MariaDB 5.5 as part of the standard package repository. At this point, our deployment is considered complete, however, we would like to further customize our deployment as shown in the next sections.

Customizing the Deployment

The simplest definition in playbook gives us a very basic installation and uses all default configuration options. We can further customize the MySQL installation by extending/modifying/appending the playbook to do the following:

  • modify MySQL configuration options
  • add database user
  • add database schema
  • configure user privileges
  • configure MySQL replication
  • install MySQL from other vendors
  • import a custom MySQL configuration file

Installing MySQL from Oracle repository

By default, the role will install the default MySQL package that comes with the OS distribution. As for CentOS 7, you would get MariaDB 5.5 installed by default. Suppose we want to install MySQL from another vendor, we can extend the playbook with pre_tasks, a task which Ansible executes before executing any tasks mentioned in any .yml file, as shown in the following example:

(ansible-host)$ vim deploy-mysql.yml

- hosts: db-mysql

  become: yes

  vars_files:

    - vars/main.yml

  roles:

    - { role: geerlingguy.mysql }

  pre_tasks:

    - name: Install the MySQL repo.

      yum:

        name: http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm

        state: present

      when: ansible_os_family == "RedHat"

    - name: Override variables for MySQL (RedHat).

      set_fact:

        mysql_daemon: mysqld

        mysql_packages: ['mysql-server']

        mysql_log_error: /var/lib/mysql/error.log

        mysql_syslog_tag: mysqld

        mysql_pid_file: /var/run/mysqld/mysqld.pid

        mysql_socket: /var/lib/mysql/mysql.sock

      when: ansible_os_family == "RedHat"

Execute the playbook:

(ansible-host)$ ansible-playbook deploy-mysql.yml

The above will install MySQL from Oracle repository instead. The default version you would get is MySQL 5.6. Executing the above playbook on a target host that already has a running older version of MySQL/MariaDB would likely fail because of the incompatibility.

Creating MySQL Databases and Users

Inside vars/main.yml, we can define the MySQL database and users that we want Ansible to configure on our MySQL server by using the mysql_database and mysql_users modules, right after our previous definition on mysql_root_password:

(ansible-host)$ vim vars/main.yml

mysql_root_password: "theR00tP455w0rd"

mysql_databases:

  - name: myshop

    encoding: latin1

    collation: latin1_general_ci

  - name: sysbench

    encoding: latin1

    collation: latin1_general_ci

mysql_users:

  - name: myshop_user

    host: "%"

    password: mySh0pPassw0rd

    priv: "myshop.*:ALL"

  - name: sysbench_user

    host: "192.168.0.%"

    password: sysBenchPassw0rd

    priv: "sysbench.*:ALL"

The definition instructs Ansible to create two databases, "myshop" and "sysbench", followed its respective MySQL user with proper privileges, allowed host and password.

Re-execute the playbook to apply the change into our MySQL server:

(ansible-host)$ ansible-playbook deploy-mysql.yml

This time, Ansible will pick up all the changes we made in vars/main.yml to be applied to our MySQL server. We can verify in the MySQL server with the following commands:

(mysql-host)$ mysql -uroot -p -e 'SHOW DATABASES'

Enter password:

+--------------------+

| Database           |

+--------------------+

| information_schema |

| myshop             |

| mysql              |

| performance_schema |

| sysbench           |

+--------------------+

(mysql-host)$ mysql -uroot -p -e 'SHOW GRANTS FOR sysbench_user@"192.168.0.%"'

Enter password:

+------------------------------------------------------------------------------------------------------------------------+

| Grants for sysbench_user@192.168.0.%                                                                                   |

+------------------------------------------------------------------------------------------------------------------------+

| GRANT USAGE ON *.* TO 'sysbench_user'@'192.168.0.%' IDENTIFIED BY PASSWORD '*4AC2E8AD02562E8FAAF5A958DC2AEA4C47451B5C' |

| GRANT ALL PRIVILEGES ON `sysbench`.* TO 'sysbench_user'@'192.168.0.%'                                                  |

+------------------------------------------------------------------------------------------------------------------------+

Enabling Slow Query Log

This role supports enabling MySQL slow query log, we can define the location of the log file as well as the slow query time. Add the necessary variables inside vars/main.yml file:

mysql_root_password: "theR00tP455w0rd"

mysql_databases:

  - name: example_db

    encoding: latin1

    collation: latin1_general_ci

  - name: sysbench

    encoding: latin1

    collation: latin1_general_ci

mysql_users:

  - name: example_user

    host: "%"

    password: similarly-secure-password

    priv: "example_db.*:ALL"

  - name: sysbench_user

    host: "192.168.0.%"

    password: sysBenchPassw0rd

    priv: "sysbench.*:ALL"

mysql_slow_query_log_enabled: true

mysql_slow_query_log_file: 'slow_query.log'

mysql_slow_query_time: '5.000000'

Re-run the playbook to apply the changes:

(ansible-host)$ ansible-playbook deploy-mysql.yml

The playbook will make necessary changes to MySQL slow query related options and restart the MySQL server automatically to load the new configurations. We can then verify if the new configuration options are loaded correctly on the MySQL server:

(mysql-host)$ mysql -uroot -p -e 'SELECT @@slow_query_log, @@slow_query_log_file, @@long_query_time'

+------------------+-----------------------+-------------------+

| @@slow_query_log | @@slow_query_log_file | @@long_query_time |

+------------------+-----------------------+-------------------+

|                1 | slow_query.log        | 5.000000 |

+------------------+-----------------------+-------------------+

Including Custom MySQL Configuration File

Ansible role variables and MySQL variables are two different things. The author of this role has created a number of MySQL related variables that can be represented with Ansible role variables. Taken from the Readme file, here are some of them:

mysql_port: "3306"

mysql_bind_address: '0.0.0.0'

mysql_datadir: /var/lib/mysql

mysql_socket: *default value depends on OS*

mysql_pid_file: *default value depends on OS*

mysql_log_file_group: mysql *adm on Debian*

mysql_log: ""

mysql_log_error: *default value depends on OS*

mysql_syslog_tag: *default value depends on OS*

If the generated configuration does not satisfy our MySQL requirement, we can include custom MySQL configuration files into the deployment by using mysql_config_include_files variable. It accepts an array of values separated by a comma, with a "src" as the prefix for the actual path on the Ansible host.

First of all, we have to prepare the custom configuration files on the Ansible host. Create a directory and a simple MySQL configuration file:

(ansible-host)$ mkdir /root/custom-config/

(ansible-host)$ vim /root/custom-config/my-severalnines.cnf

[mysqld]

max_connections=250

log_bin=binlog

expire_logs_days=7

Let's say we have another configuration file specifically for mysqldump configuration:

(ansible-host)$ vim /root/custom-config/mysqldump.cnf

[mysqldump]

max_allowed_packet=128M

To import these configuration files into our deployment, define them in the mysql_config_include_files array in vars/main.yml file:

mysql_root_password: "theR00tP455w0rd"

mysql_databases:

  - name: example_db

    encoding: latin1

    collation: latin1_general_ci

  - name: sysbench

    encoding: latin1

    collation: latin1_general_ci

mysql_users:

  - name: example_user

    host: "%"

    password: similarly-secure-password

    priv: "example_db.*:ALL"

  - name: sysbench_user

    host: "192.168.0.%"

    password: sysBenchPassw0rd

    priv: "sysbench.*:ALL"

mysql_slow_query_log_enabled: true

mysql_slow_query_log_file: slow_query.log

mysql_slow_query_time: 5

mysql_config_include_files: [

  src: '/root/custom-config/my-severalnines.cnf',

  src: '/root/custom-config/mysqldump.cnf'

]

Note that /root/custom-config/mysqld-severalnines.cnf and /root/custom-config/mysqldump.cnf exist inside the Ansible host.

Re-run the playbook:

(ansible-host)$ ansible-playbook deploy-mysql.yml

The playbook will import those configuration files and put them into the include directory (depending on the OS) which is /etc/my.cnf.d/ for CentOS 7. The playbook will auto-restart the MySQL server to load the new configuration options. We can then verify if the new configuration options are loaded correctly:

(mysql-host)$ mysql -uroot -p -e 'select @@max_connections'

250

(mysql-host)$ mysqldump --help | grep ^max-allowed-packet

max-allowed-packet                134217728

Conclusion

Ansible can be used to automate the database deployment and configuration management with a little knowledge of scripting. Meanwhile, ClusterControl uses a similar passwordless SSH approach to deploy, monitor, manage and scale your database cluster from A to Z, with a user interface and needs no additional skill to achieve the same result.

by ashraf at January 23, 2020 04:35 PM

MariaDB Foundation

MariaDB Day Brussels 0202 2020 Provisional Schedule

A provisional schedule for the first MariaDB Day, to be held as part of the FOSDEM Fringe in Brussels at the Bedford Hotel and Congress Centre on Sunday February 2, is now available. […]

The post MariaDB Day Brussels 0202 2020 Provisional Schedule appeared first on MariaDB.org.

by Ian Gilfillan at January 23, 2020 06:05 AM

January 22, 2020

SeveralNines

Using PostgreSQL Replication Slots

What are Replication Slots?

Back in the days when "Replication Slots" were not yet introduced, managing the WAL segments were a challenge. In standard streaming replication, the master has no knowledge of the slave status.  Take the example of a master that executes a large transaction, while a standby node is in maintenance mode for a couple of hours (such as upgrading the system packages, adjusting network security, hardware upgrade, etc.). At some point, the master removes its transaction log (WAL segments) as checkpoint passes. Once the slave is off maintenance, it possibly has a huge slave lag and has to catch up with the master. Eventually, the slave will get a fatal issue like below:

LOG:  started streaming WAL from primary at 0/73000000 on timeline 1

FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000000000000073 has already been removed

The typical approach is to specify in your postgresql.conf a WAL archival script that will copy WAL files to one or more long-term archive locations. If you don’t have any standbys or other streaming replication clients, then basically the server can discard the WAL file once the archive script is done or responds OK. But you’ll still need some recent WAL files for crash recovery (data from recent WAL files get replayed during crash recovery. In our example of a standby node which is placed for a long maintenance period, problems arise when it comes back online and asks the primary for a WAL file that the primary no longer has, then the replication fails.

This problem was addressed in PostgreSQL 9.4 via "Replication Slots".

If not using replication slots, a common way to reduce the risk of failing replication is to set the wal_keep_segments high enough so that WAL files that might be needed won't be rotated or recycled. The disadvantage of this approach is that it's hard to determine what value is best for your setup. You won't need maintenance on a daily basis or you won't need to retain a large pile of WAL files that eats your disk storage. While this works, it's not an ideal solution as risking disk space on the master can cause incoming transactions to fail.

Alternative approaches of not using replication slots is to configure PostgreSQL with continuous archiving and provide a restore_command to give the replica access to the archive. To avoid WAL build-up on the primary, you may use a separate volume or storage device for the WAL files, e.g., SAN or NFS. Another thing is with synchronous replication since it requires that primary has to wait for standby nodes to commit transaction. This means, it assures that WAL files have been applied to the standby nodes. But still, it's best that you provide archiving commands from the primary so that once WAL's are recycled in the primary, rest assured that you have WAL backups in case for recovery. Although in some situations, synchronous replication is not an ideal solution as it comes with some performance overhead as compared with asynchronous replication.

Types of Replication Slots

There are two types of replication slots. These are:

Physical Replication Slots 

Can be used for standard streaming replication. They will make sure that data is not recycled too early. 

Logical Replication Slots

Logical replication does the same thing as physical replication slots and are used for logical replication. However, they are used for logical decoding. The idea behind logical decoding is to give users a chance to attach to the transaction log and decode it with a plugin. It allows to extract changes made to the database and therefore to the transaction log in any format and for any purpose.

In this blog, we'll be using physical replication slots and how to achieve this using ClusterControl.

Advantages and Disadvantages of Using Replication Slots

Replications slots are definitely beneficial once enabled. By default, "Replication Slots" are not enabled and have to be set  manually. Among the advantages of using Replication Slots are

  • Ensures master retains enough WAL segments for all replicas to receive them
  • Prevents the master from removing rows that could cause recovery conflict on the replicas
  • A master can only recycle the transaction log once it has been consumed by all replicas. The advantage here is that a slave can never fall behind so much that a re-sync is needed.

Replication slots also come with some caveats.

  • An orphan replication slot can cause unbounded disk growth due to piled up WAL files from the master
  • Slave nodes placed under long maintenance (such as days or weeks) and that are tied to a replication slot will have unbounded disk growth due to piled up WAL files from the master

You can monitor this by querying pg_replication_slots to determine the slots that are not used. We'll check back on this a bit later.

Using Replication Slots 

As stated earlier, there are two types of replication slots. For this blog, we'll use physical replication slots for streaming replication.

Creating A Replication Slot

Creating a replication is simple. You need to invoke the existing function pg_create_physical_replication_slot to do this and has to be run and created in the master node. The function is simple,

maximus_db=# \df pg_create_physical_replication_slot

Schema              | pg_catalog

Name                | pg_create_physical_replication_slot

Result data type    | record

Argument data types | slot_name name, immediately_reserve boolean DEFAULT false, OUT slot_name name, OUT xlog_position pg_lsn

Type                | normal

e.g. Creating a replication slot named slot1,

postgres=# SELECT pg_create_physical_replication_slot('slot1');

-[ RECORD 1 ]-----------------------+---------

pg_create_physical_replication_slot | (slot1,)

The replication slot names and its underlying configuration is only system-wide and not cluster-wide. For example, if you have nodeA (current master), and standby nodes nodeB and nodeC, creating the slot on a master nodeA namely "slot1", then data will not be available to nodeB and nodeC. Therefore, when failover/switchover is about to happen, you need to re-create the slots you have created.

Dropping A Replication Slot

Unused replication slots have to be dropped or deleted. As stated earlier, when there are orphaned replication slots or slots that have not been assigned to any client or standby nodes, it can lead to boundless disk space issues if left undropped. So it is very important that these have to be dropped when it's no longer use. To drop it, simply invoke pg_drop_replication_slot. This function has the following definition:

maximus_db=# \df pg_drop_replication_slot

Schema              | pg_catalog

Name                | pg_drop_replication_slot

Result data type    | void

Argument data types | name

Type                | normal

Dropping it is simple:

maximus_db=# select pg_drop_replication_slot('slot2');

-[ RECORD 1 ]------------+-

pg_drop_replication_slot |

Monitoring Your PostgreSQL Replication Slots

Monitoring your replication slots is something that you don't want to miss. Just collect the information from view pg_replication_slots in the primary/master node just like below:

postgres=# select * from pg_replication_slots;

-[ RECORD 1 ]-------+-----------

slot_name           | main_slot

plugin              |

slot_type           | physical

datoid              |

database            |

active              | t

active_pid          | 16297

xmin                |

catalog_xmin        |

restart_lsn         | 2/F4000108

confirmed_flush_lsn |

-[ RECORD 2 ]-------+-----------

slot_name           | main_slot2

plugin              |

slot_type           | physical

datoid              |

database            |

active              | f

active_pid          |

xmin                |

catalog_xmin        |

restart_lsn         |

confirmed_flush_lsn |

The above result shows that the main_slot has been taken, but not main_slot2.

Another thing you can do is to monitor how much lag behind the slots you have. To achieve this, you can simply use the query based on the sample result below:

postgres=# SELECT redo_lsn, slot_name,restart_lsn, 

round((redo_lsn-restart_lsn) / 1024 / 1024 / 1024, 2) AS GB_behind 

FROM pg_control_checkpoint(), pg_replication_slots;

redo_lsn    | slot_name | restart_lsn | gb_behind 

------------+-----------+-------------+-----------

 1/8D400238 |     slot1 | 0/9A000000 | 3.80

But redo_lsn is not present in 9.6, shall use redo_location, so in 9.6,

imbd=# SELECT redo_location, slot_name,restart_lsn, 

round((redo_location-restart_lsn) / 1024 / 1024 / 1024, 2) AS GB_behind 

FROM pg_control_checkpoint(), pg_replication_slots;

-[ RECORD 1 ]-+-----------

redo_location | 2/F6008BE0

slot_name     | main_slot

restart_lsn   | 2/F6008CC0

gb_behind     | 0.00

-[ RECORD 2 ]-+-----------

redo_location | 2/F6008BE0

slot_name     | main_slot2

restart_lsn   | 2/F6008CC0

gb_behind     | 0.00

System Variable Requirements

Implementing replication slots requires manual setting. There are variables that you have to keep in mind that require changes and be specified in your postgresql.conf. See below:

  • max_replication_slots – If set to 0, this means that replication slots are totally disabled. If you're using PostgreSQL < 10 versions, this slot has to be specified other than 0 (default). Since PostgreSQL 10, the default is 10. This variable specifies the maximum number of replication slots. Setting it to a lower value than the number of currently existing replication slots will prevent the server from starting.
  • wal_level – must at least be replica or higher (replica is default). Setting hot_standby or archive will map to replica. For a physical replication slot, replica is enough. For logical replication slots, logical is preferred.
  • max_wal_senders – set to 10 by default, 0 in 9.6 version which means replication is disabled. We suggest you set this at least to 16 especially when running with ClusterControl.
  • hot_standby – in versions < 10, you need to set this to on which is off by default. This is important for standby nodes which means when on, you can connect and run queries during recovery or in standby mode.
  • primary_slot_name –  this variable is set via recovery.conf on the standby node. This is the slot to be used by the receiver or standby node when connecting with the sender (or primary/master).

You have to take note that these variables mostly require a database service restart in order to reload new values.

Using Replication Slots in a ClusterControl PostgreSQL Environment

Now, let’s see how we can use physical replication slots and implement them within a Postgres setup managed by ClusterControl.

Deploying of PostgreSQL Database Nodes

Let's start deploying a 3-node PostgreSQL Cluster using ClusterControl using PostgreSQL 9.6 version this time.

ClusterControl will deploy nodes with the following system variables defined accordingly based on their defaults or tuned up values. In:

postgres=# select name, setting from pg_settings where name in ('max_replication_slots', 'wal_level', 'max_wal_senders', 'hot_standby');

         name          | setting 

-----------------------+---------

 hot_standby           | on

 max_replication_slots | 0

 max_wal_senders       | 16

 wal_level             | replica

(4 rows)

In versions PostgreSQL > 9.6, max_replication_slots default value is 10 which is enabled by default but not in 9.6 or lower versions which is disabled by default. You need to assign max_replication_slots higher than 0. In this example, I set max_replication_slots to 5.

root@debnode10:~# grep 'max_replication_slots' /etc/postgresql/9.6/main/postgresql.conf 

# max_replication_slots = 0                     # max number of replication slots

max_replication_slots = 5

and restarted the service,

root@debnode10:~# pg_lsclusters 

Ver Cluster Port Status Owner    Data directory Log file

9.6 main    5432 online postgres /var/lib/postgresql/9.6/main pg_log/postgresql-%Y-%m-%d_%H%M%S.log



root@debnode10:~# pg_ctlcluster 9.6 main restart

Setting The Replication Slots For Primary and Standby Nodes

There's no option in ClusterControl to do this, so you have to create your slots manually. In this example, I created the slots in the primary in host 192.168.30.100:

192.168.10.100:5432 pgdbadmin@maximus_db=# SELECT pg_create_physical_replication_slot('slot1'), pg_create_physical_replication_slot('slot2');

 pg_create_physical_replication_slot | pg_create_physical_replication_slot 

-------------------------------------+-------------------------------------

 (slot1,)                            | (slot2,)

(1 row)

Checking what we have just created shows,

192.168.10.100:5432 pgdbadmin@maximus_db=# select * from pg_replication_slots;

 slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 

-----------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------

 slot1     | | physical  | | | f      | | |       | | 

 slot2     | | physical  | | | f      | | |       | | 

(2 rows)

Now in the standby nodes, we need to update the recovery.conf and add the variable primary_slot_name and change the application_name so it's easier to identify the node. Here's how it looks like in host 192.168.30.110 recovery.conf: 

root@debnode11:/var/lib/postgresql/9.6/main/pg_log# cat ../recovery.conf 

standby_mode = 'on'

primary_conninfo = 'application_name=node11 host=192.168.30.100 port=5432 user=cmon_replication password=m8rLmZxyn23Lc2Rk'

recovery_target_timeline = 'latest'

primary_slot_name = 'slot1'

trigger_file = '/tmp/failover_5432.trigger'

Doing the same thing as well in host 192.168.30.120 but changed the application_name and set the primary_slot_name = 'slot2'.

Checking the replication slot health:

192.168.10.100:5432 pgdbadmin@maximus_db=# select * from pg_replication_slots;

 slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 

-----------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------

 slot1     | | physical  | | | t      | 24252 | |       | 0/CF0A4218 | 

 slot2     | | physical  | | | t      | 11635 | |       | 0/CF0A4218 | 

(2 rows)

What Else Do You Need?

Since ClusterControl doesn't support Replication Slots as of this time, there are things that you need to take into account. What are these? Let's go into details.

Failover/Switchover Process

When an auto failover or switchover via ClusterControl has been attempted, slots will not be retained from the primary and on the standby nodes. You need to re-create this manually, check the variables if set correctly, and modify the recovery.conf accordingly.

Rebuilding a Slave from a Master

When rebuilding a slave, the recovery.conf will not be retained. This means that your recovery.conf settings having the primary_slot_name will be erased. You need to specify this manually again and check the pg_replication_slots view to determine if slots are properly used or left orphaned.

If you want to rebuild the slave/standby node from a master, you might have to consider specifying the PGAPPNAME env variable just like the command below:

$ export PGAPPNAME="app_repl_testnode15"; /usr/pgsql-9.6/bin/pg_basebackup -h 192.168.10.190 -U cmon_replication -D /var/lib/pgsql/9.6/data -p5434 -W -S main_slot -X s -R -P

Specifying the -R param is very important so it will re-create the recovery.conf, while -S shall specify what slot name to use when rebuilding the standby node.

Conclusion

Implementing the Replication Slots in PostgreSQL is straightforward yet there are certain caveats that you must remember. When deploying with ClusterControl, you’ll need to update some settings during failover or slave rebuilds.

by Paul Namuag at January 22, 2020 05:15 PM

January 21, 2020

SeveralNines

Moving from MySQL 5.7 to MySQL 8.0 - What You Should Know

April 2018 is not just a date for the MySQL world. MySQL 8.0 was released there, and more than 1 year after, it’s probably time to consider migrating to this new version.

MySQL 8.0 has important performance and security improvements, and, as in all migration to a new database version, there are several things to take into account before going into production to avoid hard issues like data loss, excessive downtime, or even a rollback during the migration task.

In this blog, we’ll mention some of the new MySQL 8.0 features, some deprecated stuff, and what you need to keep in mind before migrating.

What’s New in MySQL 8.0?

Let’s now summarize some of the most important features mentioned in the official documentation for this new MySQL version.

  • MySQL incorporates a transactional data dictionary that stores information about database objects.
  • An atomic DDL statement combines the data dictionary updates, storage engine operations, and binary log writes associated with a DDL operation into a single, atomic transaction.
  • The MySQL server automatically performs all necessary upgrade tasks at the next startup to upgrade the system tables in the mysql schema, as well as objects in other schemas such as the sys schema and user schemas. It is not necessary for the DBA to invoke mysql_upgrade.
  • It supports the creation and management of resource groups, and permits assigning threads running within the server to particular groups so that threads execute according to the resources available to the group. 
  • Table encryption can now be managed globally by defining and enforcing encryption defaults. The default_table_encryption variable defines an encryption default for newly created schemas and general tablespace. Encryption defaults are enforced by enabling the table_encryption_privilege_check variable. 
  • The default character set has changed from latin1 to utf8mb4.
  • It supports the use of expressions as default values in data type specifications. This includes the use of expressions as default values for the BLOB, TEXT, GEOMETRY, and JSON data types.
  • Error logging was rewritten to use the MySQL component architecture. Traditional error logging is implemented using built-in components, and logging using the system log is implemented as a loadable component.
  • A new type of backup lock permits DML during an online backup while preventing operations that could result in an inconsistent snapshot. The new backup lock is supported by LOCK INSTANCE FOR BACKUP and UNLOCK INSTANCE syntax. The BACKUP_ADMIN privilege is required to use these statements.
  • MySQL Server now permits a TCP/IP port to be configured specifically for administrative connections. This provides an alternative to the single administrative connection that is permitted on the network interfaces used for ordinary connections even when max_connections connections are already established.
  • It supports invisible indexes. This index is not used by the optimizer and makes it possible to test the effect of removing an index on query performance, without removing it.
  • Document Store for developing both SQL and NoSQL document applications using a single database.
  • MySQL 8.0 makes it possible to persist global, dynamic server variables using the SET PERSIST command instead of the usual SET GLOBAL one. 

MySQL Security and Account Management

As there are many improvements related to security and user management, we'll list them in a separate section.

  • The grant tables in the mysql system database are now InnoDB tables. 
  • The new caching_sha2_password authentication plugin is now the default authentication method in MySQL 8.0. It implements SHA-256 password hashing, but uses caching to address latency issues at connect time. It provides more secure password encryption than the mysql_native_password plugin, and provides better performance than sha256_password.
  • MySQL now supports roles, which are named collections of privileges. Roles can have privileges granted to and revoked from them, and they can be granted to and revoked from user accounts. 
  • MySQL now maintains information about password history, enabling restrictions on reuse of previous passwords. 
  • It enables administrators to configure user accounts such that too many consecutive login failures due to incorrect passwords cause temporary account locking. 

InnoDB enhancements

As the previous point, there are also many improvements related to this topic, so we'll list them in a separate section too.

  • The current maximum auto-increment counter value is written to the redo log each time the value changes, and saved to an engine-private system table on each checkpoint. These changes make the current maximum auto-increment counter value persistent across server restarts
  • When encountering index tree corruption, InnoDB writes a corruption flag to the redo log, which makes the corruption flag crash-safe. InnoDB also writes in-memory corruption flag data to an engine-private system table on each checkpoint. During recovery, InnoDB reads corruption flags from both locations and merges results before marking in-memory table and index objects as corrupt.
  • A new dynamic variable, innodb_deadlock_detect, may be used to disable deadlock detection. On high concurrency systems, deadlock detection can cause a slowdown when numerous threads wait for the same lock. At times, it may be more efficient to disable deadlock detection and rely on the innodb_lock_wait_timeout setting for transaction rollback when a deadlock occurs.
  • InnoDB temporary tables are now created in the shared temporary tablespace, ibtmp1.
  • mysql system tables and data dictionary tables are now created in a single InnoDB tablespace file named mysql.ibd in the MySQL data directory. Previously, these tables were created in individual InnoDB tablespace files in the mysql database directory.
  • By default, undo logs now reside in two undo tablespaces that are created when the MySQL instance is initialized. Undo logs are no longer created in the system tablespace.
  • The new innodb_dedicated_server variable, which is disabled by default, can be used to have InnoDB automatically configure the following options according to the amount of memory detected on the server: innodb_buffer_pool_size, innodb_log_file_size, and innodb_flush_method. This option is intended for MySQL server instances that run on a dedicated server. 
  • Tablespace files can be moved or restored to a new location while the server is offline using the innodb_directories option. 

Now, let’s take a look at some of the features that you shouldn’t use anymore in this new MySQL version.

What is Deprecated in MySQL 8.0?

The following features are deprecated and will be removed in a future version.

  • The utf8mb3 character set is deprecated. Please use utf8mb4 instead.
  • Because caching_sha2_password is the default authentication plugin in MySQL 8.0 and provides a superset of the capabilities of the sha256_password authentication plugin, sha256_password is deprecated.
  • The validate_password plugin has been reimplemented to use the server component infrastructure. The plugin form of validate_password is still available but is deprecated.
  • The ENGINE clause for the ALTER TABLESPACE and DROP TABLESPACE statements.
  • The PAD_CHAR_TO_FULL_LENGTH SQL mode.
  • AUTO_INCREMENT support is deprecated for columns of type FLOAT and DOUBLE (and any synonyms). Consider removing the AUTO_INCREMENT attribute from such columns, or convert them to an integer type.
  • The UNSIGNED attribute is deprecated for columns of type FLOAT, DOUBLE, and DECIMAL (and any synonyms). Consider using a simple CHECK constraint instead for such columns.
  • FLOAT(M,D) and DOUBLE(M,D) syntax to specify the number of digits for columns of type FLOAT and DOUBLE (and any synonyms) is a nonstandard MySQL extension. This syntax is deprecated.
  • The nonstandard C-style &&, ||, and ! operators that are synonyms for the standard SQL AND, OR, and NOT operators, respectively, are deprecated. Applications that use the nonstandard operators should be adjusted to use the standard operators.
  • The mysql_upgrade client is deprecated because its capabilities for upgrading the system tables in the mysql system schema and objects in other schemas have been moved into the MySQL server.
  • The mysql_upgrade_info file, which is created data directory and used to store the MySQL version number.
  • The relay_log_info_file system variable and --master-info-file option are deprecated. Previously, these were used to specify the name of the relay log info log and master info log when relay_log_info_repository=FILE and master_info_repository=FILE were set, but those settings have been deprecated. The use of files for the relay log info log and master info log has been superseded by crash-safe slave tables, which are the default in MySQL 8.0.
  • The use of the MYSQL_PWD environment variable to specify a MySQL password is deprecated.

And now, let’s take a look at some of the features that you must stop using in this MySQL version.

What Was Removed in MySQL 8.0?

The following features have been removed in MySQL 8.0.

  • The innodb_locks_unsafe_for_binlog system variable was removed. The READ COMMITTED isolation level provides similar functionality.
  • Using GRANT to create users. Instead, use CREATE USER. Following this practice makes the NO_AUTO_CREATE_USER SQL mode immaterial for GRANT statements, so it too is removed, and an error now is written to the server log when the presence of this value for the sql_mode option in the options file prevents mysqld from starting.
  • Using GRANT to modify account properties other than privilege assignments. This includes authentication, SSL, and resource-limit properties. Instead, establish such properties at account-creation time with CREATE USER or modify them afterward with ALTER USER.
  • IDENTIFIED BY PASSWORD 'auth_string' syntax for CREATE USER and GRANT. Instead, use IDENTIFIED WITH auth_plugin AS 'auth_string' for CREATE USER and ALTER USER, where the 'auth_string' value is in a format compatible with the named plugin. 
  • The PASSWORD() function. Additionally, PASSWORD() removal means that SET PASSWORD ... = PASSWORD('auth_string') syntax is no longer available.
  • The old_passwords system variable.
  • The FLUSH QUERY CACHE and RESET QUERY CACHE statements.
  • These system variables: query_cache_limit, query_cache_min_res_unit, query_cache_size, query_cache_type, query_cache_wlock_invalidate.
  • These status variables: Qcache_free_blocks, Qcache_free_memory, Qcache_hits, Qcache_inserts, Qcache_lowmem_prunes, Qcache_not_cached, Qcache_queries_in_cache, Qcache_total_blocks.
  • These thread states: checking privileges on cached query, checking query cache for a query, invalidating query cache entries, sending cached result to the client, storing result in the query cache, Waiting for query cache lock.
  • The tx_isolation and tx_read_only system variables have been removed. Use transaction_isolation and transaction_read_only instead.
  • The sync_frm system variable has been removed because .frm files have become obsolete.
  • The secure_auth system variable and --secure-auth client option have been removed. The MYSQL_SECURE_AUTH option for the mysql_options() C API function was removed.
  • The log_warnings system variable and --log-warnings server option have been removed. Use the log_error_verbosity system variable instead.
  • The global scope for the sql_log_bin system variable was removed. sql_log_bin has session scope only, and applications that rely on accessing @@GLOBAL.sql_log_bin should be adjusted.
  • The unused date_format, datetime_format, time_format, and max_tmp_tables system variables are removed.
  • The deprecated ASC or DESC qualifiers for GROUP BY clauses are removed. Queries that previously relied on GROUP BY sorting may produce results that differ from previous MySQL versions. To produce a given sort order, provide an ORDER BY clause.
  • The parser no longer treats \N as a synonym for NULL in SQL statements. Use NULL instead. This change does not affect text file import or export operations performed with LOAD DATA or SELECT ... INTO OUTFILE, for which NULL continues to be represented by \N. 
  • The client-side --ssl and --ssl-verify-server-cert options have been removed. Use --ssl-mode=REQUIRED instead of --ssl=1 or --enable-ssl. Use --ssl-mode=DISABLED instead of --ssl=0, --skip-ssl, or --disable-ssl. Use --ssl-mode=VERIFY_IDENTITY instead of --ssl-verify-server-cert options.
  • The mysql_install_db program has been removed from MySQL distributions. Data directory initialization should be performed by invoking mysqld with the --initialize or --initialize-insecure option instead. In addition, the --bootstrap option for mysqld that was used by mysql_install_db was removed, and the INSTALL_SCRIPTDIR CMake option that controlled the installation location for mysql_install_db was removed.
  • The mysql_plugin utility was removed. Alternatives include loading plugins at server startup using the --plugin-load or --plugin-load-add option, or at runtime using the INSTALL PLUGIN statement.
  • The resolveip utility is removed. nslookup, host, or dig can be used instead.

There are a lot of new, deprecated, and removed features. You can check the official website for more detailed information.

Considerations Before Migrating to MySQL 8.0

Let’s mention now some of the most important things to consider before migrating to this MySQL version.

Authentication Method

As we mentioned, caching_sha2_password is not the default authentication method, so you should check if your application/connector supports it. If not, let’s see how you can change the default authentication method and the user authentication plugin to ‘mysql_native_password’ again.

To change the default  authentication method, edit the my.cnf configuration file, and add/edit the following line:

$ vi /etc/my.cnf

[mysqld]

default_authentication_plugin=mysql_native_password

To change the user authentication plugin, run the following command with a privileged user:

$ mysql -p

ALTER USER ‘username’@’hostname’ IDENTIFIED WITH ‘mysql_native_password’ BY ‘password’;

Anyway, these changes aren’t a permanent solution as the old authentication could be deprecated soon, so you should take it into account for a future database upgrade.

Also the roles are an important feature here. You can reduce the individual privileges assigning it to a role and adding the corresponding users there. 

For example, you can create a new role for the marketing and the developers teams:

$ mysql -p

CREATE ROLE 'marketing', 'developers';

Assign privileges to these new roles:

GRANT SELECT ON *.* TO 'marketing';

GRANT ALL PRIVILEGES ON *.* TO 'developers';

And then, assign the role to the users:

GRANT 'marketing' TO 'marketing1'@'%';

GRANT 'marketing' TO 'marketing2'@'%';

GRANT 'developers' TO 'developer1'@'%';

And that’s it. You’ll have the following privileges:

SHOW GRANTS FOR 'marketing1'@'%';

+-------------------------------------------+

| Grants for marketing1@%                   |

+-------------------------------------------+

| GRANT USAGE ON *.* TO `marketing1`@`%`    |

| GRANT `marketing`@`%` TO `marketing1`@`%` |

+-------------------------------------------+

2 rows in set (0.00 sec)

SHOW GRANTS FOR 'marketing';

+----------------------------------------+

| Grants for marketing@%                 |

+----------------------------------------+

| GRANT SELECT ON *.* TO `marketing`@`%` |

+----------------------------------------+

1 row in set (0.00 sec)

Character Sets

As the new default character set is utf8mb4, you should make sure you’re not using the default one as it’ll change.

To avoid some issues, you should specify the character_set_server and the collation_server variables in the my.cnf configuration file.

$ vi /etc/my.cnf

[mysqld]

character_set_server=latin1

collation_server=latin1_swedish_ci

MyISAM Engine

The MySQL privilege tables in the MySQL schema are moved to InnoDB. You can create a table engine=MyISAM, and it will work as before, but coping a MyISAM table into a running MySQL server will not work because it will not be discovered.

Partitioning

There must be no partitioned tables that use a storage engine that does not have native partitioning support. You can run the following query to verify this point.

$ mysql -p

SELECT TABLE_SCHEMA, TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE ENGINE NOT IN ('innodb', 'ndbcluster') AND CREATE_OPTIONS LIKE '%partitioned%';

If you need to change the engine of a table, you can run:

ALTER TABLE table_name ENGINE = INNODB;

Upgrade Check

As a last step, you can run the mysqlcheck command using the check-upgrade flag to confirm if everything looks fine.

$ mysqlcheck -uroot -p --all-databases --check-upgrade

Enter password:

mysql.columns_priv                                 OK

mysql.component                                    OK

mysql.db                                           OK

mysql.default_roles                                OK

mysql.engine_cost                                  OK

mysql.func                                         OK

mysql.general_log                                  OK

mysql.global_grants                                OK

mysql.gtid_executed                                OK

mysql.help_category                                OK

mysql.help_keyword                                 OK

mysql.help_relation                                OK

mysql.help_topic                                   OK

mysql.innodb_index_stats                           OK

mysql.innodb_table_stats                           OK

mysql.password_history                             OK

mysql.plugin                                       OK

mysql.procs_priv                                   OK

mysql.proxies_priv                                 OK

mysql.role_edges                                   OK

mysql.server_cost                                  OK

mysql.servers                                      OK

mysql.slave_master_info                            OK

mysql.slave_relay_log_info                         OK

mysql.slave_worker_info                            OK

mysql.slow_log                                     OK

mysql.tables_priv                                  OK

mysql.time_zone                                    OK

mysql.time_zone_leap_second                        OK

mysql.time_zone_name                               OK

mysql.time_zone_transition                         OK

mysql.time_zone_transition_type                    OK

mysql.user                                         OK

sys.sys_config                                     OK

world_x.city                                       OK

world_x.country                                    OK

world_x.countryinfo                                OK

world_x.countrylanguage                            OK

There are several things to check before performing the upgrade. You can check the official MySQL documentation for more detailed information.

Upgrade Methods

There are different ways to upgrade MySQL 5.7 to 8.0. You can use the upgrade in-place or even create a replication slave in the new version, so you can promote it later. 

But before upgrading, step 0 must be backing up your data. The backup should include all the databases including the system databases. So, if there is any issue, you can rollback asap. 

Another option, depending on the available resources, can be creating a cascade replication MySQL 5.7 -> MySQL 8.0 -> MySQL 5.7, so after promoting the new version, if something went wrong, you can promote the slave node with the old version back. But it could be dangerous if there was some issue with the data, so the backup is a must before it.

For any method to be used, it’s necessary a test environment to verify that the application is working without any issue using the new MySQL 8.0 version.

Conclusion

More than 1 year after the MySQL 8.0 release, it is time to start thinking to migrate your old MySQL version, but luckily, as the end of support for MySQL 5.7 is 2023, you have time to create a migration plan and test the application behavior with no rush. Spending some time in that testing step is necessary to avoid any issue after migrating it.

by Sebastian Insausti at January 21, 2020 08:57 PM

January 20, 2020

Valeriy Kravchuk

Dynamic Tracing of MariaDB Server With bcc trace - Basic Example

This is a yet another blog post in my series about dynamic tracing of MySQL server (and friends) on Linux. Logically it had to appear after this one about perf and another one about bpftrace. For older Linux systems or when you are in a hurry with customer and have no time to upgrade, build from source etc perf just works and is really flexible (but it comes with some cost of writing many samples to disk and then processing them). For happy users of Linux with kernels 4.9+ (the newer the better), like recent Ubuntu, RHEL 8, Debian 9+ or Fedora there entire world of new efficient tracing with bpftrace is open and extending with every new kernel release.

For those in between, like me with this Ubuntu 16.04:
openxs@ao756:~/git/bcc/build$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"
openxs@ao756:~/git/bcc/build$ uname -a
Linux ao756 4.4.0-171-generic #200-Ubuntu SMP Tue Dec 3 11:04:55 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
the fancy world of eBPF and more efficient dynamic tracing is still mostly open, as we can try to use BCC tools. BCC is a toolkit for creating efficient kernel tracing and manipulation programs that includes several potentially useful tools and examples for MySQL DBAs. It makes use of extended BPF (Berkeley Packet Filters), formally known as eBPF.

I have a draft of this blog post hanging around since October 2019, but every time I tried to complete it I was not happy with the content. I wanted to get back, test more, try to present more tools, find out how to be able to access structure members in probes as easy as I can do it with gdb or perf, but then I hit some problem and put the draft aside...

When I started again some time later I often hit some new problem, so today I just decided to finally write down what I already know for sure, and provide at least a very basic example of dynamic tracing along the lines of those used in earlier posts (capturing queries executed by different threads using dynamic probes).

The first problem in case of Ubuntu 16.04 is to get the binaries of BCC tools. One of the ways is to build from GitHub source. The INSTALL.md document is clear enough when describing build dependencies and steps:
git clone https://github.com/iovisor/bcc.git
mkdir bcc/build; cd bcc/build
cmake .. -DCMAKE_INSTALL_PREFIX=/usr
make
sudo make install
but still there is something to note. In recent versions you surely have to update the libbpf submodule, or you'll end up with compilation errors at early stage. My steps today were the following:
openxs@ao756:~/dbs/maria10.3$ cd ~/git/bcc/
openxs@ao756:~/git/bcc$ git pull
Already up-to-date.
openxs@ao756:~/git/bcc$ git log -1
commit dce8e9daf59f44dec4e3500d39a82a8ce59e43ba
Author: Yonghong Song <yhs@fb.com>
Date:   Fri Jan 17 22:06:52 2020 -0800

    sync with latest libbpf repo

    sync libbpf submodule upto the following commit:
        commit 033ad7ee78e8f266fdd27ee2675090ccf4402f3f
        Author: Andrii Nakryiko <andriin@fb.com>
        Date:   Fri Jan 17 16:22:23 2020 -0800

            sync: latest libbpf changes from kernel

    Signed-off-by: Yonghong Song <yhs@fb.com>

openxs@ao756:~/git/bcc$ git submodule init
openxs@ao756:~/git/bcc$ git submodule update
Submodule path 'src/cc/libbpf': checked out '033ad7ee78e8f266fdd27ee2675090ccf4402f3f'
Now I can proceed to build subdirectory and complete the build:
openxs@ao756:~/git/bcc/build$ cmake .. -DCMAKE_INSTALL_PREFIX=/usr...
openxs@ao756:~/git/bcc/build$ make
...
[ 99%] Building CXX object tests/cc/CMakeFiles/test_libbcc.dir/test_usdt_probes.cc.o
[100%] Building CXX object tests/cc/CMakeFiles/test_libbcc.dir/utils.cc.o
[100%] Linking CXX executable test_libbcc
[100%] Built target test_libbcc
It's always interesting to check if tests pass:
openxs@ao756:~/git/bcc/build$ make test
Running tests...
Test project /home/openxs/git/bcc/build
      Start  1: style-check
 1/40 Test  #1: style-check ......................   Passed    0.01 sec
      Start  2: c_test_static
 2/40 Test  #2: c_test_static ....................   Passed    0.30 sec
...
40/40 Test #40: lua_test_standalone ..............***Failed    0.06 sec

75% tests passed, 10 tests failed out of 40

Total Test time (real) = 450.78 sec

The following tests FAILED:
          3 - test_libbcc (Failed)
          4 - py_test_stat1_b (Failed)
          5 - py_test_bpf_log (Failed)
          6 - py_test_stat1_c (Failed)
          7 - py_test_xlate1_c (Failed)
          8 - py_test_call1 (Failed)
         16 - py_test_brb (Failed)
         17 - py_test_brb2 (Failed)
         18 - py_test_clang (Failed)
         40 - lua_test_standalone (Failed)
Errors while running CTest
Makefile:105: recipe for target 'test' failed
make: *** [test] Error 8
I've always had some tests failed, and one day I probably have to report the issue for the project, but for the purpose of this post (based on previous experience with older code) I'll get at least trace tool working as expected. So, I decided to process with installation:
openxs@ao756:~/git/bcc/build$ sudo make install
...
-- Up-to-date: /usr/share/bcc/tools/old/stackcount
-- Up-to-date: /usr/share/bcc/tools/old/oomkill
The tools are installed by default to /usr/share/bcc/tools.

For adding dynamic probes I'll use trace tool  that probes functions you specify and displays trace messages if a particular condition is met. You can control the message format to display function
arguments and return values.

Brendan Gregg explains the usage of this and other tools here with a lot of details. I'll just add a nice chart from that page here:
There is a separate tutorial with examples. You may want to check section for trace tool there.

For the purpose of this blog post I think it's enough to quickly check help output:
openxs@ao756:~/git/bcc/build$ sudo /usr/share/bcc/tools/trace
usage: trace [-h] [-b BUFFER_PAGES] [-p PID] [-L TID] [-v] [-Z STRING_SIZE]
             [-S] [-M MAX_EVENTS] [-t] [-u] [-T] [-C] [-c CGROUP_PATH]
             [-n NAME] [-f MSG_FILTER] [-B] [-s SYM_FILE_LIST] [-K] [-U] [-a]
             [-I header]
             probe [probe ...]
trace: error: too few arguments
and note the following basic syntax is used to define probes (see man /usr/share/bcc/man/man8/trace.8.gz after buuilding the tools from source as described above):
PROBE SYNTAX
       The general probe syntax is as follows:

       [{p,r}]:[library]:function[(signature)]      [(predicate)]     ["format
       string"[, arguments]]

       {t:category:event,u:library:probe}  [(predicate)]  ["format   string"[,
       arguments]]

       {[{p,r}],t,u}
              Probe  type  -  "p" for function entry, "r" for function return,
              "t" for kernel tracepoint, "u" for USDT probe. The default probe
              type is "p".
...
Fir simplicity here we do not consider conditional probes, so predicate is skipped. At the moment we are not interested in kernel or or user defined static tracepoints (the are not defined in default recent builds of MySQL or MariaDB server anway and require -DENABLE_DTRACE=ON to be explictly added to cmake command line used). For user defined dynamic probes in the mysqld process we need p (for probe at function entry) and maybe r (for function return).

We need to refer to library, and in our case this is a full path name to the mysqld binary (or just mysqld if it's in PATH). We also need to refer to some function by names. Quick test will show you that by default trace does NOT accept plain function names in MySQL or MariaDB code (as perf does), and require mangled ones to be used (same as bpftrace). We can find the names with nm command:
openxs@ao756:~/git/bcc/build$ nm -na /home/openxs/dbs/maria10.3/bin/mysqld | grep dispatch_command
00000000004a1eef t _Z16dispatch_command19enum_server_commandP3THDPcjbb.cold.344
00000000005c5180 T _Z16dispatch_command19enum_server_commandP3THDPcjbb
00000000005c5180 t _Z16dispatch_command19enum_server_commandP3THDPcjbb.localalias.256
In the example above I was specifically looking for dispatch_command() function of MariaDB server version 10.3.x that I assume (see previous post) has a string with SQL statement as the third argument, packet. So, I can refer to this function in probe as "_Z16dispatch_command19enum_server_commandP3THDPcjbb".

The "format string" that define how to output arguments of probe  is a printf-style format string.  You  can  use the following format specifiers: %s, %d%u, ...  with the same semantics as printf's. In our case for zero-terminating string we'll use "%s".

Arguments of the function traced are named arg1, arg2, ... argN (unless we provide a signature for function), and are numbered starting from 1. So, in out case we can add a probe to print third argument of the dispatch_command() function upon entry as follows:
openxs@ao756:~/dbs/maria10.3$ sudo /usr/share/bcc/tools/trace -T 'p:/home/openxs/dbs/maria10.3/bin/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb "%s" arg3'
[sudo] password for openxs:
TIME     PID     TID     COMM            FUNC             -
16:16:53 26585   29133   mysqld          _Z16dispatch_command19enum_server_commandP3THDPcjbb select @@version_comment limit 1
16:17:02 26585   29133   mysqld          _Z16dispatch_command19enum_server_commandP3THDPcjbb select 1
16:17:05 26585   29133   mysqld          _Z16dispatch_command19enum_server_commandP3THDPcjbb select 2
16:17:07 26585   29133   mysqld          _Z16dispatch_command19enum_server_commandP3THDPcjbb
^C
I've got the output above for this sample session:
openxs@ao756:~/dbs/maria10.3$ bin/mysql -uroot --socket=/tmp/mariadb.sock
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 16
Server version: 10.3.22-MariaDB Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> select 1;
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0,000 sec)

MariaDB [(none)]> select 2;
+---+
| 2 |
+---+
| 2 |
+---+
1 row in set (0,001 sec)

MariaDB [(none)]> exit
Bye
Note that I've added -T option to the command line to output timestamp.
  
It's a bit more complex with recent versions of Percona Server or MySQL (or any function parameters that are complex structures). It's also more complex if we want to process prepared statements and process packet content depending on the first argument (the above is correct only for COM_QUERY) and so on. But these are basic steps to get a log of SQL queries with timestamps by adding a dynamic probe with BCC trace tool. Enjoy!

by Valerii Kravchuk (noreply@blogger.com) at January 20, 2020 02:25 PM

SeveralNines

Rebuilding a MySQL 8.0 Replication Slave Using a Clone Plugin

With MySQL 8.0 Oracle adopted a new approach to development. Instead of pushing features with major versions, almost every minor MySQL 8.0 version comes with new features or improvements. One of these new features is what we would like to focus on in this blog post. 

Historically MySQL did not come with good tools for provisioning. Sure, you had mysqldump, but it is just a logical backup tool, not really suitable for larger environments. MySQL enterprise users could benefit from MySQL Enterprise Backup while community users could use xtrabackup. Neither of those came with a clean MySQL Community deployments though. It was quite annoying as provisioning is a task you do quite often. You may need to build a new slave, rebuild a failed one - all of this will require some sort of a data transfer between separate nodes.

MySQL 8.0.17 introduced a new way of provisioning MySQL data - clone plugin. It was intended with MySQL Group Replication in mind to introduce a way of automatic provisioning and rebuilding of failed nodes, but its usefulness is not limited to that area. We can as well use it to rebuild a slave node or provision a new server. In this blog post we would like to show you how to set up MySQL Clone plugin and how to rebuild a replication slave.

First of all, the plugin has to be enabled as it is disabled by default. Once you do this, it will stay enabled through restarts. Ideally, you will do it on all of the nodes in the replication topology.

mysql> INSTALL PLUGIN clone SONAME 'mysql_clone.so';

Query OK, 0 rows affected (0.00 sec)

Clone plugin requires MySQL user with proper privileges. On donor it has to have “BACKUP_ADMIN” privilege while on the joiner it has to have “CLONE_ADMIN” privilege. Assuming you want to use the clone plugin extensively, you can just create user with both privileges. Do it on the master so the user will be created also on all of the slaves. After all, you never know which node will be a master some time in the future therefore it’s more convenient to have everything prepared upfront.

mysql> CREATE USER clone_user@'%' IDENTIFIED BY 'clonepass';

Query OK, 0 rows affected (0.01 sec)

mysql> GRANT BACKUP_ADMIN, CLONE_ADMIN ON *.* to clone_user@'%';

Query OK, 0 rows affected (0.00 sec)

MySQL Clone plugin has some prerequisites thus sanity checks should be performed. You should ensure that both donor and joiner will have the same values in the following configuration variables:

mysql> SHOW VARIABLES LIKE 'innodb_page_size';

+------------------+-------+

| Variable_name    | Value |

+------------------+-------+

| innodb_page_size | 16384 |

+------------------+-------+

1 row in set (0.01 sec)

mysql> SHOW VARIABLES LIKE 'innodb_data_file_path';

+-----------------------+-------------------------+

| Variable_name         | Value   |

+-----------------------+-------------------------+

| innodb_data_file_path | ibdata1:100M:autoextend |

+-----------------------+-------------------------+

1 row in set (0.01 sec)

mysql> SHOW VARIABLES LIKE 'max_allowed_packet';

+--------------------+-----------+

| Variable_name      | Value |

+--------------------+-----------+

| max_allowed_packet | 536870912 |

+--------------------+-----------+

1 row in set (0.00 sec)

mysql> SHOW GLOBAL VARIABLES LIKE '%character%';

+--------------------------+--------------------------------+

| Variable_name            | Value       |

+--------------------------+--------------------------------+

| character_set_client     | utf8mb4       |

| character_set_connection | utf8mb4                        |

| character_set_database   | utf8mb4       |

| character_set_filesystem | binary                         |

| character_set_results    | utf8mb4       |

| character_set_server     | utf8mb4       |

| character_set_system     | utf8       |

| character_sets_dir       | /usr/share/mysql-8.0/charsets/ |

+--------------------------+--------------------------------+

8 rows in set (0.00 sec)



mysql> SHOW GLOBAL VARIABLES LIKE '%collation%';

+-------------------------------+--------------------+

| Variable_name                 | Value |

+-------------------------------+--------------------+

| collation_connection          | utf8mb4_0900_ai_ci |

| collation_database            | utf8mb4_0900_ai_ci |

| collation_server              | utf8mb4_0900_ai_ci |

| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |

+-------------------------------+--------------------+

4 rows in set (0.00 sec)

Then, on the master, we should double-check that undo tablespaces have unique names:

mysql> SELECT TABLESPACE_NAME, FILE_NAME FROM INFORMATION_SCHEMA.FILES

    ->        WHERE FILE_TYPE LIKE 'UNDO LOG';

+-----------------+------------+

| TABLESPACE_NAME | FILE_NAME  |

+-----------------+------------+

| innodb_undo_001 | ./undo_001 |

| innodb_undo_002 | ./undo_002 |

+-----------------+------------+

2 rows in set (0.12 sec)

Default verbosity level does not show too much data regarding cloning process therefore we would recommend to increase it to have better insight into what is happening:

mysql> SET GLOBAL log_error_verbosity=3;

Query OK, 0 rows affected (0.00 sec)

To be able to start the process on our joiner, we have to configure a valid donor:

mysql> SET GLOBAL clone_valid_donor_list ='10.0.0.101:3306';

Query OK, 0 rows affected (0.00 sec)

mysql> SHOW VARIABLES LIKE 'clone_valid_donor_list';

+------------------------+-----------------+

| Variable_name          | Value |

+------------------------+-----------------+

| clone_valid_donor_list | 10.0.0.101:3306 |

+------------------------+-----------------+

1 row in set (0.00 sec)

Once it is in place, we can use it to copy the data from:

mysql> CLONE INSTANCE FROM 'clone_user'@'10.0.0.101':3306 IDENTIFIED BY 'clonepass';

Query OK, 0 rows affected (18.30 sec)

That’s it, the progress can be tracked in the MySQL error log on the joiner. Once everything is ready, all you have to do is to setup the replication:

mysql> CHANGE MASTER TO MASTER_HOST='10.0.0.101', MASTER_AUTO_POSITION=1;

Query OK, 0 rows affected (0.05 sec)

mysql> START SLAVE USER='rpl_user' PASSWORD='afXGK2Wk8l';

Query OK, 0 rows affected, 1 warning (0.01 sec)

Please keep in mind that Clone plugin comes with a set of limitations. For starters, it transfers only InnoDB tables so if you happen to use any other storage engines, you would have to either convert them to InnoDB or use another provisioning method. It also interferes with Data Definition Language - ALTERs will block and be blocked by cloning operations.

By default cloning is not encrypted so it could be used only in a secure environment. If needed, you can set up SSL encryption for the cloning process by ensuring that the donor has SSL configured and then define following variables on the joiner:

clone_ssl_ca=/path/to/ca.pem

clone_ssl_cert=/path/to/client-cert.pem

clone_ssl_key=/path/to/client-key.pem

Then, you need to add “REQUIRE SSL;” at the end of the CLONE command and the process will be executed with SSL encryption. Please keep in mind this is the only method to clone databases with data-at-rest encryption enabled.

As we mentioned at the beginning, cloning was, most likely, designed with MySQL Group Replication/InnoDB Cluster in mind but, as long as the limitations are not affecting particular use case, it can be used as a native way of provisioning any MySQL instance. We will see how broad of adoption it will have - possibilities are numerous. What’s already great is we now have another hardware-agnostic method we can use to provision servers in addition to Xtrabackup. Competition is always good and we are looking forward to see what the future holds.

 

by krzysztof at January 20, 2020 10:45 AM

January 18, 2020

Valeriy Kravchuk

Fun with Bugs #92 - On MySQL Bug Reports I am Subscribed to, Part XXVI

I'd like to continue reviewing MySQL bug reports from Community users that I considered interesting and subscribed to. Unlike in the previous post in this series, I am not going to check test cases on any competitor product, but will use only recently released MySQL 5.7.29 and 8.0.19 for checks, if any. This time I'll concentrate on bugs reported in November 2019.

As usual, I mostly care about optimizer, InnoDB and replication related bugs. Here is the list:
  • Bug #97476 - "Range optimizer skips rows". This bug reported by Ilya Raudsepp looks like a clear regression in MySQL 8.0.x comparing to MySQL 5.7.x at least. I get the following correct results with 5.7.29:
    mysql> SELECT t.id
        -> FROM Test t
        -> JOIN (
        ->     SELECT item_id, MAX(created_at) AS created_at
        ->     FROM Test t
        ->     WHERE (platform_id = 2) AND (item_id IN (3,2,111)) AND (type = 'Default')
        ->     GROUP BY item_id
        -> ) t2 ON t.item_id = t2.item_id
        ->   t.item_id = t2.item_id
        ->   AND t.created_at = t2.created_at
        ->   AND t.type = 'Default'
        -> WHERE t.platform_id = 2;
    +----+
    | id |
    +----+
    |  6 |
    |  3 |
    |  5 |
    +----+
    3 rows in set (0,03 sec)

    mysql> select version();
    +-----------+
    | version() |
    +-----------+
    | 5.7.29    |
    +-----------+
    1 row in set (0,02 sec)
  • Bug #97531 - "5.7 replication breakage with syntax error with GRANT management". This tricky bug reported by Simon Mudd applies also to MySQL 8.0.x. It is closed as fixed, but the fix had not made it to recent 5.7.29 and 8.0.19 releases, so you'll have to wait for few more months.
  • Bug #97552 - "Regression: LEFT JOIN with Impossible ON condition performs slowly". Yet another optimizer regression in MySQL 8 (comparing to 5.7.x) that is fixed only in MySQL 8.0.20+. The bug was reported by Fredric Johansson.
  • Bug #97648 - "Bug in order by clause in union clause". Yet another regression (at least from user's point of view) in recent MySQL 5.7.x and 8.0.x comparing to 5.6.x. This time without a "regression" tag. The bug was reported by Andrei Mart.
  • Bug #97662 - "MySQL v8.0.18 FIPS mode is no longer supported". According to Ryan L, MySQL 8.0.18+ is no longer supporting ssl_fips_mode=STRICT, as OpenSSL 1.1.1 is not FIPS-compatible and MySQL Server must be compiled using OpenSSL 1.1.1 or higher. That's interesting. Check also this link.
  • Bug #97682 - "Handler fails to trigger on Error 1049 or SQLSTATE 42000 or plain sqlexception". This regression (comparing to MySQL 5.7) was reported by Jericho Rivera. It is fixed in MySQL 8.0.20. The patch was provided by Kamil Holubicki.
  • Bug #97692 - "Querying information_schema.TABLES issue". I do not see any documented attempt to check on MySQL 8.0, so I had to add a comment to the bug report. From what I see, in MySQL 8.0.19 we still get different (empty) result from the second query, but at least now we have a warning:
    mysql> SELECT ts.TABLE_SCHEMA
        -> FROM information_schema.TABLES ts
        -> WHERE ts.TABLE_TYPE ='VIEW'
        -> AND ts.TABLE_SCHEMA NOT IN ('sys')
        -> AND ts.TABLE_COMMENT LIKE '%invalid%';
    +--------------+
    | TABLE_SCHEMA |
    +--------------+
    | test         |
    +--------------+
    1 row in set, 1 warning (0,00 sec)

    mysql> show warnings\G
    *************************** 1. row ***************************
      Level: Warning
       Code: 1356
    Message: View 'test.v' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them
    1 row in set (0,00 sec)

    mysql> select version();
    +-----------+
    | version() |
    +-----------+
    | 8.0.19    |
    +-----------+
    1 row in set (0,00 sec)
    The bug was reported by Vinicius Malvestio Grippa.
  • Bug #97693 - "ALTER USER user IDENTIFIED BY 'password' broken by invalid authentication_string". The bug was reported by Nikolai Ikhalainen. MySQL 8.0.19 is still affected.
  • Bug #97694 - "MySQL 8.0.18 fails on STOP SLAVE/START SLAVE stress test". For some reason I do not see any documented attempt to verify this on MySQL 5.7 also. The bug was reported by Przemysław Skibiński, who also suggested a fix.
  • Bug #97734 - "Document the correct method to stop slaving with MTS without a warning or error". I can only agree with this request from Buchan Milne. Please. do :)
  • Bug #97735 - "ALTER USER IF EXISTS ... WITH_MAX_USER_CONNECTIONS 9999 not applied correctly". yet another bug report by Simon Mudd in this list. For some reason, again, I do not see any documented attempt to verify the bug on MySQL 8.0.x, while there is no clear reason to think it is not affected.
  • Bug #97742 - "bad item ref from correlated subquery to outer distinct table". This bug was reported by Song Zhibai, who also had contributed a patch. Based on further comments from  Øystein Grøvlen and these results:
    mysql> EXPLAIN SELECT f3 FROM t1 HAVING (SELECT 1 FROM t2 HAVING f2 LIMIT 1);
    +----+--------------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
    | id | select_type        | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
    +----+--------------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
    |  1 | PRIMARY            | t1    | NULL       | ALL   | NULL          | NULL    | NULL    | NULL |    3 |   100.00 | NULL        |
    |  2 | DEPENDENT SUBQUERY | t2    | NULL       | index | NULL          | PRIMARY | 4       | NULL |    1 |   100.00 | Using index |
    +----+--------------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
    2 rows in set, 2 warnings (0,00 sec)

    mysql> show warnings\G
    *************************** 1. row ***************************
      Level: Note
       Code: 1276
    Message: Field or reference 'f2' of SELECT #2 was resolved in SELECT #1
    *************************** 2. row ***************************
      Level: Note
       Code: 1003
    Message: /* select#1 */ select `test`.`t1`.`f3` AS `f3` from `test`.`t1` having (/* select#2 */ select 1 from `test`.`t2` having `test`.`t1`.`f2` limit 1)
    2 rows in set (0,00 sec)

    mysql> select version();
    +-----------+
    | version() |
    +-----------+
    | 5.7.29    |
    +-----------+
    1 row in set (0,00 sec)
    I'd say that MySQL 5.7.x is also affected, but for some reason nobody documented any attempt to verify it there. So, I've added a comment.
  • Bug #97777 - "separate global variables (from hot variables) using linker script (ELF)". Beautiful bug report from Daniel Black.With a lot of details, perf and readelf outputs and patch contributed. See also his Bug #97822 - "buf_page_get_gen buf_pool->stat.n_page_gets++ is a cpu waste", with perf analysis up to a single assembler instruction level and fix suggested.
  • Bug #97825 - "dd_mdl_acquire in dd_table_open with dict_sys->mutex hold may cause deadlock". Here I am really puzzled by no visible attempt to check the arguments of bug reporter, Dave Do, who tried to perform lock order analysis by code review. All we see as a result is this:
    "Lock order could be different, but it is irrelevant, since these are locks on totally different levels and can't, in themselves, cause any deadlock."
    What a great argument! Not a bug, surely... We trust you.
    "What bugs are you talking about? I have no bugs, neither does MySQL 8!"
    To summarize:
    1. MySQL 8 introduces some optimizer (and some other) regressions. They seem to be fixed fast enough, but I wonder why only Community users were able to find them not Oracle's QA...
    2. MySQL 8.0.19 is surely great, but I see many serious bugs fixed o0nly in 8.0.20+.
    3. Percona, Booking and Facebook engineers still continue contributing high quality bug reports, comments/verification details and patches. Oracle is lucky to have such nice partners in making MySQL better.
    4. I still see problems with following proper verification procedures and documenting the results. Too often the bug reported for 8.0.x is NOT checked on 5.7.x as well, regression tag is not set, and so on. Sometimes reports are closed as "Not a bug" without any attempt to follow the analysis provided or prove the point. This is sad and wrong.

    by Valerii Kravchuk (noreply@blogger.com) at January 18, 2020 07:49 PM

    January 17, 2020

    SeveralNines

    MongoDB 4.2 Management & Monitoring Without Vendor Lockin

    With the release of a new version of ClusterControl (1.7.5), we can see several new features, one of the main ones being the support for MongoDB 4.2.

    MongoDB 4.2 is on the market for a while. It was initially announced at MongoDB World in June 2019, with GA ready in August. Since then, a lot of you have been putting it through its paces. It brings many awaited features, which makes NoSQL a more straightforward choice over RDBMS.

    The most significant feature in 4.X was transaction support. It dramatically reduces the gap between RDBMS and NoSQL systems. MongoDB transactions were added in version 4.0, but that didn't work with the most powerful feature of MongoDB clusters. Now MongoDB extends multi-document ACID, which is now guaranteed from the replica set to sharded clusters, enabling you to serve an even broader range of use cases.

    The most prominent features of version 4.2 are:

    • On-Demand Materialized Views using the new $merge operator. 
    • Distributed transactions
    • Wildcard Indexes
    • Server-side updates 
    • MongoDB Query Language enhancements
    • Field-level encryption to selectively protect sensitive files

    To install MongoDB 4.2 manually, we must first add the repositories or download the necessary packages for the installation, install them, and configure them correctly, depending on our infrastructure. All these steps take time, so let's see how we could speed it up.

    In this blog, we will see how to deploy this new MongoDB version with a few clicks using ClusterControl and how to manage it. As a prerequisite, please install the 1.7.5 version of ClusterControl on a dedicated host or VM.

    Deploying a MongoDB 4.2 Replica Shard

    To perform a new installation from ClusterControl, select the option "Deploy" and follow the instructions that appear. Note that if you already have a MongoDB 4.2 instance running, then you need to choose the 'Import Existing Server/Database' instead.

    Deploy MongoDB 4.2

    ClusterControl Deployment Options

    When selecting MongoDB, we must specify User, Key or Password and port to connect by SSH to our MongoDB nodes. We also need the name for our new cluster and if we want ClusterControl to install the corresponding software and configurations for us.

    After setting up the SSH access information, we must define the database user, version, and datadir (optional). We can also specify which repository to use. In this case, we want to deploy MongoDB 4.2, so select it and continue.

    In the next step, we need to add our servers to the cluster we are going to create.

    ClusterControl Percona 4.2 MongoDB Deployment

    When adding our servers, we can enter IP or hostname.

    ClusterControl MongoDB 4.2 Deployment

    We can monitor the status of the creation of our new cluster from the ClusterControl activity monitor.

    ClusterControl Job Details

    Once the task is finished, we can see our new MongoDB replicaSet in the main ClusterControl screen.

    ClusterContorol Dashboard Status

    Once we have our cluster created, we can perform several tasks on it, like adding a backup job

    Scaling MongoDB 4.2 

    If we go to cluster actions and select "Add  Node", we can either create a new replica from scratch or add an existing MongoDB database as a replica.

    ClusterControl MongoDB 4.2 Add a Node

    As you can see in the image, we only need to choose our new or existing server, enter the IP address for our new slave server and the database port. Then, we can choose if we want ClusterControl to install the software for us and configure cluster.

    The other option is to convert replica set clusters to MongoDB shard. CusterControl will walk you through the process. We need to provide details about Configuration Server and Routers as you can see on the below screen.  

    ClusterControl Convert MongoDB 4.2 ReplicaSet to Shard

    Conclusion

    As we have seen above, you can now deploy the latest MongoDB (version 4.2) using ClusterControl. Once deployed, ClusterControl provides a whole range of features, from monitoring, alerting, automatic failover, backup, point-in-time recovery, backup verification, to scaling of reading replicas.

    by Bart Oles at January 17, 2020 10:45 AM

    January 16, 2020

    SeveralNines

    Why Did My MySQL Database Crash? Get Insights with the New MySQL Freeze Frame

    In case you haven't seen it, we just released ClusterControl 1.7.5  with major improvements and new useful features. Some of the features include Cluster Wide Maintenance, support for version CentOS 8 and Debian 10, PostgreSQL 12 Support, MongoDB 4.2 and Percona MongoDB v4.0 support, as well as the new MySQL Freeze Frame. 

    Wait, but What is a MySQL Freeze Frame? Is This Something New to MySQL? 

    Well it's not something new within the MySQL Kernel itself. It's a new feature we added to ClusterControl 1.7.5 that is specific to MySQL databases. The MySQL Freeze Frame in ClusterControl 1.7.5 will cover these following things:

    • Snapshot MySQL status before cluster failure.
    • Snapshot MySQL process list before cluster failure (coming soon).
    • Inspect cluster incidents in operational reports or from the s9s command line tool.

    These are valuable sets of information that can help trace bugs and fix your MySQL/MariaDB clusters when things go south. In the future, we are planning to include also snapshots of the SHOW ENGINE InnoDB status values as well. So please stay tuned to our future releases.

    Note that this feature is still in beta state, we expect to collect more datasets as we work with our users. In this blog, we will show you how to leverage this feature, especially when you need further information when diagnosing your MySQL/MariaDB cluster.

    ClusterControl on Handling Cluster Failure

    For cluster failures, ClusterControl does nothing unless Auto Recovery (Cluster/Node) is enabled just like below:

    Once enabled, ClusterControl will try to recover a node or recover the cluster by bringing up the entire cluster topology. 

    For MySQL, for example in a master-slave replication, it must have at least one master alive at any given time, regardless of the number of available slave/s. ClusterControl attempts to correct the topology at least once for replication clusters, but provides more retries for multi-master replication like NDB Cluster and Galera Cluster. Node recovery attempts to recover a failing database node, e.g. when the process was killed (abnormal shutdown), or the process suffered an OOM (Out-of-Memory). ClusterControl will connect to the node via SSH and try to bring up MySQL. We have previously blogged about How ClusterControl Performs Automatic Database Recovery and Failover, so please visit that article to learn more about the scheme for ClusterControl auto recovery.

    In the previous version of ClusterControl < 1.7.5, those attempted recoveries triggered alarms. But one thing our customers missed was a more complete incident report with state information just before the cluster failure. Until we realized this shortfall and  added this feature in ClusterControl 1.7.5. We called it the "MySQL Freeze Frame". The MySQL Freeze Frame, as of this writing, offers a brief summary of incidents leading to cluster state changes just before the crash. Most importantly, it includes at the end of the report the list of hosts and their MySQL Global Status variables and values.

    How Does MySQL Freeze Frame Differs With Auto Recovery?

    The MySQL Freeze Frame is not part of the auto recovery of ClusterControl. Whether Auto Recovery is disabled or enabled, the MySQL Freeze Frame will always do its work as long as a cluster or node failure has been detected.

    How Does MySQL Freeze Frame Work?

    In ClusterControl, there are certain states that we classify as different types of Cluster Status. MySQL Freeze Frame will generate an incident report when these two states are triggered:

    • CLUSTER_DEGRADED
    • CLUSTER_FAILURE

    In ClusterControl, a CLUSTER_DEGRADED is when you can write to a cluster, but one or more nodes are down. When this happens, ClusterControl will generate the incident report.

    For CLUSTER_FAILURE, though its nomenclature explains itself, it is the state where your cluster fails and is no longer able to process reads or writes. Then that is a CLUSTER_FAILURE state. Regardless of whether an auto-recovery process is attempting to fix the problem or whether it's disabled, ClusterControl will generate the incident report.

    How Do You Enable MySQL Freeze Frame?

    ClusterControl's MySQL Freeze Frame is enabled by default and only generates an incident report only when the states CLUSTER_DEGRADED or CLUSTER_FAILURE are triggered or encountered. So there's no need on the user end to set any ClusterControl configuration setting, ClusterControl will do it for you automagically.

    Locating the MySQL Freeze Frame Incident Report

    As of this writing, there are 4-ways you can locate the incident report. These can be found by doing the following sections below.

    Using the Operational Reports Tab

    The Operational Reports from the previous versions are used only to create, schedule, or list the operational reports that have been generated by users. Since version 1.7.5, we included the incident report generated by our MySQL Freeze Frame feature. See the example below:

    The checked items or items with Report type == incident_report, are the incident reports generated by MySQL Freeze Frame feature in ClusterControl.

    Using Error Reports

    By selecting the cluster and generating an error report, i.e. going through this process: <select the cluster> → Logs → Error Reports→ Create Error Report. This will include the incident report under the ClusterControl host.

    Using s9s CLI Command Line

    On a generated incident report, it does include instructions or hint on how you can use this with s9s CLI command. Below are what's shown in the incident report:

    Hint! Using the s9s CLI tool allows you to easily grep data in this report, e.g:

    s9s report --list --long
    
    s9s report --cat --report-id=N

    So if you want to locate and generate an error report, you can use this approach:

    [vagrant@testccnode ~]$ s9s report --list --long --cluster-id=60
    
    ID CID TYPE            CREATED TITLE                            
    
    19  60 incident_report 16:50:27 Incident Report - Cluster Failed
    
    20  60 incident_report 17:01:55 Incident Report

    If I want to grep the wsrep_* variables on a specific host, I can do the following:

    [vagrant@testccnode ~]$ s9s report --cat --report-id=20 --cluster-id=60|sed -n '/WSREP.*/p'|sed 's/  */ /g'|grep '192.168.10.80'|uniq -d
    
    | WSREP_APPLIER_THREAD_COUNT | 4 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_CLUSTER_CONF_ID | 18446744073709551615 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_CLUSTER_SIZE | 1 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_CLUSTER_STATE_UUID | 7c7a9d08-2d72-11ea-9ef3-a2551fd9f58d | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_EVS_DELAYED | 27ac86a9-3254-11ea-b104-bb705eb13dde:tcp://192.168.10.100:4567:1,9234d567-3253-11ea-92d3-b643c178d325:tcp://192.168.10.90:4567:1,9234d567-3253-11ea-92d4-b643c178d325:tcp://192.168.10.90:4567:1,9e93ad58-3241-11ea-b25e-cfcbda888ea9:tcp://192.168.10.90:4567:1,9e93ad58-3241-11ea-b25f-cfcbda888ea9:tcp://192.168.10.90:4567:1,9e93ad58-3241-11ea-b260-cfcbda888ea9:tcp://192.168.10.90:4567:1,9e93ad58-3241-11ea-b261-cfcbda888ea9:tcp://192.168.10.90:4567:1,9e93ad58-3241-11ea-b262-cfcbda888ea9:tcp://192.168.10.90:4567:1,9e93ad58-3241-11ea-b263-cfcbda888ea9:tcp://192.168.10.90:4567:1,b0b7cb15-3241-11ea-bdbc-1a21deddc100:tcp://192.168.10.100:4567:1,b0b7cb15-3241-11ea-bdbd-1a21deddc100:tcp://192.168.10.100:4567:1,b0b7cb15-3241-11ea-bdbe-1a21deddc100:tcp://192.168.10.100:4567:1,b0b7cb15-3241-11ea-bdbf-1a21deddc100:tcp://192.168.10.100:4567:1,b0b7cb15-3241-11ea-bdc0-1a21deddc100:tcp://192.168.10.100:4567:1,dea553aa-32b9-11ea-b321-9a836d562a47:tcp://192.168.10.100:4567:1,dea553aa-32b9-11ea-b322-9a836d562a47:tcp://192.168.10.100:4567:1,e27f4eff-3256-11ea-a3ab-e298880f3348:tcp://192.168.10.100:4567:1,e27f4eff-3256-11ea-a3ac-e298880f3348:tcp://192.168.10.100:4567:1 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_GCOMM_UUID | 781facbc-3241-11ea-8a22-d74e5dcf7e08 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_LAST_COMMITTED | 443 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_LOCAL_CACHED_DOWNTO | 98 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_LOCAL_RECV_QUEUE_MAX | 2 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_LOCAL_STATE_UUID | 7c7a9d08-2d72-11ea-9ef3-a2551fd9f58d | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_PROTOCOL_VERSION | 10 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_PROVIDER_VERSION | 26.4.3(r4535) | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_RECEIVED | 112 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_RECEIVED_BYTES | 14413 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_REPLICATED | 86 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_REPLICATED_BYTES | 40592 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_REPL_DATA_BYTES | 31734 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_REPL_KEYS | 86 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_REPL_KEYS_BYTES | 2752 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_ROLLBACKER_THREAD_COUNT | 1 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_THREAD_COUNT | 5 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |
    
    | WSREP_EVS_REPL_LATENCY | 4.508e-06/4.508e-06/4.508e-06/0/1 | 192.168.10.80:3306 | 2020-01-09 08:50:24 |

    Manually Locating via System File Path

    ClusterControl generates these incident reports in the host where ClusterControl runs. ClusterControl creates a directory in the /home/<OS_USER>/s9s_tmp or /root/s9s_tmp if you are using the root system user. The incident reports can be located, for example, by going to  /home/vagrant/s9s_tmp/60/galera/cmon-reports/incident_report_2020-01-09_085027.html where the format explains as,  /home/<OS_USER>/s9s_tmp/<CLUSTER_ID>/<CLUSTER_TYPE>/cmon-reports/<INCIDENT_FILE_NAME>.html. The full path of the file is also displayed when you hover your mouse in the item or file you want to check under the Operational Reports Tab just like below:

    Are There Any Dangers or Caveats When Using MySQL Freeze Frame?

    ClusterControl does not change nor modify anything in your MySQL nodes or cluster. MySQL Freeze Frame will just read SHOW GLOBAL STATUS (as of this time) at specific intervals to save records since we cannot predict the state of a MySQL node or cluster when it can crash or when it can have hardware or disk issues. It's not possible to predict this, so we save the values and therefore we can generate an incident report in case a particular node goes down. In that case, the danger of having this is close to none. It can theoretically add a series of client requests to the server(s) in case some locks are held within MySQL, but we have not noticed it yet.The series of tests doesn't show this so we would be glad if you can let us know or file a support ticket in case problems arise.

    There are certain situations where an incident report might not be able to gather global status variables if a network issue was the problem prior to ClusterControl freezing a specific frame to gather data. That's completely reasonable because there's no way ClusterControl can collect data for further diagnosis as there's no connection to the node in the first place.

    Lastly, you might wonder why not all variables are shown in the GLOBAL STATUS section? For the meantime, we set a filter where empty or 0 values are excluded in the incident report. The reason is that we want to save some disk space. Once these incident reports are no longer needed, you can delete it via Operational Reports Tab.

    Testing the MySQL Freeze Frame Feature

    We believe that you are eager to try this one and see how it works. But please, make sure you are not running or testing this in a live or production environment. We'll cover 2-phases of scenario in the MySQL/MariaDB, one for master-slave setup and one for Galera-type setup.

    Master-Slave Setup Test Scenario

    In a master-slave(s) setup, it's easy and simple to try. 

    Step One

    Make sure that you have disabled the Auto Recovery modes (Cluster and Node), like below:

    so it won't try or attempt to fix the test scenario.

    Step Two

    Go to your Master node and try setting to read-only:

    root@node1[mysql]> set @@global.read_only=1;
    
    Query OK, 0 rows affected (0.000 sec)

    Step Three

    This time, an alarm was raised and so a generated incident report. See below how does my cluster looks like:

    and the alarm was triggered:

    and the incident report was generated:

    Galera Cluster Setup Test Scenario

    For Galera-based setup, we need to make sure that the cluster will be no longer available, i.e., a cluster-wide failure. Unlike the Master-Slave test, you can let Auto Recovery enabled since we'll play around with network interfaces.

    Note: For this setup, ensure that you have multiple interfaces if you are testing the nodes in a remote instance since you cannot bring the interface up when you down that interface where you are connected.

    Step One

    Create a 3-node Galera cluster (for example using vagrant)

    Step Two

    Issue the command (just like below) to simulate network issue and do this to all the nodes

    [root@testnode10 ~]# ifdown eth1
    
    Device 'eth1' successfully disconnected.

    Step Three

    Now, it took my cluster down and have this state:

    raised an alarm,

    and it generates an incident report:

    For a sample incident report, you can use this raw file and save it as html.

    It's quite simple to try but again, please do this only in a non-live and non-prod environment.

    Conclusion

    MySQL Freeze Frame in ClusterControl can be helpful when diagnosing crashes. When troubleshooting, you need a wealth of information in order to determine cause and that is exactly what MySQL Freeze Frame provides.

    by Paul Namuag at January 16, 2020 10:45 AM

    January 15, 2020

    SeveralNines

    Database Management & Monitoring for PostgreSQL 12

    A few months ago we blogged about the release of PostgreSQL 12, with notable improvements to query performance (particularly over larger data sets and overall space utilization) among other important features. Now, with the ClusterControl 1.7.5 version, we’re glad to announce support for this new PostgreSQL version.

    This new ClusterControl 1.7.5 version comes with many new features for managing and monitoring your database cluster. In this blog, we’ll take a look at these features and see how to deploy PostgreSQL 12 easily.

    Easily Deploy PostgreSQL 12

    To perform a new installation of PostgreSQL 12 from ClusterControl, just select the “Deploy” option and follow the instructions that appear. Note that if you already have a PostgreSQL 12 instance running, then you need to select the “Import Existing Server/Database” instead.

    Deploy PostgreSQL 12

    When selecting PostgreSQL, you must specify User, Key or Password, and port to connect by SSH to your PostgreSQL hosts. You also need the name for your new cluster and if you want ClusterControl to install the corresponding software and configurations for you.

    Deploy PostgreSQL 12

    Please check the ClusterControl user requirement for this step here.

    Deploy PostgreSQL 12

    After setting up the SSH access information, you must define the database user, version, and datadir (optional). You can also specify which repository to use. In this case, we want to deploy PostgreSQL 12, so just select it and continue.

    In the next step, you need to add your servers to the cluster you’re going to create.

    When adding your servers, you can enter IP or hostname.

    In the last step, you can choose if your replication will be Synchronous or Asynchronous.

    Deploy Postgres 12

    You can monitor the status of the creation of your new cluster from the ClusterControl Activity Monitor.

    Once the task is finished, you can see your new PostgreSQL 12 cluster in the main ClusterControl screen.

    Once you have your cluster created, you can perform several tasks on it, like adding a load balancer (HAProxy, Keepalived) or a new replica, and also different management or monitoring tasks.

    PostgreSQL 12 Database Management

    As you probably know, using ClusterControl you can perform different management tasks like add/remove load balancers, add/remove slave nodes, automatic fail-over and recovery, backups, create/modify advisors, and even more.

    Schedule Maintenance Mode

    One of the new ClusterControl management features is the option to schedule maintenance mode for the database cluster. If you need to modify something in your environment or if for some reason you need to schedule a maintenance window, you can set it with ClusterControl.

    Go to ClusterControl -> Cluster Actions -> Schedule Maintenance Mode, to enable the maintenance window for all the cluster.

    After enabling it, you won’t receive alarms and notifications from this cluster during the specified period.

    In case you will work over one specific node, you can enable this maintenance mode just for that node and not for all the cluster by using the “Schedule Maintenance Mode” in the Node Actions section.

    PostgreSQL User Management

    Now, in the ClusterControl 1.7.5 version, you’ll be able to manage users/roles for your PostgreSQL cluster. Go to ClusterControl -> Select Cluster -> Manage -> User Management.

    PostgreSQL GUI User Management

    Here you can see all the accounts with the privileges assigned, and you can create a new one, or modify/edit an existing account.

    Now, let’s see how to monitor this new PostgreSQL version by using ClusterControl.

    PostgreSQL 12 Database Monitoring

    Monitoring is a must in all environments, and databases aren’t the exception. If you select your cluster in the ClusterControl main screen, you’ll see an overview of it with some basic metrics.

    PostgreSQL 12 Monitoring

    But probably this is not enough to see what is happening in your database cluster. So if you go to ClusterControl -> Select your Cluster -> Dashboards, you can enable this agent-based dashboard to monitor your database in more detail.

    Once it is enabled, you’ll have detailed information from both the database and the operating system side.

    Postgres 12 Monitoring

    This dashboard method is useful to see, in a friendly way,  if everything is going fine.

    You can also take advantage of the old monitoring features like query monitor, performance, advisors, and more features for PostgreSQL or different database technologies.

    Conclusion

    PostgreSQL 12 comes with many improvements to query performance and new features. If you’re looking for a quick way to give it a try, ClusterControl can help you to deploy, manage and monitor it in an easy way.

    by Sebastian Insausti at January 15, 2020 10:45 AM

    January 14, 2020

    Henrik Ingo

    Automatic retries in MongoDB

    At work we were discussing whether MongoDB will retry operations in some circumstances or whether the client needs to be prepared to do so. After a while we realized different participants in the discussion were discussing different retries.

    So I sat down to get to the bottom of all the retries that can happen in MongoDB, and write a blog post about them. But after googling a bit it turns out someone has already written that blog post, so this will be a short post for me linking to other posts.

    Retries by the driver

    If you set retryWrites=true in your MongoDB connection string, then the driver will automatically retry some write operations for some types of failures. Ok, can I be more specific? Yes I can...

    read more

    by hingo at January 14, 2020 11:10 AM

    SeveralNines

    Cluster-Wide Database Maintenance and Why You Need It

    Undoubtedly, there is a long list of maintenance tasks that have to be performed by system administrators, especially when it comes to critical systems. Some of the tasks have to be performed at regular intervals, like daily, weekly, monthly and yearly. Some have to be done right away, urgently. Nevertheless, any maintenance operation should not lead to another bigger problem, and any maintenance has to be handled with extra care to avoid any interruption to the business. Therefore, planning, scheduling and reporting are important aspects. 

    ClusterControl, as a cluster automation and management tool, is smart enough to plan and schedule maintenance windows in advance. This can help avoid unpleasant surprises during production operations, for instance unnecessary recovery procedure, failovers and alarms being triggered. This blog showcases some of the new maintenance mode features that come with ClusterControl 1.7.5.

    Maintenance Mode pre v1.7.5

    Maintenance mode has been in ClusterControl logic since v1.4.0, where one could set a maintenance duration to an individual node, which allows ClusterControl to disable recovery/failover and alarms on that node during a set period. The maintenance mode can be activated immediately or scheduled to run in the future. Alarms and notifications will be turned off when maintenance mode is active, which is expected in an environment where the corresponding node is undergoing maintenance.

    Some of the weaknesses that we found out and also reported by our users:

    • Maintenance mode was bound per node. This means if one would want to perform maintenance on all nodes in the cluster, one had to repeatedly configure the maintenance mode for every node in the cluster. For larger environments, scheduling a major maintenance window for all nodes on multiple clusters could be repetitive.
    • Activating maintenance mode did not deactivate the automatic recovery feature. This would cause an unhealthy node to be recovered automatically while maintenance is ongoing. False alarms might be raised.
    • Maintenance mode could not be activated periodically per schedule. Therefore, regular maintenance had to be defined manually for every approaching date. There was no way to schedule a cron-based (with iteration) maintenance mode.

    ClusterControl new maintenance mode and job implementations solve all of the key problems mentioned, which are shown in the next sections.

    Database Cluster-Wide Maintenance Mode

    Cluster-wide maintenance mode comes handy in an environment where you have multiple clusters, and multiple nodes per cluster managed by a single ClusterControl instance. For example, a common production setup of a MySQL Galera Cluster could have up to 7 nodes - A three-node Galera Cluster could have one additional host for asynchronous slave, with two ProxySQL/Keepalived nodes and one backup verification server. For older ClusterControl versions where only node maintenance was supported, if a major maintenance is required, for example upgrading OS kernel on all hosts, the scheduling had to be repeated 7 times for every monitored node. We have covered this issue in detail in this blog post, with some workarounds.

    Cluster-wide maintenance mode is the super-set of node maintenance mode as in the previous versions. An activated cluster-wide maintenance mode will activate maintenance mode on all nodes in the particular cluster. Simply click on the Cluster Actions > Schedule Maintenance Mode and you will be presented with the following dialog:

    The fields in this dialog are almost identical with scheduling maintenance dialog for single node, except its domain is the particular cluster, as highlighted in the red oval. You can activate the maintenance immediately, or schedule it to run in the future. Once scheduled, you should see the following notification under the summary bar with status "Scheduled" for all clusters:

    Once the maintenance mode is activated, you should see the blue maintenance icon on the summary bar of the cluster, together with the green 'Active' icon notification in the ClusterControl UI:

    All active maintenance mode can be deactivated at any time via the UI, just go to the Cluster Actions > Disable Maintenance Mode.

    Advanced Maintenance Management via ClusterControl CLI

    ClusterControl CLI a.k.a s9s, comes with an extended maintenance management functionality, allowing users to improve the existing maintenance operation flow as a whole. The CLI works by sending commands as JSON messages to ClusterControl Controller (CMON) RPC interface, via TLS encryption which requires the port 9501 to be opened on controller and the client host.

    With a bit of scripting knowledge, we can fully automate and synchronize the maintenance process flow especially if the exercise involves another layer/party/domain outside of ClusterControl. Note that we always incorporated our changes via the CLI first before making it to the UI. This is one of the ways to test out new functionality to find out if they would be useful to our users.

    The following sections will give you a walkthrough on advanced management for maintenance mode via command line.

    View Maintenance Mode

    To list out all maintenance that has been scheduled for all clusters and nodes:

    $ s9s maintenance --list --long
    ST UUID    OWNER          GROUP  START               END                 HOST/CLUSTER REASON
    Ah 460a97b dba            admins 02:31:32            04:31:32            192.168.0.22 Switching to different racks
    -h e3bf19f user@email.com        2020-01-17 02:35:00 2020-01-17 03:00:00 192.168.0.23 Change network cable - Clark Kent
    -c 8f55f76 user@email.com        2020-01-17 02:34:00 2020-01-17 03:59:00 PXC 57       Kernel upgrade and system reboot - John Doe
    Ac 4f4d73c dba            admins 02:30:01            02:31:01            MariaDB 10.3 Test maintenance job creation every 5 minutes

    Owner with email address means the maintenance mode was created by ClusterControl UI user. While for owners with groups, that user is coming from the CLI with our new user/group permission currently supported on CLI only. The leftmost column is the maintenance mode status:

    • The first character: 'A' stands for active and '-' stands for inactive.
    • The second character: 'h' stands for host-related maintenance and 'c' stands for cluster-related maintenance.

    To list out the current active maintenance mode:

    $ s9s maintenance --current --cluster-id=32
    Cluster 32 is under maintenance: Kernel upgrade and system reboot - John Doe

    Use the job command option to get the timestamp, and status of past maintenance mode:

    $ s9s job --list | grep -i maintenance
    5979  32 SCHEDULED dba            admins 2020-01-09 05:29:34   0% Registering Maintenance
    5980  32 FINISHED  dba            admins 2020-01-09 05:30:01   0% Registering Maintenance
    5981  32 FINISHED  dba            admins 2020-01-09 05:35:00   0% Registering Maintenance
    5982  32 FINISHED  dba            admins 2020-01-09 05:40:00   0% Registering Maintenance

    'Registering Maintenance' is the job name to schedule or activate the maintenance mode.

    Create a Maintenance Mode

    To create a new maintenance mode for a node, specify the host under --nodes parameter, with --begin and --end in ISO 8601 (with microsecond, UTC only thus the suffix 'Z') date format:

    $ s9s maintenance --create \
    --nodes="192.168.0.21" \
    --begin="2020-01-09T08:50:58.000Z" \
    --end="2020-01-09T09:50:58.000Z" \
    --reason="Upgrading RAM"

    However, the above will require an extra effort to figure out the correct start time and end time. We can use the "date" command to translate the date and time to the supported format relative to the current time, similar to below:

    $ s9s maintenance --create \
    --nodes="192.168.0.21" \
    --begin="$(date +%FT%T.000Z -d 'now')" \
    --end="$(date +%FT%T.000Z -d 'now + 2 hours')" \
    --reason="Upgrading RAM"
    b348f2ac-9daa-4481-9a95-e8cdf83e81fc

    The above will activate a maintenance mode for node 192.168.0.21 immediately and will end up in 2 hours from the moment it was created. An accepted command should receive a UUID, as in the above example, it was 'b348f2ac-9daa-4481-9a95-e8cdf83e81fc'. A wrong command will simply return a blank output.

    The following command will schedule a maintenance mode for cluster ID 32 on the next day:

    $ s9s maintenance --create \
    --cluster-id=32 \
    --begin="$(date +%FT%T.000Z -d 'now + 1 day')" \
    --end="$(date +%FT%T.000Z -d 'now + 1 day + 2 hours')" \
    --reason="Replacing old network cable"
    85128b1a-a1cd-450e-b381-2a92c03db7a0

    We can also see what is coming up next in the scheduled maintenance for a particular node or cluster:

    $ date -d 'now'
    Wed Jan  8 07:41:57 UTC 2020
    
    $ s9s maintenance --next --cluster-id=32 --nodes='192.168.0.22'
    Host 192.168.0.22 maintenance starts Jan 09 07:41:23: Replacing old network cable

    Omit --nodes if you just want to see the upcoming maintenance details for a particular cluster.

    Delete Maintenance Mode

    Firstly, retrieve the maintenance job UUID:

    $ s9s maintenance --list --long
    ST UUID    OWNER          GROUP START               END                 HOST/CLUSTER             REASON
    -h 7edeabb user@email.com       04:59:00            06:59:00            192.168.0.21             Changing network cable - John Doe
    -c 82b13d3 user@email.com       2020-01-10 05:02:00 2020-01-10 06:27:00 MariaDB 10.3 Replication Upgrading RAM
    Total: 2

    Use the --uuid and specify the corresponding maintenance mode to delete:

    $ s9s maintenance --delete --uuid=82b13d3
    Deleted.

    At this point the maintenance mode has been deleted for the corresponding node or cluster.

    Maintenance Mode Scheduling with Iteration

    In ClusterControl 1.7.5, maintenance mode can be scheduled and iterated just like a cron job. For example, you can now schedule a maintenance mode for daily, weekly, monthly or yearly. This iteration automates the maintenance mode job creation and simplifies the maintenance workflow, especially if you are running in a fully automated infrastructures, where maintenance happens automatically and at regular intervals.

    There is a special flag that we have to use called --create-with-job, where it registers the maintenance as a new job for the controller to execute. The following is a simple example where we activate maintenance mode by registering a new job:

    $ s9s maintenance \
    --create-with-job \
    --cluster-id=32 \
    --reason="testmainteannce" \
    --minutes=60 \
    --log
    
    Preparing to register maintenance.
    The owner of the maintenance will be 'dba'.
    The reason is: testmainteannce
    The maintenance starts NOW.
    Maintenance will be 60 minute(s) long.
    Registering maintenance for cluster 32.
    Maintenance registered.

    To schedule a periodic maintenance, use the --create-with-job flag, with --minutes for the maintenance duration and --recurrence flag in cron-style formatting. The following command schedules a maintenance job every Friday at 3 AM for cluster ID 32:

    $ s9s maintenance \
    --create-with-job \
    --cluster-id=32 \
    --reason="Weekly OS patch at 3 AM every Friday" \
    --minutes=120 \
    --recurrence="0 3 * * 5" \
    --job-tags="maintenance"
    
    Job with ID 5978 registered.

    You should get a job ID in the response. We can then verify if the job has been created correctly:

    $ s9s job --list --job-id=5978
    ID   CID STATE     OWNER GROUP  CREATED  RDY TITLE
    5978  32 SCHEDULED dba   admins 05:21:07 0%  Registering Maintenance

    We can also use the --show-scheduled flag together with --long flag to get extended information on the scheduled job:

    $ s9s job --show-scheduled --list --long
    --------------------------------------------------------------------------------------------------------------------------
    Registering Maintenance
    Scheduled
    
    Created   : 2020-01-09 05:21:07    ID : 5978      Status : SCHEDULED
    Started   :                      User : dba         Host : 127.0.0.1
    Ended     :                      Group: admins    Cluster: 32
    Tags      : #maintenance
    RPC       : 2.0
    --------------------------------------------------------------------------------------------------------------------------

    A recurring job created by the scheduled job will be tagged as "recurrence":

    --------------------------------------------------------------------------------------------------------------------------
    Registering Maintenance
    Job finished.                                                                                                [ ]
                                                                                                                     0.00%
    Created   : 2020-01-09 05:40:00    ID : 5982        Status : FINISHED
    Started   : 2020-01-09 05:40:01    User : dba         Host : 127.0.0.1
    Ended     : 2020-01-09 05:40:01    Group: admins    Cluster: 32
    Tags      : #recurrence
    RPC       : 2.0
    --------------------------------------------------------------------------------------------------------------------------

    Thus, to list out the recurring job, we can use the --job-tags flag. The following example shows executed recurring jobs scheduled to run every 5 minutes:

    $ s9s job --list --job-tags=recurrence
    ID   CID STATE    OWNER GROUP  CREATED  RDY TITLE
    5980  32 FINISHED dba   admins 05:30:01 0%  Registering Maintenance
    5981  32 FINISHED dba   admins 05:35:00 0%  Registering Maintenance
    5982  32 FINISHED dba   admins 05:40:00 0%  Registering Maintenance

    Automatic Recovery as a Job

    In the previous versions, automatic recovery feature can only be enabled or disabled at runtime via the UI, through a simple switch button in the cluster's summary bar, as shown in the following screenshot:

    In ClusterControl 1.7.5, automatic recovery is also part of an internal job, where the configuration can be controlled via CLI and persistent across restarts. This means the job can be scheduled, iterated and controlled with an expiration period via ClusterControl CLI and allows users to incorporate the automatic recovery management in the maintenance automation scripts when necessary. 

    When a cluster-wide maintenance is ongoing, it is pretty common to see some questionable states of database hosts, which is totally acceptable during this period. The common practice is to ignore these questionable states and make no interruption to the node while maintenance is happening. If ClusterControl automatic recovery is turned on, it will automatically attempt to recover the problematic host back to the good state, regardless of the maintenance mode state. Thus, disabling ClusterControl automatic recovery during the maintenance operation is highly recommended so ClusterControl will not interrupt the maintenance as it carries on.

    To disable cluster automatic recovery, simply use the --disable-recovery flag with respective cluster ID:

    $ s9s cluster --disable-recovery --log --cluster-id=32
    Cluster ID is 32.
    Cluster recovery is currently enabled.
    Node recovery is currently enabled.
    Disabling cluster auto recovery.
    Disabling node auto recovery.

    To reverse the above, use --enable-recovery flag to enable it again:

    $ s9s cluster --enable-recovery --log --cluster-id=32
    Cluster ID is 32.
    Cluster recovery is currently disabled.
    Node recovery is currently disabled.
    Enabling cluster auto recovery.
    Enabling node auto recovery.

    The CLI also supports disabling recovery together with activating maintenance mode in the same command. One has to use the --maintenance-minutes flag and optionally provide a reason:

    $ s9s cluster \
    --disable-recovery \
    --log \
    --cluster-id=29 \
    --maintenance-minutes=60 \
    --reason='Disabling recovery for 1 hour to update kernel'
    
    Registering maintenance for 60 minute(s) for cluster 32.
    Cluster ID is 29.
    Cluster recovery is currently enabled.
    Node recovery is currently enabled.
    Disabling cluster auto recovery.
    Disabling node auto recovery.

    From the above output, we can tell that ClusterControl has disabled automatic recovery for the node, and also registered a maintenance mode for the cluster. We can then verify with the list maintenance command:

    $ s9s maintenance --list --long
    ST UUID    OWNER     GROUP  START    END      HOST/CLUSTER             REASON
    Ac 687e255 system    admins 06:09:57 07:09:57 MariaDB 10.3 Replication Disabling recovery for 1 hour to update kernel

    Similarly, it will appear in the UI as shown in the following screenshot:

    You can enable the automatic recovery feature using the --enable-recovery flag if it is no longer necessary. The maintenance mode will still be active as defined in the --maintenance-minutes option, unless you explicitly delete or deactivate the maintenance mode via GUI or CLI.

    Conclusion

    ClusterControl allows you to manage your maintenance window efficiently, by discarding possible false alarms and controlling the automatic recovery behaviour while maintenance is ongoing. Maintenance mode is available for free in all ClusterControl editions, so give it a try.

    by ashraf at January 14, 2020 10:45 AM

    January 13, 2020

    SeveralNines

    Announcing ClusterControl 1.7.5: Advanced Cluster Maintenance & Support for PostgreSQL 12 and MongoDB 4.2

    We’re excited to announce the 1.7.5 release of ClusterControl - the only database management system you’ll ever need to take control of your open source database infrastructure. 

    This new version features support for the latest MongoDB & PostgreSQL general releases as well as new operating system support allowing you to install ClusterControl on Centos 8 and Debian 10.

    ClusterControl 1.7.4 provided the ability to place a node into Maintenance Mode. 1.7.5 now allows you to place (or schedule) the entire database cluster in Maintenance Mode, giving you more control over your database operations.

    In addition, we are excited to announce a brand new function in ClusterControl we call “Freeze Frame.” This new feature will take snapshots of your MySQL or MariaDB setups right before a detected failure, providing you with invaluable troubleshooting information about what caused the issue. 

    Release Highlights

    Database Cluster-Wide Maintenance

    • Perform tasks in Maintenance-Mode across the entire database cluster.
    • Enable/disable cluster-wide maintenance mode with a cron-based scheduler.
    • Enable/disable recurring jobs such as cluster or node recovery with automatic maintenance mode.

    MySQL Freeze Frame (BETA)

    • Snapshot MySQL status before cluster failure.
    • Snapshot MySQL process list before cluster failure (coming soon).
    • Inspect cluster incidents in operational reports or from the s9s command line tool.

    New Operating System & Database Support

    • Centos 8 and Debian 10 support.
    • PostgreSQL 12 support.
    • MongoDB 4.2 and Percona MongoDB v4.0 support.

    Additional Misc Improvements

    • Synchronize time range selection between the Overview and Node pages.
    • Improvements to the nodes status updates to be more accurate and with less delay.
    • Enable/Disable Cluster and Node recovery are now regular CMON jobs.
    • Topology view for Cluster-to-Cluster Replication.
     

    View Release Details and Resources

    Release Details

    Cluster-Wide Maintenance 

    The ability to place a database node into Maintenance Mode was implemented in the last version of ClusterControl (1.7.4). In this release we now offer the ability to place your entire database cluster into Maintenance Mode to allow you to perform updates, patches, and more.

    MySQL & MariaDB Freeze Frame

    This new ClusterControl feature allows you to get a snapshot of your MySQL statuses and related processes immediately before a failure is detected. This allows you to better understand what happened when troubleshooting, and provide you with actionable information on how you can prevent this type of failure from happening in the future. 

    This new feature is not part of the auto-recovery features in ClusterControl. Should your database cluster go down those functions will still perform to attempt to get you back online; it’s just that now you’ll have a better idea of what caused it. 

    Support for PostgreSQL 12

    Released in October 2019, PostgreSQL 12 featured major improvements to indexing, partitioning, new SQL & JSON functions, and improved security features, mainly around authentication. ClusterControl now allows you to deploy a preconfigured Postgres 12 database cluster with the ability to fully monitor and manage it.

    PostgreSQL GUI - ClusterControl

    Support for MongoDB 4.2

    MongoDB 4.2 offers unique improvements such as new ACID transaction guarantees, new query and analytics functions including new charts for rich data visualizations. ClusterControl now allows you to deploy a preconfigured MongoDB 4.2 or Percona Server for MongoDB 4.2 ReplicaSet with the ability to fully monitor and manage it.

    MongoDB GUI - ClusterControl
     

    by fwlymburner at January 13, 2020 03:59 PM

    January 12, 2020

    Valeriy Kravchuk

    Fun with Bugs #91 - On MySQL Bug Reports I am Subscribed to, Part XXV

    Not sure if it's still interesting to anybody else, but MySQL users keep finding and reporting new problems that may be caused by genuine bugs in the code. I keep checking these reports and subscribing to those I consider interesting. Let me start blogging in the New Year of 2020 with a review of some replication, InnoDB and (many!) optimizer bugs reported in September and October, 2019.

    As usual, I start from the oldest and care to mention bug reporters by names and add links to their other bug reports, if any. So, here is the new list:
    • Bug #96827 - "mysqlbinlog needs options to abort if invalid events are found on in-use binlogs". I had never checked myself, but I see no reasons not to trust Yoshinori Matsunobu in this case, based on code fragments shared. All current MySQL versions, from 5.6.x to 8.0.x, are affected. From what I see here, MariaDB is also affected.
    • Bug #96853 - "Inconsistent super_read_only status when changing the variable is blocked". Nice bug report by Przemyslaw Malkowski from Percona. For some reason I do not see clear statement if MySQL 8 is affected.
    • Bug #96874 - "The write_notifier_mutex in log_sys is useless". This bug was reported by Chen Zongzhi on MySQL 8.0.17 (see also hist another similar Bug #97358 - "The log.flush_notifier_mutex in log_sys is useless"), but "Version" filed is empty even though the bug is "Verified". This is NOT acceptable.
    • Bug #96946 - "Outer reference in join condition isn't allowed". This bug (that affects all MySQL versions) was reported by Laurents Meyer. See also older Bug #35242 (still "Verified" and affects MariaDB 10.3.x as well).
    • Bug #96950 - "CONCAT() can generate corrupted output". I wish we'd see the exact test case, but at least based on code review this bug (reported by Jay Edgar) was verified for MySQL 5.6 and 5.7. I see the same code in MariaDB, unfortunately.
    • Bug #97001 - "Dangerous optimization reconsidering_access_paths_for_index_ordering". The problems is with queries like this:
      SELECT ... WHERE [secondary key conditions] ORDER BY `id` ASC LIMIT n
      and bug reporter, Jeremy Cole, listed a lot of potentially related older bug reports. he had also suggested a patch. I'd be happy to see the fix in MySQL soon.
    • Bug #97113 - "BIT column serialized incorrectly in CASE expression". This bug report was created by Bradley Grainger. It is stated there that MySQL 5.7 (not only 8.0) is affected, but "Version:" field of a verified bug does NOT list 5.7.x. Absolutely wrong way of bugs processing. MariaDB also seems to be inconsistent, even though the result for one of the queries is different:
      MariaDB [test]> SELECT CASE WHEN name IS NOT NULL THEN value ELSE NULL END FROM query_bit;
      Field   1:  `CASE WHEN name IS NOT NULL THEN value ELSE NULL END`
      Catalog:    `def`
      Database:   ``
      Table:      ``
      Org_table:  ``
      Type:       NEWDECIMAL
      Collation:  binary (63)
      Length:     2
      Max_length: 1
      Decimals:   0
      Flags:      BINARY NUM


      +-----------------------------------------------------+
      | CASE WHEN name IS NOT NULL THEN value ELSE NULL END |
      +-----------------------------------------------------+
      | 1                                                   |
      +-----------------------------------------------------+
      1 row in set (0.021 sec)
    • Bug #97150 - "rwlock: refine lock->recursive with C11 atomics". Patch for MySQL 8.0.x was contributed by Cai Yibo. See also his another contribution, Bug #97228 - "rwlock: refine lock->lock_word with C11 atomics".
    • Bug #97299 - "Improve the explain informations for Hash Joins". Simple EXPLAIN (unlike the one with format=tree) does not give a hint that new MySQL 8.0.18+ features, hash join, was used. Simple and useful feature request from Tibor Korocz.
    • Bug #97345 - "IO Thread not detecting failed master with relay_log_space_limit." Nice bug report from Jean-François Gagné, but no documented attempt to check if MySQL 5.6.x and 8.0.x are also affected.
    • Bug #97347 - "In some cases queries with ST_CONTAINS do not return any results". Simple and easy to check bug report from Christian Koinig. Note that based on quick test MariaDB is NOT affected:
      MariaDB [test]> select version(), count(*) FROM test
          -> WHERE ST_CONTAINS(
          ->  geo_footprint,
          ->  ST_GeomFromGeoJSON('{"type":"Polygon","coordinates":[[[15.11333480819996
      6,48.1337532388],[15.113329984100005,48.1337371609],[15.113411697200036,48.13371
      66354],[15.113673777399981,48.1336819199],[15.114544787600039,48.1335464618],[15
      .115336574000025,48.1334415189],[15.116374992200008,48.1332937084],[15.117266346
      799966,48.1330924824],[15.11769786879995,48.1329803459],[15.118129375299986,48.1
      32868199],[15.118515258099933,48.1327388086],[15.118597296700045,48.1327141533],
      [15.118635348899943,48.132702717],[15.11867907729993,48.1327796282],[15.11876276
      890007,48.13290987],[15.118805112699988,48.1330357889],[15.118850101500016,48.13
      33486685],[15.118823191700017,48.1334777297],[15.118820984299987,48.1334784295],
      [15.118821076099948,48.1334779691],[15.113334808199966,48.1337532388],[15.113334
      808199966,48.1337532388]]]}'));
      +--------------------+----------+
      | version()          | count(*) |
      +--------------------+----------+
      | 10.3.7-MariaDB-log |        1 |
      +--------------------+----------+
      1 row in set (0.003 sec)
    • Bug #97372 - "Constructor Query_event must check enough space". Contribution to 5.7 and 8.0 by Pengbo Shi. Waiting for the OCI signed by the contributor...
    • Bug #97418 - "MySQL chooses different execution plan in 5.7". Interesting bug report from Vinodh Krish. I am again not sure if versions affected match the results of tests presented here.
    • Bug #97421 - "Replace into affected row count not match with events in binlog". Not sure if MySQL 8 was checked, but MariaDB 10.3.7 also uses single Update_rows event in the binary log. Thanks to Ke Lu for noticing and reporting this!
    Also, on a separate note, this claim of MySQL 8.0 performance regression from Mark CallaghanBug #86215, is still being analyzed it seems. No further comments for 2.5 years already!

    Autumn of 2019 was fruitful. A lot of interesting MySQL bug reports also, not just grapes on my balcony...
    To summarize:
    1. For some reason I often do not see explicit documented attempts by Oracle MySQL engineers from the bugs verification team to check bug on different MySQL versions. Sometimes obviously affected version (like MySQL 8.0.x) is not listed in the field. So "Version" field becomes useless This is absolutely wrong. Maybe I should submit yet another talk to some conference on how to process bugs properly?
    2. Some regression bugs are still not marked with "regression" tag when verified.
    3. MySQL optimizer still requires a lot of work to become decent.
    4. I see a lot of new interesting new bug reports both from well known old community members and users I had never noticed before by name. This is great and proves that MySQL is still alive and use all kinds of contributions from Community.
    Next time I'll review interesting bugs reported in November and December, 2019. Stay tuned!

    by Valerii Kravchuk (noreply@blogger.com) at January 12, 2020 06:51 PM

    January 10, 2020

    SeveralNines

    A SOx Compliance Checklist for PostgreSQL

    The United States SOx (Sarbanes-Oxley) Act, 2002, addresses a broad spectrum of fundamental information security principles for commercial enterprises, ensuring their functions are rooted and consistently applied, based on concepts of CIA (Confidentiality, Integrity, and Availability).

    Accomplishing these goals requires commitment from many individuals, all which must be aware of; their responsibilities maintaining the secure state of the enterprise assets, understanding policies, procedures, standards, guidelines, and the possibilities of losses involved with their duties.

    CIA aims at ensuring that the alignment of the business strategy, goals, mission, and objectives, are supported by security controls, approved in consideration with senior management's due diligence, and tolerance for risks and costs.

    PostgreSQL Database Clusters

    The PostgreSQL Server has a broad collection of features offered for free, making it one of the most popular DBMS (Database Management Systems), enabling its adoption on a wide range of projects in different social and economic spheres.

    The main advantage for its adoption, is the Open Source License, removing concerns around copyright infringement within an organization, possibly being caused by an IT administrator, inadvertently exceeding the number of permitted licenses.

    The implementation of information security for PostgreSQL (From an organizational context) will not be successful without carefully constructed and uniformly applied security policies and procedures which cover all aspects of business continuity planning.

    BCP (Business Continuity Planning)

    Leadership must agree prior to starting the BCP program to ensure they understand the expected deliverables, as well their personal liability (financially and even criminally) if it is determined that they did not use due care to adequately protect the organization and its resources.

    The senior management's expectations are communicated through policies, developed and maintained by security officers, responsible for establishing procedures and adherence to standards, baselines, and guidelines, and for discovering SPoFs (Single Points of Failure) that can compromise an entire system from working securely and reliably.

    The classification of these potential disruptive events, is done using BIA (Business Impact Analysis), which is a sequential approach of; identifying the assets and business processes, determine criticality of each one, estimate MTD (Maximum Tolerable Downtime) based on their time sensitivity for recovery, and finally, calculate the recovery objectives; RTO (Recovery Time Objective) and RPO (Recovery Point Objective), considering the cost of achieving the objective, versus, the benefit.

    Data Access Roles and Responsibilities

    Commercial businesses commonly hire outside firms who specialize in background checks in order to gather more information of prospective new employees, assisting the hiring manager with solid work records, validating education degrees and certifications, criminal history, and reference checks.

    Operational systems are being out-dated and poor or written down passwords, are just a couple of the many ways unauthorized individuals can find vulnerabilities and attack an organization's information systems, through the network or social engineering.

    Third-party services, hired by the organization, can represent a threat as well, especially if employees are not trained to use proper security procedures. Their interactions must be rooted in strong security foundations in order to prevent information disclosure.

    Least privilege refers to granting users only the access they need to do their jobs, nothing more. While some employees (based upon their job functions) have a higher “need-to-know” access. Consequently, their workstations must be continuously monitored, and up-to-date with security standards.

    Some Resources That Can Help

    Logos of frameworks and organizations, responsible for providing Cybersecurity guidelines.

    COSO (Committee of Sponsoring Organizations of the Treadway Commission)

    Formed in 1985, to sponsor the US (United States) National Commission on Fraudulent Financial Reporting, which studied causal factors that lead to fraudulent financial reporting, and produced recommendations for; public companies, their auditors, the SEC (Securities Exchange Commission), other regulators, and law enforcement bodies.

    ITIL (Information Technology Infrastructure Library)

    Built by the British government’s Stationary Office, ITIL is a framework composed of a set of books, which demonstrates best practices for specific needs for IT of an organization, such as management of core operational processes, incidents and availability, and financial considerations.

    COBIT (Control Objectives for Information and Related Technology)

    Published by the ITGI (IT Governance Institute), COBIT is a framework that provides an overall structure for IT controls, including examination of efficiency, effectiveness, CIA, reliability, and compliance, in alignment with the business needs. ISACA (Information Systems Audit and Control Association) provides deep instructions about COBIT, as well as certifications recognized globally, such as CISA (Certified Information Systems Auditor).

    ISO/IEC 27002:2013 (International Organization for Standardization/International Electrotechnical Commission)

    Previously known as ISO/IEC 17799:2005, the ISO/IEC 27002:2013 contains detailed instructions for organizations, covering information security controls, such as; policies, compliance, access controls, operations and HR (Human Resources) security, cryptography, management of incidents, risks, BC (Business Continuity), assets, and many more. There is also a preview of the document.

    VERIS (Vocabulary of Event Recording and Incident Sharing)

    Available on GitHub, VERIS is a project in continuous development, intended to help organizations collecting useful incident-related information, and sharing it anonymously and responsibly, expanding the VCDB (VERIS Community Database). The cooperation of users, resulting in an excellent reference for risk management, is then translated into an annual report, the VDBIR (Verizon Data Breach Investigation Report).

    OECD Guidelines (Organization for Economic Cooperation and Development)

    The OECD, in cooperation with partners around the globe, promotes RBCs (Responsible Business Conduct) for multinational enterprises, ensuring privacy to individuals upon their PII (Personally Identifiable Information), and establishing principles of how their data must be retained and maintained by enterprises.

    NIST SP 800 Series (National Institute of Standards and Technology Special Publication)

    The US NIST, provides on its CSRC (Computer Security Resource Center), a collection of publications for Cybersecurity, covering all kinds of topics, including databases. The most important one, from a database perspective, is the SP 800-53 Revision 4.

    Conclusion

    The Information Security Triad, versus its opposite.

    Achieving SOx goals is a daily concern for many organizations, even those not limited to accounting activities. Frameworks containing instructions for risk assessment and internal controls must be in place for enterprise's security practitioners, as well as software for preventing destruction, alteration, and disclosure of sensitive data.

     

    by thiagolopes at January 10, 2020 04:28 PM

    MariaDB Foundation

    MariaDB Day Brussels 0202 2020

    The first MariaDB Day will be held in Brussels at the Bedford Hotel and Congress Centre on Sunday February 2. This is a complementary event to the MySQL, MariaDB and Friends Day at FOSDEM, which is far-oversubscribed, and gives an opportunity for other speakers and more in-depth coverage of MariaDB-related topics. […]

    The post MariaDB Day Brussels 0202 2020 appeared first on MariaDB.org.

    by Ian Gilfillan at January 10, 2020 06:10 AM

    January 09, 2020

    SeveralNines

    Tips for Delivering MySQL Database Performance - Part Two

    The management of database performance is an area that businesses when administrators often find themselves contributing more time to than they expected.

    Monitoring and reacting to the production database performance issues is one of the most critical tasks within a database administrator job. It is an ongoing process that requires constant care. Application and underlying databases usually evolve with time; grow in size, number of users, workload, schema changes that come with code changes.

    Long-running queries are seldom inevitable in a MySQL database. In some circumstances, a long-running query may be a harmful event. If you care about your database, optimizing query performance, and detecting long-running queries must be performed regularly. 

    In this blog, we are going to take a more in-depth look at the actual database workload, especially on the running queries side. We will check how to track queries, what kind of information we can find in MySQL metadata, what tools to use to analyze such queries.

    Handling The Long-Running Queries

    Let’s start with checking Long-running queries. First of all, we have to know the nature of the query, whether it is expected to be a long-running or a short running query. Some analytic and batch operations are supposed to be long-running queries, so we can skip those for now. Also, depending on the table size, modifying table structure with ALTER command can be a long-running operation (especially in MySQL Galera Clusters).

    • Table lock - The table is locked by a global lock or explicit table lock when the query is trying to access it.
    • Inefficient query - Use non-indexed columns while lookup or joining, thus MySQL takes a longer time to match the condition.
    • Deadlock - A query is waiting to access the same rows that are locked by another request.
    • Dataset does not fit into RAM - If your working set data fits into that cache, then SELECT queries will usually be relatively fast.
    • Suboptimal hardware resources - This could be slow disks, RAID rebuilding, saturated network, etc.

    If you see a query takes longer than usual to execute, do investigate it.

    Using the MySQL Show Process List

    ​MYSQL> SHOW PROCESSLIST;

    This is usually the first thing you run in the case of performance issues. SHOW PROCESSLIST is an internal mysql command which shows you which threads are running. You can also see this information from the information_schema.PROCESSLIST table or the mysqladmin process list command. If you have the PROCESS privilege, you can see all threads. You can see information like Query Id, execution time, who runs it, the client host, etc. The information with slightly wary depending on the MySQL flavor and distribution (Oracle, MariaDB, Percona)

    SHOW PROCESSLIST;
    
    +----+-----------------+-----------+------+---------+------+------------------------+------------------+----------+
    
    | Id | User            | Host | db | Command | Time | State                  | Info | Progress |
    
    +----+-----------------+-----------+------+---------+------+------------------------+------------------+----------+
    
    |  2 | event_scheduler | localhost | NULL | Daemon  | 2693 | Waiting on empty queue | NULL   | 0.000 |
    
    |  4 | root            | localhost | NULL | Query   | 0 | Table lock   | SHOW PROCESSLIST | 0.000 |
    
    +----+-----------------+-----------+------+---------+------+------------------------+------------------+----------+

    we can immediately see the offensive query right away from the output. In the above example that could be a Table lock.  But how often do we stare at those processes? This is only useful if you are aware of the long-running transaction. Otherwise, you wouldn't know until something happens - like connections are piling up, or the server is getting slower than usual.

    Using MySQL Pt-query-digest

    If you would like to see more information about a particular workload use pt-query-digest.  The pt-query-digest is a Linux tool from Percona to analyze MySQL queries. It’s part of the Percona Toolkit which you can find here. It supports the most popular 64 bit Linux distributions like Debian, Ubuntu, and Redhat. 

    To install it you must configure Percona repositories and then install the perona-toolkit package.

    Install Percona Toolkit using your package manager:

    Debian or Ubuntu:

    sudo apt-get install percona-toolkit

    RHEL or CentOS:

    sudo yum install percona-toolkit

    Pt-query-digest accepts data from the process list, general log, binary log, slow log or tcpdump In addition to that, it’s possible to poll the MySQL process list at a defined interval - a process that can be resource-intensive and far from ideal, but can still be used as an alternative.

    The most common source for pt-query-digest is a slow query log. You can control how much data will go there with parameter log_slow_verbosity.  

    There are a number of things that may cause a query to take a longer time to execute:

    • microtime - queries with microsecond precision.
    • query_plan - information about the query’s execution plan.
    • innodb  - InnoDB statistics.
    • minimal - Equivalent to enabling just microtime.
    • standard - Equivalent to enabling microtime,innodb.
    • full - Equivalent to all other values OR’ed together without the profiling and profiling_use_getrusage options.
    • profiling - Enables profiling of all queries in all connections.
    • profiling_use_getrusage - Enables usage of the getrusage function.

    source: Percona documentation

    For completeness use log_slow_verbosity=full which is a common case.

    Slow Query Log

    The slow query log can be used to find queries that take a long time to execute and are therefore candidates for optimization. Slow query log captures slow queries (SQL statements that take more than long_query_time seconds to execute), or queries that do not use indexes for lookups (log_queries_not_using_indexes). This feature is not enabled by default and to enable it simply set the following lines and restart the MySQL server:

    [mysqld]
    slow_query_log=1
    log_queries_not_using_indexes=1
    long_query_time=0.1

    The slow query log can be used to find queries that take a long time to execute and are therefore candidates for optimization. However, examining a long slow query log can be a time-consuming task. There are tools to parse MySQL slow query log files and summarize their contents like mysqldumpslow, pt-query-digest.

    Performance Schema

    Performance Schema is a great tool available for monitoring MySQL Server internals and execution details at a lower level. It had a bad reputation in an early version (5.6) because enabling it often caused performance issues, however the recent versions do not harm performance. The following tables in Performance Schema can be used to find slow queries:

    • events_statements_current
    • events_statements_history
    • events_statements_history_long
    • events_statements_summary_by_digest
    • events_statements_summary_by_user_by_event_name
    • events_statements_summary_by_host_by_event_name

    MySQL 5.7.7 and higher include the sys schema, a set of objects that helps DBAs and developers interpret data collected by the Performance Schema into a more easily understandable form. Sys schema objects can be used for typical tuning and diagnosis use cases.

    Network tracking

    What if we don’t have access to the query log or direct application logs. In that case, we could use a combination of tcpdump and pt-query digest which could help to capture queries.

    $ tcpdump -s 65535 -x -nn -q -tttt -i any port 3306 > mysql.tcp.txt

    Once the capture process ends, we can proceed with processing the data:

    $ pt-query-digest --limit=100% --type tcpdump mysql.tcp.txt > ptqd_tcp.out

    ClusterControl Query Monitor

    ClusterControl Query Monitor is a module in a cluster control that provides combined information about database activity. It can gather information from multiple sources like show process list or slow query log and present it in a pre-aggregated way. 

    ClusterControl Top Queries

    The SQL Monitoring is divided into three sections.

    Top Queries

    presents the information about queries that take a significant chunk of resources.

    ClusterControl Top Queries

    Running Queries

    it’s a process list of information combined from all database cluster nodes into one view. You can use that to kill queries that affect your database operations.

    ClusterControl Running Queries

    Query Outliers

    present the list of queries with execution time longer than average.

    ClusterControl Query Outliners

    Conclusion

    This is all for part two. This blog is not intended to be an exhaustive guide to how to enhance database performance, but it hopefully gives a clearer picture of what things can become essential and some of the basic parameters that can be configured. Do not hesitate to let us know if we’ve missed any important ones in the comments below.

     

    by Bart Oles at January 09, 2020 08:02 PM

    January 08, 2020

    SeveralNines

    Database Performance Tuning for MariaDB

    Ever since MySQL was originally forked to form MariaDB it has been widely supported and adopted quickly by a large audience in the open source database community. Originally a drop-in replacement, MariaDB has started to create distinction against MySQL, especially with the release of MariaDB 10.2

    Despite this, however, there's still no real telltale difference between MariaDB and MySQL, as both have engines that are compatible and can run natively with one another. So don't be surprised if the tuning of your MariaDB setup has a similar approach to one tuning MySQL

    This blog will discuss the tuning of MariaDB, specifically those systems running in a Linux environment.

    MariaDB Hardware and System Optimization

    MariaDB recommends that you improve your hardware in the following priority order...

    Memory

    Memory is the most important factor for databases as it allows you to adjust the Server System Variables. More memory means larger key and table caches, which are stored in memory so that disks can access, an order of magnitude slower, is subsequently reduced.

    Keep in mind though, simply adding more memory may not result in drastic improvements if the server variables are not set to make use of the extra available memory.

    Using more RAM slots on the motherboard increases the bus frequency, and there will be more latency between the RAM and the CPU. This means that using the highest RAM size per slot is preferable.

    Disks

    Fast disk access is critical, as ultimately it's where the data resides. The key figure is the disk seek time (a measurement of how fast the physical disk can move to access the data) so choose disks with as low a seek time as possible. You can also add dedicated disks for temporary files and transaction logs.

    Fast Ethernet

    With the appropriate requirements for your internet bandwidth, fast ethernet means it can have faster response to clients requests, replication response time to read binary logs across the slaves, faster response times is also very important especially on Galera-based clusters.

    CPU

    Although hardware bottlenecks often fall elsewhere, faster processors allow calculations to be performed more quickly, and the results sent back to the client more quickly. Besides processor speed, the processor's bus speed and cache size are also important factors to consider.

    Setting Your Disk I/O Scheduler

    I/O schedulers exist as a way to optimize disk access requests. It merges I/O requests to similar locations on the disk. This means that the disk drive doesn’t need to seek as often and improves a huge overall response time and saves disk operations. The recommended values for I/O performance are noop and deadline

    noop is useful for checking whether complex I/O scheduling decisions of other schedulers are not causing I/O performance regressions. In some cases it can be helpful for devices that do I/O scheduling themselves, as intelligent storage, or devices that do not depend on mechanical movement, like SSDs. Usually, the DEADLINE I/O scheduler is a better choice for these devices, but due to less overhead NOOP may produce better performance on certain workloads.

    For deadline, it is a latency-oriented I/O scheduler. Each I/O request has got a deadline assigned. Usually, requests are stored in queues (read and write) sorted by sector numbers. The DEADLINE algorithm maintains two additional queues (read and write) where the requests are sorted by deadline. As long as no request has timed out, the “sector” queue is used. If timeouts occur, requests from the “deadline” queue are served until there are no more expired requests. Generally, the algorithm prefers reads over writes.

    For PCIe devices (NVMe SSD drives), they have their own large internal queues along with fast service and do not require or benefit from setting an I/O scheduler. It is recommended to have no explicit scheduler-mode configuration parameter.

    You can check your scheduler setting with:

    cat /sys/block/${DEVICE}/queue/scheduler

    For instance, it should look like this output:

    cat /sys/block/sda/queue/scheduler
    
    [noop] deadline cfq

    To make it permanent, edit /etc/default/grub configuration file, look for the variable GRUB_CMDLINE_LINUX and add elevator just like below:

    GRUB_CMDLINE_LINUX="elevator=noop"

    Increase Open Files Limit

    To ensure good server performance, the total number of client connections, database files, and log files must not exceed the maximum file descriptor limit on the operating system (ulimit -n). Linux systems limit the number of file descriptors that any one process may open to 1,024 per process. On active database servers (especially production ones) it can easily reach the default system limit.

    To increase this, edit /etc/security/limits.conf and specify or add the following:

    mysql soft nofile 65535
    
    mysql hard nofile 65535

    This requires a system restart. Afterwards, you can confirm by running the following:

    $ ulimit -Sn
    
    65535
    
    $ ulimit -Hn
    
    65535

    Optionally, you can set this via mysqld_safe if you are starting the mysqld process thru mysqld_safe,

    [mysqld_safe]
    
    open_files_limit=4294967295

    or if you are using systemd,

    sudo tee /etc/systemd/system/mariadb.service.d/limitnofile.conf <<EOF
    
    [Service]
    
    
    
    LimitNOFILE=infinity
    
    EOF
    
    sudo systemctl daemon-reload

    Setting Swappiness on Linux for MariaDB

    Linux Swap plays a big role in database systems. It acts like your spare tire in your vehicle, when nasty memory leaks interfere with your work, the machine will slow down... but in most cases will still be usable to finish its assigned task. 

    To apply changes to your swappiness, simply run,

    sysctl -w vm.swappiness=1

    This happens dynamically, with no need to reboot the server. To make it persistent, edit /etc/sysctl.conf and add the line,

    vm.swappiness=1

    It's pretty common to set swappiness=0, but since the release of new kernels (i.e. kernels > 2.6.32-303), changes have been made so you need to set vm.swappiness=1.

    Filesystem Optimizations for MariaDB

    The most common file systems used in Linux environments running MariaDB are ext4 and XFS. There are also certain setups available for implementing an architecture using ZFS and BRTFS (as referenced in the MariaDB documentation).

    In addition to this, most database setups do not need to record file access time. You might want to disable this when mounting the volume into the system. To do this, edit your file /etc/fstab. For example, on a volume named /dev/md2, this how it looks like:

    /dev/md2 / ext4 defaults,noatime 0 0

    Creating an Optimal MariaDB Instance

    Store Data On A Separate Volume

    It is always ideal to separate your database data on a separate volume. This volume is specifically for those types of fast storage volumes such as SSD, NVMe, or PCIe cards. For example, if your entire system volume will fail, you'll have your database volume safe and rest assured not affected in case your storage hardware will fail. 

    Tuneup MariaDB To Utilize Memory Efficiently

    innodb_buffer_pool_size

    The primary value to adjust on a database server with entirely/primarily XtraDB/InnoDB tables, can be set up to 80% of the total memory in these environments. If set to 2 GB or more, you will probably want to adjust innodb_buffer_pool_instances as well. You can set this dynamically if you are using MariaDB >= 10.2.2 version. Otherwise, it requires a server restart.

    tmp_memory_table_size/max_heap_table_size

    For tmp_memory_table_size (tmp_table_size), if you're dealing with large temporary tables, setting this higher provides performance gains as it will be stored in the memory. This is common on queries that are heavily using GROUP BY, UNION, or sub-queries. Although if max_heap_table_size is smaller, the lower limit will apply. If a table exceeds the limit, MariaDB converts it to a MyISAM or Aria table. You can see if it's necessary to increase by comparing the status variables Created_tmp_disk_tables and Created_tmp_tables to see how many temporary tables out of the total created needed to be converted to disk. Often complex GROUP BY queries are responsible for exceeding the limit.

    While max_heap_table_size,  this is the maximum size for user-created MEMORY tables. The value set on this variable is only applicable for the newly created or re-created tables and not the existing ones. The smaller of max_heap_table_size and tmp_table_size also limits internal in-memory tables. When the maximum size is reached, any further attempts to insert data will receive a "table ... is full" error. Temporary tables created with CREATE TEMPORARY will not be converted to Aria, as occurs with internal temporary tables, but will also receive a table full error.

    innodb_log_file_size

    Large memories with high-speed processing and fast I/O disk aren't new and has its reasonable price as it recommends. If you are preferring more performance gains especially during and handling your InnoDB transactions, setting the variable innodb_log_file_size to a larger value such as 5Gib or even 10GiB is reasonable. Increasing means that the larger transactions can run without needing to perform disk I/O before committing. 

    join_buffer_size

    In some cases, your queries tend to lack use of proper indexing or simply, there are instances that you need this query to run. Not unless it's going to be heavily called or invoked from the client perspective, setting this variable is best on a session level. Increase it to get faster full joins when adding indexes is not possible, although be aware of memory issues, since joins will always allocate the minimum size.

    Set Your max_allowed_packet

    MariaDB has the same nature as MySQL when handling packets. It splits data into packets and the client must be aware of the max_allowed_packet variable value. The server will have a buffer to store the body with a maximum size corresponding to this max_allowed_packet value. If the client sends more data than max_allowed_packet size, the socket will be closed. The max_allowed_packet directive defines the maximum size of packet that can be sent.

    Setting this value too low can cause a query to stop and close its client connection which is pretty common to receive errors like ER_NET_PACKET_TOO_LARGE or Lost connection to MySQL server during query. Ideally, especially on most application demands today, you can start setting this to 512MiB. If it's a low-demand type of application, just use the default value and set this variable only via session when needed if the data to be sent or received is too large than the default value (16MiB since MariaDB 10.2.4). In certain workloads that demand on large packets to be processed, then you need to adjust his higher according to your needs especially when on replication. If max_allowed_packet is too small on the slave, this also causes the slave to stop the I/O thread.

    Using Threadpool

    In some cases, this tuning might not be necessary or recommended for you. Threadpools are most efficient in situations where queries are relatively short and the load is CPU bound (OLTP workloads). If the workload is not CPU bound, you might still want to limit the number of threads to save memory for the database memory buffers.

    Using threadpool is an ideal solution especially if your system is experiencing context switching and you are finding ways to reduce this and maintain a lower number of threads than the number of clients. However, this number should also not be too low, since we also want to make maximum use of the available CPUs. Therefore there should be, ideally, a single active thread for each CPU on the machine.

    You can set the thread_pool_max_threads, thread_pool_min_threads for the maximum and the minimum number of threads. Unlike MySQL, this is only present in MariaDB.

    Set the variable thread_handling which determines how the server handles threads for client connections. In addition to threads for client connections, this also applies to certain internal server threads, such as Galera slave threads.

    Tune Your Table Cache + max_connections

    If you are facing occasional occurrences in the processlist about Opening tables and Closing tables statuses, it can signify that you need to increase your table cache. You can monitor this also via the mysql client prompt by running SHOW GLOBAL STATUS LIKE 'Open%table%'; and monitor the status variables. 

    For max_connections, if you are application requires a lot of concurrent connections, you can start setting this to 500. 

    For table_open_cache, it shall be the total number of your tables but it's best you add more depending on the type of queries you serve since temporary tables shall be cached as well. For example, if you have 500 tables, it would be reasonable you start with 1500. 

    While your table_open_cache_instances, start setting it to 8. This can improve scalability by reducing contention among sessions, the open tables cache can be partitioned into several smaller cache instances of size table_open_cache / table_open_cache_instances.

    For InnoDB, table_definition_cache acts as a soft limit for the number of open table instances in the InnoDB data dictionary cache. The value to be defined will set the number of table definitions that can be stored in the definition cache. If you use a large number of tables, you can create a large table definition cache to speed up opening of tables. The table definition cache takes less space and does not use file descriptors, unlike the normal table cache. The minimum value is 400. The default value is based on the following formula, capped to a limit of 2000:

    MIN(400 + table_open_cache / 2, 2000)

    If the number of open table instances exceeds the table_definition_cache setting, the LRU mechanism begins to mark table instances for eviction and eventually removes them from the data dictionary cache. The limit helps address situations in which significant amounts of memory would be used to cache rarely used table instances until the next server restart. The number of table instances with cached metadata could be higher than the limit defined by table_definition_cache, because parent and child table instances with foreign key relationships are not placed on the LRU list and are not subject to eviction from memory.

    Unlike the table_open_cache, the table_definition_cache doesn't use file descriptors, and is much smaller.

    Dealing with Query Cache

    Preferably, we recommend to disable query cache in all of your MariaDB setup. You need to ensure that query_cache_type=OFF and query_cache_size=0 to complete disable query cache. Unlike MySQL, MariaDB is still completely supporting query cache and do not have any plans on withdrawing its support to use query cache. There are some people claiming that query cache still provides performance benefits for them. However, this post from Percona The MySQL query cache: Worst enemy or best friend reveals that query cache, if enabled, results to have an overhead and shows to have a bad server performance.

    If you intend to use query cache, make sure that you monitor your query cache by running SHOW GLOBAL STATUS LIKE 'Qcache%';. Qcache_inserts contains the number of queries added to the query cache, Qcache_hits contains the number of queries that have made use of the query cache, while Qcache_lowmem_prunes contains the number of queries that were dropped from the cache due to lack of memory. While in due time, using and enabling query cache may become fragmented. A high Qcache_free_blocks relative to Qcache_total_blocks may indicate fragmentation. To defragment it, run FLUSH QUERY CACHE. This will defragment the query cache without dropping any queries.

    Always Monitor Your Servers

    It is highly important that you properly monitor your MariaDB nodes. Common monitoring tools out there (like Nagios, Zabbix, or PMM) are available if you tend to prefer free and open-source tools. For corporate and fully-packed tools we suggest you give ClusterControl a try, as it does not only provide monitoring, but it also offers performance advisors, alerts and alarms which helps you improve your system performance and stay up-to-date with the current trends as you engage with the Support team. Database monitoring with ClusterControl is free and part of the Community Edition.

    Conclusion

    Tuning your MariaDB setup is almost the same approach as MySQL, but with some disparities, as it differs in some of its approaches and versions that it does support. MariaDB is now a different entity in the database world and has quickly gained the trust by the community without any FUD. They have their own reasons why it has to be implemented this way so it's very important we know how to tune this and optimize your MariaDB server(s).

    by Paul Namuag at January 08, 2020 07:09 PM

    January 07, 2020

    SeveralNines

    Using OpenVPN to Secure Access to Your Database Cluster in the Cloud

    The internet is a dangerous place, especially if you’re leaving your data unencrypted or without proper security. There are several ways to secure your data; all at different levels. You should always have a strong firewall policy,  data encryption, and a strong password policy. Another way to secure your data is by accessing it using a VPN connection. 

    Virtual Private Network (or VPN) is a connection method used to add security and privacy to private and public networks, protecting your data.

    OpenVPN is a fully-featured, open source, SSL VPN solution to secure communications. It can be used for remote access or communication between different servers or data centers. It can be installed on-prem or in the cloud, in different operating systems, and can be configured with many security options.

    In this blog, we’ll create a VPN connection to access a database in the cloud. There are different ways to achieve this goal, depending on your infrastructure and how much hardware resources you want to use for this task. 

    For example, you can create two VM, one on-prem and another one in the cloud, and they could be a bridge to connect your local network to the database cloud network through a Peer-to-Peer VPN connection.

    Another simpler option could be connecting to a VPN server installed in the database node using a VPN client connection configured in your local machine. In this case, we’ll use this second option. You’ll see how to configure an OpenVPN server in the database node running in the cloud, and you’ll be able to access it using a VPN client.

    For the database node, we’ll use an Amazon EC2 instance with the following configuration:

    • OS: Ubuntu Server 18.04
    • Public IP Address: 18.224.138.210
    • Private IP Address: 172.31.30.248/20
    • Opened TCP ports: 22, 3306, 1194

    How to Install OpenVPN on Ubuntu Server 18.04

    The first task is to install the OpenVPN server in your database node. Actually, the database technology used doesn’t matter as we’re working on a networking layer, but for testing purposes after configuring the VPN connection, let’s say we’re running Percona Server 8.0.

    So let’s start by installing the OpenVPN packages.

    $ apt install openvpn easy-rsa

    As OpenVPN uses certificates to encrypt your traffic, you’ll need EasyRSA for this task. It’s a CLI utility to create a root certificate authority, and request and sign certificates, including sub-CAs and certificate revocation lists.

    Note: There is a new EasyRSA version available, but to keep the focus on the OpenVPN installation, let’s use the EasyRSA version available in the Ubuntu 18.04 repository atm (EasyRSA version 2.2.2-2).

    The previous command will create the directory /etc/openvpn/ for the OpenVPN configuration, and the directory /usr/share/easy-rsa/ with the EasyRSA scripts and configuration.

    To make this task easier, let’s create a symbolic link to the EasyRSA path in the OpenVPN directory (or you can just copy it):

    $ ln -s /usr/share/easy-rsa /etc/openvpn/

    Now, you need to configure EasyRSA and create your certificates. Go to the EasyRSA location and create a backup for the “vars” file:

    $ cd /etc/openvpn/easy-rsa
    
    $ cp vars vars.bak

    Edit this file, and change the following lines according to your information:

    $ vi vars
    
    export KEY_COUNTRY="US"
    
    export KEY_PROVINCE="CA"
    
    export KEY_CITY="SanFrancisco"
    
    export KEY_ORG="Fort-Funston"
    
    export KEY_EMAIL="me@myhost.mydomain"
    
    export KEY_OU="MyOrganizationalUnit"

    Then, create a new symbolic link to the openssl file:

    $ cd /etc/openvpn/easy-rsa
    
    $ ln -s openssl-1.0.0.cnf openssl.cnf

    Now, apply the vars file:

    $ cd /etc/openvpn/easy-rsa
    
    $ . vars

    NOTE: If you run ./clean-all, I will be doing a rm -rf on /etc/openvpn/easy-rsa/keys

    Run the clean-all script:

    $ ./clean-all

    And create the Diffie-Hellman key (DH):

    $ ./build-dh
    
    Generating DH parameters, 2048 bit long safe prime, generator 2
    
    This is going to take a long time
    
    .....................................................................................................................................................................+

    This last action could take some seconds, and when it’s finished, you will have a new DH file inside the “keys” directory in the EasyRSA directory.

    $ ls /etc/openvpn/easy-rsa/keys
    
    dh2048.pem

    Now, let’s create the CA certificates.

    $ ./build-ca
    
    Generating a RSA private key
    
    ..+++++
    
    ...+++++
    
    writing new private key to 'ca.key'
    
    -----
    
    You are about to be asked to enter information that will be incorporated
    
    into your certificate request.
    
    What you are about to enter is what is called a Distinguished Name or a DN.
    
    There are quite a few fields but you can leave some blank
    
    For some fields there will be a default value,
    
    If you enter '.', the field will be left blank.
    
    ...

    This will create the ca.crt (public certificate) and ca.key (private key). The public certificate will be required in all servers to connect to the VPN.

    $ ls /etc/openvpn/easy-rsa/keys
    
    ca.crt  ca.key

    Now you have your CA created, let’s create the server certificate. In this case, we’ll call it “openvpn-server”:

    $ ./build-key-server openvpn-server
    
    Generating a RSA private key
    
    .......................+++++
    
    ........................+++++
    
    writing new private key to 'openvpn-server.key'
    
    -----
    
    You are about to be asked to enter information that will be incorporated
    
    into your certificate request.
    
    What you are about to enter is what is called a Distinguished Name or a DN.
    
    There are quite a few fields but you can leave some blank
    
    For some fields there will be a default value,
    
    If you enter '.', the field will be left blank.
    
    ...
    
    Certificate is to be certified until Dec 23 22:44:02 2029 GMT (3650 days)
    
    Sign the certificate? [y/n]:y
    
    
    
    1 out of 1 certificate requests certified, commit? [y/n]y
    
    
    
    Write out database with 1 new entries
    
    Data Base Updated

    This will create the CRT, CSR, and Key files for the OpenVPN server:

    $ ls /etc/openvpn/easy-rsa/keys
    
    openvpn-server.crt  openvpn-server.csr openvpn-server.key

    Now, you need to create the client certificate, and the process is pretty similar:

    $ ./build-key openvpn-client-1
    
    Generating a RSA private key
    
    .........................................................................................+++++
    
    .....................+++++
    
    writing new private key to 'openvpn-client-1.key'
    
    -----
    
    You are about to be asked to enter information that will be incorporated
    
    into your certificate request.
    
    What you are about to enter is what is called a Distinguished Name or a DN.
    
    There are quite a few fields but you can leave some blank
    
    For some fields there will be a default value,
    
    If you enter '.', the field will be left blank.
    
    ...
    
    Certificate is to be certified until Dec 24 01:45:39 2029 GMT (3650 days)
    
    Sign the certificate? [y/n]:y
    
    
    
    1 out of 1 certificate requests certified, commit? [y/n]y
    
    
    
    Write out database with 1 new entries
    
    Data Base Updated

    This will create the CRT, CSR, and Key files for the OpenVPN client:

    $ ls /etc/openvpn/easy-rsa/keys
    
    openvpn-client-1.csr  openvpn-client-1.crt openvpn-client-1.key

    At this point, you have all the certificates ready. The next step will be to create both server and client OpenVPN configuration.

    Configuring the OpenVPN Server

    As we mentioned, the OpenVPN installation will create the /etc/openvpn directory, where you will add the configuration files for both server and client roles, and it has a sample configuration file for each one in /usr/share/doc/openvpn/examples/sample-config-files/, so you can copy the files in the mentioned location and modify them as you wish.

    In this case, we’ll only use the server configuration file, as it’s an OpenVPN server:

    $ cp /usr/share/doc/openvpn/examples/sample-config-files/server.conf.gz /etc/openvpn/
    
    $ gunzip /etc/openvpn/server.conf.gz

    Now, let’s see a basic server configuration file:

    $ cat /etc/openvpn/server.conf
    
    port 1194  
    
    # Which TCP/UDP port should OpenVPN listen on?
    
    proto tcp  
    
    # TCP or UDP server?
    
    dev tun  
    
    # "dev tun" will create a routed IP tunnel,"dev tap" will create an ethernet tunnel.
    
    ca /etc/openvpn/easy-rsa/keys/ca.crt  
    
    # SSL/TLS root certificate (ca).
    
    cert /etc/openvpn/easy-rsa/keys/openvpn-server.crt  
    
    # Certificate (cert).
    
    key /etc/openvpn/easy-rsa/keys/openvpn-server.key  
    
    # Private key (key). This file should be kept secret.
    
    dh /etc/openvpn/easy-rsa/keys/dh2048.pem  
    
    # Diffie hellman parameters.
    
    server 10.8.0.0 255.255.255.0  
    
    # Configure server mode and supply a VPN subnet.
    
    push "route 172.31.16.0 255.255.240.0"
    
    # Push routes to the client to allow it to reach other private subnets behind the server.
    
    keepalive 20 120  
    
    # The keepalive directive causes ping-like messages to be sent back and forth over the link so that each side knows when the other side has gone down.
    
    cipher AES-256-CBC  
    
    # Select a cryptographic cipher.
    
    persist-key  
    
    persist-tun
    
    # The persist options will try to avoid accessing certain resources on restart that may no longer be accessible because of the privilege downgrade.
    
    status /var/log/openvpn/openvpn-status.log  
    
    # Output a short status file.
    
    log /var/log/openvpn/openvpn.log  
    
    # Use log or log-append to override the default log location.
    
    verb 3  
    
    # Set the appropriate level of log file verbosity.

    Note: Change the certificate paths according to your environment. 

    And then, start the OpenVPN service using the created configuration file:

    $ systemctl start openvpn@server

    Check if the service is listening in the correct port:

    $ netstat -pltn |grep openvpn
    
    tcp        0 0 0.0.0.0:1194            0.0.0.0:* LISTEN   20002/openvpn

    Finally, in the OpenVPN server, you need to add the IP forwarding line in the sysctl.conf file to allow the VPN traffic:

    $ echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf

    And run:

    $ sysctl -p
    
    net.ipv4.ip_forward = 1

    Now, let’s see how to configure an OpenVPN client to connect to this new VPN.

    Configuring the OpenVPN Client

    In the previous point, we mentioned the OpenVPN sample configuration files, and we used the server one, so now let’s do the same but using the client configuration file.

    Copy the file client.conf from /usr/share/doc/openvpn/examples/sample-config-files/ in the corresponding location and change it as you wish.

    $ cp /usr/share/doc/openvpn/examples/sample-config-files/client.conf /etc/openvpn/

    You’ll also need the following certificates created previously to configure the VPN client:

    ca.crt
    
    openvpn-client-1.crt
    
    openvpn-client-1.key

    So, copy these files to your local machine or VM. You’ll need to add this files location in the VPN client configuration file.

    Now, let’s see a basic client configuration file:

    $ cat /etc/openvpn/client.conf
    
    client  
    
    # Specify that we are a client
    
    dev tun  
    
    # Use the same setting as you are using on the server.
    
    proto tcp  
    
    # Use the same setting as you are using on the server.
    
    remote 18.224.138.210 1194  
    
    # The hostname/IP and port of the server.
    
    resolv-retry infinite  
    
    # Keep trying indefinitely to resolve the hostname of the OpenVPN server.
    
    nobind  
    
    # Most clients don't need to bind to a specific local port number.
    
    persist-key  
    
    persist-tun
    
    # Try to preserve some state across restarts.
    
    ca /Users/sinsausti/ca.crt  
    
    cert /Users/sinsausti/openvpn-client-1.crt
    
    key /Users/sinsausti/openvpn-client-1.key
    
    # SSL/TLS parms.
    
    remote-cert-tls server  
    
    # Verify server certificate.
    
    cipher AES-256-CBC  
    
    # Select a cryptographic cipher.
    
    verb 3  
    
    # Set log file verbosity.

    Note: Change the certificate paths according to your environment. 

    You can use this file to connect to the OpenVPN server from different Operating Systems like Linux, macOS, or Windows.

    In this example, we’ll use the application Tunnelblick to connect from a macOS client. Tunnelblick is a free, open source graphic user interface for OpenVPN on macOS. It provides easy control of OpenVPN clients. It comes with all the necessary packages like OpenVPN, EasyRSA, and tun/tap drivers.

    As the OpenVPN configuration files have extensions of .tblk, .ovpn, or .conf, Tunnelblick can read all of them.

    To install a configuration file, drag and drop it on the Tunnelblick icon in the menu bar or on the list of configurations in the 'Configurations' tab of the 'VPN Details' window.

    And then, press on “Connect”.

    Now, you should have some new routes in your client machine:

    $ netstat -rn # or route -n on Linux OS
    
    Destination        Gateway Flags        Netif Expire
    
    10.8.0.1/32        10.8.0.5 UGSc         utun5
    
    10.8.0.5           10.8.0.6 UH           utun5
    
    172.31.16/20       10.8.0.5 UGSc         utun5

    As you can see, there is a route to the local database network via the VPN interface, so you should be able to  access the database service using the Private Database IP Address.

    $ mysql -p -h172.31.30.248
    
    Enter password:
    
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    
    Your MySQL connection id is 13
    
    Server version: 8.0.18-9 Percona Server (GPL), Release '9', Revision '53e606f'
    
    
    
    Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
    
    
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    
    affiliates. Other names may be trademarks of their respective
    
    owners.
    
    
    
    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    
    
    
    mysql>

    It’s working. Now you have your traffic secured using a VPN to connect to your database node.

    Conclusion

    Protecting your data is a must if you’re accessing it over the internet, on-prem, or on a mixed environment. You must know how to encrypt and secure your remote access. 

    As you could see, with OpenVPN you can reach the remote database using the local network through an encrypted connection using self-signed certificates. So, OpenVPN looks like a great option for this task. It’s an open source solution, and the installation/configuration is pretty easy. We used a basic OpenVPN server configuration, so you can look for more complex configuration in the OpenVPN official documentation to improve your OpenVPN server.

    by Sebastian Insausti at January 07, 2020 07:42 PM

    January 06, 2020

    SeveralNines

    How to Configure ClusterControl to Run on NGINX

    ClusterControl uses the Apache HTTP Server to serve its web interface, but it is also possible to use nginx. nginx + PHP fastcgi is well-known for its capabilities to run on a small memory footprint compared to standard Apache + PHP DSO.

    In this post, we will show you how to run ClusterControl 1.7.5 and later on nginx web server by swapping out the default Apache web server installed during the initial deployment. This blog post does not mean that we officially support nginx, it just an alternative way that a portion of our users have been interested in.

    Apache Configuration

    Before we jump into nginx configurations, let’s look at how ClusterControl web application is configured with Apache web server. ClusterControl consists of a number of components, and some of them require specific Apache module to run properly:

    • ClusterControl UI - Requires Apache rewrite module + PHP 5.4 and later
    • ClusterControl Controller
    • ClusterControl Notifications - Requires Apache rewrite module
    • ClusterControl SSH - Requires Apache 2.4 proxy module (wstunnel for web socket)
    • ClusterControl Cloud

    ClusterControl UI is located in the Apache’s document root which might vary depending on the operating system. For legacy OS distribution like Ubuntu 14.04 LTS and Debian 8, the Apache's document root is located at /var/www. For more recent OS distributions, most of them are now running with Apache 2.4 with /var/www/html as the default document root.

    Step One

    Make sure ClusterControl UI exists in the Apache document root. Document root for RedHat/CentOS and Ubuntu 14.04 LTS (Apache 2.4) is located at /var/www/html while Debian and Ubuntu 12.04 and lower is located at /var/www. ClusterControl UI will be installed under this document root directory and you should see something like this:

    $ ls -al /var/www/html
    
    total 16
    
    drwxr-xr-x 4 root   root 4096 Aug 8 11:42 .
    
    drwxr-xr-x 4 root   root 4096 Dec 19 03:32 ..
    
    dr-xr-xr-x 6 apache apache 4096 Dec 19 03:38 clustercontrol
    
    drwxrwx--- 3 apache apache 4096 Dec 19 03:29 cmon

    Step Two

    Apache must be able to read custom configuration file (.htaccess) under the document root directory. Thus the installer script will generate a configuration file and set the global AllowOverride option to All. Example in /etc/httpd/conf.d/s9s.conf:

        <Directory />
    
                Options +FollowSymLinks
    
                AllowOverride All
    
        </Directory>
    
        <Directory /var/www/html>
    
                Options +Indexes +FollowSymLinks +MultiViews
    
                AllowOverride All
    
                Require all granted
    
        </Directory>

    Step Three

    ClusterControl also requires the following rewrite rules:

        RewriteEngine On
    
        RewriteRule ^/clustercontrol/ssh/term$ /clustercontrol/ssh/term/ [R=301]
    
        RewriteRule ^/clustercontrol/ssh/term/ws/(.*)$ ws://127.0.0.1:9511/ws/$1 [P,L]
    
        RewriteRule ^/clustercontrol/ssh/term/(.*)$ http://127.0.0.1:9511/$1 [P]
    
        RewriteRule ^/clustercontrol/sse/events/(.*)$ http://127.0.0.1:9510/events/$1 [P,L]

    The first 3 URL rewrite rules indicate that ClusterControl SSH URL will be rewritten to use WebSocket tunneling on port 9511. This allows ClusterControl users to access the monitored nodes via SSH directly inside the ClusterControl UI.

    You may also notice another line with "sse/events" where the URL will be rewritten to port 9510 for cmon-events integration. Application cmon-events is a binary comes within ClusterControl Notifications package for notification integration with 3rd-party software like Slack, Telegram, Pagerduty and web hooks.

    Step Four

    Thus, ClusterControl suite requires the following PHP/Apache modules to be installed and enabled:

    • common
    • mysql
    • ldap
    • gd
    • curl
    • mod_proxy (websocket)

    The standard Apache installation via package manager will install PHP to run as dynamic shared object (DSO). Running on this mode will require you to restart Apache in case of PHP configuration changes.

    The following command should install all required packages for ClusterControl:

    $ yum install httpd php php-mysql php-ldap php-gd php-curl mod_ssl #RHEL/CentOS
    
    $ apt-get install apache2 php5-common php5-mysql php5-ldap php5-gd libapache2-mod-php5 php5-json php5-curl #Debian/Ubuntu

    Step Five

    The ClusterControl web components must be owned by Apache web server user ("apache" for RHEL/CentOS and "www-data" for Debian/Ubuntu).

    Switching from Apache to nginx

    We would need to configure nginx to behave similarly to our Apache configuration, as most of the Severalnines tools assume that ClusterControl is running on Apache. 

    Step One

    Install ClusterControl via the installer script:

    $ wget https://severalnines.com/downloads/cmon/install-cc
    
    $ chmod 755 install-cc
    
    $ ./install-cc

    The above will install ClusterControl and its components on top of Apache web server.

    Step Two

    Enable nginx repository. Depending on your operating system, please refer to this installation guide for details.

    Step Three

    Install nginx and PHP FPM:

    $ yum install nginx php-fpm -y #RHEL/CentOS
    
    $ sudo apt-get install nginx php5-fpm -y #Debian/Ubuntu

    Step Four

    Take note that removing Apache2 directly might cause dependent PHP packages to be uninstalled as well. So we take a safer approach by just turning it off and disabling it to start on boot:

    Systemd:

    $ systemctl stop httpd
    
    $ systemctl disable httpd

    Sysvinit RHEL/CentOS:

    $ chkconfig httpd off
    
    $ service httpd stop

    Sysvinit Debian/Ubuntu:

    $ sudo update-rc.d -f apache2 remove
    
    $ sudo service apache2 stop

    Step Five

    Open the nginx default virtual host configuration file (RHEL/CentOS: /etc/nginx/conf.d/default.conf, Debian/Ubuntu: /etc/nginx/sites-available/default) and make sure it contains the following lines:

    server {
    
            listen       0.0.0.0:80;
    
            server_name  localhost;
    
    
    
            access_log /var/log/nginx/localhost-access.log;
    
            error_log /var/log/nginx/localhost-error.log;
    
    
    
            root /var/www/html;
    
            index index.php;
    
    
    
            location ~ \.htaccess {
    
                    deny all;
    
            }
    
    
    
            location ~ \.php$ {
    
                    fastcgi_pass 127.0.0.1:9000;
    
                    fastcgi_index index.php;
    
                    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    
                    include /etc/nginx/fastcgi_params;
    
            }
    
    
    
            # Handle requests to /clustercontrol
    
            location /clustercontrol {
    
                    alias /var/www/html/clustercontrol/app/webroot;
    
                    try_files $uri $uri/ /clustercontrol/app/webroot/index.php;
    
            }
    
    
    
            # Equivalent of $is_args but adds an & character
    
            set $is_args_amp "";
    
            if ($is_args != "") {
    
                    set $is_args_amp "&";
    
            }
    
    
    
            # Handle requests to /clustercontrol/access
    
            location ~ "^/clustercontrol/access/(.*)$" {
    
                    try_files $uri $uri/ /clustercontrol/app/webroot/access/index.php?url=$1$is_args_amp$args;
    
            }
    
    
    
            # Handle requests to /clustercontrol/access2
    
            location ~ "^/clustercontrol/access2/(.*)$" {
    
                    try_files $uri $uri/ /clustercontrol/app/webroot/access2/index.php?url=$1$is_args_amp$args;
    
            }
    
    
    
            # Pass to cmon-events module
    
            location /clustercontrol/sse/events/ {
    
                    proxy_pass http://127.0.0.1:9510/events/;
    
            }
    
    
    
            # Pass to cmon-ssh module
    
            location /clustercontrol/ssh/term/ {
    
                    proxy_pass http://127.0.0.1:9511/;
    
            }
    
    
    
            # Pass cmon-ssh module via websocket
    
            location /clustercontrol/ssh/term/ws/ {
    
                    proxy_set_header X-Forwarded-Host $host:$server_port;
    
                    proxy_set_header X-Forwarded-Server $host;
    
                    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    
                    proxy_http_version 1.1;
    
                    proxy_set_header Upgrade $http_upgrade;
    
                    proxy_set_header Connection "upgrade";
    
                    proxy_pass http://127.0.0.1:9511/ws/;
    
            }
    
    
    
            # Handle requests to /clustercontrol/ssh
    
            location /clustercontrol/ssh/ {
    
                    try_files $uri $uri/ /clustercontrol/app/webroot/index.php?url=$1$is_args_amp$args;
    
            }
    
    
    
            # Redirect /clustercontrol/ssh/term to /term/
    
            rewrite ^/clustercontrol/ssh/term$ /clustercontrol/ssh/term/$1 permanent;
    
    
    
    }

    The above configuration example is specifically written to run ClusterControl UI on nginx in RHEL/CentOS. For other OS distributions, replace any occurrences of /var/www/html to its respective document root.

    Step Six

    Create a new virtual host configuration for HTTPS (optional):

    $ vim /etc/nginx/conf.d/s9s-ssl.conf #RHEL/CentOS
    
    $ vim /etc/nginx/sites-available/s9s-ssl #Debian/Ubuntu

    And make sure it contains the following lines:

    server {
    
            listen       443 ssl;
    
            server_name  localhost;
    
    
    
            access_log /var/log/nginx/localhost-access.log;
    
            error_log /var/log/nginx/localhost-error.log;
    
    
    
            # SSL cert and key path
    
            ssl_certificate      /etc/pki/tls/certs/s9server.crt;
    
            ssl_certificate_key  /etc/pki/tls/private/s9server.key;
    
    
    
            ssl_session_cache shared:SSL:1m;
    
            ssl_session_timeout  5m;
    
            ssl_ciphers  HIGH:!aNULL:!MD5;
    
            ssl_prefer_server_ciphers   on;
    
    
    
    
            root /var/www/html;
    
            index index.php;
    
    
    
            location ~ \.htaccess {
    
                    deny all;
    
            }
    
    
    
            location ~ \.php$ {
    
                    fastcgi_pass 127.0.0.1:9000;
    
                    fastcgi_index index.php;
    
                    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    
                    include /etc/nginx/fastcgi_params;
    
            }
    
    
    
            # Handle requests to /clustercontrol
    
            location /clustercontrol {
    
                    alias /var/www/html/clustercontrol/app/webroot;
    
                    try_files $uri $uri/ /clustercontrol/app/webroot/index.php;
    
            }
    
    
    
            # Equivalent of $is_args but adds an & character
    
            set $is_args_amp "";
    
            if ($is_args != "") {
    
                    set $is_args_amp "&";
    
            }
    
    
    
            # Handle requests to /clustercontrol/access
    
            location ~ "^/clustercontrol/access/(.*)$" {
    
                    try_files $uri $uri/ /clustercontrol/app/webroot/access/index.php?url=$1$is_args_amp$args;
    
            }
    
    
    
            # Handle requests to /clustercontrol/access2
    
            location ~ "^/clustercontrol/access2/(.*)$" {
    
                    try_files $uri $uri/ /clustercontrol/app/webroot/access2/index.php?url=$1$is_args_amp$args;
    
            }
    
    
    
            # Pass to cmon-events module
    
            location /clustercontrol/sse/events/ {
    
                    proxy_pass http://127.0.0.1:9510/events/;
    
            }
    
    
    
            # Pass to cmon-ssh module
    
            location /clustercontrol/ssh/term/ {
    
                    proxy_pass http://127.0.0.1:9511/;
    
            }
    
    
    
            # Pass cmon-ssh module via websocket
    
            location /clustercontrol/ssh/term/ws/ {
    
                    proxy_set_header X-Forwarded-Host $host:$server_port;
    
                    proxy_set_header X-Forwarded-Server $host;
    
                    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    
                    proxy_http_version 1.1;
    
                    proxy_set_header Upgrade $http_upgrade;
    
                    proxy_set_header Connection "upgrade";
    
                    proxy_pass http://127.0.0.1:9511/ws/;
    
            }
    
    
    
            # Handle requests to /clustercontrol/ssh
    
            location /clustercontrol/ssh/ {
    
                    try_files $uri $uri/ /clustercontrol/app/webroot/index.php?url=$1$is_args_amp$args;
    
            }
    
    
    
            # Redirect /clustercontrol/ssh/term to /term/
    
            rewrite ^/clustercontrol/ssh/term$ /clustercontrol/ssh/term/$1 permanent;
    
    
    
    }

    The above configuration example is specifically written to run ClusterControl UI on nginx in RHEL/CentOS. Replace any occurrences of the following:

    • /var/www/html to its respective document root for other OS distribution
    • /etc/pki/tls/certs/s9server.crt to /etc/ssl/certs/s9server.crt for Debian/Ubuntu
    • /etc/pki/tls/private/s9server.key to /etc/ssl/private/s9server.key for Debian/Ubuntu

    For Debian/Ubuntu, and extra step is needed to create a symlink for /etc/nginx/sites-enabled/default-ssl:
     

    $ sudo ln -sf /etc/nginx/sites-available/default-ssl /etc/nginx/sites-enabled/default-ssl

    Step Seven

    Enable and start nginx and php-fpm:

    Systemd:

    $ systemctl enable php-fpm
    
    $ systemctl enable nginx
    
    $ systemctl restart php-fpm
    
    $ systemctl restart nginx

    Sysvinit RHEL/CentOS:

    $ chkconfig php-fpm on
    
    $ chkconfig nginx on
    
    $ service php-fpm start
    
    $ service nginx start

    Sysvinit Debian/Ubuntu:

    $ sudo update-rc.d -f php-fpm defaults
    
    $ sudo update-rc.d -f nginx defaults
    
    $ sudo service php-fpm start
    
    $ sudo service nginx start

    Installation is now complete. At this point, PHP should run under fastcgi mode and nginx has taken over the web server role from Apache to serve ClusterControl UI. We can verify that with any web server detector extension on your preferred web browser:

    Caveats

    • Severalnines’s s9s_error_reporter might not get a complete error report on ClusterControl UI since it doesn’t collect any nginx related log files.
    • ClusterControl is built on a common Apache configuration. There might be some features that do not function well (although we have not encountered any malfunctions so far).
    • If you want to install ClusterControl manually on nginx (without using ClusterControl installer script), we recommend users to follow the Manual Installation documentation and install ClusterControl on Apache first. Then, follow the steps under "Switching from Apache to nginx" section to run on nginx.

    by ashraf at January 06, 2020 08:05 PM

    Federico Razzoli

    Understanding tables usage with User Statistics (Percona Server, MariaDB)

    Let's use Percona User Statistics to analyse our most used tables, and to look for problems where they mostly matter.

    by Federico Razzoli at January 06, 2020 01:32 PM

    January 03, 2020

    SeveralNines

    Tips for Delivering MySQL Database Performance - Part One

    The database backend affects the application, which can then impact organizational performance. When this happens, those in charge tend to want a quick fix. There are many different roads to improve performance in MySQL. As a very popular choice for many organizations, it's pretty common to find a MySQL installation with the default configuration. This might not, however, be appropriate for your workload and setup needs.

    In this blog, we will help you to better understand your database workload and the things that may cause harm to it. Knowledge of how to use limited resources is essential for anyone managing the database, especially if you run your production system on MySQL DB.

    To ensure that the database performs as expected, we will start with the free MySQL monitoring tools. We will then look at the related MySQL parameters you can tweak to improve the database instance. We will also take a look at indexing as a factor in database performance management. 

    To be able to achieve optimal usage of hardware resources, we’ll take a look into kernel optimization and other crucial OS settings. Finally, we will look into trendy setups based on MySQL Replication and how it can be examined in terms of performance lag. 

    Identifying MySQL Performance Issues

    This analysis helps you to understand the health and performance of your database better. The tools listed below can help to capture and understand every transaction, letting you stay on top of its performance and resource consumption.

    PMM (Percona Monitoring and Management)

    Percona Monitoring and Management tool is an open-source collection of tools dedicated to MySQL, MongoDB, and MariaDB databases (on-premise or in the cloud). PPM is free to use, and it's based on the well known Grafana and Prometheus time series DB. It Provides a thorough time-based analysis for MySQL.  It offers preconfigured dashboards that help to understand your database workload.

    PMM uses a client/server model. You'll have to download and install both the client and the server. For the server, you can use Docker Container. It's as easy as pulling the PMM server docker image, creating a container, and launching PMM.

    Pull PMM Server Image

    docker pull percona/pmm-server:2
    
    2: Pulling from percona/pmm-server
    
    ab5ef0e58194: Downloading  2.141MB/75.78MB
    
    cbbdeab9a179: Downloading  2.668MB/400.5MB

    Create PMM Container

    docker create \
    
       -v /srv \
    
       --name pmm-data \
    
       percona/pmm-server:2 /bin/true

    Run Container

    docker run -d \
    
       -p 80:80 \
    
       -p 443:443 \
    
       --volumes-from pmm-data \
    
       --name pmm-server \
    
       --restart always \
    
       percona/pmm-server:2

    You can also check how it looks without an installation. A demo of PMM is available here.

    Another tool that is part of PMM tools set is Query Analytics (QAN). QAN tool stays on top of the execution time of queries. You can even get details of SQL queries. It also gives a historical view of the different parameters that are critical for the optimal performance of a MySQL Database Server. This often helps to understand if any changes in the code could harm your performance. For example, a new code was introduced without your knowledge.  A simple use would be to display current SQL queries and highlight issues to help you improve the performance of your database.

    PMM offers point-in-time and historical visibility of MySQL database performance. Dashboards can be customized to meet your specific requirements. You can even expand a particular panel to find the information you want about a past event.

    Free Database Monitoring with ClusterControl

    ClusterControl provides real-time monitoring of the entire database infrastructure. It supports various database systems starting with MySQL, MariaDB, PerconaDB, MySQL NDB Cluster, Galera Cluster (both Percona and MariaDB), MongoDB, PostgreSQL and TimescaleDB. The monitoring and deployment modules are free to use.

    ClusterControl consists of several modules. In the free ClusterControl Community Edition we can use:

    Performance advisors offer specific advice on how to address database and server issues, such as performance, security, log management, configuration, and capacity planning. Operational reports can be used to ensure compliance across hundreds of instances. However, monitoring is not management. ClusterControl has features like backup management, automated recovery/failover, deployment/scaling, rolling upgrades, security/encryption, load balancer management, and so on.

    Monitoring & Advisors

    The ClusterControl Community Edition offers free database monitoring which provides a unified view of all of your deployments across data centers and lets you drill down into individual nodes. Similar to PMM we can find dashboards based on real-time data. It’s to know what is happening now, with high-resolution metrics for better accuracy, pre-configured dashboards, and a wide range of third-party notification services for alerting.

    On-premises and cloud systems can be monitored and managed from one single point. Intelligent health-checks are implemented for distributed topologies, for instance, detection of network partitioning by leveraging the load balancer’s view of the database nodes.

    ClusterControl Workload Analytics in one of the monitoring components which can easily help you to track your database activities. It provides clarity into transactions/queries from applications. Performance exceptions are never expected, but they do occur and are easy to miss in a sea of data. Outlier discovery will get any queries that suddenly start to execute much slower than usual. It tracks the moving average and standard deviation for query execution times and detects/alerts when the difference between the value exceeds the mean by two standard deviations. 

    As we can see from the below picture, we were able to catch some queries that in between one day tend to change execution time on a specific time. 

    To install ClusterControl click here and download the installation script. The install script will take care of the necessary installation steps. 

    You should also check out the ClusterControl Demo to see it in action.

    You can also get a docker image with ClusterControl.

    $ docker pull severalnines/clustercontrol

    For more information on this, follow this article.

    MySQL Database Indexing

    Without an index, running that same query results in a scan of every row for the needed data. Creating an index on a field in a table creates extra data structure, which is the field value, and a pointer to the record it relates to. In other words, indexing produces a shortcut, with much faster query times on expansive tables. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. 

    Generally speaking, indexing works best on those columns that are the subject of the WHERE clauses in your commonly executed queries.

    Tables can have multiple indexes. Managing indexes will inevitably require being able to list the existing indexes on a table. The syntax for viewing an index is below.

    To check indexes on MySQL table run:

    SHOW INDEX FROM table_name;

    Since indices are only used to speed up the searching for a matching field within the records, it stands to reason that indexing fields used only for output would be simply a waste of disk space. Another side effect is that indexes may extend insert or delete operations, and thus when not needed, should be avoided.

    MySQL Database Swappiness

    On servers where MySQL is the only service running, it’s a good practice to set vm.swapiness = 1. The default setting is set to 60 which is not appropriate for a database system.

    vi /etc/sysctl.conf
    vm.swappiness = 1

    Transparent Huge Pages

    If you are running your MySQL on RedHat, make sure that Transparent Huge Pages is disabled.

    This can be checked by command:

    cat /proc/sys/vm/nr_hugepages
    0

    (0 means that transparent huge pages are disabled.)

    MySQL I/O Scheduler 

    In most distributions noop or deadline I/O schedulers should be enabled by default. To check it run

    cat /sys/block/sdb/queue/scheduler 

    MySQL Filesystem Options

    It’s recommended to use journaled file systems like xfs, ext4 or btrfs. MySQL works fine with all that of them and the differences more likely will come with supported maximum file size.

    • XFS (maximum filesystem size 8EB, maximum file size 8EB)
    • XT4 (maximum filesystem size 8EB, maximum file size 16TB)
    • BTRFS (maximum filesystem size 16EB, maximum file size 16EB)

    The default file system settings should apply fine.

    NTP Deamon

    It’s a good best practice to install NTP time server demon on database servers. Use one of the following system commands.

    #Red Hat
    yum install ntp
    #Debian
    sudo apt-get install ntp

    Conclusion

    This is all for part one. In the next article, we will continue with MySQL variables operating systems settings and useful queries to gather database performance status. 

    by Bart Oles at January 03, 2020 07:24 PM

    January 02, 2020

    SeveralNines

    Full MariaDB Encryption At-Rest and In-Transit for Maximum Data Protection - Part Two

    In the first part of this series, we have covered in-transit encryption configuration for MariaDB replication servers, where we configured client-server and replication encryptions. Taken from the first post, where we had partially configured our full encryption (as indicated by the green arrows on the left in the diagram) and in this blog post, we are going to complete the encryption setup with at-rest encryption to create a fully encrypted MariaDB replication setup.

    The following diagram illustrates our current setup and the final setup that we are going to achieve:

    At-Rest Encryption

    At-rest encryption means the data-at-rest like data files and logs are encrypted on the disk, makes it almost impossible for someone to access or steal a hard disk and get access to the original data (provided that the key is secured and not stored locally). Data-at-Rest Encryption, also known as Transparent Data Encryption (TDE), is supported in MariaDB 10.1 and later. Note that using encryption has an overhead of roughly 5-10%, depending on the workload and cluster type.

    For MariaDB, the following MariaDB components can be encrypted at-rest:

    • InnoDB data file (shared tablespace or individual tablespace, e.g, *.ibd and ibdata1)
    • Aria data and index files.
    • Undo/redo logs (InnoDB log files, e.g, ib_logfile0 and ib_logfile1).
    • Binary/relay logs.
    • Temporary files and tables.

    The following files can not be encrypted at the moment:

    • Metadata file (for example .frm files).
    • File-based general log/slow query log. Table-based general log/slow query log can be encrypted.
    • Error log.

    MariaDB's data-at-rest encryption requires the use of a key management and encryption plugins. In this blog post, we are going to use File Key Management Encryption Plugin, which is provided by default since MariaDB 10.1.3. Note that there are a number of drawbacks using this plugin, e.g, the key can still be read by root and MySQL user, as explained in the MariaDB Data-at-Rest Encryption page.

    Generating Key File

    Let's create a dedicated directory to store our at-rest encryption stuff:

    $ mkdir -p /etc/mysql/rest
    $ cd /etc/mysql/rest

    Create a keyfile. This is the core of encryption:

    $ openssl rand -hex 32 > /etc/mysql/rest/keyfile

    Append a string "1;" as the key identifier into the keyfile:

    $ echo '1;' 
    sed -i '1s/^/1;/' /etc/mysql/rest/keyfile

    Thus, when reading the keyfile, it should look something like this:

    $ cat /etc/mysql/rest/keyfile
    1;4eb5770dcfa691bc634cbcd3c6bed9ed4ccd0111f3d3b1dae2c51a90fbf16ed7

    The above simply means for key identifier 1, the key is 4eb... The key file needs to contain two pieces of information for each encryption key. First, each encryption key needs to be identified with a 32-bit integer as the key identifier. Second, the encryption key itself needs to be provided in hex-encoded form. These two pieces of information need to be separated by a semicolon.

    Create a password to encrypt the above key. Here we are going to store the password inside a file called "keyfile.passwd":

    $ echo -n 'mySuperStrongPassword' > /etc/mysql/rest/keyfile.passwd

    You could skip the above step if you would like to specify the password directly in the configuration file using file_key_management_filekey option. For example: file_key_management_filekey=mySuperStrongPassword

    But in this example, we are going to read the password that is stored in a file, thus we have to define the following line in the configuration file later on: 

    file_key_management_filekey=FILE:/etc/mysql/encryption/keyfile.passwd

    We are going to encrypt the clear text keyfile into another file called keyfile.enc, using password inside the password file:

    $  openssl enc -aes-256-cbc -md sha1 -pass file:/etc/mysql/rest/keyfile.passwd -in /etc/mysql/rest/keyfile -out /etc/mysql/rest/keyfile.enc

    When listing out the directory, we should see these 3 files:

    $ ls -1 /etc/mysql/rest/
    keyfile
    keyfile.enc
    keyfile.passwd

    The content of the keyfile.enc is simply an encrypted version of keyfile:

    To test out, we can decrypt the encrypted file using OpenSSL by providing the password file (keyfile.passwd):

    $ openssl aes-256-cbc -d -md sha1 -pass file:/etc/mysql/rest/keyfile.passwd -in /etc/mysql/rest/keyfile.enc
    1;4eb5770dcfa691bc634cbcd3c6bed9ed4ccd0111f3d3b1dae2c51a90fbf16ed7

    We can then remove the plain key because we are going to use the encrypted one (.enc) together with the password file:

    $ rm -f /etc/mysql/encryption/keyfile

    We can now proceed to configure MariaDB at-rest encryption.

    Configuring At-Rest Encryption

    We have to move the encrypted key file and password to the slaves to be used by MariaDB to encrypt/decrypt the data. Otherwise, an encrypted table being backed up from the master using physical backup like MariaDB Backup would be having a problem to read by the slaves (due to different key/password combination). Logical backup like mysqldump should work with different keys and passwords.

    On the slaves, create a directory to store at-rest encryption stuff:

    (slave1)$ mkdir -p /etc/mysql/rest
    (slave2)$ mkdir -p /etc/mysql/rest

    On the master, copy the encrypted keyfile and password file to the other slaves:

    (master)$ cd /etc/mysql/rest
    (master)$ scp keyfile.enc keyfile.passwd root@slave1:/etc/mysql/rest/
    (master)$ scp keyfile.enc keyfile.passwd root@slave2:/etc/mysql/rest/

    Protect the files from global access and assign "mysql" user as the ownership:

    $ chown mysql:mysql /etc/mysql/rest/*
    $ chmod 600 /etc/mysql/rest/*

    Add the following into MariaDB configuration file under [mysqld] or [mariadb] section:

    # at-rest encryption
    plugin_load_add              = file_key_management
    file_key_management_filename = /etc/mysql/rest/keyfile.enc
    file_key_management_filekey  = FILE:/etc/mysql/rest/keyfile.passwd
    file_key_management_encryption_algorithm = AES_CBC
    
    innodb_encrypt_tables            = ON
    innodb_encrypt_temporary_tables  = ON
    innodb_encrypt_log               = ON
    innodb_encryption_threads        = 4
    innodb_encryption_rotate_key_age = 1
    encrypt-tmp-disk-tables          = 1
    encrypt-tmp-files                = 1
    encrypt-binlog                   = 1
    aria_encrypt_tables              = ON

    Take note on the file_key_management_filekey variable, if the password is in a file, you have to prefix the path with "FILE:". Alternatively, you could also specify the password string directly (not recommended due to its verbosity): 

    file_key_management_filekey=mySuperStrongPassword

    Restart MariaDB server one node at a time, starting with the slaves:

    (slave1)$ systemctl restart mariadb
    (slave2)$ systemctl restart mariadb
    (master)$ systemctl restart mariadb

    Observe the error log and make sure MariaDB encryption is activated during start up:

    $ tail -f /var/log/mysql/mysqld.log
    ...
    2019-12-17  6:44:47 0 [Note] InnoDB: Encrypting redo log: 2*67108864 bytes; LSN=143311
    2019-12-17  6:44:48 0 [Note] InnoDB: Starting to delete and rewrite log files.
    2019-12-17  6:44:48 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 67108864 bytes
    2019-12-17  6:44:48 0 [Note] InnoDB: Setting log file ./ib_logfile1 size to 67108864 bytes
    2019-12-17  6:44:48 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
    2019-12-17  6:44:48 0 [Note] InnoDB: New log files created, LSN=143311
    2019-12-17  6:44:48 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
    2019-12-17  6:44:48 0 [Note] InnoDB: Creating shared tablespace for temporary tables
    2019-12-17  6:44:48 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
    2019-12-17  6:44:48 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
    2019-12-17  6:44:48 0 [Note] InnoDB: Waiting for purge to start
    2019-12-17  6:44:48 0 [Note] InnoDB: 10.4.11 started; log sequence number 143311; transaction id 222
    2019-12-17  6:44:48 0 [Note] InnoDB: Creating #1 encryption thread id 139790011840256 total threads 4.
    2019-12-17  6:44:48 0 [Note] InnoDB: Creating #2 encryption thread id 139790003447552 total threads 4.
    2019-12-17  6:44:48 0 [Note] InnoDB: Creating #3 encryption thread id 139789995054848 total threads 4.
    2019-12-17  6:44:48 0 [Note] InnoDB: Creating #4 encryption thread id 139789709866752 total threads 4.
    2019-12-17  6:44:48 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
    2019-12-17  6:44:48 0 [Note] Plugin 'FEEDBACK' is disabled.
    2019-12-17  6:44:48 0 [Note] Using encryption key id 1 for temporary files
    ...

    You should see lines indicating encryption initialization in the error log. At this point, the majority of the encryption configuration is now complete.

    Testing Your Encryption

    Create a test database to test on the master:

    (master)MariaDB> CREATE SCHEMA sbtest;
    (master)MariaDB> USE sbtest;

    Create a standard table without encryption and insert a row:

    MariaDB> CREATE TABLE tbl_plain (id INT AUTO_INCREMENT PRIMARY KEY, data VARCHAR(255));
    MariaDB> INSERT INTO tbl_plain SET data = 'test data';

    We can see the stored data in clear text when browsing the InnoDB data file using a hexdump tool:

    $ xxd /var/lib/mysql/sbtest/tbl_plain.ibd | less
    000c060: 0200 1c69 6e66 696d 756d 0002 000b 0000  ...infimum......
    000c070: 7375 7072 656d 756d 0900 0000 10ff f180  supremum........
    000c080: 0000 0100 0000 0000 0080 0000 0000 0000  ................
    000c090: 7465 7374 2064 6174 6100 0000 0000 0000  test data.......
    000c0a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

    Create an encrypted table and insert a row:

    MariaDB> CREATE TABLE tbl_enc (id INT AUTO_INCREMENT PRIMARY KEY, data VARCHAR(255)) ENCRYPTED=YES;
    MariaDB> INSERT INTO tbl_enc SET data = 'test data';

    We can't tell what is stored in InnoDB data file for encrypted tables:

    $ xxd /var/lib/mysql/sbtest/tbl_enc.ibd | less
    000c060: 0c2c 93e4 652e 9736 e68a 8b69 39cb 6157  .,..e..6...i9.aW
    000c070: 3cd1 581c 7eb9 84ca d792 7338 521f 0639  <.X.~.....s8R..9
    000c080: d279 9eb3 d3f5 f9b0 eccb ed05 de16 f3ac  .y..............
    000c090: 6d58 5519 f776 8577 03a4 fa88 c507 1b31  mXU..v.w.......1
    000c0a0: a06f 086f 28d9 ac17 8923 9412 d8a5 1215  .o.o(....#......

    Note that the metadata file tbl_enc.frm is not encrypted at-rest. Only the InnoDB data file (.ibd) is encrypted.

    When comparing the "plain" binary or relay logs, we can clearly see the content of it using hexdump tool:

    $ xxd binlog.000002 | less
    0000560: 0800 0800 0800 0b04 726f 6f74 096c 6f63  ........root.loc
    0000570: 616c 686f 7374 0047 5241 4e54 2052 454c  alhost.GRANT REL
    0000580: 4f41 442c 4c4f 434b 2054 4142 4c45 532c  OAD,LOCK TABLES,
    0000590: 5245 504c 4943 4154 494f 4e20 434c 4945  REPLICATION CLIE
    00005a0: 4e54 2c45 5645 4e54 2c43 5245 4154 4520  NT,EVENT,CREATE
    00005b0: 5441 424c 4553 5041 4345 2c50 524f 4345  TABLESPACE,PROCE
    00005c0: 5353 2c43 5245 4154 452c 494e 5345 5254  SS,CREATE,INSERT
    00005d0: 2c53 454c 4543 542c 5355 5045 522c 5348  ,SELECT,SUPER,SH
    00005e0: 4f57 2056 4945 5720 4f4e 202a 2e2a 2054  OW VIEW ON *.* T

    While for an encrypted binary log, the content looks gibberish:

    $ xxd binlog.000004 | less
    0000280: 4a1d 1ced 2f1b db50 016a e1e9 1351 84ba  J.../..P.j...Q..
    0000290: 38b6 72e7 8743 7713 afc3 eecb c36c 1b19  8.r..Cw......l..
    00002a0: 7b3f 6176 208f 0000 00dc 85bf 6768 e7c6  {?av .......gh..
    00002b0: 6107 5bea 241c db12 d50c 3573 48e5 3c3d  a.[.$.....5sH.<=
    00002c0: 3179 1653 2449 d408 1113 3e25 d165 c95b  1y.S$I....>%.e.[
    00002d0: afb0 6778 4b26 f672 1bc7 567e da96 13f5  ..gxK&.r..V~....
    00002e0: 2ac5 b026 3fb9 4b7a 3ef4 ab47 6c9f a686  *..&?.Kz>..Gl...

    Encrypting Aria Tables

    For Aria storage engine, it does not support the ENCRYPTED option in CREATE/ALTER statement since it follows the aria_encrypt_tables global option. Therefore, when creating an Aria table, simply create the table with ENGINE=Aria option:

    MariaDB> CREATE TABLE tbl_aria_enc (id INT AUTO_INCREMENT PRIMARY KEY, data VARCHAR(255)) ENGINE=Aria;
    MariaDB> INSERT INTO tbl_aria_enc(data) VALUES ('test data');
    MariaDB> FLUSH TABLE tbl_aria_enc;

    We can then verify the content of the table's data file (tbl_aria_enc.MAD) or index file (tbl_aria_enc.MAI) with hexdump tool. To encrypt an existing Aria table, the table needs to be re-built:

    MariaDB> ALTER TABLE db.aria_table ENGINE=Aria ROW_FORMAT=PAGE;

    This statement causes Aria to rebuild the table using the ROW_FORMAT table option. In the process, with the new default setting, it encrypts the table when it writes to disk.

    Encrypting General Log/Slow Query Log

    To encrypt general and slow query logs, we can set MariaDB log_output option to 'TABLE' instead of the default 'FILE':

    MariaDB> SET GLOBAL log_ouput = 'TABLE';

    However, MariaDB will by default create the necessary tables using CSV storage engine, which is not encrypted by MariaDB. No engines other than CSV, MyISAM or Aria are legal for the log tables. The trick is to rebuild the default CSV table with Aria storage engine, provided that aria_encrypt_tables option is set to ON. However, the respective log option must be turned off for the table alteration to succeed.

    Thus, the steps to encrypt general log table is:

    MariaDB> SET GLOBAL general_log = OFF;
    MariaDB> ALTER TABLE mysql.general_log ENGINE=Aria;
    MariaDB> SET GLOBAL general_log = ON;

    Similarly, for slow query log:

    MariaDB> SET GLOBAL slow_query_log = OFF;
    MariaDB> ALTER TABLE mysql.slow_log ENGINE=Aria;
    MariaDB> SET GLOBAL slow_query_log = ON;

    Verify the output of general logs within the server:

    MariaDB> SELECT * FROM mysql.general_log;
    +----------------------------+---------------------------+-----------+-----------+--------------+------------------------------+
    | event_time                 | user_host                 | thread_id | server_id | command_type | argument                     |
    +----------------------------+---------------------------+-----------+-----------+--------------+------------------------------+
    | 2019-12-17 07:45:53.109558 | root[root] @ localhost [] |        19 |     28001 |        Query | select * from sbtest.tbl_enc |
    | 2019-12-17 07:45:55.504710 | root[root] @ localhost [] |        20 |     28001 |        Query | select * from general_log    |
    +----------------------------+---------------------------+-----------+-----------+--------------+------------------------------+

    As well as the encrypted content of the Aria data file inside data directory using hexdump tool:

    $ xxd /var/lib/mysql/mysql/general_log.MAD | less
    0002040: 1d45 820d 7c53 216c 3fc6 98a6 356e 1b9e  .E..|S!l?...5n..
    0002050: 6bfc e193 7509 1fa7 31e2 e22a 8f06 3c6f  k...u...1..*..<o
    0002060: ae71 bb63 e81b 0b08 7120 0c99 9f82 7c33  .q.c....q ....|3
    0002070: 1117 bc02 30c1 d9a7 c732 c75f 32a6 e238  ....0....2._2..8
    0002080: d1c8 5d6f 9a08 455a 8363 b4f4 5176 f8a1  ..]o..EZ.c..Qv..
    0002090: 1bf8 113c 9762 3504 737e 917b f260 f88c  ...<.b5.s~.{.`..
    00020a0: 368e 336f 9055 f645 b636 c5c1 debe fbe7  6.3o.U.E.6......
    00020b0: d01e 028f 8b75 b368 0ef0 8889 bb63 e032  .....u.h.....c.2

    MariaDB at-rest encryption is now complete. Combine this with in-transit encryption we have done in the first post, our final architecture is now looking like this:

    Conclusion

    It's now possible to totally secure your MariaDB databases via encryption for protection against physical and virtual breach or theft. ClusterControl can help you maintain this type of security as well and you can download it for free here.

     

    by ashraf at January 02, 2020 10:45 AM

    January 01, 2020

    SeveralNines

    Full MariaDB Encryption At-Rest and In-Transit for Maximum Data Protection - Part One

    In this blog series, we are going to give you a complete walkthrough on how to configure a fully encrypted MariaDB server for at-rest and in-transit encryption, to ensure maximum protection of the data from being stolen physically or while transferring and communicating with other hosts. The basic idea is we are going to turn our "plain" deployment into a fully encrypted MariaDB replication, as simplified in the following diagram:

    We are going to configure a number of encryption components:

    • In-transit encryption, which consists of:
      • Client-server encryption
      • Replication encryption
    • At-rest encryption, which consists of:
      • Data file encryption
      • Binary/relay log encryption.

    Note that this blog post only covers in-transit encryption. We are going to cover at-rest encryption in the second part of this blog series.

    This deployment walkthrough assumed that we already have an already running MariaDB replication server. If you don't have one, you can use ClusterControl to deploy a new MariaDB replication within minutes, with fewer than 5 clicks. All servers are running on MariaDB 10.4.11 on CentOS 7 system.

    In-Transit Encryption

    Data can be exposed to risks both in transit and at rest and requires protection in both states. In-transit encryption protects your data if communications are intercepted while data moves between hosts through network, either from your site and the cloud provider, between services or between clients and the server.

    For MySQL/MariaDB, data is in motion when a client connects to a database server, or when a slave node replicates data from a master node. MariaDB supports encrypted connections between clients and the server using the TLS (Transport Layer Security) protocol. TLS is sometimes referred to as SSL (Secure Sockets Layer) but MariaDB does not actually use the SSL protocol for encrypted connections because its encryption is weak. More details on this at MariaDB documentation page.

    Client-Server Encryption

    In this setup we are going to use self-signed certificates, which means we do not use external parties like Google, Comodo or any popular Certificate Authority provider out there to verify our identity. In SSL/TLS, identity verification is the first step that must be passed before the server and client exchange their certificates and keys.

    MySQL provides a very handy tool called mysql_ssl_rsa_setup which takes care of the key and certificate generation automatically. Unfortunately, there is no such tool for MariaDB server yet. Therefore, we have to manually prepare and generate the SSL-related files for our MariaDB TLS needs.

    The following is a list of the files that we will generate using OpenSSL tool:

    • CA key - RSA private key in PEM format. Must be kept secret.
    • CA certificate - X.509 certificate in PEM format. Contains public key and certificate metadata.
    • Server CSR - Certificate signing request. The Common Name (CN) when filling the form is important, for example CN=mariadb-server
    • Server key - RSA private key. Must be kept secret.
    • Server cert - X.509 certificate signed by CA key. Contains public key and certificate metadata.
    • Client CSR - Certificate signing request. Must use a different Common Name (CN) than Server's CSR, for example CN=client1 
    • Client key - RSA private key. Must be kept secret.
    • Client cert - X.509 certificate signed by CA key. Contains public key and certificate metadata.

    First and foremost, create a directory to store our certs and keys for in-transit encryption:

    $ mkdir -p /etc/mysql/transit/
    $ cd /etc/mysql/transit/

    Just to give you an idea why we name the directory as mentioned is because in the next part of this blog series, we will create another directory for at-rest encryption at /etc/mysql/rest.

    Certificate Authority

    Generate a key file for our own Certificate Authority (CA):

    $ openssl genrsa 2048 > ca-key.pem
    Generating RSA private key, 2048 bit long modulus
    .......................+++
    ...............................................................................................................................................................................................................................................+++
    e is 65537 (0x10001)

    Generate a certificate for our own Certificate Authority (CA) based on the ca-key.pem generated before with expiration of 3650 days:

    $ openssl req -new -x509 -nodes -days 3650 -key ca-key.pem -out ca.pem
    You are about to be asked to enter information that will be incorporated
    into your certificate request.
    What you are about to enter is what is called a Distinguished Name or a DN.
    There are quite a few fields but you can leave some blank
    For some fields there will be a default value,
    If you enter '.', the field will be left blank.
    -----
    Country Name (2 letter code) [XX]:SE
    State or Province Name (full name) []:Stockholm
    Locality Name (eg, city) [Default City]:Stockholm
    Organization Name (eg, company) [Default Company Ltd]:Severalnines
    Organizational Unit Name (eg, section) []:
    Common Name (eg, your name or your server's hostname) []:CA
    Email Address []:info@severalnines.com

    Now we should have ca-key.pem and ca.pem under this working directory.

    Key and Certificate for Server

    Next, generate private key for the MariaDB server:

    $ openssl genrsa 2048 > server-key.pem
    Generating RSA private key, 2048 bit long modulus
    .............................................................................................................+++
    ..................................................................................................................+++
    e is 65537 (0x10001)

    A trusted certificate must be a certificate signed by a Certificate Authority whereby here, we are going to use our own CA because we trust the hosts in the network. Before we can create a signed certificate, we need to generate a request certificate called Certificate Signing Request (CSR).

    Create a CSR for MariaDB server. We are going to call the certificate as server-req.pem. This is not the certificate that we are going to use for MariaDB server. The final certificate is the one that will be signed by our own CA private key (as shown in the next step):

    $ openssl req -new -key server-key.pem -out server-cert.pem
    You are about to be asked to enter information that will be incorporated
    into your certificate request.
    What you are about to enter is what is called a Distinguished Name or a DN.
    There are quite a few fields but you can leave some blank
    For some fields there will be a default value,
    If you enter '.', the field will be left blank.
    -----
    Country Name (2 letter code) [XX]:SE
    State or Province Name (full name) []:Stockholm
    Locality Name (eg, city) [Default City]:Stockholm
    Organization Name (eg, company) [Default Company Ltd]:Severalnines
    Organizational Unit Name (eg, section) []:
    Common Name (eg, your name or your server's hostname) []:MariaDBServer
    Email Address []:info@severalnines.com
    
    Please enter the following 'extra' attributes
    to be sent with your certificate request
    A challenge password []:
    An optional company name []:

    Take note on the Common Name where we specified "MariaDBServer". This can be any name but the value must not be the same as the client certificate. Commonly, if the applications connect to the MariaDB server via FQDN or hostname (skip-name-resolve=OFF), you probably want to specify the MariaDB server's FQDN as the Common Name. Doing so allows you to connect with 

    We can then generate the final X.509 certificate (server-cert.pem) and sign the CSR (server-req.pem) with CA's certificate (ca.pem) and CA's private key (ca-key.pem):

    $ openssl x509 -req -in server-req.pem -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out server-cert.pem -days 3650 -sha256
    Signature ok
    subject=/C=SE/ST=Stockholm/L=Stockholm/O=Severalnines/CN=MariaDBServer/emailAddress=info@severalnines.com
    Getting CA Private Key

    At this point, this is what we have now:

    $ ls -1 /etc/mysql/transite
    ca-key.pem
    ca.pem
    server-cert.pem
    server-key.pem
    server-req.pem

    We only need the signed certificate (server-cert.pem) and the private key (server-key.pem) for the MariaDB server. The CSR (server-req.pem) is no longer required.

    Key and Certificate for the Client

    Next, we need to generate key and certificate files for the MariaDB client. The MariaDB server will only accept remote connection from the client who has these certificate files. 

    Start by generating a 2048-bit key for the client:

    $ openssl genrsa 2048 > client-key.pem
    Generating RSA private key, 2048 bit long modulus
    .............................................................................................................+++
    ..................................................................................................................+++
    e is 65537 (0x10001)

    Create CSR for the client called client-req.pem:

    $ openssl req -new -key client-key.pem -out client-req.pem
    You are about to be asked to enter information that will be incorporated
    into your certificate request.
    What you are about to enter is what is called a Distinguished Name or a DN.
    There are quite a few fields but you can leave some blank
    For some fields there will be a default value,
    If you enter '.', the field will be left blank.
    -----
    Country Name (2 letter code) [XX]:SE
    State or Province Name (full name) []:Stockholm
    Locality Name (eg, city) [Default City]:Stockholm
    Organization Name (eg, company) [Default Company Ltd]:Severalnines
    Organizational Unit Name (eg, section) []:
    Common Name (eg, your name or your server's hostname) []:Client1
    Email Address []:info@severalnines.com
    
    Please enter the following 'extra' attributes
    to be sent with your certificate request
    A challenge password []:
    An optional company name []:

    Pay attention to the Common Name where we specify "Client1". Specify any name that represents the client. This value must be different from the server's Common Name. For advanced usage, you can use this Common Name to allow certain user with certificate matching this value, for example:

    MariaDB> GRANT SELECT ON schema1.* TO 'client1'@'192.168.0.93' IDENTIFIED BY 's' REQUIRE SUBJECT '/CN=Client2';

    We can then generate the final X.509 certificate (client-cert.pem) and sign the CSR (client-req.pem) with CA's certificate (ca.pem) and CA's private key (ca-key.pem):

    $ openssl x509 -req -in client-req.pem -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out client-cert.pem -days 3650 -sha256
    Signature ok
    subject=/C=SE/ST=Stockholm/L=Stockholm/O=Severalnines/CN=Client1/emailAddress=info@severalnines.com
    Getting CA Private Key

    All certificates that we need for in-transit encryption setup are generated. Verify both certificates are correctly signed by the CA:

    $ openssl verify -CAfile ca.pem server-cert.pem client-cert.pem
    server-cert.pem: OK
    client-cert.pem: OK

    Configuring SSL for MariaDB

    Create a new directory on the every slave:

    (slave1)$ mkdir -p /etc/mysql/transit/
    (slave2)$ mkdir -p /etc/mysql/transit/

    Copy the encryption files to all slaves:

    $ scp -r /etc/mysql/transit/* root@slave1:/etc/mysql/transit/
    $ scp -r /etc/mysql/transit/* root@slave2:/etc/mysql/transit/

    Make sure the owner of the certs directory to the "mysql" user and change the permissions of all key files so it won't be readable globally:

    $ cd /etc/mysql/transit
    $ chown -R mysql:mysql *
    $ chmod 600 client-key.pem server-key.pem ca-key.pem

    Here is what you should see when listing out files under "transit" directory:

    $ ls -al /etc/mysql/transit
    total 32
    drwxr-xr-x. 2 root  root 172 Dec 14 04:42 .
    drwxr-xr-x. 3 root  root 24 Dec 14 04:18 ..
    -rw-------. 1 mysql mysql 1675 Dec 14 04:19 ca-key.pem
    -rw-r--r--. 1 mysql mysql 1383 Dec 14 04:22 ca.pem
    -rw-r--r--. 1 mysql mysql 1383 Dec 14 04:42 client-cert.pem
    -rw-------. 1 mysql mysql 1675 Dec 14 04:42 client-key.pem
    -rw-r--r--. 1 mysql mysql 1399 Dec 14 04:42 client-req.pem
    -rw-r--r--. 1 mysql mysql 1391 Dec 14 04:34 server-cert.pem
    -rw-------. 1 mysql mysql 1679 Dec 14 04:28 server-key.pem
    -rw-r--r--. 1 mysql mysql 1415 Dec 14 04:31 server-req.pem

    Next, we will enable the SSL connection for MariaDB. On every MariaDB host (master and slaves) edit the configuration file and add the following lines under [mysqld] section:

    ssl-ca=/etc/mysql/transit/ca.pem
    ssl-cert=/etc/mysql/transit/server-cert.pem
    ssl-key=/etc/mysql/transit/server-key.pem

    Restart MariaDB server one node at a time, starting from slaves and finally on the master:

    (slave1)$ systemctl restart mariadb
    (slave2)$ systemctl restart mariadb
    (master)$ systemctl restart mariadb

    After restarted, MariaDB is now capable of accepting plain connections by connecting to it without any SSL-related parameters or with encrypted connections, when you specify SSL-related parameter in the connection string.

    For ClusterControl users, you can enable client-server encryption a matter of clicks. Just go to ClusterControl -> Security -> SSL Encryption -> Enable -> Create Certificate -> Certificate Expiration -> Enable SSL:

    ClusterControl will generate the required keys, X.509 certificate and CA certificate and set up SSL encryption for client-server connections for all the nodes in the cluster. For MySQL/MariaDB replication, the SSL files will be located under /etc/ssl/replication/cluster_X, where X is the cluster ID on every database node. The same certificates will be used on all nodes and the existing ones might be overwritten. The nodes must be restarted individually after this job completes. We recommend that you first restart a replication slave and verify that the SSL settings work.

    To restart every node, go to ClusterControl -> Nodes -> Node Actions -> Restart Node. Do restart one node at a time, starting with the slaves. The last node should be the master node with force stop flag enabled:

    You can tell if a node is able to handle client-server encryption by looking at the green lock icon right next to the database node in the Overview grid:

    At this point, our cluster is now ready to accept SSL connection from MySQL users.

    Connecting via Encrypted Connection

    The MariaDB client requires all client-related SSL files that we have generated inside the server. Copy the generated client certificate, CA certificate and client key to the client host:

    $ cd /etc/mysql/transit
    $ scp client-cert.pem client-key.pem ca.pem root@client-host:~

    **ClusterControl generates the client SSL files under /etc/ssl/replication/cluster_X/on every database node, where X is the cluster ID.

    Create a database user that requires SSL on the master:

    MariaDB> CREATE SCHEMA sbtest;
    MariaDB> CREATE USER sbtest@'%' IDENTIFIED BY 'mysecr3t' REQUIRE SSL;
    MariaDB> GRANT ALL PRIVILEGES ON sbtest.* to sbtest@'%';

    From the client host, connect to the MariaDB server with SSL-related parameters. We can verify the connection status by using "STATUS" statement:

    (client)$ mysql -usbtest -p -h192.168.0.91 -P3306 --ssl-cert client-cert.pem --ssl-key client-key.pem --ssl-ca ca.pem -e 'status'
    ...
    Current user: sbtest@192.168.0.19
    SSL: Cipher in use is DHE-RSA-AES256-GCM-SHA384
    ...

    Pay attention to the SSL line where the cipher is used for the encryption. This means the client is successfully connected to the MariaDB server via encrypted connection. 

    At this point, we have encrypted the client-server connection to the MariaDB server, as represented by the green two-headed arrow in the following diagram:

    In the next part, we are going to encrypt replication connections between nodes.

    Replication Encryption

    Setting up encrypted connections for replication is similar to doing so for client/server connections. We can use the same client certificates, key and CA certificate to let the replication user access the master's server via encryption channel. This will indirectly enable encryption between nodes when slave IO thread pulls replication events from the master. 

    Let's configure this on one slave at a time. For the first slave, 192.168.0.92, add the following line under [client] section inside MariaDB configuration file:

    [client]
    ssl-ca=/etc/mysql/transit/ca.pem
    ssl-cert=/etc/mysql/transit/client-cert.pem
    ssl-key=/etc/mysql/transit/client-key.pem

    Stop the replication thread on the slave:

    (slave)MariaDB> STOP SLAVE;

    On the master, alter the existing replication user to force it to connect using SSL:

    (master)MariaDB> ALTER USER rpl_user@192.168.0.92 REQUIRE SSL;

    On the slave, test the connectivity to the master, 192.168.0.91 via mysql command line with --ssl flag:

    (slave)MariaDB> mysql -urpl_user -p -h192.168.0.91 -P 3306 --ssl -e 'status'
    ...
    Current user: rpl_user@192.168.0.92
    SSL: Cipher in use is DHE-RSA-AES256-GCM-SHA384
    ...

    Make sure you can get connected to the master host without error. Then, on the slave, specify the CHANGE MASTER statement with SSL parameters as below:

    (slave)MariaDB> CHANGE MASTER TO MASTER_SSL = 1, MASTER_SSL_CA = '/etc/mysql/transit/ca.pem', MASTER_SSL_CERT = '/etc/mysql/transit/client-cert.pem', MASTER_SSL_KEY = '/etc/mysql/transit/client-key.pem';

    Start the replication slave:

    (slave)MariaDB> START SLAVE;

    Verify that the replication is running okay with related SSL parameters:

    MariaDB> SHOW SLAVE STATUS\G
    ...
                  Slave_IO_Running: Yes
                 Slave_SQL_Running: Yes
                Master_SSL_Allowed: Yes
                Master_SSL_CA_File: /etc/mysql/transit/ca.pem
                   Master_SSL_Cert: /etc/mysql/transit/client-cert.pem
                    Master_SSL_Key: /etc/mysql/transit/client-key.pem
    ...

    The slave is now replicating from the master securely via TLS encryption.

    Repeat all of the above steps on the remaining slave, 192.168.0.93. The only difference is the alter user statement to be executed on the master where we have to change to its respective host:

    (master)MariaDB> ALTER USER rpl_user@192.168.0.93 REQUIRE SSL;

    At this point we have completed in-transit encryption as illustrated by the green lines from master to slaves in the following diagram:

    You can verify the encryption connection by looking at the tcpdump output for interface eth1 on the slave. The following is an example of standard replication without encryption:

    (plain-slave)$ tcpdump -i eth1 -s 0 -l -w - 'src port 3306 or dst port 3306' | strings
    tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
    H"-'
    binlog.000008Ulw
    binlog.000008Ulw
    sbtest
    sbtest
    create table t1 (id INT AUTO_INCREMENT PRIMARY KEY, data VARCHAR(255))
    binlog.000008
    sbtest
    BEGIN3
    sbtest
    test data3
    Ok*Z
    binlog.000008*Z
    
    ^C11 packets captured
    11 packets received by filter
    0 packets dropped by kernel

    We can clearly see the text as read by the slave from the master. While on an encrypted connection, you should see gibberish characters like below:

    (encrypted-slave)$ tcpdump -i eth1 -s 0 -l -w - 'src port 3306 or dst port 3306' | strings
    tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
    :|f^yb#
    O5~_
    @#PFh
    k)]O
    jtk3c
    @NjN9_a
    !\-@
    NrF
    ?7&Y
    
    ^C6 packets captured
    6 packets received by filter
    0 packets dropped by kernel

    Conclusion

    In the next part of this blog series we are going to look into completing our fully encrypted setup with MariaDB at-rest encryption. Stay tuned!

    by ashraf at January 01, 2020 10:45 AM

    December 31, 2019

    SeveralNines

    An Overview of Multi-Document ACID Transactions in MongoDB and How to Use Them

    Database systems have a mandate to guarantee data consistency and integrity especially when critical data is involved. These aspects are enforced through ACID transactions in MongoDB. An ACID transaction should meet some defined rules for data validity before making any updates to the database otherwise it should be aborted and no changes shall be made to the database. All database transactions are considered as a single logical operation and during the execution time the database is put in an inconsistent state until the changes have been committed. Operations that successfully change the state of the database are termed as write transactions whereas those that do not update the database but only retrieve data are referred to as read-only transactions. ACID is an acronym for Atomicity, Consistency, Isolation, and Durability. 

    A database is a shared resource that can be accessed by different users at different or at the same time. For this reason, concurrent transactions may happen and if not well managed, they may result in system crashes, hardware failure, deadlock, slow database performance or repetition in the execution of the same transaction.

    What Are ACID Rules?

    All database systems must meet the ACID properties in order to guarantee data integrity.

    Atomicity

    A transaction is considered as a single unit of operation which can either succeed completely or fail completely. A transaction cannot be executed partially. If any condition consulting a transaction fails, the entire transaction will fail completely and the database will remain unchanged. For example, if you want to transfer funds from account X to Y, here there are two transactions, the first one is to remove funds from X and the second one is to record the funds in Y. If the first transaction fails, the whole transaction will be aborted

    Consistency

    When an operation is issued, before execution, the database is in a consistent state and it should remain so after every transaction. Even if there is an update, the transaction should always bring the database to a valid state, maintaining the database invariants. For instance, you cannot delete a primary key which has been referenced as a foreign key in another collection.  All data must meet the defined constraints to prevent data corruption from an illegal transaction.

    Isolation

    Multiple transactions running concurrently are executed without affecting each other and their result should be the same if they were to be executed sequentially. When two or more transactions modify the same documents in MongoDB, there may be a conflict. The database will detect a conflict immediately before it is committed. The first operation to acquire a lock on the document will continue whereas the other will fail and a conflict error message will be presented.

    Durability

    This dictates that, once the transaction has been committed, the changes should be upheld at all times even at an event of a system failure for example due to power outages or internet disconnection. 

    MongoDB ACID Transactions

    MongoDB is a document based  NoSQL database with a flexible schema. Transactions are not operations that should be executed for every write operation  since they incur a greater performance cost over a single document writes. With a document based structure and denormalized data model, there will be a minimized need for transactions. Since MongoDB allows document embedding, you don’t necessarily need to use a transaction to meet a write operation.

    MongoDB version 4.0 provides multi-document transaction support for replica set deployments only and probably the version 4.2 will extend support for sharded deployments (per their release notes). 

    Example of a transaction:

    Ensure you have a replica set in place first. Assuming you have a database called app and a collection users in the Mongo Shell run the following commands:

    $mongos and you should see something like username:PRIMARY>

    $use app
    
    $db.users.insert([{_id:1, name: ‘Brian’}, {_id:2, name: ‘Sheila’}, {_id:3, name: ‘James’}])

    We need to start a session for our transaction:

    $db.getMongo().startSession() and you should see something like 
    
    session { "id" : UUID("dcfa8de5-627d-3b1c-a890-63c9a355520c") }

    Using this session we can add more users using a transaction with the following commands 

    $session.startTransaction()
    
    session.getDatabase(‘app’).users.insert({_id:4, name:  ‘Hitler’})

    You will be presented with WriteResult({“nInsterted”: 2})

    The transaction has not yet been committed and the normal $db.users.find({}) will give us the previously saved users only. But if we run the 

    $session.getDatabase(“app”).users.find()

    the last added record will be available in the returned results. To commit this transaction, we run the command below

    $session.commitTransaction()

    The transaction modification is stored in memory that is why even after failure, the data will be available on recovery.

    Multi-Document ACID Transactions in MongoDB

    These are multi-statement operations that need to be executed sequentially without affecting each other. For the sample above we can create two transactions, one to add a user and another to update a user with a field of age. I.e.

    $session.startTransaction()
    
       db.users.insert({_id:6, name “Ibrahim”})
    
       db.users.updateOne({_id:3 , {$set:{age:50}}})
    
    session.commit_transaction()

    Transactions can be applied to operations against multiple documents contained in one or many collection/database. Any changes due to document transaction do not impact performance for workloads not related or do not require them. Until the transaction is committed, uncommitted writes are neither replicated to the secondary nodes nor are they readable outside the transactions.

    Best Practices for MongoDB Transactions

    The multi-document transactions are only supported in the WiredTiger storage engine. As mentioned before, very few applications would require transactions and if so, we should try to make them short. Otherwise, for a single ACID transaction, if you try performing an excessive number of operations, it can result in high pressure on the WiredTiger cache. The cache is always dictated to maintain state for all subsequent writes since the oldest snapshot was created. This means new writes will accumulate in the cache throughout the duration of the transaction and will be flushed only after transactions currently running on old snapshots are committed or aborted. For the best database performance on the transaction, developers should consider:

    1. Always modify a small number of documents in a transaction. Otherwise, you will need to break the transaction into different parts and process the documents in different batches. At most, process 1000 documents at a time.
    2. Temporary exceptions such as awaiting to elect primary and transient network hiccups may result in abortion of the transaction. Developers should establish a logic to retry the transaction if the defined errors are presented.
    3. Configure optimal duration for the execution of the transaction from the default 60 seconds provided by MongoDB. Besides, employ indexing so that it can allow fast data access within the transaction.  You also have the flexibility to fine-tune the transaction in addressing timeouts by breaking it into batches that allow its execution within the time limits.
    4. Decompose your transaction into a small set of operation so that it fits the 16MB size constraints. Otherwise, if the operation together with oplog description exceed this limit, the transaction will be aborted.
    5. All data relating to an entity should be stored in a single, rich document structure. This is to reduce the number of documents that are to be cached when different fields are going to be changed.

    Limitations of Transactions

    1. You cannot create or drop a collection inside a transaction.
    2. Transactions cannot make writes to a capped collection
    3. Transactions take plenty of time to execute and somehow they can slow the performance of the database.
    4. Transaction size is limited to 16MB requiring one to split any that tends to exceed this size into smaller transactions.
    5. Subjecting a large number of documents to a transaction may exert excessive pressure on the WiredTiger engine and since it relies on the snapshot capability, there will be a retention of large unflushed operations in memory. This renders some performance cost on the database.

    Conclusion

    MongoDB version 4.0 introduced the multi-document transaction support for replica sets as a feature of improving data integrity and consistency. However, there are very few applications that would require transactions when using MongoDB. There are limitations against this feature that make it considerably little bit immature as far as the transactions concept is concerned. For instance, transactions for a sharded cluster are not supported and they cannot be larger than 16MB size limit. Data modeling provides a better structure for reducing transactions in your database. Unless you are dealing with special cases, it will be a better practice to avoid transactions in MongoDB.

    by Onyancha Brian Henry at December 31, 2019 10:45 AM

    December 30, 2019

    SeveralNines

    Cloud Vendor Deep-Dive: PostgreSQL on DigitalOcean

    DigitalOcean is a cloud service provider, more of an IaaS (Infrastructure-as-a-Service) provider which is more suitable for small to medium scale businesses. You can get to know more about DigitalOcean here. What it does is a bit different to other cloud vendors like AWS or Azure and is not heavily global yet, take a look at this video which compares DigitalOcean with AWS. 

    They provide a geographically distributed computing platform in the form of virtual machines where-in businesses can deploy their applications on cloud infrastructure in an easy, fast and flexible manner. Their core focus is to provide cloud environments which are highly flexible, easy-to-set-up and can scale for various types of workloads. 

    What attracted me in DigitalOcean is the “droplets” service. Droplets are Linux based VMs which can be created as a standalone or can be part of a large cloud infrastructure with a chosen Linux flavoured operating systems like CentOS, Ubuntu, etc. 

    PostgreSQL on DigitalOcean

    With DigitalOcean, building PostgreSQL environments can be done in two ways, one way is to build manually from scratch using droplets (only Linux based VMs) or the other way is to use managed services.

    DigitalOcean started managed services for PostgreSQL with an intention to speed up the provisioning of database servers in the form of VMs on a large cloud infrastructure. Otherwise, the only way is to build PostgreSQL environments is manually by using droplets. The supported capabilities with managed services are high-availability, automatic failover, logging, and monitoring. Alerting capability does not exist yet. 

    The managed services more-or-less are similar to AWS RDS. The PostgreSQL instances can be only accessed using UI, there is no access to host running the database instance. Managing, Monitoring, parameter configuration, everything must be done from a UI.

    PostgreSQL Compatibility with DigitalOcean

    You can build PostgreSQL environments on Digital Ocean with the droplets or go for managed services (similar to AWS RDS) which can really save your time. The only supported versions on managed services are 10 and 11. This means, businesses willing to leverage DigitalOcean’s PostgreSQL managed services will need to use/upgrade-to either version 10 or 11. Also, note that there is no support for Windows operating system. 

    This blog will focus on managed services.

    Managed PostgreSQL Services

    DigitalOcean started providing managed PostgreSQL database services since February 2019. The intention was to introduce a faster way to provisioning infrastructure with PostgreSQL instances which can save valuable time for infrastructure database professionals. Provisioning a PostgreSQL instance is rather simple.

    This can be done by logging to the DO account → go to a create database cluster page → choose the PostgreSQL version → choose the specs based on pricing → choose the location → click create. You are all good. Watch this video here for a better understanding.

    High Availability

    High Availability is one of the critical requirements for databases to ensure business continuity. It is imperative to ensure that high-availability meets the SLAs defined for RTO and RPO. DigitalOcean provides high-availability services in a faster and reliable manner.

    Pricing

    The pricing model in DigitalOcean is not complex. The price of the instance is directly proportional to the capacity and architecture of the instance. Below is an example of pricing for a standalone instance -

    The capacity and pricing which suites the requirement can be chosen from the available options. Minimum is $15 per month for 10GB of disk and 1vCPU. If high-availability is a requirement, standby node can be configured as well. The limitation is that, a standby node can be added only if the primary database size is of minimum 25 GB. And, only a maximum of 5 standby nodes can be added. Below are the standby options available

    If you can observe above, standby pricing is pretty simple and does not depend on the capacity. Adding one standby node will cost $20 irrespective of any size.

    Access

    PostgreSQL instances build using managed services can be accessed using GUIs and remotely via CLI in SSL mode only. However, PostgreSQL instances manually installed on droplets can be accessed via ssh.

    Data Centres

    DigitalOcean is not heavily global yet. The data centres are located in a few countries as shown below. Which means, it is not possible to deploy/run services for businesses running their services in countries other than the ones shown below.

    Advantages of PostgreSQL Managed Services

    Managed services for PostgreSQL is advantageous for various reasons. In my experience as a DBA, the requirement often arises to build environments for developers in a faster manner possible to perform functional, regression, and performance testing for releases. Generally, the approach would be to use tools like chef or puppet to build automation modules for applications and database environments and then use those templates to build cloud VMs. DigitalOcean’s managed services can be a great, efficient, and cost-effective option for such requirements as it is bound to be time saving. Let us take a look at the advantageous in detail -

    • Opting for managed services can save a lot of time for DBAs and Developers in building PostgreSQL environments from scratch. This means, there is no database administration and maintenance overhead.
    • PostgreSQL environments can be equipped with High-availability with automatic failover capability. 
    • Managed instances are designed to sustain disaster. Daily backups can be configured with the PITR (point-in-time-recovery) capability. Importantly, backups are free.
    • Managed PostgreSQL instances are designed to be highly scalable. DigitalOcean’s customers were able to achieve higher scalability with PostgreSQL instances and TimescaleDB extensions.
    • Dashboard can be configured to monitor log files and query performance.
    • Cost model of DigitalOcean is pretty simple.
    • As it is a cloud infrastructure, vertical scaling can be seamless.
    • Managed database instances are highly secured and optimized. A big part of the data retrieval is only possible via SSL based connections.
    • Documentation is available in good detail.

    Limitations of Running PostgreSQL on DigitalOcean

    • PostgreSQL versions 10 and 11 are supported, no other versions can be used.
    • Data centres of DigitalOcean are only available at limited geographical locations.
    • The number of standby nodes cannot exceed 5.
    • PITR cannot go beyond 7 days.
    • Not all extensions for PostgreSQL are supported, only selected extensions can be used.
    • The instances can only be up-sized. They cannot be downsized.
    • Superuser access is not allowed.
    • Alerting on certain thresholds is not available yet.
    • Managed database instances can only be restored to a new node when restoring from backups.

    Conclusion

    Managed PostgreSQL services offered by DigitalOcean is a great option for businesses looking for devops type solutions for PostgreSQL environments which can really help reduce time, planning, administration, and maintenance overhead involved in building high-scale and secured PostgreSQL environments for various workloads. Their pricing model is very simple and it can be a cost-effective option. It cannot, however, really be compared to the massive cloud service providers like AWS or Azure. DigitalOcean can surely benefit businesses with its innovative cloud solutions.

    by Venkata Nagothi at December 30, 2019 10:45 AM

    December 24, 2019

    Oli Sennhauser

    FromDual Performance Monitor for MariaDB and MySQL 1.1.0 has been released

    FromDual has the pleasure to announce the release of the new version 1.1.0 of its popular Database Performance Monitor for MariaDB, MySQL and Galera Cluster fpmmm.

    The FromDual Performance Monitor for MariaDB and MySQL (fpmmm) enables DBAs and System Administrators to monitor what is going on inside their MariaDB and MySQL databases and on their machines where the databases reside.

    More detailed information your can find in the fpmmm Installation Guide.

    Download

    The new FromDual Performance Monitor for MariaDB and MySQL (fpmmm) can be downloaded from here. How to install and use fpmmm is documented in the fpmmm Installation Guide.

    In case you find a bug in the FromDual Performance Monitor for MariaDB and MySQL please report it to the FromDual Bugtracker or just send us an email.

    Any feedback, statements and testimonials are welcome as well! Please send them to us.

    Monitoring as a Service (MaaS)

    You do not want to set-up your Database monitoring yourself? No problem: Choose our MariaDB and MySQL Monitoring as a Service (Maas) program to safe time and costs!

    Installation of Performance Monitor 1.1.0

    A complete guide on how to install FromDual Performance Monitor you can find in the fpmmm Installation Guide.

    Upgrade from 1.0.x to 1.1.0

    shell> cd /opt
    shell> tar xf /download/fpmmm-1.1.0.tar.gz
    shell> rm -f fpmmm
    shell> ln -s fpmmm-1.1.0 fpmmm
    

    Changes in FromDual Performance Monitor for MariaDB and MySQL 1.1.0

    This release contains various bug fixes.

    You can verify your current FromDual Performance Monitor for MariaDB and MySQL version with the following command:

    shell> fpmmm --version
    

    General

    • fpmmm is now available for Cent OS with RPM packages and for Ubuntu with DEB packages.
    • MariaDB 10.4 seems to work and thus is officially declared as supported.
    • TimeZone made configurable.
    • Error printed to STDOUT changed to STDERR.
    • Return codes made unique.
    • De-support PHP versions older than 7.0.
    • All old PHP 5.5 stuff removed, we need now at least PHP 7.0.
    • Cosmetic fixes and error handling improved.

    fpmmm agent

    • Error message typo fixed.
    • All mpm remainings removed.
    • Upload: Error exit handling improved.

    fpmmm Templates

    • InnoDB Template: Links to mysql-forum replaced by links to fromdual.com.
    • Templates: Zabbix 4.0 templates added and tpl directory restructured.

    fpmmm Modules

    • Backup: Backup hook added to templates as example.
    • InnoDB: InnoDB buffer pool flushing data and graph added.
    • InnoDB: innodb_metrics replacing mostly SHOW ENGINE INNODB STATUS.
    • InnoDB: Started replacing SHOW ENGINE INNODB STATUS by I_S.innodb_metrics with Adaptive Hash Index (AHI).
    • InnoDB: innodb_file_format removed.
    • InnoDB: InnoDB files items and graph added.
    • InnoDB: Negative values of innodb_buffer_pool_pages_misc_b fixed.
    • InnoDB: Bug report of Wang Chao about InnoDB Adaptive Hash Index (AHI) size fixed.
    • Memcached: Memcached module fixed.
    • MySQL: MariaDB thread pool items and graph added.
    • MySQL: Slow Queries item fixed and graph added.
    • Server: Smartmon monitor added to monitor HDD/SSD.
    • Server: Server module made more robust and numactl replaced by cpuinfo.
    • Server: Server free function adapted according to Linux free command.
    • Server: Function getFsStatsLinux added for global file descriptor limits.
    • Aria: Aria cleaned-up, old mariadb_* variables removed, Aria transaction log graph added.
    • Aria: Aria pagecache blocks converted to bytes.

    fpmmm agent installer

    • No changes.

    For subscriptions of commercial use of fpmmm please get in contact with us.

    by Shinguz at December 24, 2019 11:34 AM

    December 20, 2019

    SeveralNines

    Maximizing Database Query Efficiency for MySQL - Part Two

    This is the second part of a two-part series blog for Maximizing Database Query Efficiency In MySQL. You can read part one here.

    Using Single-Column, Composite, Prefix, and Covering Index

    Tables that are frequently receiving high traffic must be properly indexed. It's not only important to index your table, but you also need to determine and analyze what are the types of queries or types of retrieval that you need for the specific table. It is strongly recommended that you analyze what type of queries or retrieval of data you need on a specific table before you decide what indexes are required for the table. Let's go over these types of indexes and how you can use them to maximize your query performance.

    Single-Column Index

    InnoD table can contain a maximum of 64 secondary indexes. A single-column index (or full-column index) is an index assigned only to a particular column. Creating an index to a particular column that contains distinct values is a good candidate. A good index must have a high cardinality and statistics so the optimizer can choose the right query plan. To view the distribution of indexes, you can check with SHOW INDEXES syntax just like below:

    root[test]#> SHOW INDEXES FROM users_account\G
    
    *************************** 1. row ***************************
    
            Table: users_account
    
       Non_unique: 0
    
         Key_name: PRIMARY
    
     Seq_in_index: 1
    
      Column_name: id
    
        Collation: A
    
      Cardinality: 131232
    
         Sub_part: NULL
    
           Packed: NULL
    
             Null: 
    
       Index_type: BTREE
    
          Comment: 
    
    Index_comment: 
    
    *************************** 2. row ***************************
    
            Table: users_account
    
       Non_unique: 1
    
         Key_name: name
    
     Seq_in_index: 1
    
      Column_name: last_name
    
        Collation: A
    
      Cardinality: 8995
    
         Sub_part: NULL
    
           Packed: NULL
    
             Null: 
    
       Index_type: BTREE
    
          Comment: 
    
    Index_comment: 
    
    *************************** 3. row ***************************
    
            Table: users_account
    
       Non_unique: 1
    
         Key_name: name
    
     Seq_in_index: 2
    
      Column_name: first_name
    
        Collation: A
    
      Cardinality: 131232
    
         Sub_part: NULL
    
           Packed: NULL
    
             Null: 
    
       Index_type: BTREE
    
          Comment: 
    
    Index_comment: 
    
    3 rows in set (0.00 sec)

    You can inspect as well with tables information_schema.index_statistics or mysql.innodb_index_stats.

    Compound (Composite) or Multi-Part Indexes

    A compound index (commonly called a composite index) is a multi-part index composed of multiple columns. MySQL allows up to 16 columns bounded for a specific composite index. Exceeding the limit returns an error like below:

    ERROR 1070 (42000): Too many key parts specified; max 16 parts allowed

    A composite index provides a boost to your queries, but it requires that you must have a pure understanding on how you are retrieving the data. For example, a table with a DDL of...

    CREATE TABLE `user_account` (
    
      `id` int(11) NOT NULL AUTO_INCREMENT,
    
      `last_name` char(30) NOT NULL,
    
      `first_name` char(30) NOT NULL,
    
      `dob` date DEFAULT NULL,
    
      `zip` varchar(10) DEFAULT NULL,
    
      `city` varchar(100) DEFAULT NULL,
    
      `state` varchar(100) DEFAULT NULL,
    
      `country` varchar(50) NOT NULL,
    
      `tel` varchar(16) DEFAULT NULL
    
      PRIMARY KEY (`id`),
    
      KEY `name` (`last_name`,`first_name`)
    
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1

    ...which consists of composite index `name`. The composite index improves query performance once these keys are reference as used key parts. For example, see the following:

    root[test]#> explain format=json select * from users_account where last_name='Namuag' and first_name='Maximus'\G
    
    *************************** 1. row ***************************
    
    EXPLAIN: {
    
      "query_block": {
    
        "select_id": 1,
    
        "cost_info": {
    
          "query_cost": "1.20"
    
        },
    
        "table": {
    
          "table_name": "users_account",
    
          "access_type": "ref",
    
          "possible_keys": [
    
            "name"
    
          ],
    
          "key": "name",
    
          "used_key_parts": [
    
            "last_name",
    
            "first_name"
    
          ],
    
          "key_length": "60",
    
          "ref": [
    
            "const",
    
            "const"
    
          ],
    
          "rows_examined_per_scan": 1,
    
          "rows_produced_per_join": 1,
    
          "filtered": "100.00",
    
          "cost_info": {
    
            "read_cost": "1.00",
    
            "eval_cost": "0.20",
    
            "prefix_cost": "1.20",
    
            "data_read_per_join": "352"
    
          },
    
          "used_columns": [
    
            "id",
    
            "last_name",
    
            "first_name",
    
            "dob",
    
            "zip",
    
            "city",
    
            "state",
    
            "country",
    
            "tel"
    
          ]
    
        }
    
      }
    
    }
    
    1 row in set, 1 warning (0.00 sec

    The used_key_parts show that the query plan has perfectly selected our desired columns covered in our composite index.

    Composite indexing has its limitations as well. Certain conditions in the query cannot take all columns part of the key.

    The documentation says, "The optimizer attempts to use additional key parts to determine the interval as long as the comparison operator is =, <=>, or IS NULL. If the operator is >, <, >=, <=, !=, <>, BETWEEN, or LIKE, the optimizer uses it but considers no more key parts. For the following expression, the optimizer uses = from the first comparison. It also uses >= from the second comparison but considers no further key parts and does not use the third comparison for interval construction…". Basically, this means that regardless you have composite index for two columns, a sample query below does not cover both fields:

    root[test]#> explain format=json select * from users_account where last_name>='Zu' and first_name='Maximus'\G
    
    *************************** 1. row ***************************
    
    EXPLAIN: {
    
      "query_block": {
    
        "select_id": 1,
    
        "cost_info": {
    
          "query_cost": "34.61"
    
        },
    
        "table": {
    
          "table_name": "users_account",
    
          "access_type": "range",
    
          "possible_keys": [
    
            "name"
    
          ],
    
          "key": "name",
    
          "used_key_parts": [
    
            "last_name"
    
          ],
    
          "key_length": "60",
    
          "rows_examined_per_scan": 24,
    
          "rows_produced_per_join": 2,
    
          "filtered": "10.00",
    
          "index_condition": "((`test`.`users_account`.`first_name` = 'Maximus') and (`test`.`users_account`.`last_name` >= 'Zu'))",
    
          "cost_info": {
    
            "read_cost": "34.13",
    
            "eval_cost": "0.48",
    
            "prefix_cost": "34.61",
    
            "data_read_per_join": "844"
    
          },
    
          "used_columns": [
    
            "id",
    
            "last_name",
    
            "first_name",
    
            "dob",
    
            "zip",
    
            "city",
    
            "state",
    
            "country",
    
            "tel"
    
          ]
    
        }
    
      }
    
    }
    
    1 row in set, 1 warning (0.00 sec)

    In this case (and if your query is more of ranges instead of constant or reference types) then avoid using composite indexes. It just wastes your memory and buffer and it increases the performance degradation of your queries.

    Prefix Indexes

    Prefix indexes are indexes which contain columns referenced as an index, but only takes the starting length defined to that column, and that portion (or prefix data) are the only part stored in the buffer. Prefix indexes can help lessen your buffer pool resources and also your disk space as it does not need to take the full-length of the column.What does this mean? Let's take an example. Let's compare the impact between full-length index versus the prefix index.

    root[test]#> create index name on users_account(last_name, first_name);
    
    Query OK, 0 rows affected (0.42 sec)
    
    Records: 0  Duplicates: 0  Warnings: 0
    
    
    
    root[test]#> \! du -hs /var/lib/mysql/test/users_account.*
    
    12K     /var/lib/mysql/test/users_account.frm
    
    36M     /var/lib/mysql/test/users_account.ibd

    We created a full-length composite index which consumes a total of 36MiB tablespace for users_account table. Let's drop it and then add a prefix index.

    root[test]#> drop index name on users_account;
    
    Query OK, 0 rows affected (0.01 sec)
    
    Records: 0  Duplicates: 0  Warnings: 0
    
    
    
    root[test]#> alter table users_account engine=innodb;
    
    Query OK, 0 rows affected (0.63 sec)
    
    Records: 0  Duplicates: 0  Warnings: 0
    
    
    
    root[test]#> \! du -hs /var/lib/mysql/test/users_account.*
    
    12K     /var/lib/mysql/test/users_account.frm
    
    24M     /var/lib/mysql/test/users_account.ibd
    
    
    
    
    
    
    root[test]#> create index name on users_account(last_name(5), first_name(5));
    
    Query OK, 0 rows affected (0.42 sec)
    
    Records: 0  Duplicates: 0  Warnings: 0
    
    
    
    root[test]#> \! du -hs /var/lib/mysql/test/users_account.*
    
    12K     /var/lib/mysql/test/users_account.frm
    
    28M     /var/lib/mysql/test/users_account.ibd

    Using the prefix index, it holds up only to 28MiB and that's less than 8MiB than using full-length index. That's great to hear, but it doesn't mean that is performant and serves what you need. 

    If you decide to add a prefix index, you must identify first what type of query for data retrieval you need. Creating a prefix index helps you utilize more efficiency with the buffer pool and so it does help with your query performance but you also need to know its limitation. For example, let's compare the performance when using a full-length index and a prefix index.

    Let's create a full-length index using a composite index,

    root[test]#> create index name on users_account(last_name, first_name);
    
    Query OK, 0 rows affected (0.45 sec)
    
    Records: 0  Duplicates: 0  Warnings: 0
    
    
    
    root[test]#>  EXPLAIN format=json select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' \G
    
    *************************** 1. row ***************************
    
    EXPLAIN: {
    
      "query_block": {
    
        "select_id": 1,
    
        "cost_info": {
    
          "query_cost": "1.61"
    
        },
    
        "table": {
    
          "table_name": "users_account",
    
          "access_type": "ref",
    
          "possible_keys": [
    
            "name"
    
          ],
    
          "key": "name",
    
          "used_key_parts": [
    
            "last_name",
    
            "first_name"
    
          ],
    
          "key_length": "60",
    
          "ref": [
    
            "const",
    
            "const"
    
          ],
    
          "rows_examined_per_scan": 3,
    
          "rows_produced_per_join": 3,
    
          "filtered": "100.00",
    
          "using_index": true,
    
          "cost_info": {
    
            "read_cost": "1.02",
    
            "eval_cost": "0.60",
    
            "prefix_cost": "1.62",
    
            "data_read_per_join": "1K"
    
          },
    
          "used_columns": [
    
            "last_name",
    
            "first_name"
    
          ]
    
        }
    
      }
    
    }
    
    1 row in set, 1 warning (0.00 sec)
    
    
    
    root[test]#> flush status;
    
    Query OK, 0 rows affected (0.02 sec)
    
    
    
    root[test]#> pager cat -> /dev/null; select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' \G
    
    PAGER set to 'cat -> /dev/null'
    
    3 rows in set (0.00 sec)
    
    
    
    root[test]#> nopager; show status like 'Handler_read%';
    
    PAGER set to stdout
    
    +-----------------------+-------+
    
    | Variable_name         | Value |
    
    +-----------------------+-------+
    
    | Handler_read_first    | 0 |
    
    | Handler_read_key      | 1 |
    
    | Handler_read_last     | 0 |
    
    | Handler_read_next     | 3 |
    
    | Handler_read_prev     | 0 |
    
    | Handler_read_rnd      | 0 |
    
    | Handler_read_rnd_next | 0     |
    
    +-----------------------+-------+
    
    7 rows in set (0.00 sec)

    The result reveals that it's, in fact, using a covering index i.e "using_index": true and uses indexes properly, i.e. Handler_read_key is incremented and does an index scan as Handler_read_next is incremented.

    Now, let's try using prefix index of the same approach,

    root[test]#> create index name on users_account(last_name(5), first_name(5));
    
    Query OK, 0 rows affected (0.22 sec)
    
    Records: 0  Duplicates: 0  Warnings: 0
    
    
    
    root[test]#>  EXPLAIN format=json select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' \G
    
    *************************** 1. row ***************************
    
    EXPLAIN: {
    
      "query_block": {
    
        "select_id": 1,
    
        "cost_info": {
    
          "query_cost": "3.60"
    
        },
    
        "table": {
    
          "table_name": "users_account",
    
          "access_type": "ref",
    
          "possible_keys": [
    
            "name"
    
          ],
    
          "key": "name",
    
          "used_key_parts": [
    
            "last_name",
    
            "first_name"
    
          ],
    
          "key_length": "10",
    
          "ref": [
    
            "const",
    
            "const"
    
          ],
    
          "rows_examined_per_scan": 3,
    
          "rows_produced_per_join": 3,
    
          "filtered": "100.00",
    
          "cost_info": {
    
            "read_cost": "3.00",
    
            "eval_cost": "0.60",
    
            "prefix_cost": "3.60",
    
            "data_read_per_join": "1K"
    
          },
    
          "used_columns": [
    
            "last_name",
    
            "first_name"
    
          ],
    
          "attached_condition": "((`test`.`users_account`.`first_name` = 'Maximus Aleksandre') and (`test`.`users_account`.`last_name` = 'Namuag'))"
    
        }
    
      }
    
    }
    
    1 row in set, 1 warning (0.00 sec)
    
    
    
    root[test]#> flush status;
    
    Query OK, 0 rows affected (0.01 sec)
    
    
    
    root[test]#> pager cat -> /dev/null; select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' \G
    
    PAGER set to 'cat -> /dev/null'
    
    3 rows in set (0.00 sec)
    
    
    
    root[test]#> nopager; show status like 'Handler_read%';
    
    PAGER set to stdout
    
    +-----------------------+-------+
    
    | Variable_name         | Value |
    
    +-----------------------+-------+
    
    | Handler_read_first    | 0 |
    
    | Handler_read_key      | 1 |
    
    | Handler_read_last     | 0 |
    
    | Handler_read_next     | 3 |
    
    | Handler_read_prev     | 0 |
    
    | Handler_read_rnd      | 0 |
    
    | Handler_read_rnd_next | 0     |
    
    +-----------------------+-------+
    
    7 rows in set (0.00 sec)

    MySQL reveals that it does use index properly but noticeably, there's a cost overhead compared to a full-length index. That's obvious and explainable, since the prefix index does not cover the whole length of the field values. Using a prefix index is not a replacement, nor an alternative, of full-length indexing. It can also create poor results when using the prefix index inappropriately. So you need to determine what type of query and data you need to retrieve.

    Covering Indexes

    Covering Indexes doesn't require any special syntax in MySQL. A covering index in InnoDB refers to the case when all fields selected in a query are covered by an index. It does not need to do a sequential read over the disk to read the data in the table but only use the data in the index, significantly speeding up the query. For example, our query earlier i.e. 

    select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' \G

    As mentioned earlier, is a covering index. When you have a very well-planned tables upon storing your data and created index properly, try to make as possible that your queries are designed to leverage covering index so that you'll benefit the result. This can help you maximize the efficiency of your queries and result to a great performance.

    Leverage Tools That Offer Advisors or Query Performance Monitoring

    Organizations often initially tend to go first on github and find open-source software that can offer great benefits. For simple advisories that helps you optimize your queries, you can leverage the Percona Toolkit. For a MySQL DBA, the Percona Toolkit is like a swiss army knife. 

    For operations, you need to analyze how you are using your indexes, you can use pt-index-usage

    Pt-query-digest is also available and it can analyze MySQL queries from logs, processlist, and tcpdump. In fact, the most important tool that you have to use for analyzing and inspecting bad queries is pt-query-digest. Use this tool to aggregate similar queries together and report on those that consume the most execution time.

    For archiving old records, you can use pt-archiver. Inspecting your database for duplicate indexes, take leverage on pt-duplicate-key-checker. You might also take advantage of pt-deadlock-logger. Although deadlocks is not a cause of an underperforming and inefficient query but a poor implementation, yet it impacts query inefficiency. If you need table maintenance and requires you to add indexes online without affecting the database traffic going to a particular table, then you can use pt-online-schema-change. Alternatively, you can use gh-ost, which is also very useful for schema migrations.

    If you are looking for enterprise features, bundled with lots of features from query performance and monitoring, alarms and alerts, dashboards or metrics that helps you optimize your queries, and advisors, ClusterControl may be the tool for you. ClusterControl offers many features that show you Top Queries, Running Queries, and Query Outliers. Checkout this blog MySQL Query Performance Tuning which guides you how to be on par for monitoring your queries with ClusterControl.

    Conclusion

    As you've arrived at the ending part of our two-series blog. We covered here the factors that cause query degradation and how to resolve it in order to maximize your database queries. We also shared some tools that can benefit you and help solve your problems.

     

    by Paul Namuag at December 20, 2019 10:45 AM

    December 19, 2019

    MariaDB Foundation

    Five Cities in India

    A trip to Chennai, Bengaluru, Hyderabad, Pune, and Mumbai taught MariaDB Foundation the importance of India. Government and Fintech lead the pack. India has a huge supply of highly educated IT specialists, and their decision power in selecting tools (including databases) is growing. […]

    The post Five Cities in India appeared first on MariaDB.org.

    by Kaj Arnö at December 19, 2019 04:51 PM

    SeveralNines

    Maximizing Database Query Efficiency for MySQL - Part One

    Slow queries, inefficient queries, or long running queries are problems that regularly plague DBA's. They are always ubiquitous, yet are an inevitable part of life for anyone responsible for managing a database. 

    Poor database design can affect the efficiency of the query and its performance. Lack of knowledge or improper use of function calls, stored procedures, or routines can also cause database performance degradation and can even harm the entire MySQL database cluster

    For a master-slave replication, a very common cause of these issues are tables which lack primary or secondary indexes. This causes slave lag which can last for a very long time (in a worse case scenario).

    In this two-part series blog, we'll give you a refresher course on how to tackle the maximizing of your database queries in MySQL to driver better efficiency and performance.

    Always Add a Unique Index To Your Table

    Tables that do not have primary or unique keys typically create huge problems when data gets bigger. When this happens a simple data modification can stall the database. Lack of proper indices and an UPDATE or DELETE statement has been applied to the particular table, a full table scan will be chosen as the query plan by MySQL. That can cause high disk I/O for reads and writes and degrades the performance of your database. See an example below:

    root[test]> show create table sbtest2\G
    
    *************************** 1. row ***************************
    
           Table: sbtest2
    
    Create Table: CREATE TABLE `sbtest2` (
    
      `id` int(10) unsigned NOT NULL,
    
      `k` int(10) unsigned NOT NULL DEFAULT '0',
    
      `c` char(120) NOT NULL DEFAULT '',
    
      `pad` char(60) NOT NULL DEFAULT ''
    
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    
    1 row in set (0.00 sec)
    
    
    
    root[test]> explain extended update sbtest2 set k=52, pad="xx234xh1jdkHdj234" where id=57;
    
    +----+-------------+---------+------------+------+---------------+------+---------+------+---------+----------+-------------+
    
    | id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref | rows | filtered | Extra       |
    
    +----+-------------+---------+------------+------+---------------+------+---------+------+---------+----------+-------------+
    
    |  1 | UPDATE      | sbtest2 | NULL       | ALL | NULL | NULL | NULL    | NULL | 1923216 | 100.00 | Using where |
    
    +----+-------------+---------+------------+------+---------------+------+---------+------+---------+----------+-------------+
    
    1 row in set, 1 warning (0.06 sec)

    Whereas a table with primary key has a very good query plan,

    root[test]> show create table sbtest3\G
    
    *************************** 1. row ***************************
    
           Table: sbtest3
    
    Create Table: CREATE TABLE `sbtest3` (
    
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    
      `k` int(10) unsigned NOT NULL DEFAULT '0',
    
      `c` char(120) NOT NULL DEFAULT '',
    
      `pad` char(60) NOT NULL DEFAULT '',
    
      PRIMARY KEY (`id`),
    
      KEY `k` (`k`)
    
    ) ENGINE=InnoDB AUTO_INCREMENT=2097121 DEFAULT CHARSET=latin1
    
    1 row in set (0.00 sec)
    
    
    
    root[test]> explain extended update sbtest3 set k=52, pad="xx234xh1jdkHdj234" where id=57;
    
    +----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
    
    | id | select_type | table   | partitions | type | possible_keys | key     | key_len | ref | rows | filtered | Extra   |
    
    +----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
    
    |  1 | UPDATE      | sbtest3 | NULL       | range | PRIMARY | PRIMARY | 4       | const | 1 | 100.00 | Using where |
    
    +----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
    
    1 row in set, 1 warning (0.00 sec)

    Primary or unique keys provides vital component for a table structure because this is very important especially when performing maintenance on a table. For example, using tools from the Percona Toolkit (such as pt-online-schema-change or pt-table-sync) recommends that you must have unique keys. Keep in mind that the PRIMARY KEY is already a unique key and a primary key cannot hold NULL values but unique key. Assigning a NULL value to a Primary Key can cause an error like,

    ERROR 1171 (42000): All parts of a PRIMARY KEY must be NOT NULL; if you need NULL in a key, use UNIQUE instead

    For slave nodes, it is also common that in certain occasions, the primary/unique key is not present on the table which therefore are discrepancy of the table structure. You can use mysqldiff to achieve this or you can mysqldump --no-data … params and and run a diff to compare its table structure and check if there's any discrepancy. 

    Scan Tables With Duplicate Indexes, Then Dropped It

    Duplicate indices can also cause performance degradation, especially when the table contains a huge number of records. MySQL has to perform multiple attempts to optimize the query and performs more query plans to check. It includes scanning large index distribution or statistics and that adds performance overhead as it can cause memory contention or high I/O memory utilization.

    Degradation for queries when duplicate indices are observed on a table also attributes on saturating the buffer pool. This can also affect the performance of MySQL when the checkpointing flushes the transaction logs into the disk. This is due to the processing and storing of an unwanted index (which is in fact a waste of space in the particular tablespace of that table). Take note that duplicate indices are also stored in the tablespace which also has to be stored in the buffer pool. 

    Take a look at the table below which contains multiple duplicate keys:

    root[test]#> show create table sbtest3\G
    
    *************************** 1. row ***************************
    
           Table: sbtest3
    
    Create Table: CREATE TABLE `sbtest3` (
    
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    
      `k` int(10) unsigned NOT NULL DEFAULT '0',
    
      `c` char(120) NOT NULL DEFAULT '',
    
      `pad` char(60) NOT NULL DEFAULT '',
    
      PRIMARY KEY (`id`),
    
      KEY `k` (`k`,`pad`,`c`),
    
      KEY `kcp2` (`id`,`k`,`c`,`pad`),
    
      KEY `kcp` (`k`,`c`,`pad`),
    
      KEY `pck` (`pad`,`c`,`id`,`k`)
    
    ) ENGINE=InnoDB AUTO_INCREMENT=2048561 DEFAULT CHARSET=latin1
    
    1 row in set (0.00 sec)

    and has a size of 2.3GiB

    root[test]#> \! du -hs /var/lib/mysql/test/sbtest3.ibd
    
    2.3G    /var/lib/mysql/test/sbtest3.ibd

    Let's drop the duplicate indices and rebuild the table with a no-op alter,

    root[test]#> drop index kcp2 on sbtest3; drop index kcp on sbtest3 drop index pck on sbtest3;
    
    Query OK, 0 rows affected (0.01 sec)
    
    Records: 0  Duplicates: 0  Warnings: 0
    
    Query OK, 0 rows affected (0.01 sec)
    
    Records: 0  Duplicates: 0  Warnings: 0
    
    Query OK, 0 rows affected (0.01 sec)
    
    Records: 0  Duplicates: 0  Warnings: 0
    
    
    
    root[test]#> alter table sbtest3 engine=innodb;
    
    Query OK, 0 rows affected (28.23 sec)
    
    Records: 0  Duplicates: 0  Warnings: 0
    
    
    
    root[test]#> \! du -hs /var/lib/mysql/test/sbtest3.ibd
    
    945M    /var/lib/mysql/test/sbtest3.ibd

    It has been able to save up to ~59% of the old size of the table space which is really huge.

    To determine duplicate indexes, you can use pt-duplicate-checker to handle the job for you. 

    Tune Up your Buffer Pool

    For this section I’m referring only to the InnoDB storage engine. 

    Buffer pool is an important component within the InnoDB kernel space. This is where InnoDB caches table and index data when accessed. It speeds up processing because frequently used data are being stored in the memory efficiently using BTREE. For instance, If you have multiple tables consisting of >= 100GiB and are accessed heavily, then we suggest that you delegate a fast volatile memory starting from a size of 128GiB and start assigning the buffer pool with 80% of the physical memory. The 80% has to be monitored efficiently. You can use SHOW ENGINE INNODB STATUS \G or you can leverage monitoring software such as ClusterControl which offers a fine-grained monitoring which includes buffer pool and its relevant health metrics. Also set the innodb_buffer_pool_instances variable accordingly. You might set this larger than 8 (default if innodb_buffer_pool_size >= 1GiB), such as 16, 24, 32, or 64 or higher if necessary.  

    When monitoring the buffer pool, you need to check global status variable Innodb_buffer_pool_pages_free which provides you thoughts if there's a need to adjust the buffer pool, or maybe consider if there are also unwanted or duplicate indexes that consumes the buffer. The SHOW ENGINE INNODB STATUS \G also offers a more detailed aspect of the buffer pool information including its individual buffer pool based on the number of innodb_buffer_pool_instances you have set.

    Use FULLTEXT Indexes (But Only If Applicable)

    Using queries like,

    SELECT bookid, page, context FROM books WHERE context like '%for dummies%';

    wherein context is a string-type (char, varchar, text) column, is an example of a super bad query! Pulling large content of records with a filter that has to be greedy ends up with a full table scan, and that is just crazy. Consider using FULLTEXT index. A FULLTEXT indexes have an inverted index design. Inverted indexes store a list of words, and for each word, a list of documents that the word appears in. To support proximity search, position information for each word is also stored, as a byte offset.

    In order to use FULLTEXT for searching or filtering data, you need to use the combination of MATCH() ...AGAINST syntax and not like the query above. Of course, you need to specify the field to be your FULLTEXT index field. 

    To create a FULLTEXT index, just specify with FULLTEXT as your index. See the example below:

    root[minime]#> CREATE FULLTEXT INDEX aboutme_fts ON users_info(aboutme);
    
    Query OK, 0 rows affected, 1 warning (0.49 sec)
    
    Records: 0  Duplicates: 0  Warnings: 1
    
    
    
    root[jbmrcd_date]#> show warnings;
    
    +---------+------+--------------------------------------------------+
    
    | Level   | Code | Message                                          |
    
    +---------+------+--------------------------------------------------+
    
    | Warning |  124 | InnoDB rebuilding table to add column FTS_DOC_ID |
    
    +---------+------+--------------------------------------------------+
    
    1 row in set (0.00 sec)

    Although using FULLTEXT indexes can offer benefits when searching words within a very large context inside a column, it also creates issues when used incorrectly. 

    When doing a FULLTEXT search for a large table that is constantly accessed (where a number of client requests are searching for different,  unique keywords) it could be very CPU intensive. 

    There are certain occasions as well that FULLTEXT is not applicable. See this external blog post. Although I haven't tried this with 8.0, I don't see any changes relevant to this. We suggest that do not use FULLTEXT for searching a big data environment, especially for high-traffic tables. Otherwise, try to leverage other technologies such as Apache Lucene, Apache Solr, tsearch2, or Sphinx.

    Avoid Using NULL in Columns

    Columns that contain null values are totally fine in MySQL. But if you are using columns with null values into an index, it can affect query performance as the optimizer cannot provide the right query plan due to poor index distribution. However, there are certain ways to optimize queries that involves null values but of course, if this suits the requirements. Please check the documentation of MySQL about Null Optimization. You may also check this external post which is helpful as well.

    Design Your Schema Topology and Tables Structure Efficiently

    To some extent, normalizing your database tables from 1NF (First Normal Form) to 3NF (Third Normal Form) provides you some benefit for query efficiency because normalized tables tend to avoid redundant records. A proper planning and design for your tables is very important because this is how you retrieved or pull data and in every one of these actions has a cost. With normalized tables, the goal of the database is to ensure that every non-key column in every table is directly dependent on the key; the whole key and nothing but the key. If this goal is reached, it pays of the benefits in the form of reduced redundancies, fewer anomalies and improved efficiencies.

    While normalizing your tables has many benefits, it doesn't mean you need to normalize all your tables in this way. You can implement a design for your database using Star Schema. Designing your tables using Star Schema has the benefit of simpler queries (avoid complex cross joins), easy to retrieve data for reporting, offers performance gains because there's no need to use unions or complex joins, or fast aggregations. A Star Schema is simple to implement, but you need to carefully plan because it can create big problems and disadvantages when your table gets bigger and requires maintenance. Star Schema (and its underlying tables) are prone to data integrity issues, so you may have a high probability that bunch of your data is redundant. If you think this table has to be constant (structure and design) and is designed to utilize query efficiency, then it's an ideal case for this approach.

    Mixing your database designs (as long as you are able to determine and identify what kind of data has to be pulled on your tables) is very important since you can benefit with more efficient queries and as well as help the DBA with backups, maintenance, and recovery.

    Get Rid of Constant and Old Data

    We recently wrote some Best Practices for Archiving Your Database in the Cloud. It covers about how you can take advantage of data archiving before it goes to the cloud. So how does getting rid of old data or archiving your constant and old data help query efficiency? As stated in my previous blog, there are benefits for larger tables that are constantly modified and inserted with new data, the tablespace can grow quickly. MySQL and InnoDB performs efficiently when records or data are contiguous to each other and has significance to its next row in the table. Meaning, if you have no old records that are no longer need to be used, then the optimizer does not need to include that in the statistics offering much more efficient result. Make sense, right? And also, query efficiency is not only on the application side, it has also need to consider its efficiency when performing a backup and when on maintenance or failover. For example, if you have a bad and long query that can affect your maintenance period or a failover, that can be a problem.

    Enable Query Logging As Needed

    Always set your MySQL's slow query log in accordance to your custom needs. If you are using Percona Server, you can take advantage of their extended slow query logging. It allows you to customarily define certain variables. You can filter types of queries in combination such as full_scan, full_join, tmp_table, etc. You can also dictate the rate of slow query logging through variable log_slow_rate_type, and many others.

    The importance of enabling query logging in MySQL (such as slow query) is beneficial for inspecting your queries so that you can optimize or tune your MySQL by adjusting certain variables that suits to your requirements. To enable slow query log, ensure that these variables are setup:

    • long_query_time - assign the right value for how long the queries can take. If the queries take more than 10 seconds (default), it will fall down to the slow query log file you assigned.
    • slow_query_log - to enable it, set it to 1.
    • slow_query_log_file - this is the destination path for your slow query log file.

    The slow query log is very helpful for query analysis and diagnosing bad queries that cause stalls, slave delays, long running queries, memory or CPU intensive, or even cause the server to crash. If you use pt-query-digest or pt-index-usage, use the slow query log file as your source target for reporting these queries alike.

    Conclusion

    We have discussed some ways you can use to maximize database query efficiency in this blog. In this next part we'll discuss even more factors which can help you maximize performance. Stay tuned!

     

    by Paul Namuag at December 19, 2019 10:45 AM

    December 18, 2019

    MariaDB Foundation

    Shanghai MariaDB Unconference Nov 2019

    In our quest to promote development of MariaDB Server, what we have come to call Unconferences form a key part. These developer meetings have traditionally been organised twice a year, and 2019 is no exception. […]

    The post Shanghai MariaDB Unconference Nov 2019 appeared first on MariaDB.org.

    by Kaj Arnö at December 18, 2019 01:40 PM

    SeveralNines

    ClusterControl CMON HA for Distributed Database High Availability - Part Two (GUI Access Setup)

    In the first part, we ended up with a working cmon HA cluster:

    root@vagrant:~# s9s controller --list --long
    
    S VERSION    OWNER GROUP NAME            IP PORT COMMENT
    
    l 1.7.4.3565 system admins 10.0.0.101      10.0.0.101 9501 Acting as leader.
    
    f 1.7.4.3565 system admins 10.0.0.102      10.0.0.102 9501 Accepting heartbeats.
    
    f 1.7.4.3565 system admins 10.0.0.103      10.0.0.103 9501 Accepting heartbeats.
    
    Total: 3 controller(s)

    We have three nodes up and running, one is acting as a leader and remaining are followers, which are accessible (they do receive heartbeats and reply to them). The remaining challenge is to configure UI access in a way that will allow us to always access the UI on the leader node. In this blog post we will present one of the possible solutions which will allow you to accomplish just that.

    Setting up HAProxy

    This problem is not new to us. With every replication cluster, MySQL or PostgreSQL, it doesn’t matter,  there’s a single node where we should send our writes to. One way of accomplishing that would be to use HAProxy and add some external checks that test the state of the node, and based on that, return proper values. This is basically what we are going to use to solve our problem. We will use HAProxy as a well-tested layer 4 proxy and we will combine it with layer 7 HTTP checks that we will write precisely for our use case. First things first, let’s install HAProxy. We will collocate it with ClusterControl, but it can as well be installed on a separate node (ideally, nodes - to remove HAProxy as the single point of failure).

    apt install haproxy

    This sets up HAProxy. Once it’s done, we have to introduce our configuration:

    global
    
            pidfile /var/run/haproxy.pid
    
            daemon
    
            user haproxy
    
            group haproxy
    
            stats socket /var/run/haproxy.socket user haproxy group haproxy mode 600 level admin
    
            node haproxy_10.0.0.101
    
            description haproxy server
    
    
    
            #* Performance Tuning
    
            maxconn 8192
    
            spread-checks 3
    
            quiet
    
    defaults
    
            #log    global
    
            mode    tcp
    
            option  dontlognull
    
            option tcp-smart-accept
    
            option tcp-smart-connect
    
            #option dontlog-normal
    
            retries 3
    
            option redispatch
    
            maxconn 8192
    
            timeout check   10s
    
            timeout queue   3500ms
    
            timeout connect 3500ms
    
            timeout client  10800s
    
            timeout server  10800s
    
    
    
    userlist STATSUSERS
    
            group admin users admin
    
            user admin insecure-password admin
    
            user stats insecure-password admin
    
    
    
    listen admin_page
    
            bind *:9600
    
            mode http
    
            stats enable
    
            stats refresh 60s
    
            stats uri /
    
            acl AuthOkay_ReadOnly http_auth(STATSUSERS)
    
            acl AuthOkay_Admin http_auth_group(STATSUSERS) admin
    
            stats http-request auth realm admin_page unless AuthOkay_ReadOnly
    
            #stats admin if AuthOkay_Admin
    
    
    
    listen  haproxy_10.0.0.101_81
    
            bind *:81
    
            mode tcp
    
            tcp-check connect port 80
    
            timeout client  10800s
    
            timeout server  10800s
    
            balance leastconn
    
            option httpchk
    
    #        option allbackups
    
            default-server port 9201 inter 20s downinter 30s rise 2 fall 2 slowstart 60s maxconn 64 maxqueue 128 weight 100
    
            server 10.0.0.101 10.0.0.101:443 check
    
            server 10.0.0.102 10.0.0.102:443 check
    
            server 10.0.0.103 10.0.0.103:443 check

    You may want to change some of the things here like the node or backend names which include here the IP of our node. You will definitely want to change servers that you are going to have included in your HAProxy.

    The most important bits are:

            bind *:81

    HAProxy will listen on port 81.

            option httpchk

    We have enabled layer 7 check on the backend nodes.

            default-server port 9201 inter 20s downinter 30s rise 2 fall 2 slowstart 60s maxconn 64 maxqueue 128 weight 100

    The layer 7 check will be executed on port 9201.

    Once this is done, start HAProxy.

    Setting up xinetd and Check Script

    We are going to use xinetd to execute the check and return correct responses to HAProxy. Steps described in this paragraph should be executed on all cmon HA cluster nodes.

    First, install xinetd:

    root@vagrant:~# apt install xinetd

    Once this is done, we have to add the following line:

    cmonhachk       9201/tcp

    to /etc/services - this will allow xinetd to open a service that will listen on port 9201. Then we have to add the service file itself. It should be located in /etc/xinetd.d/cmonhachk:

    # default: on
    
    # description: cmonhachk
    
    service cmonhachk
    
    {
    
            flags           = REUSE
    
            socket_type     = stream
    
            port            = 9201
    
            wait            = no
    
            user            = root
    
            server          = /usr/local/sbin/cmonhachk.py
    
            log_on_failure  += USERID
    
            disable         = no
    
            #only_from       = 0.0.0.0/0
    
            only_from       = 0.0.0.0/0
    
            per_source      = UNLIMITED
    
    }

    Finally, we need the check script that’s called by the xinetd. As defined in the service file it is located in /usr/local/sbin/cmonhachk.py.

    #!/usr/bin/python3.5
    
    
    
    import subprocess
    
    import re
    
    import sys
    
    from pathlib import Path
    
    import os
    
    
    
    def ret_leader():
    
        leader_str = """HTTP/1.1 200 OK\r\n
    
    Content-Type: text/html\r\n
    
    Content-Length: 48\r\n
    
    \r\n
    
    <html><body>This node is a leader.</body></html>\r\n
    
    \r\n"""
    
        print(leader_str)
    
    
    
    def ret_follower():
    
        follower_str = """
    
    HTTP/1.1 503 Service Unavailable\r\n
    
    Content-Type: text/html\r\n
    
    Content-Length: 50\r\n
    
    \r\n
    
    <html><body>This node is a follower.</body></html>\r\n
    
    \r\n"""
    
        print(follower_str)
    
    
    
    def ret_unknown():
    
        unknown_str = """
    
    HTTP/1.1 503 Service Unavailable\r\n
    
    Content-Type: text/html\r\n
    
    Content-Length: 59\r\n
    
    \r\n
    
    <html><body>This node is in an unknown state.</body></html>\r\n
    
    \r\n"""
    
        print(unknown_str)
    
    
    
    lockfile = "/tmp/cmonhachk_lockfile"
    
    
    
    if os.path.exists(lockfile):
    
        print("Lock file {} exists, exiting...".format(lockfile))
    
        sys.exit(1)
    
    
    
    Path(lockfile).touch()
    
    try:
    
        with open("/etc/default/cmon", 'r') as f:
    
            lines  = f.readlines()
    
    
    
        pattern1 = "RPC_BIND_ADDRESSES"
    
        pattern2 = "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"
    
        m1 = re.compile(pattern1)
    
        m2 = re.compile(pattern2)
    
    
    
        for line in lines:
    
            res1 = m1.match(line)
    
            if res1 is not None:
    
                res2 = m2.findall(line)
    
                i = 0
    
                for r in res2:
    
                    if r != "127.0.0.1" and i == 0:
    
                        i += 1
    
                        hostname = r
    
    
    
        command = "s9s controller --list --long | grep {}".format(hostname)
    
        output = subprocess.check_output(command.split())
    
        state = output.splitlines()[1].decode('UTF-8')[0]
    
        if state == "l":
    
            ret_leader()
    
        if state == "f":
    
            ret_follower()
    
        else:
    
            ret_unknown()
    
    finally:
    
        os.remove(lockfile)

    Once you create the file, make sure it is executable:

    chmod u+x /usr/local/sbin/cmonhachk.py

    The idea behind this script is that it tests the status of the nodes using “s9s controller --list --long” command and then it checks the output relevant to the IP that it can find on the local node. This allows the script to determine if the host on which it is executed is a leader or not. If the node is the leader, script returns “HTTP/1.1 200 OK” code, which HAProxy interprets as the node is available and routes the traffic to it.. Otherwise it returns “HTTP/1.1 503 Service Unavailable”, which is treated as a node, which is not healthy and the traffic will not be routed there. As a result, no matter which node will become a leader, HAProxy will detect it and mark it as available in the backend:

    You may need to restart HAProxy and xinetd to apply configuration changes before all the parts will start working correctly.

    Having more than one HAProxy ensures we have a way to access ClusterControl UI even if one of HAProxy nodes would fail but we still have two (or more) different hostnames or IP to connect to the ClusterControl UI. To make it more comfortable, we will deploy Keepalived on top of HAProxy. It will monitor the state of HAProxy services and assign Virtual IP to one of them. If that HAProxy would become unavailable, VIP will be moved to another available HAProxy. As a result, we’ll have a single point of entry (VIP or a hostname associated to it). The steps we’ll take here have to be executed on all of the nodes where HAProxy has been installed.

    First, let’s install keepalived:

    apt install keepalived

    Then we have to configure it. We’ll use following config file:

    vrrp_script chk_haproxy {
    
       script "killall -0 haproxy"   # verify the pid existance
    
       interval 2                    # check every 2 seconds
    
       weight 2                      # add 2 points of prio if OK
    
    }
    
    vrrp_instance VI_HAPROXY {
    
       interface eth1                # interface to monitor
    
       state MASTER
    
       virtual_router_id 51          # Assign one ID for this route
    
       priority 102                   
    
       unicast_src_ip 10.0.0.101
    
       unicast_peer {
    
          10.0.0.102
    
    10.0.0.103
    
    
    
       }
    
       virtual_ipaddress {
    
           10.0.0.130                        # the virtual IP
    
       } 
    
       track_script {
    
           chk_haproxy
    
       }
    
    #    notify /usr/local/bin/notify_keepalived.sh
    
    }

    You should modify this file on different nodes. IP addresses have to be configured properly and priority should be different on all of the nodes. Please also configure VIP that makes sense in your network. You may also want to change the interface - we used eth1, which is where the IP is assigned on virtual machines created by Vagrant.

    Start the keepalived with this configuration file and you should be good to go. As long as VIP is up on one HAProxy node, you should be able to use it to connect to the proper ClusterControl UI:

    This completes our two-part introduction to ClusterControl highly available clusters. As we stated at the beginning, this is still in beta state but we are looking forward for feedback from your tests.

    by krzysztof at December 18, 2019 10:45 AM

    December 17, 2019

    SeveralNines

    ClusterControl CMON HA for Distributed Database High Availability - Part One (Installation)

    High Availability is a paramount nowadays and there’s no better way to introduce high availability than to build it on top of the quorum-based cluster. Such cluster is able to easily handle failures of individual nodes and ensure that all nodes, which have disconnected from the cluster, will not continue to operate. There are several protocols that allow you to solve consensus issues, examples being Paxos or RAFT. You can always introduce your own code. 

    With this in mind, we would like to introduce you to CMON HA, a solution we created which allows to build highly available clusters of cmon daemons to achieve ClusterControl high availability. Please keep in mind this is a beta feature - it works but we are adding better debugging and more usability features. Having said that, let’s take a look at how it can be deployed, configured and accessed.

    Prerequisites

    CMON, the daemon that executes tasks in ClusterControl, works with a MySQL database to store some of the data - configuration settings, metrics, backup schedules and many others. In the typical setup this is a standalone MySQL instance. As we want to build highly available solution, we have to consider highly available database backend as well. One of the common solutions for that is MySQL Galera Cluster. As the installation scripts for ClusterControl sets up the standalone database, we have to deploy our Galera Cluster first, before we attempt to install highly available ClusterControl. What is the better way of deploying a Galera cluster than using ClusterControl? We will use temporary ClusterControl to deploy Galera on top of which we will deploy highly available version of ClusterControl.

    Deploying a MySQL Galera Cluster

    We won’t cover here the installation of the standalone ClusterControl. It’s as easy as downloading it for free and then following the steps you are provided with. Once it is ready, you can use the deployment wizard to deploy 3 nodes of Galera Cluster in couple of minutes.

    Pick the deployment option, you will be then presented with a deployment wizard.

    Define SSH connectivity details. You can use either root or password or passwordless sudo user. Make sure you correctly set SSH port and path to the SSH key.

    Then you should pick a vendor, version and few of the configuration details including server port and root password. Finally, define the nodes you want to deploy your cluster on. Once this is done, ClusterControl will deploy a Galera cluster on the nodes you picked. From now on you can as well remove this ClusterControl instance, it won’t be needed anymore.

    Deploying a Highly Available ClusterControl Installation

    We are going to start with one node, configure it to start the cluster and then we will proceed with adding additional nodes.

    Enabling Clustered Mode on the First Node

    What we want to do is to deploy a normal ClusterControl instance therefore we are going to proceed with typical installation steps. We can download the installation script and then run it. The main difference, compared to the steps we took when we installed a temporary ClusterControl to deploy Galera Cluster, is that in this case there is already existing MySQL database. Thus the script will detect it, ask if we want to use it and if so, request password for the superuser. Other than that, installation is basically the same.

    Next step would be to reconfigure cmon to listen not only on the localhost but also to bind to IP’s that can be accessed from outside. Communication between nodes in the cluster will happen on that IP on port (by default) 9501. We can accomplish this by editing file: /etc/default/cmon and adding IP to the RPC_BIND_ADDRESSES variable:

    RPC_BIND_ADDRESSES="127.0.0.1,10.0.0.101"

    Afterwards we have to restart cmon service:

    service cmon restart

    Following step will be to configure s9s CLI tools, which we will use to create and monitor cmon HA cluster. As per the documentation, those are the steps to take:

    wget http://repo.severalnines.com/s9s-tools/install-s9s-tools.sh
    
    chmod 755 install-s9s-tools.sh
    
    ./install-s9s-tools.sh

    Once we have s9s tools installed, we can enable the clustered mode for cmon:

    s9s controller --enable-cmon-ha

    We can then verify the state of the cluster:

    s9s controller --list --long
    
    S VERSION    OWNER GROUP NAME            IP PORT COMMENT
    
    l 1.7.4.3565 system admins 10.0.0.101      10.0.0.101 9501 Acting as leader.
    
    Total: 1 controller(s)

    As you can see, we have one node up and it is acting as a leader. Obviously, we need at least three nodes to be fault-tolerant therefore the next step will be to set up the remaining nodes.

    Enabling Clustered Mode on Remaining Nodes

    There are a couple of things we have to keep in mind while setting up additional nodes. First of all, ClusterControl creates tokens that “links” cmon daemon with clusters. That information is stored in several locations, including in the cmon database therefore we have to ensure every place contains the same token. Otherwise cmon nodes won’t be able to collect information about clusters and execute RPC calls. To do that we should copy existing configuration files from the first node to the other nodes. In this example we’ll use node with IP of 10.0.0.103 but you should do that for every node you plan to include in the cluster.

    We’ll start by copying the cmon configuration files to new node:

    scp -r /etc/cmon* 10.0.0.103:/etc/

    We may need to edit /etc/cmon.cnf and set the proper hostname:

    hostname=10.0.0.103

    Then we’ll proceed with regular installation of the cmon, just like we did on the first node. There is one main difference though. Script will detect configuration files and ask if we want to install the controller:

    => An existing Controller installation detected!
    
    => A re-installation of the Controller will overwrite the /etc/cmon.cnf file
    
    => Install the Controller? (y/N):

    We don’t want to do it for now. As on the first node we will be asked if we want to use existing MySQL database. We do want that. Then we’ll be asked to provide passwords:

    => Enter your MySQL root user's password:
    
    => Set a password for ClusterControl's MySQL user (cmon) [cmon]
    
    => Supported special characters: ~!@#$%^&*()_+{}<>?
    
    => Enter a CMON user password:
    
    => Enter the CMON user password again: => Creating the MySQL cmon user ...

    Please make sure you use exactly the same password for cmon user as you did on the first node.

    As the next step, we want to install s9s tools on new nodes:

    wget http://repo.severalnines.com/s9s-tools/install-s9s-tools.sh
    
    chmod 755 install-s9s-tools.sh
    
    ./install-s9s-tools.sh

    We want to have them configured exactly as on the first node thus we’ll copy the config:

    scp -r ~/.s9s/ 10.0.0.103:/root/
    
    scp /etc/s9s.conf 10.0.0.103:/etc/

    There’s one more place where we ClusterControl stores token: /var/www/clustercontrol/bootstrap.php. We want to copy that file as well:

    scp /var/www/clustercontrol/bootstrap.php 10.0.0.103:/var/www/clustercontrol/

    Finally, we want to install the controller (as we skipped this when we ran the installation script):

    apt install clustercontrol-controller

    Make sure you do not overwrite existing configuration files. Default options should be safe and leave correct configuration files in place.

    There is one more piece of configuration you may want to copy: /etc/default/cmon. You want to copy it to other nodes:

    scp /etc/default/cmon 10.0.0.103:/etc/default

    Then you want to edit RPC_BIND_ADDRESSES to point to a correct IP of the node.

    RPC_BIND_ADDRESSES="127.0.0.1,10.0.0.103"

    Then we can start the cmon service on the nodes, one by one, and see if they managed to join the cluster. If everything went well, you should see something like this:

    s9s controller --list --long
    
    S VERSION    OWNER GROUP NAME            IP PORT COMMENT
    
    l 1.7.4.3565 system admins 10.0.0.101      10.0.0.101 9501 Acting as leader.
    
    f 1.7.4.3565 system admins 10.0.0.102      10.0.0.102 9501 Accepting heartbeats.
    
    f 1.7.4.3565 system admins 10.0.0.103      10.0.0.103 9501 Accepting heartbeats.
    
    Total: 3 controller(s)

    In case of any issues, please check if all the cmon services are bound to the correct IP addresses. If not, kill them and start again, to re-read the proper configuration:

    root      8016 0.4 2.2 658616 17124 ?        Ssl 09:16 0:00 /usr/sbin/cmon --rpc-port=9500 --bind-addr='127.0.0.1,10.0.0.103' --events-client='http://127.0.0.1:9510' --cloud-service='http://127.0.0.1:9518'

    If you manage to see the output from ‘s9s controller --list --long’ as above, this means that, technically, we have a running cmon HA cluster of three nodes. We can end here but it’s not over yet. The main problem that remains is the UI access. Only leader node can execute jobs. Some of the s9s commands support this but as of now UI does not. This means that the UI will work only on the leader node, in our current situation it is the UI accessible via https://10.0.0.101/clustercontrol

    In the second part we will show you one of the ways in which you could solve this problem.

     

    by krzysztof at December 17, 2019 07:23 PM

    December 16, 2019

    SeveralNines

    PostgreSQL Database Monitoring: Tips for What to Monitor

    Once you have your database infrastructure up-and-running, you’ll need to keep tabs on what’s happening. Monitoring is a must if you want to be sure everything is going fine or if you might need to change something.

    For each database technology there are several things to monitor. Some of these are specific to the database engine or the vendor or even the specific version that you’re using.

    In this blog, we’ll take a look at what you need to monitor in a PostgreSQL environment.

    What to Monitor in PostgreSQL

    When monitoring a database cluster or node, there are two main things to take into account: the operating system and the database itself. You will need to define which metrics you are going to monitor from both sides and how you are going to do it. You need to monitor the metric always in the context of your system, and you should look for alterations on the behavior pattern.

    In most cases, you will need to use several tools (as it is nearly impossible to find one to cover all the desired metrics.) 

    Keep in mind that when one of your metrics is affected, it can also affect others, making troubleshooting of the issue more complex. Having a good monitoring and alerting system is important to making this task as simple as possible.

    Operating System Monitoring

    One important thing (which is common to all database engines and even to all systems) is to monitor the Operating System behavior. Here are some points to check here.

    CPU Usage

    Excessive percentage of CPU usage could be a problem if it’s not usual behavior. In this case, is important to identify the process/processes that are generating this issue. If the problem is the database process, you will need to check what is happening inside the database.

    RAM Memory or SWAP Usage

    If you’re seeing a high value for this metric and nothing had changed in your system, you probably need to check your database configuration. Parameters like shared_buffers and work_mem can affect this directly as they define the amount of memory to be able to use for the PostgreSQL database.

    Disk Usage

    An abnormal increase in the use of disk space or an excessive disk access consumption are important things to monitor as you could have a high number of errors logged in the PostgreSQL log file or a bad cache configuration that could generate an important disk access consumption instead of using memory to process the queries.

    Load Average

    It’s related to the three points mentioned above. A high load average could be generated by an excessive CPU, RAM or disk usage.

    Network

    A network issue can affect all the systems as the application can’t connect (or connect losing packages) to the database, so this is an important metric to monitor indeed. You can monitor latency or packet loss, and the main issue could be a network saturation, a hardware issue or just a bad network configuration.

    PostgreSQL Database Monitoring

    Monitoring your PostgreSQL database is not only important to see if you’re having an issue, but also to know if you need to change something to improve your database performance, that is probably one of the most important things to monitor in a database. Let’s see some metrics that are important for this.

    Query Monitoring

    By default, PostgreSQL is configured with compatibility and stability in mind, so you need to know your queries and his pattern, and configure your databases depending on the traffic that you have. Here, you can use the EXPLAIN command to check the query plan for a specific query, and you can also monitor the amount of SELECT, INSERT, UPDATE or DELETEs on each node. If you have a long query or a high number of queries running at the same time, that could be a problem for all the systems.

    Monitoring Active Sessions

    You should also monitor the number of active sessions. If you are near the limit, you need to check if something is wrong or if you just need to increment the max_connections value. The difference in the number can be an increase or decrease of connections. Bad usage of connection pooling, locking or network issue are the most common problems related to the number of connections.

    Database Locks

    If you have a query waiting for another query, you need to check if that another query is a normal process or something new. In some cases, if somebody is making an update on a big table, for example, this action can be affecting the normal behavior of your database, generating a high number of locks.

    Monitoring Replication

    The key metrics to monitor for replication are the lag and the replication state. The most common issues are networking issues, hardware resource issues, or under dimensioning issues. If you are facing a replication issue you will need to know this asap as you will need to fix it to ensure the high availability environment.

    Monitoring Backups

    Avoiding data loss is one of the basic DBA tasks, so you don’t only need to take the backup, you should know if the backup was completed, and if it’s usable. Usually, this last point is not taken into account, but it’s probably the most important check in a backup process.

    Monitoring Database Logs

    You should monitor your database log for errors like FATAL or deadlock, or even for common errors like authentication issues or long-running queries. Most of the errors are written in the log file with detailed useful information to fix it.

    Impact of Monitoring on PostgreSQL Database Performance

    While monitoring is a must, it’s not typically free. There is always a cost on the database performance, depending on how much you are monitoring, so you should avoid monitoring things that you won’t use.

    In general, there are two ways to monitor your databases, from the logs or from the database side by querying.

    In the case of logs, to be able to use them, you need to have a high logging level, which generates high disk access and it can affect the performance of your database.

    For the querying mode, each connection to the database uses resources, so depending on the activity of your database and the assigned resources, it may affect the performance too.

    PostgreSQL Monitoring Tools

    There are several tool options for monitoring your database. It can be a built-in PostgreSQL tool, like extensions, or some external tool. Let’s see some examples of these tools.

    Extensions

    • Pg_stat_statements: This extension will help you know the query profile of your database. It tracks all the queries that are executed and stores a lot of useful information in a table called pg_stat_statements. By querying this table you can get what queries are run in the system, how many times they have run, and how much time they have consumed, among other information.
    • Pgbadger: It’s a software that performs an analysis of PostgreSQL logs and displays them in an HTML file. It helps you to understand the behavior of your database and identify which queries need to be optimized. 
    • Pgstattuple: It can generate statistics for tables and indexes, showing how much space used by each table and index, is consumed by live tuples, deleted tuples or how much-unused space is available in each relation.
    • Pg_buffercache: With this, you can check what's happening in the shared buffer cache in real-time, showing how many pages are currently held in the cache.

    External Monitoring Tools

    • ClusterControl: It’s a management and monitoring system that helps to deploy, manage, monitor and scale your databases from a friendly interface. ClusterControl has support for the top open-source database technologies and you can automate many of the database tasks you have to perform regularly like adding and scaling new nodes, running backups and restores, and more.
    • Nagios: It’s an Open Source system and network monitoring application. It monitors hosts or services, and manage alerts for different states. With this tool, you can monitor network services, host resources, and more. For monitoring PostgreSQL, you can use some plugin or you can create your own script to check your database.
    • Zabbix: It’s a software that can monitor both networks and servers. It uses a flexible notification mechanism that allows users to configure alerts by email. It also offers reports and data visualization based on the stored data. All Zabbix reports and statistics, as well as configuration parameters, are accessed through a web interface.

    Dashboards

    Visibility is useful for fast issue detection. It’s definitely a more time-consuming task to read a command output than just watch a graph. So, the usage of a dashboard could be the difference between detecting a problem now or in the next 15 minutes, most sure that time could be really important for the company. For this task, tools like PMM or Vividcortex, among others, could be the key to add visibility to your database monitoring system.

    Percona Monitoring and Management (PMM): It’s an open-source platform for managing and monitoring your database performance. It provides thorough time-based analysis for MySQL, MariaDB, MongoDB, and PostgreSQL servers to ensure that your data works as efficiently as possible.

    VividCortex: It’s a cloud-hosted platform that provides deep database performance monitoring. It offers complete visibility into leading open source databases including MySQL, PostgreSQL, AWS Aurora, MongoDB, and Redis.

    Alerting

    Just monitoring a system doesn’t make sense if you don’t receive a notification about each issue. Without an alerting system, you should go to the monitoring tool to see if everything is fine, and it could be possible that you’re having a big issue since many hours ago. This alerting job could be done by using email alerts, text alerts or other tool integrations like slack.

    It's really difficult to find some tools to monitor all the necessary metrics for PostgreSQL, in general, you will need to use more than one and even some scripting will need to be made. One way to centralize the monitoring and alerting task is by using ClusterControl, which provides you with features like backup management, monitoring and alerting, deployment and scaling, automatic recovery and more important features to help you manage your databases. All these features on the same system.

    Monitoring Your PostgreSQL Database with ClusterControl

    ClusterControl allows you to monitor your servers in real-time. It has a predefined set of dashboards for you, to analyze some of the most common metrics. 

    It allows you to customize the graphs available in the cluster, and you can enable the agent-based monitoring to generate more detailed dashboards. 

    You can also create alerts, which inform you of events in your cluster, or integrate with different services such as PagerDuty or Slack.

    Also, you can check the query monitor section, where you can find the top queries, the running queries, queries outliers, and the queries statistics. 

    With these features, you can see how your PostgreSQL database is going.

    For backup management, ClusterControl centralizes it to protect, secure and recover your data, and with the verification backup feature, you can confirm if the backup is good to go.

    This verification backup job will restore the backup in a separate standalone host, so you can make sure that the backup is working.

    Monitoring with the ClusterControl Command Line

    For scripting and automating tasks, or even if you just prefer the command line, ClusterControl has the s9s tool. It's a command-line tool for managing your database cluster.

    Cluster List

    Node List

    You can perform all the tasks (and even more) from the ClusterControl UI, and you can integrate this feature with some external tools like slack, to manage it from there.

    Conclusion

    In this blog, we mentioned some important metrics to monitor in your PostgreSQL environment, and some tools to make your life easier by having your systems under control. You could also see how to use ClusterControl for this task.

    As you can see, monitoring is absolutely necessary, and the best way on how to do it depends on the infrastructure and the system itself. You should reach a balance between what do you need to monitor and how it affects your database performance.

     

    by Sebastian Insausti at December 16, 2019 08:03 PM

    Federico Razzoli

    MySQL/MariaDB: Using views to grant or deny row-level privileges

    Relational DBMSs allow to grant users permissions on certain tables or columns. Here we'll discuss how to restrict access to a certain set of rows.

    by Federico Razzoli at December 16, 2019 09:19 AM

    December 13, 2019

    SeveralNines

    Building a Monitoring Pipeline for InfluxDB with Sensu & Grafana

    Metrics are at the heart of any good monitoring strategy — including how to collect them, where to send them, and how you visualize that data. In this post, I’ll walk you through building your own open-source monitoring pipeline with Sensu, InfluxDB, and Grafana to monitor performance metrics (specifically, check output metric extraction). While I won’t go into step-by-step installation instructions for each of these tools, I’ll make sure to link out to the proper guides so you can follow along. 

    Checking Output Metric Extraction with Sensu

    Sensu is an open source monitoring solution that integrates with a wide ecosystem of complementary tooling (including InfluxDB and Grafana). Sensu Go is the latest and greatest version of Sensu — it’s designed to be more portable, faster to deploy, and (even more) friendly to containerized and ephemeral environments. To try it out (and get started quickly so you can follow along), download the Sensu sandbox. The Sensu sandbox comes pre-loaded with Sensu and related services up and running so you can skip the basic install steps and just focus on learning how to monitor performance metrics.

    Here’s what we’ll be doing with our metrics:

    Getting the Sandbox Setup

    First off, we’ll collect metrics using the aforementioned Sensu check output metric extraction. To get started, you’ll need to spin up your Sensu backend, agent, and CLI (sensuctl is our command-line tool for managing resources within Sensu — see this guide for more info). Below I’ll give the commands necessary to get things up and running inside the sandbox. If you aren’t using the sandbox you’ll be able to still follow along with some minor changes to your commands.

    Start up the sandbox:

    ENABLE_SENSU_SANDBOX_PORT_FORWARDING=1 vagrant up

    This will enable port forwarding for services running inside the sandbox so you can access them from the host machine.
    Enter the sandbox:

    vagrant ssh

    Start backend inside the sandbox:

    sudo systemctl start sensu-backend

    Start agent inside the sandbox:

    sudo systemctl start sensu-agent

    Configure CLI (if you’re not using the sandbox):

    sensuctl configure

    Confirm the Sensu agent is running

    sensuctl entity list

    You should see a listing for sensu-go-sandbox

    Collecting Metrics

    Now that you have Sensu running in the sandbox, it’s time to create a check and configure it to extract metrics. The script I’m using in our examples is system-profile-linux, which prints metrics in Graphite Plaintext Format, but Sensu supports several other formats. Another note worth calling out: the example command is only compatible with Linux, because that’s what the Sensu sandbox is using, but Sensu works with several operating systems (including OSX, Windows, and Docker). If you’re using another OS, you’ll have to adjust your check commands to make sure they’re compatible. The main thing we want is for the check command to print at least one metric in the specific output-metric-format

    First let’s add the system-profile-linux asset to our system by making use of sensuctl’s Bonsai integration (introduced in Sensu Go 5.14).

    sensuctl asset add sensu/system-profile-linux  

    We’ll be referencing that asset definition to ensure system-profile-linux asset is downloaded by the Sensu agent running the metrics collection check.

    sensuctl check create collect-metrics --command system-profile-linux \
    
    --interval 10 --subscriptions entity:sensu-go-sandbox \
    
    --output-metric-format graphite_plaintext \
    
    --runtime-assets sensu/system-profile-linux

    After the check executes, enter the following to make sure that the check passed with a 0 status. Since the metrics are not stored in Sensu, you can validate that the metrics have been extracted properly by using a debug handler  — check out this guide for an example.

    sensuctl event info sensu-go-sandbox collect-metrics --format json

    Transforming Metrics

    Now it’s time to handle the events we’ve received from our checks and metrics! I wrote the sensu-influxdb-handler to transform any metrics in a Sensu event to send to InfluxDB and it’s available in the Bonsai asset index here. Instead of downloading it manually and installing it into the sandbox, you can add it to our Sensu assets with sensuctl:

    sensuctl asset add sensu/sensu-influxdb-handler  

    And then reference that new asset in an InfluxDB handler definition:

    sensuctl handler create sensu-influxdb-handler --command \
    
    "sensu-influxdb-handler --addr http://localhost:8086 --username sensu \
    
    --password sandbox --db-name sensu --type pipe \
    --runtime-assets sensu/sensu-influxdb-handler"

    Now assign the handler to the check we created earlier:

    sensuctl check set-output-metric-handlers \
    
    collect-metrics sensu-influxdb-handler

    Recording Metrics

    In order to record all these metrics, you’ll want to have the InfluxDB daemon running on the configured address for the database and credentials recorded in your handler command (above).

    Note: the handler above is using the addr, username, password, and db-name configuration appropriate for the InfluxDB setup and running locally in the Sensu sandbox. If you want to route the metrics to a different InfluxDB database, just edit the handler command definition accordingly. The sandbox also comes with the influxd tool, so that you can easily query InfluxDB to make sure that the metrics were handled and recorded.

    Visualizing Metrics

    It’s time to visualize the data you’ve collected. If you are running Grafana outside of the sandbox, make sure to check the Grafana configuration file you are using has no port collisions with Sensu (such as port 3000), then start your Grafana server. Sensu sandbox users should have access to a running Grafana service all ready, accessible from the sandbox host at http://localhost:4002 (if you have enabled port forwarding when creating the sandbox).

    Don’t forget to customize your dashboard based on the output from your check. You can also use this dashboard configuration I created. You’ll also need to connect the InfluxDB data source in your Grafana dashboard — check out this guide to learn how. If all goes to plan, you should be able to see the metrics being collected, like in the example dashboards below.

    I hope this was a helpful guide to getting started building your own open-source monitoring pipeline. Questions, comments, or feedback? Find me on Twitter or in the Sensu Community. Thanks for reading, and happy monitoring! 

     

    by nikkiattea at December 13, 2019 10:45 AM

    December 12, 2019

    SeveralNines

    Basic Considerations for Taking a MongoDB Backup

    Database systems have a responsibility to store and ensure consistent availability of relevant data whenever needed at any time in operation. Most companies fail to continue with business after cases of data loss as a result of a database failure, security bridge, human error or catastrophic failure that may completely destroy the operating nodes in production. Keeping databases in the same data center puts one at high risk of losing all the data in case of these outrages.

    Replication and Backup are the commonly used ways of ensuring high availability of data. The selection between the two is dependent on how frequently the data is changing. Backup is best preferred where data is not changing more frequently and no expectation of having so many backup files. On the other end, replication is preferred for frequently changing data besides some other merits associated like serving data in a specific location by reducing the latency of requests. However, both replication and backup can be used for maximum data integrity and consistency during restoration in any case of failure.

    Database backups render more advantages besides providing a restoration point to providing basics for creating new environments for development, open access, and staging without tempering with production. The development team can quickly and easily test newly integrated features and accelerate their development. Backups can also be used to the checkpoint for code errors wherever the resulting data is not consistent.

    Considerations for Backing Up MongoDB

    Backups are created at certain points to reflect (acting as a snapshot of the database) what data the database hosts at that given moment. If the database fails at a given point, we can use the last backup file to roll back the DB to a point before it failed. However, one needs to take into consideration some factors before doing a recovery and they include:

    1. Recovery Point Objective
    2. Recovery Time Objective
    3. Database and Snapshot Isolation
    4. Complications with Sharding
    5. Restoration Process
    6. Performance Factors and Available Storage
    7. Flexibility
    8. Complexity of Deployment

    Recovery Point Objective

    This is carried out so as to determine how much data you are ready to lose during the backup and restoration process. For example, if we have user data and clickstream data, user data will be given priority over the clickstream analytics since the latter can be regenerated by monitoring operations in your application after restoration. A continuous backup should be preferred for critical data such as bank information, production industry data and communication systems information and should be carried out in close intervals. If the data does not change frequently, it may be less expensive to lose much of it if you do a restored snapshot of for example 6 months or 1 year earlier.

    Recovery Time Objective

    This is to analyze and determine how quickly the restoration operation can be done. During recovery, your applications will incur some downtime which is also directly proportional to the amount of data that needs to be recovered. If you are restoring a large set of data it will take longer.

    Database and Snapshot Isolation

    Isolation is a measure of how close backup snapshots are from the primary database servers in terms of logical configuration and physically. If they happen to be close enough, the recovery time reduces at the expense of increased likelihood of being destroyed at the same time the database is destroyed. It is not advisable to host backups and the production environment in the same system so as to avoid any disruption on the servers from mitigating into the backups too. 

    Complications with Sharding

    For a distributed database system through sharding, some backup complexity is presented and write activities may have to be paused across the whole system. Different shards will finish different types of backups at different times. Considering logical backups and Snapshot backups,

    Logical Backups

    • Shards are of different sizes hence will finish at different times
    • MongoDB-base dumps will ignore the --oplog hence won’t be consistent at each shard.
    • The balancer could be off while it is supposed to be on just because some shards maybe have not finished the restoration process

    Snapshot Backups

    • Works well for a single replica from versions 3.2 and later. You should, therefore, consider updating your MongoDB version.

    Restoration Process

    Some people carry out backups without testing if they will work in case of restoration. A backup in essence is to provide a restoration capability otherwise it will render to be useless. You should always try to run the backups at different test servers to see if they are working.

    Performance Factors and Available Storage

    Backups also tend to take many sizes as the data from the database itself and need to be compressed enough not to occupy a lot of unnecessary space that may cut the overall storage resources of the system. They can be archived into zip files hence reducing their overall sizes. Besides, as mentioned before, one can archive the backups in different datacenters from the database itself. 

    Backups may determine different performances of the database in that some could degrade it. In that case, continuous backups will render some setback hence should be converted to scheduled backups to avoid depletion of maintenance windows. In most cases, secondary servers are deployed to support backups. In this case:

    • Single nodes cannot be consistently backed up because MongoDB uses read-uncommitted without an oplog when using the mongodump command and in that case backups will not be safe.
    • Use secondary nodes for backups since the process itself takes time according to the amount of data involved and the applications connected will incur some downtime. If you use the primary which has to also update the Oplogs, then you may lose some data during that downtime
    • The restore process takes a lot of time but the storage resources assigned are tiny.

    Flexibility

    Many at times you may not want some of the data during backup, as for the example of Recovery Point Objective, one may want the recovery be done and filter out the user clicks data. To do so, you need a Partial backup strategy that will provide the flexibility to filter out the data that you won’t be interested in, hence reduce the recovery duration and resources that would have been wasted. Incremental backup can also be useful such that only data parts that have changed will be backed up from the last snapshot rather than taking entire backups for every snapshot.

    Complexity of Deployment

    Your backup strategy should be easy to set and maintain with time. You can also schedule your backups so that you don’t need to do them manually whenever you want to.

    Conclusion

    Database systems guarantee “life after death” if only there is well-established backup up system in place. The database could be destroyed by catastrophic factors, human error or security attacks that can lead to loss or corruption of data. Before doing a backup, one has to consider the type of data in terms of size and importance. It is always not advisable to keep your backups in the same data center as your database so as to reduce the likelihood of the backups being destroyed simultaneously. Backups may alter the performance of the database hence one should be careful about the strategy to use and when to carry out the backup. Do not carry out your backups on the primary node since it may result in system downtime during the backup and consequently loss important data.

    Tags: 

    by Onyancha Brian Henry at December 12, 2019 07:01 PM

    MariaDB Foundation

    MariaDB 10.4.11, 10.3.21 and 10.2.30 now available

    The MariaDB Foundation is pleased to announce the availability of MariaDB 10.4.11, MariaDB 10.3.21 and MariaDB 10.2.30, the latest stable releases in their respective series. […]

    The post MariaDB 10.4.11, 10.3.21 and 10.2.30 now available appeared first on MariaDB.org.

    by Ian Gilfillan at December 12, 2019 06:38 AM

    December 11, 2019

    SeveralNines

    Handling Replication Issues from non-GTID to GTID MariaDB Database Clusters

    We recently ran into an interesting customer support case involving a MariaDB replication setup. We spent a lot of time researching this problem and thought it would be worth sharing this with you in this blog post.

    Customer’s Environment Description

    The issue was as follows: an old (pre 10.x) MariaDB server was in use and an attempt was made to migrate data from it into more recent MariaDB replication setup. This resulted in issues with using Mariabackup to rebuild slaves in the new replication cluster. For the purpose of the tests we recreated this behavior in the following environment:

    The data has been migrated from 5.5 to 10.4 using mysqldump:

    mysqldump --single-transaction --master-data=2 --events --routines sbtest > /root/dump.sql

    This allowed us to collect master binary log coordinates and the consistent dump. As a result, we were able to provision MariaDB 10.4 master node and set up the replication between old 5.5 master and new 10.4 node. The traffic was still running on 5.5 node. 10.4 master was generating GTID’s as it had to replicate data to 10.4 slave. Before we dig into details, let's take a quick look into how GTID’s work in MariaDB.

    MariaDB and GTID

    For starters, MariaDB uses a different format of the GTID than Oracle MySQL. It consists of three numbers separated by dashes:

    0 - 1 - 345

    First is a replication domain, which allows for multi-source replication to be properly handled. This is not relevant to our case as all the nodes are in the same replication domain. Second number is the server ID of the node that generated the GTID. Third one is the sequence number - it monotonically increases with every event stored in the binary logs.

    MariaDB uses several variables to store the information about GTID’s executed on a given node. The most interesting for us are:

    Gtid_binlog_pos - as per the documentation, this variable is the GTID of the last event group written to the binary log.

    Gtid_slave_pos - as per the documentation, this system variable contains the GTID of the last transaction applied to the database by the server's slave threads.

    Gtid_current_pos - as per the documentation, this system variable contains the GTID of the last transaction applied to the database. If the server_id of the corresponding GTID in gtid_binlog_pos is equal to the servers own server_id, and the sequence number is higher than the corresponding GTID in gtid_slave_pos, then the GTID from gtid_binlog_pos will be used. Otherwise the GTID from gtid_slave_pos will be used for that domain.

    So, to make it clear, gtid_binlog_pos stores GTID of the last locally executed event. Gtid_slave_pos stores GTID of the event executed by the slave thread and gtid_current_pos shows either the value from gtid_binlog_pos, if it has the highest sequence number and it has server-id or gtid_slave_pos if it has the highest sequence. Please keep this in your mind.

    An Overview of the Issue

    The initial state of the relevant variables are on 10.4 master:

    MariaDB [(none)]> show global variables like '%gtid%';
    
    +-------------------------+----------+
    
    | Variable_name           | Value |
    
    +-------------------------+----------+
    
    | gtid_binlog_pos         | 0-1001-1 |
    
    | gtid_binlog_state       | 0-1001-1 |
    
    | gtid_cleanup_batch_size | 64       |
    
    | gtid_current_pos        | 0-1001-1 |
    
    | gtid_domain_id          | 0 |
    
    | gtid_ignore_duplicates  | ON |
    
    | gtid_pos_auto_engines   | |
    
    | gtid_slave_pos          | 0-1001-1 |
    
    | gtid_strict_mode        | ON |
    
    | wsrep_gtid_domain_id    | 0 |
    
    | wsrep_gtid_mode         | OFF |
    
    +-------------------------+----------+
    
    11 rows in set (0.001 sec)

    Please note gtid_slave_pos which, theoretically, doesn’t make sense - it came from the same node but via slave thread. This could happen if you make a master switch before. We did just that - having two 10.4 nodes we switched the masters from host with server ID of 1001 to host with server ID of 1002 and then back to 1001.

    Afterwards we configured the replication from 5.5 to 10.4 and this is how things looked like:

    MariaDB [(none)]> show global variables like '%gtid%';
    
    +-------------------------+-------------------------+
    
    | Variable_name           | Value |
    
    +-------------------------+-------------------------+
    
    | gtid_binlog_pos         | 0-55-117029 |
    
    | gtid_binlog_state       | 0-1001-1537,0-55-117029 |
    
    | gtid_cleanup_batch_size | 64                      |
    
    | gtid_current_pos        | 0-1001-1 |
    
    | gtid_domain_id          | 0 |
    
    | gtid_ignore_duplicates  | ON |
    
    | gtid_pos_auto_engines   | |
    
    | gtid_slave_pos          | 0-1001-1 |
    
    | gtid_strict_mode        | ON |
    
    | wsrep_gtid_domain_id    | 0 |
    
    | wsrep_gtid_mode         | OFF |
    
    +-------------------------+-------------------------+
    
    11 rows in set (0.000 sec)

    As you can see, the events replicated from MariaDB 5.5, they all have been accounted for in gtid_binlog_pos variable: all events with server ID of 55. This results in a serious issue. As you may remember, gtid_binlog_pos should contain events executed locally on the host. Here it contains events replicated from another server with different server ID.

    This makes things dicey when you want to rebuild the 10.4 slave, here’s why. Mariabackup, just like Xtrabackup, works in a simple way. It copies the files from the MariaDB server while scanning redo logs and storing any incoming transactions. When the files have been copied, Mariabackup would freeze the database using either FLUSH TABLES WITH READ LOCK or backup locks, depending on the MariaDB version and the availability of the backup locks. Then it reads the latest executed GTID and stores it alongside the backup. Then the lock is released and backup is completed. The GTID stored in the backup should be used as the latest executed GTID on a node. In case of rebuilding slaves it will be put as a gtid_slave_pos and then used to start the GTID replication. This GTID is taken from gtid_current_pos, which makes perfect sense - after all it is the “GTID of the last transaction applied to the database”. Acute reader can already see the problem. Let’s show the output of the variables when 10.4 replicates from the 5.5 master:

    MariaDB [(none)]> show global variables like '%gtid%';
    
    +-------------------------+-------------------------+
    
    | Variable_name           | Value |
    
    +-------------------------+-------------------------+
    
    | gtid_binlog_pos         | 0-55-117029 |
    
    | gtid_binlog_state       | 0-1001-1537,0-55-117029 |
    
    | gtid_cleanup_batch_size | 64                      |
    
    | gtid_current_pos        | 0-1001-1 |
    
    | gtid_domain_id          | 0 |
    
    | gtid_ignore_duplicates  | ON |
    
    | gtid_pos_auto_engines   | |
    
    | gtid_slave_pos          | 0-1001-1 |
    
    | gtid_strict_mode        | ON |
    
    | wsrep_gtid_domain_id    | 0 |
    
    | wsrep_gtid_mode         | OFF |
    
    +-------------------------+-------------------------+
    
    11 rows in set (0.000 sec)

    Gtid_current_pos is set to 0-1001-1. This is definitely not the correct moment in time, it’s taken from gtid_slave_pos while we have a bunch of transactions that came from 5.5 after that. The problem is that those transactions are stored as gtid_binlog_pos. On the other hand gtid_current_pos is calculated in a way that it requires local server ID for GTID’s in gitd_binlog_pos before they can be used as the gtid_current_pos. In our case they have the server ID of the 5.5 node so they will not be treated properly as events executed on the 10.4 master. After backup restore, if you’d set the slave according to the GTID state stored in the backup, it would end up re-applying all the events that came from 5.5. This, obviously, would break the replication.

    The Solution

    A solution to this problem is to take several additional steps:

    1. Stop the replication from 5.5 to 10.4. Run STOP SLAVE on 10.4 master
    2. Execute any transaction on 10.4 - CREATE SCHEMA IF NOT EXISTS bugfix - this will change the GTID situation like this:
    MariaDB [(none)]> show global variables like '%gtid%';
    
    +-------------------------+---------------------------+
    
    | Variable_name           | Value   |
    
    +-------------------------+---------------------------+
    
    | gtid_binlog_pos         | 0-1001-117122   |
    
    | gtid_binlog_state       | 0-55-117121,0-1001-117122 |
    
    | gtid_cleanup_batch_size | 64                        |
    
    | gtid_current_pos        | 0-1001-117122   |
    
    | gtid_domain_id          | 0   |
    
    | gtid_ignore_duplicates  | ON   |
    
    | gtid_pos_auto_engines   |   |
    
    | gtid_slave_pos          | 0-1001-1   |
    
    | gtid_strict_mode        | ON   |
    
    | wsrep_gtid_domain_id    | 0   |
    
    | wsrep_gtid_mode         | OFF   |
    
    +-------------------------+---------------------------+
    
    11 rows in set (0.001 sec)

    The latest GITD was executed locally, so it was stored as gtid_binlog_pos. As it has the local server ID, it’s picked as the gtid_current_pos. Now, you can take a backup and use it to rebuild slaves off 10.4 master. Once this is done, start the slave thread again.

    MariaDB is aware that this kind of bug exists, one of the relevant bug report we found is: https://jira.mariadb.org/browse/MDEV-10279 Unfortunately, there’s no fix so far. What we found is that this issue affects MariaDB up to 5.5. Non-GTID events that come from MariaDB 10.0 are correctly accounted on 10.4 as coming from the slave thread and gtid_slave_pos is properly updated. MariaDB 5.5 is quite an old one (even though it still supported) so you still may see setups running on it and attempts to migrate from 5.5 to more recent, GTID-enabled MariaDB versions. What’s worse, according to the bug report we found, this also affects replication coming from non-MariaDB (one of the comments mentions issue showing up on Percona Server 5.6) servers into MariaDB. 

    Anyway, we hope you found this blog post useful and hopefully you will not run into the problem we just described.

     

    by krzysztof at December 11, 2019 07:34 PM

    Shlomi Noach

    Quick hack for GTID_OWN lack

    One of the benefits of MySQL GTIDs is that each server remembers all GTID entries ever executed. Normally these would be ranges, e.g. 0041e600-f1be-11e9-9759-a0369f9435dc:1-3772242 or multi-ranges, e.g. 24a83cd3-e30c-11e9-b43d-121b89fcdde6:1-103775793, 2efbcca6-7ee1-11e8-b2d2-0270c2ed2e5a:1-356487160, 46346470-6561-11e9-9ab7-12aaa4484802:1-26301153, 757fdf0d-740e-11e8-b3f2-0a474bcf1734:1-192371670, d2f5e585-62f5-11e9-82a5-a0369f0ed504:1-10047.

    One of the common problems in asynchronous replication is the issue of consistent reads. I've just written to the master. Is the data available on a replica yet? We have iterated on this, from reading on master, to heuristically finding up-to-date replicas based on heartbeats (see presentation and slides) via freno, and now settled, on some parts of our apps, to using GTID.

    GTIDs are reliable as any replica can give you a definitive answer to the question: have you applied a given transaction or not?. Given a GTID entry, say f7b781a9-cbbd-11e9-affb-008cfa542442:12345, one may query for the following on a replica:

    mysql> select gtid_subset('f7b781a9-cbbd-11e9-affb-008cfa542442:12345', @@global.gtid_executed) as transaction_found;
    +-------------------+
    | transaction_found |
    +-------------------+
    |                 1 |
    +-------------------+
    
    mysql> select gtid_subset('f7b781a9-cbbd-11e9-affb-008cfa542442:123450000', @@global.gtid_executed) as transaction_found;
    +-------------------+
    | transaction_found |
    +-------------------+
    |                 0 |
    +-------------------+
    

    Getting OWN_GTID

    This is all well, but, given some INSERT or UPDATE on the master, how can I tell what's the GTID associated with that transaction? There\s good news and bad news.

    • Good news is, you may SET SESSION session_track_gtids = OWN_GTID. This makes the MySQL protocol return the GTID generated by your transaction.
    • Bad news is, this isn't a standard SQL response, and the common MySQL drivers offer you no way to get that information!

    At GitHub we author our own Ruby driver, and have implemented the functionality to extract OWN_GTID, much like you'd extract LAST_INSERT_ID. But, how does one solve that without modifying the drivers? Here's a poor person's solution which gives you an inexact, but good enough, info. Following a write (insert, delete, create, ...), run:

    select gtid_subtract(concat(@@server_uuid, ':1-1000000000000000'), gtid_subtract(concat(@@server_uuid, ':1-1000000000000000'), @@global.gtid_executed)) as master_generated_gtid;
    

    The idea is to "clean" the executed GTID set from irrelevant entries, by filtering out all ranges that do not belong to the server you've just written to (the master). The number 1000000000000000 stands for "high enough value that will never be reached in practice" - set to your own preferred value, but this value should take you beyond 300 years assuming 100,000 transactions per second.

    The value you get is the range on the master itself. e.g.:

    mysql> select gtid_subtract(concat(@@server_uuid, ':1-1000000000000000'), gtid_subtract(concat(@@server_uuid, ':1-1000000000000000'), @@global.gtid_executed)) as master_generated_gtid;
    +-------------------------------------------------+
    | master_generated_gtid                           |
    +-------------------------------------------------+
    | dc103953-1598-11ea-82a7-008cfa5440e4:1-35807176 |
    +-------------------------------------------------+
    

    You may further parse the above to extract dc103953-1598-11ea-82a7-008cfa5440e4:35807176 if you want to hold on to the latest GTID entry. Now, this entry isn't necessarily your own. Between the time of your write and the time of your GTID query, other writes will have taken place. But the entry you get is either your own or a later one. If you can find that entry on a replica, that means your write is included on the replica.

    One may wonder, why do we need to extract the value at all? Why not just select @@global.gtid_executed? Why filter only the master's UUID? Logically, the answer is the same if you do that. But in practice, your query may be unfortunate enough to return some:

    select @@global.gtid_executed \G
    
    e71f0cdb-b8ef-11e9-9361-008cfa542442:1-83331,
    e742d87f-dea7-11e9-be6d-008cfa542c9e:1-18485,
    e7880c0e-ac54-11e9-865a-008cfa544064:1-7331973,
    e82043c6-c7d9-11e9-9413-008cfa5440e4:1-61692,
    e902678b-b046-11e9-a281-008cfa542c9e:1-83108,
    e90d7ff9-e35e-11e9-a9a0-008cfa544064:1-18468,
    e929a635-bb40-11e9-9c0d-008cfa5440e4:1-139348,
    e9351610-ef1b-11e9-9db4-008cfa5440e4:1-33460918,
    e938578d-dc41-11e9-9696-008cfa542442:1-18232,
    e947f165-cd53-11e9-b7a1-008cfa5440e4:1-18480,
    e9733f37-d537-11e9-8604-008cfa5440e4:1-18396,
    e97a0659-e423-11e9-8433-008cfa542442:1-18237,
    e98dc1f7-e0f8-11e9-9bbd-008cfa542c9e:1-18482,
    ea16027a-d20e-11e9-9845-008cfa542442:1-18098,
    ea1e1aa6-e74a-11e9-a7f2-008cfa544064:1-18450,
    ea8bc1bd-dd06-11e9-a10c-008cfa542442:1-18203,
    eae8c750-aaca-11e9-b17c-008cfa544064:1-85990,
    eb1e41e9-af81-11e9-9ceb-008cfa544064:1-86220,
    eb3c9b3b-b698-11e9-b67a-008cfa544064:1-18687,
    ec6daf7e-b297-11e9-a8a0-008cfa542c9e:1-80652,
    eca4af92-c965-11e9-a1f3-008cfa542c9e:1-18333,
    ecd110b9-9647-11e9-a48f-008cfa544064:1-24213,
    ed26890e-b10b-11e9-a79d-008cfa542c9e:1-83450,
    ed92b3bf-c8a0-11e9-8612-008cfa542442:1-18223,
    eeb60c82-9a3d-11e9-9ea5-008cfa544064:1-1943152,
    eee43e06-c25d-11e9-ba23-008cfa542442:1-105102,
    eef4a7fb-b438-11e9-8d4b-008cfa5440e4:1-74717,
    eefdbd3b-95b3-11e9-833d-008cfa544064:1-39415,
    ef087062-ba7b-11e9-92de-008cfa5440e4:1-9726172,
    ef507ff0-98b3-11e9-8b15-008cfa5440e4:1-928030,
    ef662471-9a3b-11e9-bd2e-008cfa542c9e:1-954800,
    f002e9f7-97ee-11e9-bed0-008cfa542c9e:1-5180743,
    f0233228-e9a1-11e9-a142-008cfa542c9e:1-18583,
    f04780c4-a864-11e9-9f28-008cfa542c9e:1-83609,
    f048acd9-b1d2-11e9-a0b6-008cfa544064:1-70663,
    f0573d8c-9978-11e9-9f73-008cfa542c9e:1-85642135,
    f0b0a37c-c89c-11e9-804c-008cfa5440e4:1-18488,
    f0cfe1ac-e5af-11e9-bc09-008cfa542c9e:1-18552,
    f0e4997c-cbc9-11e9-9179-008cfa542442:1-1655552,
    f24e481c-b5c4-11e9-aff0-008cfa5440e4:1-83015,
    f4578c4b-be6d-11e9-982e-008cfa5440e4:1-132701,
    f48bce80-e99f-11e9-94f4-a0369f9432f4:1-18460,
    f491adf1-9b04-11e9-bc71-008cfa542c9e:1-962823,
    f5d3db74-a929-11e9-90e8-008cfa5440e4:1-75379,
    f6696ba7-b750-11e9-b458-008cfa542c9e:1-83096,
    f714cb4c-dab7-11e9-adb9-008cfa544064:1-18413,
    f7b781a9-cbbd-11e9-affb-008cfa542442:1-18169,
    f81f7729-b10d-11e9-b29b-008cfa542442:1-86820,
    f88a3298-e903-11e9-88d0-a0369f9432f4:1-18548,
    f9467b29-d78c-11e9-b1a2-008cfa5440e4:1-18492,
    f9c08f5c-e4ea-11e9-a76c-008cfa544064:1-1667611,
    fa633abf-cee3-11e9-9346-008cfa542442:1-18361,
    fa8b0e64-bb42-11e9-9913-008cfa542442:1-140089,
    fa92234c-cc90-11e9-b337-008cfa544064:1-18324,
    fa9755eb-e425-11e9-907d-008cfa542c9e:1-1668270,
    fb7843d5-eb38-11e9-a1ff-a0369f9432f4:1-1668957,
    fb8ceae5-dd08-11e9-9ed3-008cfa5440e4:1-18526,
    fbf9970e-bc07-11e9-9e4f-008cfa5440e4:1-136157,
    fc0ffaee-98b1-11e9-8574-008cfa542c9e:1-940999,
    fc9bf1e4-ee54-11e9-9ce9-008cfa542c9e:1-18189,
    fca4672f-ac56-11e9-8a83-008cfa542442:1-82014,
    fcebaa05-dab5-11e9-8356-008cfa542c9e:1-18490,
    fd0c88b1-ad1b-11e9-bf3a-008cfa5440e4:1-75167,
    fd394feb-e4e4-11e9-bd09-008cfa5440e4:1-18574,
    fd687577-b048-11e9-b429-008cfa542442:1-83479,
    fdb18995-a79f-11e9-a28d-008cfa542442:1-82351,
    fdc72b7f-b696-11e9-ade9-008cfa544064:1-57674,
    ff1f3b6b-c967-11e9-ae04-008cfa544064:1-18503,
    ff6fe7dc-c186-11e9-9bb4-008cfa5440e4:1-103192,
    fff9dd94-ed95-11e9-90b7-008cfa544064:1-911039
    

    This can happen when you fail over to a new master, multiple times; it happens when you don't recycle UUIDs, when you provision new hosts and let MySQL pick their UUID. Returning this amount of data per query is an excessive overhead, hence why we extract the master's UUID only, which is guaranteed to be limited in size.

    by shlomi at December 11, 2019 08:00 AM

    December 10, 2019

    SeveralNines

    Creating a Cold Standby for PostgreSQL Using Amazon AWS

    The need to achieve database High Availability is a pretty common task, and often a must. If your company has a limited budget, then maintaining a replication slave (or more than one) that is running on the same cloud provider (just waiting if it’s needed someday) can be expensive. Depending on the type of application, there are some cases where a replication slave is necessary to improve the RTO (Recovery Time Objective).

    There is another option, however, if your company can accept a short delay to get your systems back online.

    Cold Standby, is a redundancy method where you have a standby node (as a backup) for the primary one. This node is only used during a master failure. The rest of the time the cold standby node is shut down and only used to load a backup when needed.

    To use this method,  it’s necessary to have a predefined backup policy (with redundancy) according to an acceptable RPO (Recovery Point Objective) for the company. It may be that losing 12 hours of data is acceptable for the business or losing just one hour could be a big problem. Every company and application must determine their own standard.

    In this blog, you’ll learn you how to create a backup policy and how to restore it to a Cold Standby Server using ClusterControl and its integration with Amazon AWS.

    For this blog, we’ll assume that you already have an AWS account and ClusterControl installed. While we’re going to use AWS as the cloud provider in this example, you can use a different one. We’ll use the following PostgreSQL topology deployed using ClusterControl:

    • 1 PostgreSQL Primary Node
    • 2 PostgreSQL Hot-Standby Nodes
    • 2 Load Balancers (HAProxy + Keepalived)
    ClusterControl Topology View Section

    Creating an Acceptable Backup Policy

    The best practice for creating this type of policy is to store the backup files in three different places, one stored locally on the database server (for faster recovery), another one in a centralized backup server, and the last one in the cloud. 

    You can improve on this by also using full, incremental and differential backups. With ClusterControl you can perform all the above best practices, all from the same system, with a friendly and easy to use UI. Let’s start by creating the AWS integration in ClusterControl.

    Configuring the ClusterControl AWS Integration

    Go to ClusterControl -> Integrations -> Cloud Providers -> Add Cloud Credentials.

    Choose a cloud provider. We support AWS, Google Cloud or Azure. In this case, chose AWS and continue.

    Here you need to add a Name, a Default region, and an AWS key ID and key secret. To get or create these last ones, you should go to the IAM (Identity and Access Management) section on the AWS management console. For more information, you can refer to our documentation or AWS documentation.

    Now you have the integration created, let’s go to schedule the first backup using ClusterControl.

    Scheduling a Backup with ClusterControl

    Go to ClusterControl -> Select the PostgreSQL Cluster -> Backup -> Create Backup.

    You can choose if you want to create a single backup instantly or schedule a new backup. So, let’s choose the second option and continue.

    When you’re scheduling a backup, first, you need to specify schedule/frequency. Then, you must choose a backup method (pg_dumpall, pg_basebackup, pgBackRest), the server from which the backup will be taken, and where you want to store the backup. You can also upload our backup to the cloud (AWS, Google or Azure) by enabling the corresponding button.

    Then specify the use of compression, the compression level, encryption and retention period for your backup. There is another feature called “Verify Backup” that you’ll see more in detail soon in this blog post.

    If the “Upload Backup to the cloud” option was enabled, you’ll see this step where you must select the cloud credentials, and create or select an S3 bucket where to store the backup. You must also specify the retention period.

    Now you’ll have the scheduled backup in the ClusterControl Schedule Backups section. To cover the best practices mentioned earlier, you can schedule a backup to store it into an external server (ClusterControl server) and in the cloud, and then schedule another backup to store it locally in the database node for a faster recovery.

    Restoring a Backup on Amazon EC2

    Once the backup is finished, you can restore it by using ClusterControl in the Backup section. 

    Creating the Amazon EC2 Instance

    First of all, to restore it, you’ll need somewhere to do it, so let’s create a basic Amazon EC2 instance. Go to” Launch Instance” in the AWS management console in the EC2 section, and configure your instance.

    When your instance is created, you’ll need to copy the SSH public key from the ClusterControl server. 

    Restoring the Backup Using ClusterControl

    Now you have the new EC2 instance, let’s use it to restore the backup there. For this, in your ClusterControl go to the backup section (ClusterControl -> Select Cluster -> Backup), and there you can select "Restore Backup", or directly "Restore" on the backup that you want to restore.

    You have three options to restore the backup. You can restore the backup in an existing database node, restore and verify the backup on a standalone host or create a new cluster from the backup. As you want to create a cold standby node, let’s use the second option “Restore and Verify on standalone host”.

    You’ll need a dedicated host (or VM) that is not part of the cluster to restore the backup, so let’s use the EC2 instance created for this job. ClusterControl will install the software and it’ll restore the backup in this host. 

    If the option “Shutdown the server after the backup has been restored” is enabled, ClusterControl will stop the database node after finishing the restore job, and that is exactly what we need for this cold standby creation.

    You can monitor the backup progress in the ClusterControl Activity section.

    Using the ClusterControl Verify Backup Feature

    A backup is not a backup if it's not restorable. So, you should make sure that the backup is working and restore it in the cold standby node frequently.

    This ClusterControl Verify Backup backup feature is a way to automate the maintenance of a cold standby node restoring a recent backup to keep this as up-to-date as possible avoiding the manual restore backup job. Let’s see how it works.

    As the “Restore and Verify on standalone host” task, it’ll require a dedicated host (or VM) that is not part of the cluster to restore the backup, so let’s use the same EC2 instance here.

    The automatic verify backup feature is available for the scheduled backups. So, go to ClusterControl -> Select the PostgreSQL Cluster -> Backup -> Create Backup and repeat the steps that you saw earlier to schedule a new backup.

    In the second step, you will have the “Verify Backup” feature available to enable it.

    Using the above options, ClusterControl will install the software and restore the backup on the host. After restoring it, if everything went fine, you will see the verification icon in the ClusterControl Backup section.

    Conclusion

    If you have a limited budget, but require High Availability, you can use a cold standby PostgreSQL node that could be valid or not depending on the RTO and RPO of the company. In this blog, we showed you how to schedule a backup (according to your business policy) and how to restore it manually. We also showed how to restore the backup automatically in a Cold Standby Server using ClusterControl, Amazon S3, and Amazon EC2.

    by Sebastian Insausti at December 10, 2019 06:57 PM

    December 09, 2019

    SeveralNines

    Using Database Backup Advisors to Automate Maintenance Tasks

    A disaster usually causes an outage, which means system downtime and potential loss of data. Once we have detected the blackout, we trigger our DR plan to recover from it. But it would be a surprise, if there is no backup, or after long hours of recovery, you see it's not the one you need. 

    While outages can be costly - there is often a financial impact  which can be harmful to the business and data loss may be a reason to close the company. 

    To minimize data loss, we need to have multiple copies of data in various places. We can design our infrastructure in different layers and abstract each layer from the one below it. For instance, we build a layer for clusters of database instances to protect against hardware failure. We replicate databases across datacenters so we can defend ourselves against a data center failure. Every additional layer adds complexity, which can become a nightmare to manage. But still, in essence, a backup will take the central place in the disaster recovery.

    That's why it's crucial to be sure it's something we can rely on. But how to achieve this? Well, one of the options is to verify if backups were executed based on the last few lines of backup script. 

    A simple example:

    #!/bin/sh
    
    mysqldump -h 192.168.1.1 -u user -ppassword dbname > filename.sql
    
    
    
    if [ "$?" -eq 0 ]; then
    
        echo "Success."
    
    else
    
        echo "Error."
    
    fi

    But what if the backup script did  not start at all? Google offers quite a bit of search results for "Linux cron, not running." 

    Unfortunately, open-source databases often do not offer backup repository. 

    Another backup testing. You may have heard  about Schrödinger's cat.  A known Schrödinger's Backup theory is . "The condition of any backup is unknown until a restore is attempted." Sounds like a simple approach but such an attempt would mean you have to set up a test environment, copy files run restore ... after every backup. 

    In this article, we will see how you can use ClusterControl to make sure your backup is executed  to achieve Enterprise-Grade databases with Open Source Databases.

    Backup Reports

    ClusterControl has been aimed at operational reports. Operational Reporting provides support to day-to-day enterprise activity monitoring and control. The backup report is one of many. You can find reports like:

    • Daily System Report
    • Package Upgrade Report
    • Schema Change Report
    • Availability 
    • Backup
    Create Operational Report

    But why you would need this?

    You may already have an excellent monitoring tool with all possible metrics/graphs and you probably have also set up alerts based on metrics and thresholds (some will even have automated advisors providing them recommendations or fixing things automatically.) That's good - having visibility into your system is important; nevertheless, you need to be able to process a lot of information.

    ClusterControl Backup Report

    How does this work? ClusterControl collects information on the backup process, the systems, platforms, and devices in the backup infrastructure when the backup job is triggered. All of that information is aggregated and stored in a CMON (internal database), so there is no need to query particular databases additionally. Additionally, when it discovers that you have a running cluster, but there was no backup, it will be reported too.

    In the report details, you can track a  backup ID with detailed data about the location, size, time, and backup method. Templates work with data for different database types, so when you manage your mixed environment, you will get the same feel and look. It helps to manage different database backups better.

    CLI Reports

    For those who prefer the command-line interface, a good option to track backups ClusterControl Command Line Interface (CLI).

    CLI lets you execute most of the functions available within ClusterControl using simple commands. Backup execution and backup reports are one of them. 

    Used in conjunction with the powerful GUI, it gives ClusterControl users alternative ways to manage their open-source database environments using whatever engine they prefer.

    $ s9s backup --list --cluster-id=1 --long --human-readable
    
    ID CID STATE     OWNER HOSTNAME CREATED  SIZE FILENAME
    
     1   1 COMPLETED dba   10.0.0.5 07:21:39 252K mysqldump_2017-05-09_072135_mysqldb.sql.gz
    
     1   1 COMPLETED dba   10.0.0.5 07:21:43 1014 mysqldump_2017-05-09_072135_schema.sql.gz
    
     1   1 COMPLETED dba   10.0.0.5 07:22:03 109M mysqldump_2017-05-09_072135_data.sql.gz
    
     1   1 COMPLETED dba   10.0.0.5 07:22:07 679 mysqldump_2017-05-09_072135_triggerseventsroutines.sql.gz
    
     2   1 COMPLETED dba   10.0.0.5 07:30:20 252K mysqldump_2017-05-09_073016_mysqldb.sql.gz
    
     2   1 COMPLETED dba   10.0.0.5 07:30:24 1014 mysqldump_2017-05-09_073016_schema.sql.gz
    
     2   1 COMPLETED dba   10.0.0.5 07:30:44 109M mysqldump_2017-05-09_073016_data.sql.gz
    
     2   1 COMPLETED dba   10.0.0.5 07:30:49 679 mysqldump_2017-05-09_073016_triggerseventsroutines.sql.gz

    Beginning from version 1.4.1, the installer script will automatically install this package on the ClusterControl node. CLI is part of s9s-tools package. You can also install it separately on a different machine to manage the database cluster remotely. Similar to ClusterControl it uses secure SSH communication. 

    Automatic Backup Verification

    A backup is not a backup if we are not able to retrieve the data. Verifying backups is something that is usually overlooked by many companies. Let’s see how ClusterControl can automate the verification of backups and help avoid any surprises.

    In ClusterControl, select your cluster and go to the "Backup" section, then, select “Create Backup”.

    The automatic verify backup feature is available for the scheduled backups so, let’s choose the “Schedule Backup” option.

    When scheduling a backup, in addition to selecting the common options like method or storage, we also need to specify schedule/frequency. In this example, we are going to setup MySQL backup verification. However the same can be achieved for PostgreSQL and Timescale databases. 

    When backup verification is checked another tab will appear.

    Here we can set all the necessary steps to prepare the environment. When IP is provided we are good to go and schedule such backup. Whenever backup finishes it will be copied to a temporary backup verification environment (“restore backup on” option).  After successful refresh, you will see the status of verification in the backup repository tab. 

    Failed Backup Executions and Integration Services

    Another interesting option to get more clues about backup execution is to use ClusterControl Integration services. You can control the backup execution status with third-party services.

    Third-party tools integration enables you to automate alerts with other popular systems. Currently, ClusterControl supports ServiceNow, PagerDuty, VictorOps, OpsGenie, Slack, Telegram, and Webhooks. 

    ClusterControl Integration Services

    Below we can see an example of Slack channel integration. Whenever a backup event occurs it will appear in the slack channel.  

    ClusterControl Integration Services

    Conclusion

    Backups are mandatory in any environment. They help you protect your data and are in the center of any disaster recovery scenario. ClusterControl can help automate the backup process for your databases and, in case of failure, restore it with a few clicks. Also, you can be sure they are executed successfully and reliable so in case of disaster, you will not lose your data.

    by Bart Oles at December 09, 2019 10:45 AM

    December 08, 2019

    Henrik Ingo

    Let's rewrite everything from scratch (Drizzle eulogy)

    Earlier this year I performed my last act as Drizzle liaison to SPI by requesting that Drizzle be removed from the list of active SPI member projects and that about 6000 USD of donated funds be moved to the SPI general fund.

    Drizzle project started in 2008, when Brian Aker and a few other MySQL employees were moved to Sun Microsystem's CTO labs. The background to why there was demand for such a project was in my opinion twofold:

    read more

    by hingo at December 08, 2019 12:24 PM

    December 06, 2019

    SeveralNines

    ClusterControl Takes the Spotlight as Top IT Management Software; Chosen as Rising Star of 2019 by B2B Review Platform

    ClusterControl is all about delivering robust, open source, database management for the IT management needs of our clients. This goal drives us every day, so much so that it lead us to receive two awards recently from CompareCamp: the Rising Star of 2019 Award and the Great User Experience Award.

    CompareCamp is a B2B Review Platform that delivers credible SaaS reviews and updated technology news from industry experts. Thousands of users rely on CompareCamp reviews that detail the pros and cons of software from different industries. 

    ClusterControl was given the Great User Experience Award because it effectively boosted users’ rate of productivity through highly secured tools for real-time monitoring, failure detection, load balancing, data migrating, and automated recovery. Our dedicated features for node recovery, SSL encryption, and performance reporting received raves from experts.

    We also received the Rising Star of 2019 Award due to our initial review and being a highly recommended IT management software by CompareCamp. 

    To read the full review, please visit CompareCamp.

    by fwlymburner at December 06, 2019 10:45 AM

    December 05, 2019

    SeveralNines

    How to Install ClusterControl to a Custom Location

    ClusterControl consists of a number of components and one of the most important ones is the ClusterControl UI. This is the web application through which the user interacts with other backend ClusterControl components like controller, notification, web-ssh module and cloud module. Each component is packaged independently with its own name, therefore easier for potential issues to be fixed and delivered to the end user. For more info on ClusterControl components, check out the documentation page.

    In this blog post, we are going to look into ways to customize our ClusterControl installation, especially the ClusterControl UI which by default will be located under /var/www/html (default document root of Apache). Note that it's recommended to host ClusterControl on a dedicated server where it can use all the default paths which will simplify ClusterControl maintenance operations.

    Installing ClusterControl

    For a fresh installation, go to our Download page to get the installation link. Then, start installing ClusterControl using the installer script as root user:

    $ whoami
    root
    $ wget https://severalnines.com/downloads/cmon/install-cc
    $ chmod 755 install-cc
    $ ./install-cc

    Follow the installation wizard accordingly and the script will install all dependencies, configure ClusterControl components and start them up. The script will configure Apache 2.4, and use the package manager to install ClusterControl UI which by default located under /var/www/html.

    Preparation

    Once ClusterControl is installed into its default location, we can then move the UI directories located under /var/www/html/clustercontrol and /var/www/html/cmon into somewhere else. Let's prepare the new path first.

    Suppose we want to move the UI components to a user directory under /home. Firstly, create the user. In this example, the user name is "cc":

    $ useradd -m cc

    The above command will automatically create a home directory for user "cc", under /home/cc. Then, create the necessary directories for Apache usage for this user. We are going to create a directory called "logs" for Apache logs, "public_html" for Apache document root of this user and the "www" as a symbolic link to the public_html:

    $ cd /home/cc
    $ mkdir logs
    $ mkdir public_html
    $ ln -sf public_html www

    Make sure all of them are owned by Apache:

    $ chown apache:apache logs public_html

    To allow Apache process to access public_html under user cc, we have to allow global read to the home directory of user cc:

    $ chmod 755 /home/cc

    We are now good to move stuff.

    Customizing the Path

    Stop ClusterControl related services and Apache:

    $ systemctl stop httpd # RHEL/CentOS
    $ systemctl stop apache2 # Debian/Ubuntu
    $ systemctl stop cmon cmon-events cmon-ssh cmon-cloud

    We basically have two options in moving the directory into the user's directory:

    1. Move everything from /var/www/html into /home/cc/public_html.
    2. Create a symbolic link from /var/www/html/clustercontrol to /home/cc/public_html (recommended).

    If you opt for option #1, simply move the ClusterControl UI directories into the new path, /home/cc/public_html:

    $ mv /var/www/html/clustercontrol /home/cc/public_html/
    $ mv /var/www/html/cmon /home/cc/public_html/

    Make sure the ownership is correct:

    $ chown -R apache:apache /home/cc/public_html # RHEL/CentOS
    $ chown -R www-data:www-data /home/cc/public_html # Debian/Ubuntu

    However, there is a drawback since ClusterControl UI package will always get extracted under /var/www/html. This means if you upgrade the ClusterControl UI via package manager, the new content will be available under /var/www/html. Refer to "Potential Issues" section further down for more details.

    If you choose option #2, which is the recommended way, you just need to create a symlink (link reference to another file or directory) under the user's public_html directory for both directories. When an upgrade happens, the DEB/RPM postinst script will replace the existing installation with the updated version under /var/www/html. To do a symlink, simply:

    $ ln -sf /var/www/html/clustercontrol /home/cc/public_html/clustercontrol
    $ ln -sf /var/www/html/cmon /home/cc/public_html/cmon

    Another step is required for option #2, where we have to allow Apache to follow symbolic links outside of the user's directory. Create a .htaccess file under /home/cc/public_html and add the following line:

    # /home/cc/public_html/.htaccess
    Options +FollowSymlinks -SymLinksIfOwnerMatch

    Open Apache site configuration file at /etc/httpd/conf.d/s9s.conf (RHEL/CentOS) or /etc/apache2/sites-enabled/001-s9s.conf (Debian/Ubuntu) using your favourite text editor and modify it to be as below (pay attention on lines marked with ##):

    <VirtualHost *:80>
        ServerName cc.domain.com  ## 
    
        ServerAdmin webmaster@cc.domain.com
        DocumentRoot /home/cc/public_html  ##
    
        ErrorLog /home/cc/logs/error.log  ##
        CustomLog /home/cc/logs/access.log combined  ##
    
        # ClusterControl SSH & events
        RewriteEngine On
        RewriteRule ^/clustercontrol/ssh/term$ /clustercontrol/ssh/term/ [R=301]
        RewriteRule ^/clustercontrol/ssh/term/ws/(.*)$ ws://127.0.0.1:9511/ws/$1 [P,L]
        RewriteRule ^/clustercontrol/ssh/term/(.*)$ http://127.0.0.1:9511/$1 [P]
        RewriteRule ^/clustercontrol/sse/events/(.*)$ http://127.0.0.1:9510/events/$1 [P,L]
    
        # Main Directories
        <Directory />
                Options +FollowSymLinks
                AllowOverride All
        </Directory>
    
        <Directory /home/cc/public_html>  ##
                Options +Indexes +FollowSymLinks +MultiViews
                AllowOverride All
                Require all granted
        </Directory>
    
    </VirtualHost>

    The similar modifications apply to the HTTPS configuration at /etc/httpd/conf.d/ssl.conf (RHEL/CentOS) or /etc/apache2/sites-enabled/001-s9s-ssl.conf (Debian/Ubuntu). Pay attention to lines marked with ##:

    <IfModule mod_ssl.c>
            <VirtualHost _default_:443>
    
                   ServerName cc.domain.com  ##
                   ServerAdmin webmaster@cc.domain.com ##
    
                   DocumentRoot /home/cc/public_html  ##
    
                    # ClusterControl SSH & events
    
                    RewriteEngine On
                    RewriteRule ^/clustercontrol/ssh/term$ /clustercontrol/ssh/term/ [R=301]
                    RewriteRule ^/clustercontrol/ssh/term/ws/(.*)$ ws://127.0.0.1:9511/ws/$1 [P,L]
                    RewriteRule ^/clustercontrol/ssh/term/(.*)$ http://127.0.0.1:9511/$1 [P]
                    RewriteRule ^/clustercontrol/sse/events/(.*)$ http://127.0.0.1:9510/events/$1 [P,L]
    
                    <Directory />
                            Options +FollowSymLinks
                            AllowOverride All
                    </Directory>
    
                    <Directory /home/cc/public_html>  ##
                            Options +Indexes +FollowSymLinks +MultiViews
                            AllowOverride All
                            Require all granted
                    </Directory>
    
                    SSLEngine on
                    SSLCertificateFile /etc/pki/tls/certs/s9server.crt # RHEL/CentOS
                    SSLCertificateKeyFile /etc/pki/tls/private/s9server.key # RHEL/CentOS
                    SSLCertificateKeyFile /etc/ssl/certs/s9server.crt # Debian/Ubuntu
                    SSLCertificateKeyFile /etc/ssl/private/s9server.key # Debian/Ubuntu
    
                    <FilesMatch "\.(cgi|shtml|phtml|php)$">
                                    SSLOptions +StdEnvVars
                    </FilesMatch>
    
                    <Directory /usr/lib/cgi-bin>
                                    SSLOptions +StdEnvVars
                    </Directory>
                    BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown
            </VirtualHost>
    </IfModule>

    Restart everything:

    $ systemctl restart httpd
    $ systemctl restart cmon cmon-events cmon-ssh cmon-cloud

    Consider the IP address of the ClusterControl is 192.168.1.202 and domain cc.domain.com resolves to 192.168.1.202, you can access ClusterControl UI via one of the following URLs:

    You should see the ClusterControl login page at this point. The customization is now complete.

    Potential Issues

    Since the package manager simply executes the post-installation script during package upgrade, the content of the new ClusterControl UI package (the actual package name is clustercontrol.x86_64) will be extracted into /var/www/html (it's hard coded inside post installation script). The following is what would happen:

    $ ls -al /home/cc/public_html # our current installation
    clustercontrol
    cmon
    $ ls -al /var/www/html # empty
    $ yum upgrade clustercontrol -y
    $ ls -al /var/www/html # new files are extracted here
    clustercontrol
    cmon

    Therefore, if you use symlink method, you may skip the following additional steps.

    To complete the upgrade process, one has to replace the existing installation under the custom path with the new installation manually. First, perform the upgrade operation:

    $ yum upgrade clustercontrol -y # RHEL/CentOS
    $ apt upgrade clustercontrol -y # Debian/Ubuntu

    Move the existing installation to somewhere safe. We will need the old bootstrap.php file later on:

    $ mv /home/cc/public_html/clustercontrol /home/cc/public_html/clustercontrol_old

    Move the new installation from the default path /var/www/html into user's document root:

    $ mv /var/www/html/clustercontrol /home/cc/public_html

    Copy bootstrap.php from the old installation to the new one:

    $ mv /home/cc/public_html/clustercontrol_old/bootstrap.php /home/cc/public_html/clustercontrol

    Get the new version string from bootstrap.php.default:

    $ grep CC_UI_VERSION /home/cc/public_html/clustercontrol/bootstrap.php.default
    define('CC_UI_VERSION', '1.7.4.6537-#f427cb');

    Update the new version string for CC_UI_VERSION value inside bootstrap.php using your favourite text editor:

    $ vim /home/cc/public_html/clustercontrol/bootstrap.php

    Save the file and the upgrade is now complete.

    That's it, folks. Happy customizing!

     

    by ashraf at December 05, 2019 04:18 PM

    December 04, 2019

    SeveralNines

    Best Practices for Archiving Your Database in the Cloud

    With the technology available today there is no excuse for failing to recover your data due to lack of backup policies or understanding of how vital it is to take backups as part of your daily, weekly, or monthly routine. Database backups must be taken on a regular basis as part of your overall disaster recovery strategy. 

    The technology for handling backups has never been more efficient and many best practices have been adopted (or bundled) as part of a certain database technology or service that offers it.

    To some extent, people still don’t understand how to store data backups efficiently, nor do they understand the difference between data backups versus archived data. 

    Archiving your data provides many benefits, especially in terms of efficiency such as storage costs, optimizing data retrieval, data facility expenses, or payroll for skilled people to maintain your backup storage and its underlying hardware. In this blog, we'll look at the best practices for archiving your data in the cloud.

    Data Backups vs Data Archives

    For some folks in the data tech industry, these topics are often confusing, especially for newcomers.

    Data backups are backups that are taken from your physical and raw data to be stored locally or offsite which can be accessed in case of emergency or data recovery. It is used to restore data in case it is lost, corrupted or destroyed. 

    Data archived, on the other hand, are data (or can still be a backup data) but are no longer used or less critical to your business needs such as stagnant data, yet it's still not obsolete and has value on it. This means that data that is to be stored is still important but that doesn’t need to be accessed or modified frequently (if at all).  Its purpose can be among these:

    • reduce its primary consumption so it can be stored on a low-performant machines since data stored on it doesn't mean it has to be retrieved everyday or immediately.
    • Retain cost-efficiency on maintaining your data infrastructure
    • Worry-less for an overgrowing data especially those data that are old or data that are infrequently changed from time-to-time.
    • Avoid large expenses when maintaining backup appliances or software that are integrated into the backup system.
    • As a requirement to meet regulatory standards like HIPAA, PCI-DSS or GDPR to store legacy data or data that they are required to keep

    While for databases, it has a very promising benefits which are,

    • it helps reduce data complexity especially when data grows drastically but archiving your data helps maintain the size of your data set.
    • It helps your daily, weekly, or monthly data backups perform optimally because it has less data since it doesn't need to include processing the old or un-useful data. It's un-useful since it's not a useless data but it's just un-useful for daily or frequent needs.
    • It helps your queries perform efficiently and optimization results can be consistent at times since it doesn't require to scan large and old data.
    • data storage space can be managed and controlled accordingly based on its data retention and policy.

    Data archived facility is not necessarily has to be the same power and resources as the data backups storage have. Tape drives, magnetic disk, or optical drives can be used for data archiving opposes. While it's  purpose of storing the data means its infrequently accessed or shall be accessed not very soon but still can be accessible when it's needed.

    Additionally, people involved in data archival requires to identify what archived data means. Data archives are those data that are not reproducible or data that can be re-generated or self-generated. If the data stored in the database are records that can be a result of a mathematical determinants or calculation that are predictably reproducible, then these can be re-generated if needed. This can be excluded for your data archival purposes.

    Data Retention Standards

    It's true that pruning your data records stored in your database and moving it to your archives has some great benefits. It doesn't mean, however, that you are free to do this as it depends on your business requirements. In fact, different countries have laws that require you to follow (or at least implement) based on the regulation. You will need to determine what archived data mean to your business application or what data are infrequently accessed. 

    For example, Healthcare providers are commonly required (depending on its country of origin) to retain patient's information for long periods of time. While in Finance, the rules depend on the specific country. What data you need to retain should be verified so you can safely prune it for archival purposes and then store it in a safe, secure place.

    The Data Life-Cycle

    Data backups and data archives are usually taken alongside through a backup life-cycle process. This life-cycle process has to be defined within your backup policy. Most backup policies have to undergo the process as listed below...

    • it has the process defined on which it has to be taken (daily, weekly, monthly),
    • if it has to be a full backup or an incremental backup,
    • the format of the backup if it has to be compressed or stored in an archived file format, 
    • if the data has to be encrypted or not, 
    • its designated location to store the backup (locally stored on the same machine or over the local network), 
    • its secondary location to store the backup (cloud storage, or in a collo), 
    • and it's data retention on how old your data can be present until its end-of-life or destroyed. 

    What Applications Need Data Archiving?

    While everyone can enjoy the benefits of data archiving, there are certain fields that regularly practice this process for managing and maintaining their data. 

    Government institutions fall into this criteria. Security and public safety (such as video surveillance, threats to personal, residential, social, and business safety) require that this information be retained. This type of data must be stored securely for years to come for forensic and investigative purposes.

    Digital Media companies often have to store large amounts of content of their data and these files are often very large in size. Digital Libraries also has to store tons of data for research or information for public use. 

    Healthcare providers, including insurance, are required to retain large amounts of information on their patients' for many years. Certainly, data can grow quickly and it can affect the efficiency of the database when it's not maintained properly. 

    Cloud Storage Options For Your Archived Data

    The oop cloud companies are actively competing to get you great features to store your archived data in the cloud. It starts with a low cost price and offers flexibility to access your data off-site. Cloud storage is a useful and reliable off-site data storage for data backups and data archiving purposes, especially because it's very cost efficient. You don't need to maintain large amounts of data. No need to maintain your hardware and storage services in your local site or primary site. It's less expensive, as well, in handling electricity billings. 

    These points are important as you might not need to access your archived date in real-time. On certain occasions, especially when a recovery or investigation has to be done, you might require access to your data abruptly. For some businesses, they offer their customers the ability to access their old data, but you have to wait for hours or days before they can provide the access to download the archived data.

    For example, in AWS, they have AWS S3 Glacier which offers a great flexibility. In fact, you can store your data via S3, setup a life-cycle policy and define the end of your data when it will be destroyed. Check out the documentation on How Do I Create a Lifecycle Policy for an S3 Bucket?. The great thing with AWS S3 Glacier is that, it is highly flexible. See their waterfall model below,

    Image Courtesy of Amazon's Documentation "Transitioning Objects Using Amazon S3 Lifecycle".

    At this level, you can store your backups to S3 and let the life-cycle process defined in that bucket handle the data archival purposes. 

    If you're using GCP (Google Cloud Platform), they also offer similar approach. Check out their documentation about Object Lifecycle Management. GCP uses the TTL (or Time-to-Live) approach for retaining objects stored in their Cloud Storage. The great thing with the GCP offering is that they have Archival Cloud Storage which offers Nearline and Coldline storage types. 

    Coldline is ideal for data that are infrequently modified or access in a year. Where as with the Nearline storage type, it's more frequent (a monthly rate or at least modified once a month) but possibly multiple times throughout the year. Your data stored in a life-cycle basis can be accessed in a sub-second and that could be fast.

    With Microsoft Azure, its offerings are plain and simple. They offer the same thing as GCP and AWS does and it offers you to move your archived data into hot or cool tiers. You maybe able to prioritize your requested archived data when needed to the hot or cool tiers but comes with a price compared to a standard request. Checkout their documentation on Rehydrate blob data from the archive tier.

    Overall, this provides hassle free when storing your archived data to the cloud. You may need to define your requirements and of course cost involved when determining which cloud would you need to avail.

    Best Practices for Your Archived Data in the Cloud

    Since we have tackled the differences of data backups and archived data (or data archives), and some of the top cloud vendor offerings, let's take a list of what's the best practices you must have when storing to the cloud.

    • Identify the type of data to be archived. As stated earlier, data backups is not data archived but your data backups can be a data archived. However, data archives are those data that are stagnant, old data, and has infrequently accessed. You need to identify first what are these, mark a tag or add a label to these archived data so you would be able to identify it when stored off-site.
    • Determine Data Access Frequency. Before everything else has to be archived,  you need to identify how frequently will you be going to access the archived data when needed. Certain price can differ on the time you have to access data. For example, Amazon S3 will charge higher if you avail for Expedite Retrieval using Provisioned instead of On-Demand, same thing with Microsoft Azure when you rehydrate archived data with a higher priority.
    • Ensure Multiple Copies Are Spread. Yes, you read it correctly. Even it's archived data or stagnant data, you still need to ensure that your copies are highly available and highly durable when needed. The cloud vendors we have mentioned earlier offers SLA's that will give you an overview of how they store the data for efficiency and faster accessibility. In fact, when configuring your life-cycle policy/backup policy, ensure that you are able to store it in multiple regions or replicate your archived data into a different region. Most of these tech-giant cloud vendors stores their archival cloud storage offerings with multiple zones to offer highly scalable and durable in times of data retrieval is requested.
    • Data Compliance. Ensure that data compliance and regulations are followed accordingly and make it happen during initial phase and not later. Unless the data doesn't affect customer's profile and are just business logic data and history, it might be harmless when it's destroyed but it's better to make things in accord.
    • Provider standards. Choose the right cloud backup and data-retention provider. Walking the path of online data archiving and backup with an experienced service provider could save you from unrecoverable data loss. The top 3 tech-giants of the cloud can be your top choice. But you're free to choose as well promising cloud vendors out there such as Alibaba, IBM or Oracle Archive Storage. It can be best to try it out before making your final decision.

    Data Archiving Tools and Software

    Database using MariaDB, MySQL, or Percona Server can benefit with using pt-archiver. pt-archiver has been widely used for almost a decade and allows you to prune your data while doing archiving as well. For example, the command below to remove orphan records can be done as,

    pt-archiver --source h=host,D=db,t=child --purge \
    
      --where 'NOT EXISTS(SELECT * FROM parent WHERE col=child.col)'

    or send the rows to a different host such as OLAP server,

    pt-archiver --source h=oltp_server,D=test,t=tbl --dest h=olap_server \
    
      --file '/var/log/archive/%Y-%m-%d-%D.%t'                           \
    
      --where "1=1" --limit 1000 --commit-each

    For PostgreSQL or TimescaleDB, you can try and use the CTE (Common Table Expressions) to achieve this. For example,

    CREATE TABLE public.user_info_new (LIKE public.user_info INCLUDING ALL);
    
    
    
    ALTER TABLE public.user_info_new OWNER TO sysadmin;
    
    
    
    GRANT select ON public.user_info_new TO read_only
    
    GRANT select, insert, update, delete ON public.user_info TO user1;
    
    GRANT all ON public.user_info TO admin;
    
    
    
    ALTER TABLE public.user_info INHERIT public.user_info_new;
    
    
    
    BEGIN;
    
    LOCK TABLE public.user_info IN ACCESS EXCLUSIVE MODE;
    
    LOCK TABLE public.user_info_new IN ACCESS EXCLUSIVE MODE;
    
    ALTER TABLE public.user_info RENAME TO user_info_old;
    
    ALTER TABLE public.user_info_new RENAME TO user_info;
    
    
    
    COMMIT;  (or ROLLBACK; if there's a problem)

    Then do a,

    WITH row_batch AS (
    
        SELECT id FROM public.user_info_old WHERE updated_at >= '2016-10-18 00:00:00'::timestamp LIMIT 20000 ),
    
    delete_rows AS (
    
        DELETE FROM public.user_info_old u USING row_batch b WHERE b.id = o.id RETURNING o.id, account_id, created_at, updated_at, resource_id, notifier_id, notifier_type)
    
    INSERT INTO public.user_info SELECT * FROM delete_rows;

    Using CTE with Postgres might incur performance issues. You might have to run this during non-peak hours. See this external blog to be careful on using CTE with PostgreSQL.

    For MongoDB, you can try and use mongodump with the --archive parameters just like below,

    mongodump --archive=test.$(date +"%Y_%m_%d").archive --db=test

    this will dump an archive file namely test.<current-date>.archive

    Using ClusterControl for Data Archival

    ClusterControl allows you to set a backup policy and upload data off-site to your desired cloud storage location. ClusterControl supports the Top three clouds (AWS, GCP, and Microsoft Azure). Please checkout our previous blog on Best Practices for Database Backups to learn more.

    With ClusterControl you can take a backup by first defining the backup policy, choose the database, and archive the table just like below...

    Make sure that the "Upload Backup to the cloud" is enabled or checked just like above. Define the backup settings and set retention,

    Then define the cloud settings just like below.

    For the selected bucket, ensure that you have setup lifecycle management, and in this scenario, we're using AWS S3. In order to setup the lifecycle rule, you just have to select the bucket, then go to the Management tab just like below,

    then setup the lifecycle rules as follows,

    then ensure its transitions,

    In the example above, we're ensuring the transition will go to Amazon S3 Glacier, which is our best choice to retain archived data.

    Once you are done setting up, you're good-to-go to take the backup. Your archived data will follow the lifecycle you have setup within AWS for this example. If you use GCP or Microsoft Azure, it's just the same process where you have to set the backup along with its lifecycle.

    Conclusion

    Adopting the best practices for archiving your data into the cloud can be cumbersome at the beginning, however, if you have the right set of tools or bundled software, it will make your life easier to implement the process.

     

    by Paul Namuag at December 04, 2019 09:16 PM

    December 03, 2019

    MariaDB Foundation

    MariaDB 10.5.0 now available

    The MariaDB Foundation is pleased to announce the availability of MariaDB 10.5.0, the first alpha release in the new MariaDB 10.5 development series. […]

    The post MariaDB 10.5.0 now available appeared first on MariaDB.org.

    by Ian Gilfillan at December 03, 2019 11:47 PM

    SeveralNines

    How to Backup an Encrypted Database with Percona Server for MySQL 8.0

    Production interruptions are nearly guaranteed to happen at some point in time. We know it so we plan backups, create recovery standby databases, convert single instances into clusters.

    Admitting the need for a proper recovery scenario, we must analyze the possible disaster timeline and failure scenarios and implement steps to bring your database up. Planned outage execution can help prepare, diagnose, and recover from the next one. To mitigate the impact of downtime, organizations need an appropriate recovery plan, which would include all factors required to bring service into life.

    Backup Management is not as mild as just scheduling a backup job. There are many factors to consider, such as retention, storage, verification, and whether the backups you are taking are physical or logical and what is easy to overlook security. 

    Many organizations vary their approach to backups, trying to have a combination of server image backups (snapshots), logical and physical backups stored in multiple locations. It is to avoid any local or regional disasters that would wipe up our databases and backups stored in the same data center.

    We want to make it secure. Data and backups should be encrypted. But there are many implications when both options are in place. In this article, we will take a look at backup procedures when we deal with encrypted databases.

    Encryption-at-Rest for Percona Server for MySQL 8.0

    Starting from MySQL 5.7.11, the community version of MySQL began support for InnoDB tablespace encryption. It is called Transparent Tablespace Encryption or referred to as Encryption-at-Rest. 

    The main difference compared to the enterprise version is the way the keys are stored - keys are not located in a secure vault, which is required for regulatory compliance. The same applies to Percona Server, starting version 5.7.11, it is possible to encrypt InnoDB tablespace. In the Percona Server 8.0, support for encrypting binary logs has been greatly extended. Version 8.0 added 

    (Per Percona 8.0 release doc):

    • Temporary File Encryption
    • InnoDB Undo Tablespace Encryption
    • InnoDB System Tablespace Encryption (InnoDB System Tablespace Encryption)
    • default_table_encryption  =OFF/ON (General Tablespace Encryption)
    • table_encryption_privilege_check =OFF/ON (Verifying the Encryption Settings)
    • InnoDB redo log encryption (for master key encryption only) (Redo Log Encryption)
    • InnoDB merge file encryption (Verifying the Encryption Setting)
    • Percona Parallel doublewrite buffer encryption (InnoDB Tablespace Encryption)

    For those interested in-migration from MySQL Enterprise version to Percona -  It is also possible to integrate with Hashicorp Vault server via a keyring_vault plugin, matching the features available in Oracle’s MySQL Enterprise edition.

    Data at rest encryption requires that a keyring plugin. There are two options here:

    How to Enable Tablespace Encryption

    To enable encryption start your database with the --early-plugin-load option:

    either by hand:

    $ mysqld --early-plugin-load="keyring_file=keyring_file.so"

    or by modifying the configuration file:

    [mysqld]
    
    early-plugin-load=keyring_file.so

    Starting Percona Server 8.0 two types of tablespaces can be encrypted. General tablespace and system tablespace. Sys tablespace is controlled via parameter innodb_sys_tablespace_encrypt. By default, the sys tablespace is not encrypted, and if you have one already, it's not possible to convert it to encrypted state, a new instance must be created (start an instance with --bootstrap option). 

    General tablespace support encryption either of all tables in tablespace or none. It's not possible to run encryption in mixed mode. In order to create ate tablespace with encryption use ENCRYPTION='Y/N' flag. 

    Example:

    mysql> CREATE TABLESPACE severalnines ADD DATAFILE 'severalnines.ibd' ENCRYPTION='Y';

    Backing up an Encrypted Database

    When you add encrypted tablespaces it's necessary to include keyring file in the xtrabackup command. To do it you must specify the path to a keyring file as the value of the --keyring-file-data option.

    $ xtrabackup --backup --target-dir=/u01/mysql/data/backup/ --user=root --keyring-file-data=/u01/secure_location/keyring_file

    Make sure to store the keyring file in a secure location. Also, make sure to always have a backup of the file. Xtrabackup will not copy the keyring file in the backup directory. To prepare the backup, you need to make a copy of the keyring file yourself.

    Preparing the Backup

    Once we have our backup file we should prepare it for the recovery. Here you also need to specify the keyring-file-data.

    $ xtrabackup --prepare --target-dir=/u01/mysql/data/backup/ --keyring-file-data=/u01/secure_location/keyring_file

    The backup is now prepared and can be restored with the --copy-back option. In the case that the keyring has been rotated, you will need to restore the keyring (which was used to take and prepare the backup).

    In order to prepare the backup xtrabackup, we will need access to the keyring.  Xtrabackup doesn’t talk directly to the MySQL server and doesn’t read the default my.cnf configuration file during prepare, specify keyring settings via the command line:

    $ xtrabackup --prepare --target-dir=/data/backup --keyring-vault-config=/etc/vault.cnf

    The backup is now prepared and can be restored with the --copy-back option:

    $ xtrabackup --copy-back --target-dir=/u01/backup/ --datadir=/u01/mysql/data/

    Performing Incremental Backups

    The process of taking incremental backups with InnoDB tablespace encryption is similar to taking the same incremental backups with an unencrypted tablespace.

    To make an incremental backup, begin with a full backup. The xtrabackup binary writes a file called xtrabackup_checkpoints into the backup’s target directory. This file contains a line showing the to_lsn, which is the database’s LSN at the end of the backup.

    First, you need to create a full backup with the following command:

    $ xtrabackup --backup --target-dir=/data/backups/base --keyring-file-data=/var/lib/mysql-keyring/keyring

    Now that you have a full backup, you can make an incremental backup based on it. Use a command such as the following:

    $ xtrabackup --backup --target-dir=/data/backups/inc1 \
    
    --incremental-basedir=/data/backups/base \
    
    --keyring-file-data=/var/lib/mysql-keyring/keyring

    The /data/backups/inc1/ directory should now contain delta files, such as ibdata1.delta and test/table1.ibd.delta

    The meaning should be self-evident. It’s now possible to use this directory as the base for yet another incremental backup:

    $ xtrabackup --backup --target-dir=/data/backups/inc2 \
    
    --incremental-basedir=/data/backups/inc1 \
    
    --keyring-file-data=/var/lib/mysql-keyring/keyring

    Preparing Incremental Backups

    So far the process of backing up the database is similar to a regular backup, except for the flag where we specified location of keyring file. 

    Unfortunately, the --prepare step for incremental backups is not the same as for normal backups.

    In normal backups, two types of operations are performed to make the database consistent: committed transactions are replayed from the log file against the data files, and uncommitted transactions are rolled back. You must skip the rollback of uncommitted transactions when preparing a backup, because transactions that were uncommitted at the time of your backup may be in progress, and it’s likely that they will be committed in the next incremental backup. You should use the --apply-log-only option to prevent the rollback phase.

    If you do not use the --apply-log-only option to prevent the rollback phase, then your incremental backups will be useless. After transactions have been rolled back, further incremental backups cannot be applied.

    Beginning with the full backup you created, you can prepare it and then apply the incremental differences to it. Recall that you have the following backups:

    /data/backups/base
    
    /data/backups/inc1
    
    /data/backups/inc2

    To prepare the base backup, you need to run --prepare as usual, but prevent the rollback phase:

    $ xtrabackup --prepare --apply-log-only --target-dir=/data/backups/base --keyring-file-data=/var/lib/mysql-keyring/keyring

    To apply the first incremental backup to the full backup, you should use the following command:

    $ xtrabackup --prepare --apply-log-only --target-dir=/data/backups/base \
    
    --incremental-dir=/data/backups/inc1 \
    
    --keyring-file-data=/var/lib/mysql-keyring/keyring

    if the keyring has been rotated between the base and incremental backup that you’ll need to use the keyring that was in use when the first incremental backup has been taken.

    Preparing the second incremental backup is a similar process

    $ xtrabackup --prepare --target-dir=/data/backups/base \
    
    --incremental-dir=/data/backups/inc2 \
    
    --keyring-file-data=/var/lib/mysql-keyring/keyring

    Note; --apply-log-only should be used when merging all incrementals except the last one. That’s why the previous line doesn’t contain the --apply-log-only option. Even if the --apply-log-only was used on the last step, backup would still be consistent but in that case server would perform the rollback phase.
    The last step is to restore it with --copy-back option. In case the keyring has been rotated you’ll need to restore the keyring which was used to take and prepare the backup.

    While the described restore method works, it requires an access to the same keyring that the server is using. It may not be possible if the backup is prepared on a different server or at a much later time, when keys in the keyring are purged, or, in the case of a malfunction, when the keyring vault server is not available at all.

    The --transition-key=<passphrase> option should be used to make it possible for xtrabackup to process the backup without access to the keyring vault server. In this case, xtrabackup derives the AES encryption key from the specified passphrase and will use it to encrypt tablespace keys of tablespaces that are being backed up.

    Creating a Backup with a Passphrase

    The following example illustrates how the backup can be created in this case:

    $ xtrabackup --backup --user=root -p --target-dir=/data/backup \
    
    --transition-key=MySecetKey

    Restoring the Backup with a Generated Key

    When restoring a backup you will need to generate a new master key. Here is the example for keyring_file:

    $ xtrabackup --copy-back --target-dir=/data/backup --datadir=/data/mysql \
    
    --transition-key=MySecetKey --generate-new-master-key \
    
    --keyring-file-data=/var/lib/mysql-keyring/keyring

    In case of keyring_vault, it will look like this:

    $ xtrabackup --copy-back --target-dir=/data/backup --datadir=/data/mysql \
    
    --transition-key=MySecetKey --generate-new-master-key \
    
    --keyring-vault-config=/etc/vault.cnf
     

    by Bart Oles at December 03, 2019 06:37 PM

    December 02, 2019

    SeveralNines

    Clustered Database Node Failure and its Impact on High Availability

    A node crash can happen at any time, it is unavoidable in any real world situation. Back then, when giant, standalone databases roamed the data world, each fall of such a titan created ripples of issues that moved across the world. Nowadays data world has changed. Few of the titans survived, they were replaced by swarms of small, agile, clustered database instances that can adapt to the ever changing business requirements. 

    One example of such a database is Galera Cluster, which (typically) is deployed in the form of a cluster of nodes. What changes if one of the Galera nodes fail? How does this affect the availability of the cluster as a whole? In this blog post we will dig into this and explain the Galera High Availability basics.

    Galera Cluster and Database High Availability

    Galera Cluster is typically deployed in clusters of at least three nodes. This is due to the fact that Galera uses a quorum mechanism to ensure that the cluster state is clear for all of the nodes and that the automated fault handling can happen. For that three nodes are required - more than 50% of the nodes have to be alive after a node’s crash in order for cluster to be able to operate.

    Galera Cluster

    Let’s assume you have a three nodes in Galera Cluster, just as on the diagram above. If one node crashes, the situation changes into following:

    Node “3” is off but there are nodes “1” and “2”, which consist of 66% of all nodes in the cluster. This means, those two nodes can continue to operate and form a cluster. Node “3” (if it happens to be alive but it cannot connect to the other nodes in the cluster) will account for 33% of the nodes in the cluster, thus it will cease to operate.

    We hope this is now clear: three nodes are the minimum. With two nodes each would be 50% of the nodes in the cluster thus neither will have majority - such cluster does not provide HA. What if we would add one more node?

    Such setup allows also for one node to fail:

    In such case we have three (75%) nodes up-and-running, which is the majority. What would happen if two nodes fail?

    Two nodes are up, two are down. Only 50% of the nodes are available, there is no majority thus cluster has to cease its operations. The minimal cluster size to support failure of two nodes is five nodes:

    In the case as above two nodes are off, three are remaining which makes it 60% available thus the majority is reached and cluster can operate.

    To sum up, three nodes are the minimum cluster size to allow for one node to fail. Cluster should have an odd number of nodes - this is not a requirement but as we have seen, increasing cluster size from three to four did not make any difference on the high availability - still only one failure at the same time is allowed. To make the cluster more resilient and support two node failures at the same time, cluster size has to be increased from three to five. If you want to increase the cluster's ability to handle failures even further you have to add another two nodes.

    Impact of Database Node Failure on the Cluster Load

    In the previous section we have discussed the basic math of the high availability in Galera Cluster. One node can be off in a three node cluster, two off in a five node cluster. This is a basic requirement for Galera. 

    You have to also keep in mind other aspects too. We’ll take a quick look at them just now. For starters, the load on the cluster. 

    Let’s assume all nodes have been created equal. Same configuration, same hardware, they can handle the same load. Having load on one node only doesn’t make too much sense cost-wise on three node cluster (not to mention five node clusters or larger). You can safely expect that if you invest in three or five galera nodes you want to utilize all of them. This is quite easy - load balancers can distribute the load across all Galera nodes for you. You can send the writes to one node and balance reads across all nodes in the cluster. This poses additional threat you have to keep in mind. How does the load will look like if one node will be taken out of the cluster? Let’s take a look at the following case of a five node cluster.

    We have five nodes, each one is handling 50% load. This is quite ok, nodes are fairly loaded yet they still have some capacity to accommodate unexpected spikes in the workload. As we discussed, such cluster can handle up to two node failures. Ok, let’s see how this would look like:

    Two nodes are down, that’s ok. Galera can handle it. 100% of the load has to be redistributed across three remaining nodes. This makes it a total 250% of the load distributed across three nodes. As a result, each of them will be running at 83% of their capacity. This may be acceptable but 83% of the load on average means that the response time will be increased, queries will take longer and any spike in the workload most likely will cause serious issues. 

    Will our five node cluster (with 50% utilization of all nodes) really able to handle failure of two nodes? Well, not really, no. It will definitely not be as performant as the cluster before the crashes. It may survive but it’s availability may be seriously affected by temporary spikes in the workload.

    You also have to keep in mind one more thing - failed nodes will have to be rebuilt. Galera has an internal mechanism that allows it to provision nodes which join the cluster after the crash. It can either be IST, incremental state transfer, when one of the remaining nodes have required data in gcache. If not, full data transfer will have to happen - all data will be transferred from one node (donor) to the joining node. The process is called SST - state snapshot transfer. Both IST and SST requires some resources. Data has to be read from disk on the donor and then transferred over the network. IST is more light-weight, SST is much heavier as all the data has to be read from disk on the donor. No matter which method will be used, some additional CPU cycles will be burnt. Will the 17% of the free resources on the donor enough to run the data transfer? It’ll depend on the hardware. Maybe. Maybe not. What doesn’t help is that most of the proxies, by default, remove donor node from the pool of nodes to send traffic to. This makes perfect sense - node in “Donor/Desync” state may lag behind the rest of the cluster. 

    When using Galera, which is virtually a synchronous cluster, we don’t expect nodes to lag. This could be a serious issue for the application. On the other hand, in our case, removing donor from the pool of nodes to load balance the workload ensures that the cluster will be overloaded (250% of the load will be distributed across two nodes only, 125% of node’s capacity is, well, more than it can handle). This would make the cluster definitely not available at all.

    Conclusion

    As you can see, high availability in the cluster is not just a matter of quorum calculation. You have to account for other factors like workload, its change in time, handling state transfers. When in doubt, test yourself. We hope this short blog post helped you to understand that high availability is quite a tricky subject even if only discussed based on two variables - number of nodes and node’s capacity. Understanding this should help you design better and more reliable HA environments with Galera Cluster.

    by krzysztof at December 02, 2019 08:40 PM

    November 30, 2019

    Valeriy Kravchuk

    Fun with Bugs #90 - On MySQL Bug Reports I am Subscribed to, Part XXIV

    Previous post in this series was published 3 months ago and the last Bug #96340 from it is already closed as fixed in upcoming MySQL 8.0.19. I've picked up 50+ more bugs to follow since that time, so I think I should send quick status update about interesting public MySQL bug reports that are still active.

    As usual I concentrate mostly on InnoDB, replication and optimizer bugs. Here is the list, starting from the oldest:
    • Bug #96374  - "binlog rotation deadlock when innodb concurrency limit setted". This bug was reported by Jia Liu, who used gdb to show threads deadlock details. I admit that recently more bug reporters use gdb and sysbench with custom(ized) Lua scripts to prove the point, and I am happy to see this happening!
    • Bug #96378 - "Subquery with parameter is exponentially slower than hard-coded value". In my primitive test with user variables replaced by constants (on MariaDB 10.3.7) I get the same plan for the query, so I am not 100% sure that the analysis by my dear friend Sinisa Milivojevic was right and it's about optimization (and not comparing values with different collations, for example). But anyway, this problem reported by Jeff Johnson ended up as a verified feature request. Let's see what may happen to it next.
    • Bug #96379 - "First query successful, second - ERROR 1270 (HY000): Illegal mix of collations ". This really funny bug was reported by Владислав Сокол.
    • Bug #96400 - "MTS STOP SLAVE takes over a minute when master crashed during event logging". Nice bug report by Przemyslaw Malkowski from Percona, who used sysbench and dbdeployer to demonstrate the problem. Later Przemysław Skibiński (also from Percona) provided a patch to resolve the problem.
    • Bug #96412 - "Mess usages of latch meta data for InnoDB latches (mutex and rw_lock)". Fungo Wang had to make a detailed code analysis to get this bug verified. I am not sure why it ended up with severity S6 (Debug Builds) though.
    • Bug #96414 - "CREATE TABLE events in wrong order in a binary log.". This bug was reported by Iwo P. His test case to demonstarte the problem included small source code modification, but (unlike with some other bug reports) this had NOT prevented accepting it as a true, verified bug. The bug not affect MySQL 8.0.3+ thanks to WL#6049 "Meta-data locking for FOREIGN KEY tables" implemented there.
    • Bug #96472 - "Memory leak after 'innodb.alter_crash'". Yet another bug affecting only MySQL 4.7 and not MySQL 8.0. It was reported by Yura Sorokin from Percona.
    • Bug #96475 - "ALTER TABLE t IMPORT TABLESPACE blocks SELECT on I_S.tables.".  Clear and simple "How to repeat" instructions (using dbdeployer) by Jean-François Gagné. See also his related Bug #96477 - "FLUSH TABLE t FOR EXPORT or ALTER TABLE t2 IMPORT TABLESPACE broken in 8.0.17" for MySQL 8. The latter is a regression bug (without a regression tag), and I just do not get how the GA releases with such new bugs introduced may happen.
    • Bug #96504 - "Refine atomics and barriers for weak memory order platform". Detailed analysis, with links to code etc from Cai Yibo.
    • Bug #96525 - "Huge malloc when open file limit is high". Looks more like a systemd problem (in versions < 240) to me. Anyway, useful report from Andreas Hasenack.
    • Bug #96615 - "mysql server cannot handle write operations after set system time to the past". A lot of arguments were needed to get this verified, but Shangshang Yu was not going to give up. First time I see gstack used in the bug report to get a stack trace quickly. It's a part of gdb RPM on CentOS 6+. I have to try it vs gdb and pstack one day and decide what is the easiest and most efficient way to get backtraces of all threads in production...
    • Bug #96637 - "Clone fails on just upgraded server from 5.7". I had not used MySQL 8 famous clone plugin yet in practice, but I already know that it has bugs. This bug was reported by Satya Bodapati, who also suggested a patch.
    • Bug #96644 - "Set read_only on a master waiting for semi-sync ACK blocked on global read lock". Yet another problem (documented limitation) report from Przemyslaw Malkowski. Not sure why it was not verified on MySQL 8.0. Without a workaround to set master to read only it is unsafe to use long rpl_semi_sync_master_timeout values, as we may end up with that long downtime.
    • Bug #96677 - ""SELECT ... INTO var_name FOR UPDATE" not working in MySQL 8". This regression bug was reported by Vinodh Krish. Some analysis and patch were later suggested by Zsolt Parragi.
    • Bug #96690 - "sql_require_primary_key should not apply to temporary tables". This bug was also reported by Przemyslaw Malkowski from Percona. It ended up as a verified feature request, but not everyone in community is happy with this. Let me quote:
      "[30 Aug 8:08] Jean-François Gagné
      Could we know what was the original severity of this bug as reported by Przemyslaw ? This is now hidden as it has been reclassified as S4 (Feature Request).

      From my point of view, this is actually a bug, not a feature request and it should be classified as S2. A perfectly working application would break for no reason when a temporary table does not have a Primary Key, so this is actually a big hurdle for using sql_require_primary_key, hence serious bug in the implementation of this otherwise very nice and useful feature.
      "
    That's all about bugs I've subscribed to in summer.
    Winter is coming, so why not to remember nice warm sunny days and interesting MySQL bugs reported back then.
    To summarize:
    1. We still see some strange "games" played during bugs processing and trend to decrease severity of reports. I think this is a waste of time for both Oracle engineers and community bug reporters.
    2. I am still not sure if Oracle's QA does not use ASan or just ignore problems reported for MTR test cases. Anyway, Percona engineers do this for them, and report related bugs :)
    3. dbdeployer and sysbench are really popular among MySQL bug reporters recently!
    4. Importing of InnoDB tablespaces is broken in MySQL 8.0.17+ at least.
    5. There are many interesting MySQL bugs reported during last 3 months, so I epxect more posts in this series soon.

    by Valerii Kravchuk (noreply@blogger.com) at November 30, 2019 04:53 PM

    Oli Sennhauser

    Migration from MySQL 5.7 to MariaDB 10.4

    Up to version 5.5 MariaDB and MySQL can be considered as "the same" databases. The official wording at those times was "drop-in-replacement". But now we are a few years later and times and features changed. Also the official wording has slightly changed to just "compatible".
    FromDual recommends that you consider MariaDB 10.3 and MySQL 8.0 as completely different database products (with some common roots) nowadays. Thus you should work and act accordingly.

    Because more and more FromDual customers consider a migration from MySQL to MariaDB we were testing some migration paths to find the pitfalls. One upgrade of some test schemas led to the following warnings:

    # mysql_upgrade --user=root
    MariaDB upgrade detected
    Phase 1/7: Checking and upgrading mysql database
    Processing databases
    mysql
    mysql.columns_priv                                 OK
    ...
    mysql.user                                         OK
    Phase 2/7: Installing used storage engines
    Checking for tables with unknown storage engine
    Phase 3/7: Fixing views from mysql
    sys.host_summary
    Error    : Table 'performance_schema.memory_summary_by_host_by_event_name' doesn't exist
    status   : Operation failed
    sys.host_summary_by_file_io
    Error    : Column count of mysql.proc is wrong. Expected 21, found 20. Created with MariaDB 50723, now running 100407. Please use mysql_upgrade to fix this error
    error    : Corrupt
    ...
    sys.x$host_summary
    Error    : Table 'performance_schema.memory_summary_by_host_by_event_name' doesn't exist
    Error    : View 'sys.x$host_summary' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them
    error    : Corrupt
    ...
    sys.x$waits_global_by_latency                      OK
    Phase 4/7: Running 'mysql_fix_privilege_tables'
    Phase 5/7: Fixing table and database names
    Phase 6/7: Checking and upgrading tables
    Processing databases
    staging
    staging.sales                                      OK
    staging.sugarcrm_contact_export
    Warning  : Row size too large (> 8126). Changing some columns to TEXT or BLOB or using ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED may help. In current row format, BLOB prefix of 768 bytes is stored inline.
    status   : OK
    Phase 7/7: Running 'FLUSH PRIVILEGES'
    OK
    

    If you run the mysql_upgrade utility a 2nd time all issues are gone...

    # mysql_upgrade --user=root --force
    

    Some hints for upgrading

    • Make a backup first before you start!
    • Dropping MySQL sys Schema before the upgrade and installing MariaDB sys Schema again afterwards reduces noise a bit and lets you having a working sys Schema again.
      The MariaDB sys Schema you can find at GitHub: FromDual / mariadb-sys .
    • It makes sense to read this document before you begin with the upgrade: MariaDB versus MySQL: Compatibility.

    Literature


    by Shinguz at November 30, 2019 01:17 PM

    November 29, 2019

    Federico Razzoli

    The dangers of replication filters in MySQL

    MySQL supports replication filters and binlog filters. These features are powerful, but dangerous. Here you'll find out the risks, and how to mitigate them.

    by Federico Razzoli at November 29, 2019 11:43 AM

    SeveralNines

    Automating MongoDB with SaltStack

    Database deployment for a multiple number of servers becomes more complex and time consuming with time when adding new resources or making changes. In addition, there is a likelihood of human errors that may lead to catastrophic outcomes whenever the system is configured manually.  

    A database deployment automation tool will enable us to deploy a database across multiple servers ranging from development to production environments. The results from an automated deployment are reliable, more efficient and predictable besides providing the current state information of your nodes which can be further used to plan for resources you will need to add to your servers. With a well-managed deployment, the productivity of both development and operational teams improves thereby enabling the business to develop faster, accomplish more and due to easy frequent deployment, the overall software setup will be ultimately better and function reliably for end-users. 

    MongoDB can be deployed manually but the task becomes more and more cumbersome when you have to configure a cluster of many members being hosted on different servers. We therefore need to resolve to use an automotive tool that can save us the stress. Some of the available tools that can be used include Puppet, Chef, Ansible, and SaltStack.

    The main benefits of deploying your MongoDB with any of these tools are:

    1. Time saving. Imagine having 50 nodes for your database and you need to update MongoDB version for each. This will take you ages going through the process. However, with an automatic tool, you will just need to write some instructions and issues a command to do the rest of the update for you. Developers will then have time to work on new features rather than fixing manual deployments.
    2. Reduced errors hence customer satisfaction. Making new updates may introduce errors to a database system especially if the configuration has to be done manually. With a tool like SaltStack, removing manual steps reduces human error and frequent updates with new features will address customer needs hence keeping the organization competitive.
    3. Lower configuration cost. With a deployment tool, anyone can deploy even yourself since the process itself will be much easier. This will eliminate the need for experts to do the work and reduced errors

    What is SaltStack

    SaltStack is an open-source remote execution tool and a configuration management system developed in Python. 

    The remote execution features are used to run commands on various machines in parallel with a flexible targeting system. If for example you have 3 server machines and you would like to install MongoDB for each, you can run the installation commands on these machines simultaneously from a master node. 

    In terms of configuration management, a client-server interface is established to ease and securely transform the infrastructure components into the desired state.

    SaltStack Architecture

    The basic setup model for SaltStack is Client-Server where the server can be referred to as the master and the Clients as slaves. The master issues command or rather instructions as the controlling system that need to be executed by the clients/minions which are the controlled systems.

    SaltSack Components

    The following are what SaltStack is made of

    1. Master: Responsible for issuing instructions to the slaves and change them to the desired state after execution.
    2. Minion: It is the controlled system which needs to be transformed into some desired state.
    3. Salt Grains:  this is static data or metadata regarding the minion and it constitutes information like the model, serial number, memory capacity, and the Operating System. They are collected when the minion first connects to the server. They can be used for targeting a certain group of minions in relation to some aspect. For example, you can run a command stating, install MongoDB for all machines with a Windows operating system. 
    4. Execution Modules/instructions: These are Ad hoc commands issued to one or more target minions and are executed from the command line.
    5. Pillars: are user defined variables distributed among the minions. They are used for: minion configuration, highly sensitive data, arbitrary data, and variables. Not all minions are accessible to all pillars, one can restrict which pillars are for a certain group of minions.
    6. State files. This is the core of Salt state System (SLS) and it represents the state in which the system should be in. It is an equivalent to a playbook in case of Ansible considering that they are also in YAML format i.e
    #/srv/salt/mongodbInstall.sls (file root)
    
    install_mongodb: (task id)
    
    pkg.installed: (state declaration)
    
    -name:mongodb  (name of package to install)
    1. Top file: Used to map a group of machines and define which state files should be applied . i.e.

    #/srv/salt/top.sls
    
      base:
    
       ‘minion1’:
    
         -mongodb
    1. Salt Proxy:  This is a feature that enables controlling devices that cannot run a standard salt-minion. They include network gears with an API running on a proprietary OS, devices with CPU and memory limitations or ones that cannot run minions due to security reasons. A Junos proxy has to be used for discovery, control, remote execution and state management of these devices.

    SaltStack Installation

    We can use the pip command to install SaltStack as 

    $ pip install salt

    To confirm the installation, run the command $ salt --version and you should get something like salt 2019.2.2 (Fluorine)

    Before connecting to the master the minion will require a minimum configuration of the master ip address and minion id which will be used by the master for its reference. These configurations can be done in the files /etc/salt/minion.

    We can then run the master in various modes that is daemon or in debug mode. For the daemon case you will have $salt-master -d and for debug mode,  $salt-master -l debug. You will need to accept the minion’s key before starting it by running $ salt-key -a nameOfMinion. To list the available keys, run $ salt-key -l

    In the case of the minion, we can start it with $salt-minion -l debug.

    For example, if we want to create a file in all the minions from the master, we can run the command 

    $ salt ‘’*” file.touch ‘/tmp/salt_files/sample.text

    All nodes will have a new sample.text file in the salt_files folder. The * option is used to refer to all minions. To specify for example all minions with id name having the string minion, we will use a regex expression as below 

    $ salt “minion*” file.touch ‘/tmp/salt_files/sample.text

    To see the metadata collected for a given minion, run:

    $salt ‘minion1’ grains.items.

    Setting up MongoDB with SaltStack

    We can create a database called myAppdata with the setDatabase.sls with the contents below 

    classes:
    
    - service.mongodb.server.cluster
    
    parameters:
    
       _param:
    
         mongodb_server_replica_set: myAppdata
    
         mongodb_myAppdata_password: myAppdataPasword
    
         mongodb_admin_password: cloudlab
    
         mongodb_shared_key: xxx
    
       mongodb:
    
         server:
    
           database:
    
             myAppdata:
    
               enabled: true
    
               password: ${_param:mongodb_myAppdata_password}
    
               users:
    
               -  name: myAppdata
    
                  password: ${_param:mongodb_myAppdata_password}

    Starting a Single MongoDB Server 

    mongodb:
    
      server:
    
        enabled: true
    
        bind:
    
          address: 0.0.0.0
    
          port: 27017
    
        admin:
    
          username: admin
    
          password: myAppdataPasword
    
        database:
    
          myAppdata:
    
            enabled: true
    
            encoding: 'utf8'
    
            users:
    
            - name: 'username'
    
              password: 'password'

    Setting up a MongoDB Cluster with SaltStack

    mongodb:
    
      server:
    
        enabled: true
    
        logging:
    
          verbose: false
    
          logLevel: 1
    
          oplogLevel: 0
    
        admin:
    
          user: admin
    
          password: myAppdataPasword
    
        master: mongo01
    
        members:
    
          - host: 192.168.100.11
    
            priority: 2
    
          - host: 192.168.101.12
    
          - host: 192.168.48.13
    
        replica_set: default
    
        shared_key: myAppdataPasword

    Conclusion

    Like ClusterControl, SaltStack is an automation tool that can be used to ease deployment and operations tasks. With an automation tool, there are reduced errors, reduced time of configuration, and more reliable results.

    by Onyancha Brian Henry at November 29, 2019 10:45 AM

    November 28, 2019

    SeveralNines

    How ClusterControl Performs Automatic Database Recovery and Failover

    ClusterControl is programmed with a number of recovery algorithms to automatically respond to different types of common failures affecting your database systems. It understands different types of database topologies and database-related process management to help you determine the best way to recover the cluster. In a way, ClusterControl improves your database availability.

    Some topology managers only cover cluster recovery like MHA, Orchestrator and mysqlfailover but you have to handle the node recovery by yourself. ClusterControl supports recovery at both cluster and node level.

    Configuration Options

    There are two recovery components supported by ClusterControl, namely:

    • Cluster - Attempt to recover a cluster to an operational state
    • Node - Attempt to recover a node to an operational state

    These two components are the most important things in order to make sure the service availability is as high as possible. If you already have a topology manager on top of ClusterControl, you can disable automatic recovery feature and let other topology manager handle it for you. You have all the possibilities with ClusterControl. 

    The automatic recovery feature can be enabled and disabled with a simple toggle ON/OFF, and it works for cluster or node recovery. The green icons mean enabled and red icons means disabled. The following screenshot shows where you can find it in the database cluster list:

    There are 3 ClusterControl parameters that can be used to control the recovery behaviour. All parameters are default to true (set with boolean integer 0 or 1):

    • enable_autorecovery - Enable cluster and node recovery. This parameter is the superset of enable_cluster_recovery and enable_node_recovery. If it's set to 0, the subset parameters will be turned off.
    • enable_cluster_recovery - ClusterControl will perform cluster recovery if enabled.
    • enable_node_recovery - ClusterControl will perform node recovery if enabled.

    Cluster recovery covers recovery attempt to bring up entire cluster topology. For example, a master-slave replication must have at least one master alive at any given time, regardless of the number of available slave(s). ClusterControl attempts to correct the topology at least once for replication clusters, but infinitely for multi-master replication like NDB Cluster and Galera Cluster.

    Node recovery covers node recovery issue like if a node was being stopped without ClusterControl knowledge, e.g, via system stop command from SSH console or being killed by OOM process.

    Node Recovery

    ClusterControl is able to recover a database node in case of intermittent failure by monitoring the process and connectivity to the database nodes. For the process, it works similarly to systemd, where it will make sure the MySQL service is started and running unless if you intentionally stopped it via ClusterControl UI.

    If the node comes back online, ClusterControl will establish a connection back to the database node and will perform the necessary actions. The following is what ClusterControl would do to recover a node:

    • It will wait for systemd/chkconfig/init to start up the monitored services/processes for 30 seconds
    • If the monitored services/processes are still down, ClusterControl will try to start the database service automatically.
    • If ClusterControl is unable to recover the monitored services/processes, an alarm will be raised.

    Note that if a database shutdown is initiated by user, ClusterControl will not attempt to recover the particular node. It expects the user to start it back via ClusterControl UI by going to Node -> Node Actions -> Start Node or use the OS command explicitly.

    The recovery includes all database-related services like ProxySQL, HAProxy, MaxScale, Keepalived, Prometheus exporters and garbd. Special attention to Prometheus exporters where ClusterControl uses a program called "daemon" to daemonize the exporter process. ClusterControl will try to connect to exporter's listening port for health check and verification. Thus, it's recommended to open the exporter ports from ClusterControl and Prometheus server to make sure no false alarm during recovery.

    Cluster Recovery

    ClusterControl understands the database topology and follows best practices in performing the recovery. For a database cluster that comes with built-in fault tolerance like Galera Cluster, NDB Cluster and MongoDB Replicaset, the failover process will be performed automatically by the database server via quorum calculation, heartbeat and role switching (if any). ClusterControl monitors the process and make necessary adjustments to the visualization like reflecting the changes under Topology view and adjusting the monitoring and management component for the new role e.g, new primary node in a replica set.

    For database technologies that do not have built-in fault tolerance with automatic recovery like MySQL/MariaDB Replication and PostgreSQL/TimescaleDB Streaming Replication, ClusterControl will perform the recovery procedures by following the best-practices provided by the database vendor. If the recovery fails, user intervention is required, and of course you will get an alarm notification regarding this.

    In a mixed/hybrid topology, for example an asynchronous slave which is attached to a Galera Cluster or NDB Cluster, the node will be recovered by ClusterControl if cluster recovery is enabled.

    Cluster recovery does not apply to standalone MySQL server. However, it's recommended to turn on both node and cluster recoveries for this cluster type in the ClusterControl UI.

    MySQL/MariaDB Replication

    ClusterControl supports recovery of the following MySQL/MariaDB replication setup:

    • Master-slave with MySQL GTID
    • Master-slave with MariaDB GTID
    • Master-slave with without GTID (both MySQL and MariaDB)
    • Master-master with MySQL GTID
    • Master-master with MariaDB GTID
    • Asynchronous slave attached to a Galera Cluster

    ClusterControl will respect the following parameters when performing cluster recovery:

    • enable_cluster_autorecovery
    • auto_manage_readonly
    • repl_password
    • repl_user
    • replication_auto_rebuild_slave
    • replication_check_binlog_filtration_bf_failover
    • replication_check_external_bf_failover
    • replication_failed_reslave_failover_script
    • replication_failover_blacklist
    • replication_failover_events
    • replication_failover_wait_to_apply_timeout
    • replication_failover_whitelist
    • replication_onfail_failover_script
    • replication_post_failover_script
    • replication_post_switchover_script
    • replication_post_unsuccessful_failover_script
    • replication_pre_failover_script
    • replication_pre_switchover_script
    • replication_skip_apply_missing_txs
    • replication_stop_on_error

    For more details on each of the parameter, refer to the documentation page.

    ClusterControl will obey the following rules when monitoring and managing a master-slave replication:

    • All nodes will be started with read_only=ON and super_read_only=ON (regardless of its role).
    • Only one master (read_only=OFF) is allowed to operate at any given time.
    • Rely on MySQL variable report_host to map the topology.
    • If there are two or more nodes that have read_only=OFF at a time, ClusterControl will automatically set read_only=ON on both masters, to protect them against accidental writes. User intervention is required to pick the actual master by disabling the read-only. Go to Nodes -> Node Actions -> Disable Readonly.

    In case the active master goes down, ClusterControl will attempt to perform the master failover in the following order:

    1. After 3 seconds of master unreachability, ClusterControl will raise an alarm.
    2. Check the slave availability, at least one of the slaves must be reachable by ClusterControl.
    3. Pick the slave as a candidate to be a master.
    4. ClusterControl will calculate the probability of errant transactions if GTID is enabled. 
    5. If no errant transaction is detected, the chosen will be promoted as the new master.
    6. Create and grant replication user to be used by slaves.
    7. Change master for all slaves that were pointing to the old master to the newly promoted master.
    8. Start slave and enable read only.
    9. Flush logs on all nodes.
    10. If the slave promotion fails, ClusterControl will abort the recovery job. User intervention or a cmon service restart is required to trigger the recovery job again.
    11. When old master is available again, it will be started as read-only and will not be part of the replication. User intervention is required.

    At the same time, the following alarms will be raised:

    Check out Introduction to Failover for MySQL Replication - the 101 Blog and Automatic Failover of MySQL Replication - New in ClusterControl 1.4 to get further information on how to configure and manage MySQL replication failover with ClusterControl.

    PostgreSQL/TimescaleDB Streaming Replication

    ClusterControl supports recovery of the following PostgreSQL replication setup:

    ClusterControl will respect the following parameters when performing cluster recovery:

    • enable_cluster_autorecovery
    • repl_password
    • repl_user
    • replication_auto_rebuild_slave
    • replication_failover_whitelist
    • replication_failover_blacklist

    For more details on each of the parameter, refer to the documentation page.

    ClusterControl will obey the following rules for managing and monitoring a PostgreSQL streaming replication setup:

    • wal_level is set to "replica" (or "hot_standby" depending on the PostgreSQL version).
    • Variable archive_mode is set to ON on the master.
    • Set recovery.conf file on the slave nodes, which turns the node into a hot standby with read-only enabled.

    In case if the active master goes down, ClusterControl will attempt to perform the cluster recovery in the following order:

    1. After 10 seconds of master unreachability, ClusterControl will raise an alarm.
    2. After 10 seconds of graceful waiting timeout, ClusterControl will initiate the master failover job.
    3. Sample the replayLocation and receiveLocation on all available nodes to determine the most advanced node.
    4. Promote the most advanced node as the new master.
    5. Stop slaves.
    6. Verify the synchronization state with pg_rewind.
    7. Restarting slaves with the new master.
    8. If the slave promotion fails, ClusterControl will abort the recovery job. User intervention or a cmon service restart is required to trigger the recovery job again.
    9. When old master is available again, it will be forced to shut down and will not be part of the replication. User intervention is required. See further down.

    When the old master comes back online, if PostgreSQL service is running, ClusterControl will force shutdown of the PostgreSQL service. This is to protect the server from accidental writes, since it would be started without a recovery file (recovery.conf), which means it would be writable. You should expect the following lines will appear in postgresql-{day}.log:

    2019-11-27 05:06:10.091 UTC [2392] LOG:  database system is ready to accept connections
    
    2019-11-27 05:06:27.696 UTC [2392] LOG:  received fast shutdown request
    
    2019-11-27 05:06:27.700 UTC [2392] LOG:  aborting any active transactions
    
    2019-11-27 05:06:27.703 UTC [2766] FATAL:  terminating connection due to administrator command
    
    2019-11-27 05:06:27.704 UTC [2758] FATAL:  terminating connection due to administrator command
    
    2019-11-27 05:06:27.709 UTC [2392] LOG:  background worker "logical replication launcher" (PID 2419) exited with exit code 1
    
    2019-11-27 05:06:27.709 UTC [2414] LOG:  shutting down
    
    2019-11-27 05:06:27.735 UTC [2392] LOG:  database system is shut down

    The PostgreSQL was started after the server was back online around 05:06:10 but ClusterControl performs a fast shutdown 17 seconds after that around 05:06:27. If this is something that you would not want it to be, you can disable node recovery for this cluster momentarily.

    Check out Automatic Failover of Postgres Replication and Failover for PostgreSQL Replication 101 to get further information on how to configure and manage PostgreSQL replication failover with ClusterControl.

    Conclusion

    ClusterControl automatic recovery understands database cluster topology and is able to recover a down or degraded cluster to a fully operational cluster which will improve the database service uptime tremendously. Try ClusterControl now and achieve your nines in SLA and database availability. Don't know your nines? Check out this cool nines calculator.

    by ashraf at November 28, 2019 10:45 AM

    November 27, 2019

    SeveralNines

    How to Avoid PostgreSQL Cloud Vendor Lock-in

    Vendor lock-in is a well-known concept for database technologies. With cloud usage increasing, this lock-in has also expanded to include cloud providers. We can define vendor lock-in as a proprietary lock-in that makes a customer dependent on a vendor for their products or services. Sometimes this lock-in doesn’t mean that you can’t change the vendor/provider, but it could be an expensive or time-consuming task.

    PostgreSQL, an open source database technology, doesn’t have the vendor lock-in problem in itself, but if you’re running your systems in the cloud, it’s likely you’ll need to cope with that issue at some time.

    In this blog, we’ll share some tips about how to avoid PostgreSQL cloud lock-in and also look at how ClusterControl can help in avoiding it.

    Tip #1: Check for Cloud Provider Limitations or Restrictions

    Cloud providers generally offer a simple and friendly way (or even a tool) to migrate your data to the cloud. The problem is when you want to leave them it can be hard to find an easy way to migrate the data to another provider or to an on-prem setup. This task usually has a high cost (often based on the amount of traffic).

    To avoid this issue, you must always first check the cloud provider documentation and limitations to know the restrictions that may be inevitable when leaving.

    Tip #2: Pre-Plan for a Cloud Provider Exit

    The best recommendation that we can give you is don’t wait until the last minute to know how to leave your cloud provider. You should plan it long in advance so you can know the best, fastest, and least expensive way to make your exit., 

    Because this plan most-likely depends on your specific business requirements the plan will be different depending on whether you can schedule maintenance windows and if the company will accept any downtime periods. Planning it beforehand, you will definitely avoid a headache at the end of the day.

    Tip #3: Avoid Using Any Exclusive Cloud Provider Products

    A cloud provider’s product will almost always run better than an open source product. This is due to the fact that it was designed and tested to run on the cloud provider’s infrastructure. The performance will often be considerably better than the second one.

    If you need to migrate your databases to another provider, you’ll have the technology lock-in problem as the cloud provider product is only available in the current cloud provider environment. This means you won’t be able to migrate easily. You can probably find a way to do it by generating a dump file (or another backup method), but you'll probably have a long downtime period (depending on the amount of data and technologies that you want to use).

    If you are using Amazon RDS or Aurora, Azure SQL Database, or Google Cloud SQL, (to focus on the most currently used cloud providers) you should consider checking the alternatives to migrate it to an open source database. With this, we’re not saying that you should migrate it, but you should definitely have an option to do it if needed.

    Tip #4: Store You Backups to Another Cloud Provider

    A good practice to decrease downtime, whether in the case of migration or for disaster recovery, is not only to store backups in the same place (for a faster recovery reasons), but also to store backups in a different cloud provider or even on-prem. 

    By following this practice when you need to restore or migrate your data, you just need to copy the latest data after the backup was taken back. The amount of traffic and time will be considerably less than copying all data without compression during the migration or failure event.

    Tip #5: Use a Multi-Cloud or Hybrid Model

    This is probably the best option if you want to avoid cloud lock-in. Storing the data in two or more places in real-time (or as close to real-time as you can get) allows you to migrate in a fast way and you can do it with the least downtime possible. If you have a PostgreSQL cluster in one cloud provider and you have a PostgreSQL standby node in another one, in case that you need to change your provider, you can just promote the standby node and send the traffic to this new primary PostgreSQL node. 

    A similar concept is applied to the hybrid model. You can keep your production cluster in the cloud, and then you can create a standby cluster or database node on-prem, which generates a hybrid (cloud/on-prem) topology, and in case of failure or migration necessities, you can promote the standby node without any cloud lock-in as you’re using your own environment.

    In this case, keep in mind that probably the cloud provider will charge you for the outbound traffic, so under heavy traffic, keep this method working could generate an excessive cost for the company.

    How ClusterControl Can Help Avoid PostgreSQL Lock-in

    In order to avoid PostgreSQL lock-in, you can also use ClusterControl to deploy (or import), manage, and monitor your database clusters. This way you won’t depend on a specific technology or provider to keep your systems up and running.

    ClusterControl has a friendly and easy-to-use UI, so you don’t need to use a cloud provider management console to manage your databases, you just need to login in and you’ll have an overview of all your database clusters in the same system.

    It has three different versions (including a community free version). You can still use ClusterControl (without some paid features) even if your license is expired and it won’t affect your database performance.

    You can deploy different open source database engines from the same system, and only SSH access and a privileged user is required to use it.

    ClusterControl can also help in managing your backup system. From here, you can schedule a new backup using different backup methods (depending on the database engine), compress, encrypt, verify your backups by restoring it in a different node. You can also store it in multiple different locations at the same time (including the cloud).

    The multi-cloud or hybrid implementation is easily doable with ClusterControl by using the Cluster-to-Cluster Replication or the Add Replication Slave feature. You only need to follow a simple wizard to deploy a new database node or cluster in a different place. 

    Conclusion

    As data is probably the most important asset to the company, most probably you’ll want to keep data as controlled as possible. Having a cloud lock-in doesn’t help on this. If you’re in a cloud lock-in scenario, it means that you can’t manage your data as you wish, and that could be a problem.

    However, cloud lock-in is not always a problem. It could be possible that you’re running all your system (databases, applications, etc) in the same cloud provider using the provider products (Amazon RDS or Aurora, Azure SQL Database, or Google Cloud SQL) and you’re not looking for migrating anything, instead of that, it's possible that you’re taking advantage of all the benefits of the cloud provider. Avoiding cloud lock-in is not always a must as it depends on each case.

    We hope you enjoyed our blog sharing the most common ways to avoid a PostgreSQL cloud lock-in and how ClusterControl can help.

    by Sebastian Insausti at November 27, 2019 08:21 PM

    MariaDB Foundation

    MariaDB Server’s continuous integration & testing available to community

    How MariaDB Server is tested
    MariaDB Foundation is commited to ensuring MariaDB Server has a thriving community of developers and contributors. A software project cannot be maintained without proper tests. […]

    The post MariaDB Server’s continuous integration & testing available to community appeared first on MariaDB.org.

    by Vicențiu Ciorbaru at November 27, 2019 06:13 AM

    November 26, 2019

    SeveralNines

    Comparing Percona XtraBackup to MySQL Enterprise Backup: Part One

    When it comes to backups and data archiving, IT departments are often under stress to meet stringent service level agreements as well as deliver more robust backup procedures that would minimize the downtime, speed up the backup process, cost less, and meet tight security requirements.

    There are multiple ways to take a backup of a MySQL database, but we can divide these methods into two groups - logical and physical.

    Logical Backups contain data that is exported using SQL commands and stored in a file. It can be, e.g., a set of SQL commands, that, when executed, will result in restoring the content of the database. With some modifications to the output file's syntax, you can store your backup in CSV files.

    Logical backups are easy to perform, solely with a one-liner, you can take a backup of all of your table, database, or all mysql databases in the instance. 

    Unfortunately, logical backups have many limitations.  They are usually slower than a physical one. This is due to the overhead needed to execute SQL commands to get the data out and then to execute another set of SQL commands to get the data back into the database.  They are less flexible, unless you write complex backup workloads that would include multiple steps. It doesn't work well in a parallel environment, provides less security, and so on and so one.

    Physical Backups in MySQL World

    MySQL doesn't come with online physical backup for community edition. You can either pay for an Enterprise version or use a third-party tool. The most popular third-party tool on the market is XtraBackup. Those we are going to compare in this blog article.

    Percona XtraBackup is the very popular, open-source, MySQL/MariaDB hot backup software that performs non-blocking backups for InnoDB and XtraDB databases. It falls into the physical backup category, which consists of exact copies of the MySQL data directory and files underneath it.

    One of the biggest advantages of XtraBackup is that it does not lock your database during the backup process. For large databases (100+ GB), it provides much better restoration time as compared to mysqldump. The restoration process involves preparing MySQL data from the backup files, before replacing or switching it with the current data directory on the target node.

    Percona XtraBackup works by remembering the log sequence number (LSN) when it starts and then copies away the data files to another location. Copying data takes time, and if the files are changing, they reflect the state of the database at different points in time. At the same time, XtraBackup runs a background process that keeps an eye on the transaction log (aka redo log) files, and copies changes from it. This has to be done continually because the transaction logs are written in a round-robin fashion, and can be reused after a while. XtraBackup needs the transaction log records for every change to the data files since it began execution.

    By using this tool you can:

    • Create hot InnoDB backups, that complete quickly and reliably, without pausing your database or adding load to the server
    • Make incremental backups
    • Move tables between MySQL servers on-line
    • Create new MySQL replication slaves easily
    • Stream compressed MySQL backups to another server
    • Save on disk space and network bandwidth

    MySQL Enterprise Backup delivers hot, online, non-blocking backups on multiple platforms. It's not a free backup tool, but it offers a lot of features. The standard license cost is $5000 (but may vary on your agreement with Oracle.) 

    Backup Process Supported Platforms

    MySQL Enterprise

    It may run on Linux, Windows, Mac & Solaris. What is essential it may also store backup to tape, which is usually a cheaper solution than writes to disks. The direct tape writes supports integration with Veritas Netbackup, Tivoli Storage Manager, and EMC NetWorker. 

    XtraBackup

    XtraBackup may run only on the Linux platform, which may be undoubtedly a show stopper for those running on windows. A solution here maybe replication to the slave running on Linux and running backup from there. 

    Backup Process Main Differences

    MySQL Enterprise Backup provides a rich set of back and recovery features and functionality including significant performance improvements over existing MySQL backup methods. 

    Oracle shows Enterprise backup to be even 49x faster than mysqldump. That, of course, may vary depending on you data however there are many features to improve the backup process. A parallel backup is definitely one of the biggest differences between mysqldump and Enterprise backup. It increases performance by multi-threaded processing. The most interesting feature, however, is compression.

    --compress

    Creates a backup in compressed format. For a regular backup, among all the storage engines supported by MySQL, only data files of the InnoDB format are compressed, and they bear the .ibz extension after the compression. Similarly, for a single-image backup, only data files of the InnoDB format inside the backup image are compressed. The binary log and relay log files are compressed and saved with the .bz extension when being included in a compressed backup.

    -compress-method=zlib,lz4(default), lzma, punch-hole
    
    --compress-level=LEVEL(0-9)
    
    --include-tables=REGEXP

    MySQL Backups with ClusterControl

    ClusterControl allows you to schedule backups using XtraBackup and mysqldump. It can store the backup files locally on the node where the backup is taken, or the backup files can also be streamed to the controller node and compressed on-the-fly. It does not support MySQL Enterprise backup however with the extended features of mysqldump and XtraBackup it may be a good option. 

    ClusterControl is the all-inclusive open source database management system for users with mixed environments. It provides advanced backup management functionality for MySQL or MariaDB.

    ClusterControl Backup Repository

    With ClusterControl you can:

    • Create backup policies
    • Monitor backup status, executions, and servers without backups
    • Execute backups and restores (including a point in time recovery)
    • Control backup retention
    • Save backups in cloud storage
    • Validate backups (full test with the restore on the standalone server)
    • Encrypt backups
    • Compress backups
    • And many others
    ClusterControl Backup Recovery

    Conclusion

    As a DBA, you need to make sure that the databases are backed up regularly, and appropriate recovery procedures are in place and tested. Both Percona XtraBackup and MySQL Enterprise Backup provides DBAs with a high-performance, online backup solution with data compression and encryption technology to warrant your data is protected in the event of downtime or an outage

    Backups should be planned according to the restoration requirement. Data loss can be full or partial. For instance, you do not always need to recover the whole data. In some cases, you might just want to do a partial recovery by restoring missing tables or rows. With the reach feature set, both solutions would be a great replacement of mysqldump, which is still a very popular method to do the backup. Having mysqldump is also important for partial recovery, where corrupted databases can be corrected by analyzing at the contents of the dump. Binary logs allow us to achieve point-in-time recovery, e.g., up to right before the MySQL server went down. 

    This is all for part one, in the next part we are going to test the performance of both solutions and run some real case backup and recovery scenarios. 

     

    by Bart Oles at November 26, 2019 07:30 PM

    MariaDB Foundation

    MariaDB Foundation Endorses the SaveDotOrg Campaign to Protect the .org Domain

    The MariaDB Foundation is proud to put its name behind the SaveDotOrg campaign. We urge the Internet Society (ISOC) to cancel the sale of the Public Interest Registry (PIR) to Ethos Capital. […]

    The post MariaDB Foundation Endorses the SaveDotOrg Campaign to Protect the .org Domain appeared first on MariaDB.org.

    by Ian Gilfillan at November 26, 2019 12:19 PM

    November 25, 2019

    SeveralNines

    Top Ten Reasons to Migrate from Oracle to PostgreSQL

    Oracle Relational Database Management System (RDBMS) has been widely used by large organizations and is considered by far to be the most advanced database technology available in the market. It’s typically the most often compared RDBMS with other database products serving as the standard “de-facto” of what a product should offer. It is ranked by db-engines.com as the #1 RDBMS available in the market today.

    PostgreSQL is ranked as the #4 RDBMS, but that doesn’t mean there aren't any advantages to migrating to  PostgreSQL. PostgreSQL has been around since 1989 it open-sourced in 1996. PostgreSQL won DBMS of the year on two consecutive years from 2017 and 2018. That just indicates there's no signs of stopping from attracting large number of users and big organizations. 

    One of the reasons why PostgreSQL has attracted a lot of attention is because people are looking for an alternative to Oracle so they can cut off the organizations high costs and escape vendor lock-in. 

    Moving from a working and productive Oracle Database can be a daunting task. Concerns such as the company's TCO (Total Cost of Ownership) is one of the reasons why companies drag their decision whether or not to ditch Oracle. 

    In this blog we will take a look at some of the main reasons why companies are choosing to leave Oracle and migrate to PostgreSQL.

    Reason One: It’s a True Open Source Project

    PostgreSQL is open-source and is released under the PostgreSQL License, a liberal Open Source license, similar to the BSD or MIT licenses. Acquiring the product and support requires no fee. 

    If you want to leverage the database software, it means that you can get all the available features of PostgreSQL database for free. PostgreSQL has been more than 30 years old of maturity in the database world and has been touch based as open-source since 1996. It has enjoyed decades developers working to create extensions. That, in itself, makes developers, institutions, and organizations choose PostgreSQL for enterprise applications; powering leading business and mobile applications.

    Once again, organizations are waking up to the realization that open source database solutions like Postgres offer greater capacity, flexibility, and support that isn’t entirely dependent on any one company or developer. Postgres, like Linux before it, has been (and continues to be) engineered by dedicated users solving day-to-day business problems who choose to return their solutions to the community. Unlike a large developer like Oracle, which may have different motives of developing products that are profitable or support a narrow but lucrative market, the Postgres community is committed to developing the best possible tools for everyday relational database users.

    PostgreSQL often carries out those tasks without adding too much complexity. Its design is focused strictly on handling the database without having to waste resources like managing additional IT environments through added features. It's one of the things that consumers of this open-source software like when migrating from Oracle to PostgreSQL. Spending hours to study complex technology about how an Oracle database functions, or how to optimize and tune up might  end up with its expensive support. This lures institutions or organizations to find an alternative that can be less headache on the cost and brings profit and productivity. Please check out our previous blog about how capable does PostgreSQL to match SQL syntax presence with Oracle's syntax.

    Reason Two: No License and a Large Community

    For users of the Oracle RDBMS platform, it's difficult to find any type of community support that is free or without a hefty fee. Institutions, organizations, and developers often end up finding an alternative information online that can offer answers or solutions to their problems for free. 

    When using Oracle, it's difficult to decide on a specific product or whether to go with Product Support because (typically) a lot of money is involved. You might try a specific product to test it, end up buying it, just to realize it can’t help you out. With PostgreSQL, the community is free and full of experts who have extensive experience that are happy to help you out with your current problems.

    You can subscribe to the mailing list right here at https://lists.postgresql.org/ to start reaching out with the community. Newbies or prodigies of PostgreSQL touch based here to communicate, showcase, and share their solutions, technology, bugs, new findings or even share their emerging software. You may even ask help from IRC chat using irc.freenode.net and joining to #postgresql channel. You can also reach out to the community through Slack by joining with https://postgres-slack.herokuapp.com/ or https://postgresteam.slack.com/.  There's a lot of options to take and lots of Open Source organizations that can offer you questions

    For more details and information about where to start, go check out here https://www.postgresql.org/community/.

    If you want to go and checkout for Professional Services in PostgreSQL, there's tons of options to choose from. Even checking their website at https://www.postgresql.org/support/professional_support/northamerica/, you can find a large list of companies there and some of these are at a cheap price. Even here at Severalnines, we do offer also Support for Postgres, which is part of the ClusterControl license or a DBA Consultancy.

    Reason Three:  Wide Support for SQL Conformance

    PostgreSQL has always been keen to adapt and conform to SQL as a de facto standard for its language. The formal name of the SQL standard is ISO/IEC 9075 “Database Language SQL”.  Any successive revised versions of the standard releases replaces the previous one, so claims of conformance to earlier versions have no official merit. 

    Unlike Oracle, some keyword or operators are still present that does not conform the ANSI-standard SQL (Structured Query Language). Example, the OUTER JOIN (+) operator can attribute confusions with other DBA's that have not touched or with the least familiarity to Oracle. PostgreSQL follows the ANSI-SQL standard for JOIN syntax and that leaves an advantage to jump easily and simply with other open-source RDBMS database such as MySQL/Percona/MariaDB databases. 

    Another syntax that is very common with Oracle is on using hierarchical queries.  Oracle uses the non-standard START WITH..CONNECT BY syntax, while in SQL:1999, hierarchical queries are implemented by way of recursive common table expressions. For example, the queries below differs its syntax in accordance to hierarchical queries:

    Oracle

    SELECT
    
        restaurant_name, 
    
        city_name 
    
    FROM
    
        restaurants rs 
    
    START WITH rs.city_name = 'TOKYO'
    
    CONNECT BY PRIOR rs.restaurant_name = rs.city_name;

    PostgreSQL

    WITH RECURSIVE tmp AS (SELECT restaurant_name, city_name
    
                                     FROM restaurants
    
                                    WHERE city_name = 'TOKYO'
    
                                    UNION
    
                                   SELECT m.restaurant_name, m.city_name
    
                                     FROM restaurants m
    
                                     JOIN tmp ON tmp.restaurant_name = m.city_name)
    
                      SELECT restaurant_name, city_name FROM tmp;

    PostgreSQL has a very similar approach as the other top open-source RDBMS like MySQL/MariaDB

    According to the PostgreSQL manual, PostgreSQL development aims for conformance with the latest official version of the standard where such conformance does not contradict traditional features or common sense. Many of the features required by the SQL standard are supported, though sometimes with slightly differing syntax or function. This is, in fact, what is great with PostgreSQL as it's also supported and collaborated by the different organizations, whether it's small or large. The beauty stays on its SQL language conformity to what has the standard push through.

    PostgreSQL development aims for conformance with the latest official version of the standard where such conformance does not contradict traditional features or common sense. Many of the features required by the SQL standard are supported, though sometimes with slightly differing syntax or function. Further moves towards conformance can be expected over time.

    Reason Four: Query Parallelism

    To be fair, PostgreSQL's Query Parallelism is not as rich when compared to Oracle's parallel execution for SQL statements. Amongst the features that Oracle's parallelism are statement queuing with hints, ability to set the degree of parallelism (DOP), set a parallel degree policy, or adaptive parallelism. 

    PostgreSQL has a simple degree of parallelism based on the plans supported, but that does not define that Oracle edges over the open source PostgreSQL. 

    PostgreSQL's parallelism has been constantly improving and continuously enhanced by the community. When PostgreSQL 10 was released, it added more appeal to the public especially the improvements on parallelism support for merge join, bitmap heap scan, index scan and index-only scan, gather merge, etc. Improvements also adds statistics to pg_stat_activity.

    In PostgreSQL versions < 10, parallelism is disabled by default which you need to set the variable max_parallel_workers_per_gather. 

    postgres=# \timing
    
    Timing is on.
    
    postgres=# explain analyze select * from imdb.movies where birthyear >= 1980 and birthyear <=2005;
    
                                                       QUERY PLAN                                                   
    
    ----------------------------------------------------------------------------------------------------------------
    
     Seq Scan on movies  (cost=0.00..215677.28 rows=41630 width=68) (actual time=0.013..522.520 rows=84473 loops=1)
    
       Filter: ((birthyear >= 1980) AND (birthyear <= 2005))
    
       Rows Removed by Filter: 8241546
    
     Planning time: 0.039 ms
    
     Execution time: 525.195 ms
    
    (5 rows)
    
    
    
    Time: 525.582 ms
    
    postgres=# \o /dev/null 
    
    postgres=#  select * from imdb.movies where birthyear >= 1980 and birthyear <=2005;
    
    Time: 596.947 ms

    Query plan reveals that it's the actual time can go around 522.5 ms then the real query execution time goes around 596.95 ms. Whereas enabling parallelism,

    postgres=# set max_parallel_workers_per_gather=2;
    
    Time: 0.247 ms
    
    postgres=# explain analyze select * from imdb.movies where birthyear >= 1980 and birthyear <=2005;
    
                                                              QUERY PLAN                                                           
    
    -------------------------------------------------------------------------------------------------------------------------------
    
     Gather  (cost=1000.00..147987.62 rows=41630 width=68) (actual time=0.172..339.258 rows=84473 loops=1)
    
       Workers Planned: 2
    
       Workers Launched: 2
    
       ->  Parallel Seq Scan on movies  (cost=0.00..142824.62 rows=17346 width=68) (actual time=0.029..264.980 rows=28158 loops=3)
    
             Filter: ((birthyear >= 1980) AND (birthyear <= 2005))
    
             Rows Removed by Filter: 2747182
    
     Planning time: 0.096 ms
    
     Execution time: 342.735 ms
    
    (8 rows)
    
    
    
    Time: 343.142 ms
    
    postgres=# \o /dev/null
    
    postgres=#  select * from imdb.movies where birthyear >= 1980 and birthyear <=2005;
    
    Time: 346.020 ms

    The query plan determines that the query needs to use parallelism and then it does use a Gather node. It's actual time estimates to 339ms with 2 works and estimates to 264ms before it has been aggregated by the query plan. Now, the real execution time of the query took 346ms, which is very near to the estimated actual time from the query plan. 

    This just illustrates how fast and beneficial it is with PostgreSQL. Although PostgreSQL has its own limits when parallelism can occur or when query plan determine it's faster than to use parallelism, it does not make its feature a huge difference than Oracle. PostgreSQL's parallelism is flexible and can be enabled or utilized correctly as long as your query matches the sequence required for query parallelism.

    Reason Five: Advanced JSON Support and is Always Improving

    JSON support in PostgreSQL is always on par compared to the other open source RDBMS. Take a look at this external blog from LiveJournal where PostgreSQL's JSON support reveals to be always more advanced when compared to the other RDBMS. PostgreSQL has a large number of JSON functions and features.

    The JSON data-type was introduced in PostgreSQL-9.2. Since then, it has a lot of significant enhancements  and amongst the major addition came-up in PostgreSQL-9.4 with the addition of JSONB data-type. PostgreSQL offers two data types for storing JSON data: json and jsonb. With jsonb, it is an advanced version of JSON data-type which stores the JSON data in binary format. This is the major enhancement which made a big difference to the way JSON data was searched and processed in PostgreSQL.

    Oracle has extensive support of JSON as well. In contrast, PostgreSQL has extensive support as well as functions that can be used for data retrieval, data formatting, or conditional operations that affects the output of the data or even the data stored in the database. Data stored with jsonb data type has a greater advantage with the ability to use GIN (Generalized Inverted Index) which can be used to efficiently search for keys or key/value pairs occurring within a large number of jsonb documents.

    PostgreSQL has additional extensions that are helpful to implement TRANSFORM FOR TYPE for the jsonb type to its supported procedure languages. These extensions are jsonb_plperl and jsonb_plperlu for PL/Perl. Whereas for PL/Python, these are jsonb_plpythonu, jsonb_plpython2u, and jsonb_plpython3u. For example, using jsonb values to map Perl arrays, you can use jsonb_plperl or jsonb_plperlu extensions.

    ArangoDB had posted a benchmark comparing PostgreSQL's JSON performance along with other JSON-support databases. Although it's an old blog, still it showcases how PostgreSQL's JSON performs compared to other databases where JSON is it's core feature in their database kernel. This just makes PostgreSQL has its own advantage even with its side feature.

    Reason Six: DBaaS Support By Major Cloud Vendors

    PostgreSQL has been supported widely as a DBaaS. These services are coming from Amazon, Microsoft's with its Azure Database for PostgreSQL, and Google's Cloud SQL for PostgreSQL

    In comparison Oracle, is only available on Amazon RDS for Oracle. The services offered by the major players start at an affordable price and are very flexible to setup in accordance to your needs. This helps institutions and organizations to setup accordingly and relieve from its large cost tied up on the Oracle platform.

    Reason Seven:  Better Handling of Massive Amounts of Data

    PostgreSQL RDBMS are not designed to handle analytical and data warehousing workloads. PostgreSQL is a row-oriented database, but it has the capability to store large amount of data. PostgreSQL has the following limits for dealing with data store:

    Limit

    Value

    Maximum Database Size

    Unlimited

    Maximum Table Size

    32 TB

    Maximum Row Size

    1.6 TB

    Maximum Field Size

    1 GB

    Maximum Rows per Table

    Unlimited

    Maximum Columns per Table

    250-1600 depending on column types

    Maximum Indexes per Table

    Unlimited

    The major benefit with PostgreSQL is that, there have been plugins that can be incorporated to handle large amounts of data. TimeScaleDB and CitusData's cstore_fdw are one of the plugins that you can incorporate for time series database, storing large data from mobile applications, or data from your IoT applications, or data analytics or data warehousing. In fact, ClusterControl offers support for TimeScaleDB which made simple yet easy to deploy.

    If you want to use the core features of PostgreSQL, you may store large amount of data using jsonb. For example, a large amount of documents (PDF, Word, Spreadsheets) and store this using jsonb data type. For geolocation applications and systems, you can use PostGIS.

    Reason Eight: Scalability, High-Availability, Redundancy/Geo-Redundancy, and Fault-Tolerant Solutions on the Cheap

    Oracle offers similar, but powerful, solutions such as Oracle Grid, Oracle Real Application Clusters (RAC), Oracle Clusterware, and Oracle Data Guard to name a few. These technologies can add to your increasing costs and are unpredictably expensive to deploy and make stable. It's hard to ditch these solutions. Training and skills must be enhanced and develop the people involved on the deployment and implementation process. 

    PostgreSQL has massive support and that has a lot of options to choose from. PostgreSQL includes streaming and logical replication built-in to the core package of the software. You may also able to setup a synchronous replication for PostgreSQL to have more high-availability cluster, while making a stand by node process your read queries. For high availability, we suggest you read our blog Top PG Clustering High Availability (HA) Solutions for PostgreSQL and that covers a lot of great tools and technology to choose from. 

    There are enterprise features as well that offers high-availability, monitoring, and backup solutions. ClusterControl is one of this technology and offers at an affordable price compared to Oracle Solutions.

    Reason Nine:  Support for Several Procedural Languages: PL/pgSQL, PL/Tcl, PL/Perl, and PL/Python.

    Since version 9.4, PostgreSQL has a great feature where you can define a new procedural language in accordance to your choice. Although not all variety of programming languages are supported, but it has a number of languages that are supported. Currently, with base distribution, it includes PL/pgSQL, PL/Tcl, PL/Perl, and PL/Python. The external languages are:

    Name

    Language

    Website

    PL/Java

    Java

    https://tada.github.io/pljava/

    PL/Lua

    Lua

    https://github.com/pllua/pllua

    PL/R

    R

    https://github.com/postgres-plr/plr

    PL/sh

    Unix shell

    https://github.com/petere/plsh

    PL/v8

    JavaScript

    https://github.com/plv8/plv8

     

    The great thing about this is that, unlike Oracle, developers that have jump off newly to PostgreSQL can quickly provide business logic to their application systems without further taking time to learn about PL/SQL. PostgreSQL makes the environment for developers easier and efficient. This nature of PostgreSQL contributes to the reason why developers loves PostgreSQL and starts to shift away on enterprise platform solutions to the open source environment.

    Reason Ten:  Flexible Indexes for Large and Textual Data (GIN, GiST, SP-GiST, and BRIN)

    PostgreSQL has a huge advantage when it comes to the support of indexes which are beneficial to handling large data. Oracle has a lot of index types that are beneficial for handling large data sets as well, especially for full text indexing. But for PostgreSQL, these types of indexes are made to be flexible according to your purpose. For example, these types of indexes are applicable for large data:

    GIN - (Generalized Inverted Indexes) 

    This type of index is applicable for jsonb, hstore, range, and arrays data type columns. It is useful when you have data types that contain multiple values in a single column. According to the PostgreSQL docs, “GIN is designed for handling cases where the items to be indexed are composite values, and the queries to be handled by the index need to search for element values that appear within the composite items. For example, the items could be documents, and the queries could be searches for documents containing specific words.”

    GiST - (Generalized Search Tree)

    A height-balanced search tree that consists of node pages. The nodes consist of index rows. Each row of a leaf node (leaf row), in general, contains some predicate (boolean expression) and a reference to a table row (TID). GiST indexes are best if you use this for geometrical data type like, you want to see if two polygons contained some point. In one case a specific point may be contained within box, while another point only exists within one polygon. The most common datatypes where you want to leverage GiST indexes are geometry types and text when dealing with full-text search

    In choosing which index type to use, GiST or GIN, consider these performance differences:

    • GIN index lookups are about three times faster than GiST
    • GIN indexes take about three times longer to build than GiST
    • GIN indexes are moderately slower to update than GiST indexes, but about 10 times slower if fast-update support was disabled
    • GIN indexes are two-to-three times larger than GiST indexes

    As a rule of thumb, GIN indexes are best for static data because lookups are faster. For dynamic data, GiST indexes are faster to update.

    SP-GiST - (Space Partitioned GiST) 

    For larger datasets with natural but uneven clustering. This type of index leverage space partitioning trees. SP-GiST indexes are most useful when your data has a natural clustering element to it, and is also not an equally balanced tree. A great example of this is phone numbers, for example in the US, they use the following format:

    • 3 digits for area code
    • 3 digits for prefix (historically related to a phone carrier’s switch)
    • 4 digits for line number

    This means that you have some natural clustering around the first set of 3 digits, around the second set of 3 digits, then numbers may fan out in a more even distribution. But, with phone numbers some area codes have a much higher saturation than others. The result may be that the tree is very unbalanced. Because of that natural clustering up front and the unequal distribution of data–data like phone numbers could make a good case for SP-GiST.

    BRIN - (Block Range Index) 

    For really large datasets that line up sequentially. A block range is a group of pages adjacent to each other, where summary information about all those pages is stored in Index. Block range indexes can focus on some similar use cases to SP-GiST in that they’re best when there is some natural ordering to the data, and the data tends to be very large. Have a billion record table especially if it’s time series data? BRIN may be able to help. If you’re querying against a large set of data that is naturally grouped together such as data for several zip codes (which then roll up to some city) BRIN helps to ensure that similar zip codes are located near each other on disk.

    When you have very large datasets that are ordered such as dates or zip codes BRIN indexes allow you to skip or exclude a lot of the unnecessary data very quickly. BRIN additionally are maintained as smaller indexes relative to the overall data size making them a big win for when you have a large dataset.

    Conclusion

    PostgreSQL has some major advantages when competing against Oracle's enterprise platform and business solutions. It's definitely easy to hail PostgreSQL as your go-to choice of open source RDBMS as it is nearly powerful as Oracle. 

    Oracle is hard to beat (and that is a hard truth to accept) and it's also not easy to ditch the tech-giant’s enterprise platform. When systems provide you power and productive results, that could be a dilemma.

    Sometimes though there are situations where a decision has to be made as continued over-investing in your platform cost can outweigh the cost of your other business layers and priorities which can affect progress. 

    PostgreSQL and its underlying platform solutions can be of choice to help you cut down the cost, relieve your budgetary problems; all with moderate to small changes.

    by Paul Namuag at November 25, 2019 08:06 PM

    November 22, 2019

    SeveralNines

    An Overview of VACUUM Processing in PostgreSQL

    PostgreSQL does not use IN-PLACE update mechanism, so as per the way DELETE and UPDATE command is designed,

    • Whenever DELETE operations are performed, it marks the existing tuple as DEAD instead of physically removing those tuples.
    • Similarly, whenever UPDATE operation is performed, it marks the corresponding existing tuple as DEAD and inserts a new tuple (i.e. UPDATE operation = DELETE + INSERT).

    So each DELETE and UPDATE command will result in one DEAD tuple, which is never going to be used (unless there are parallel transactions). These dead tuples will lead to unnecessary extra space usage even though the same or less number of effective records. This is also called space bloating in PostgreSQL. Since PostgreSQL is widely used as OLTP kind of relational database system, where there are frequent INSERT, UPDATE and DELETE operations carried out, there will be many DEAD tuples and hence corresponding consequences. So PostgreSQL required a strong maintenance mechanism to deal with these DEAD tuples. VACUUM is the maintenance process which takes care of dealing with DEAD tuple along with a few more activities useful for optimizing VACUUM operation. Let’s understand some terminology to be used later in this blog.

    Visibility Map

    As the name implies, it maintains visibility info about pages containing only tuples that are known to be visible to all active transactions. For each page, one bit is used. If the bit is set to 1 means all tuples of the corresponding page are visible. The bit set to 0 means there is no free space on the given page and tuples can be visible to all transactions.

    Visibility map is maintained for each relation (table and index) and gets associated alongside main relations i.e. if the relation file node name is 12345, then the visibility file gets stored in the parallel file 12345_vm.

    Free Space Map

    It maintains free space info containing details about the available space in the relation. This is also stored in the file parallel to relation main file i.e. if the relation file node name is 12345, then the free space map file gets stored in the parallel file 12345_fsm.

    Freeze Tuple

    PostgreSQL uses 4 bytes for storing transaction id, which means a maximum of 2 billion transactions can be generated before it wraps around. Now consider still at this time some tuple contains initial transaction id say 100, then for the new transaction (which uses the wrapped around transaction) say 5, transaction id 100 will look into future and it won’t be able to see the data added/modified by it even though it was actually in the past. In order to avoid this special transaction id FrozenTransactionId (equal to 2) is assigned. This special transaction id is always considered to be in the past and will be visible to all transactions.

    VACUUM

    The VACUUM primary job is to reclaim storage space occupied by DEAD tuples. Reclaimed storage space is not given back to the operating system rather they are just defragmented within the same page, so they are just available to be re-used by future data insertion within the same table. While VACUUM operation going on a particular table, concurrently other READ/WRITE operation can be done on the same table as exclusive lock is not taken on the particular table. In-case a table name is not specified, VACUUM will be performed on all tables of the database. The VACUUM operation performs below a series of operation within a ShareUpdateExclusive lock:

    • Scan all pages of all tables (or specified table) of the database to get all dead tuples.
    • Freeze old tuples if required.
    • Remove the index tuple pointing to the respective DEAD tuples.
    • Remove the DEAD tuples of a page corresponding to a specific table and reallocate the live tuples in the page.
    • Update Free space Map (FSM) and Visibility Map (VM).
    • Truncate the last page if possible (if there were DEAD tuples which got freed).
    • Update all corresponding system tables.

    As we can see from the above steps of work for VACUUM, it is clear that it is a very costly operation as it needs to process all pages of the relation. So it is very much needed to skip possible pages which do not require to be vacuumed. Since Visibility map (VM) gives information of the page where if there is no free space, it can be assumed that the corresponding page vacuum is not required and hence this page can be safely skipped.

    Since VACUUM anyway traverse through all pages and their all tuples, so it takes the opportunity to do other important task of freezing the qualifying tuples.

    Full VACUUM

    As discussed in the previous section, even though VACUUM removes all DEAD tuples and defragment the page for future use, it does not help in reducing the overall storage of the table as space is actually not released to the operating system. Suppose a table tbl1 that the total storage has reached 1.5GB and out of this 1GB occupied by dead tuple, then after VACUUM another approximately 1GB will be available for further tuple insertion but still, the total storage will remain as 1.5GB.

    Full VACUUM solves this problem by actually freeing space and returning it back to the operating system. But this comes at a cost. Unlike VACUUM, FULL VACUUM does not allow parallel operation as it takes an exclusive lock on the relation getting FULL VACUUMed. Below are the steps:

    • Takes exclusive lock on the relation.
    • Create a parallel empty storage file.
    • Copy all live tuples from current storage to newly allocated storage.
    • Then free up the original storage.
    • Free up the lock.

    So as it is clear from steps also, it will have storage only required for the remaining data.

    Auto VACUUM

    Instead of doing VACUUM manually, PostgreSQL supports a demon which does automatically trigger VACUUM periodically. Every time VACUUM wakes up (by default 1 minute) it invokes multiple works (depending on configuration autovacuum_worker processes).

    Auto-vacuum workers do VACUUM processes concurrently for the respective designated tables. Since VACUUM does not take any exclusive lock on tables, it does not (or minimal) impact other database work.

    The configuration of Auto-VACUUM should be done based on the usage pattern of the database. It should not be too frequent (as it will waste worker wake-up as there may not be or too little dead tuples) or too much delayed (it will cause a lot of dead tuples together and hence table bloat).

    VACUUM or Full VACUUM

    Ideally, database application should be designed in a way that there is no need for the FULL VACUUM. As explained above, FULL VACUUM recreates storage space and put back the data, so if there are only less dead tuples, then immediately storage space will be recreated to put back all original data. Also since FULL VACUUM takes exclusive lock on the table, it blocks all operations on the corresponding table. So doing FULL VACUUM sometimes can slow down the overall database.

    In summary Full VACUUM should be avoided unless it is known that the majority of storage space is because of dead tuples. PostgreSQL extension pg_freespacemap can be used to get a fair hint about free space.

    Let’s see an example of the explained VACUUM process.

    First, let’s create a table demo1:

    postgres=# create table demo1(id int, id2 int);
    
    CREATE TABLE

    And insert some data there:

    postgres=# insert into demo1 values(generate_series(1,10000), generate_series(1,
    
    10000));
    
    INSERT 0 10000
    
    postgres=# SELECT count(*) as npages, round(100 * avg(avail)/8192 ,2) as average_freespace_ratio FROM pg_freespace('demo1');
    
     npages | average_freespace_ratio
    
    --------+-------------------------
    
      45 |                0.00
    
    (1 row)

    Now, let’s delete data:

    postgres=# delete from demo1 where id%2=0;
    
    DELETE 5000

    And run a manual vacuum:

    postgres=# vacuum demo1;
    
    VACUUM
    
    postgres=# SELECT count(*) as npages, round(100 * avg(avail)/8192 ,2) as average_freespace_ratio FROM pg_freespace('demo1');
    
     npages | average_freespace_ratio
    
    --------+-------------------------
    
      45 |               45.07
    
    (1 row)

    This freespace is now available to be reused by PostgreSQL, but if you want to release that space to the operating system, run:

    postgres=# vacuum full demo1;
    
    VACUUM
    
    postgres=# SELECT count(*) as npages, round(100 * avg(avail)/8192 ,2) as average_freespace_ratio FROM pg_freespace('demo1');
    
     npages | average_freespace_ratio
    
    --------+-------------------------
    
      23 |                0.00
    
    (1 row)

    Conclusion

    And this was a short example of how the VACUUM process works. Luckily, thanks to the auto vacuum process, most of the time and in a common PostgreSQL environment, you don’t need to think about this because it’s managed by the engine itself.

    by Kumar Rajeev Rastogi at November 22, 2019 04:33 PM

    November 21, 2019

    SeveralNines

    MySQL InnoDB Cluster 8.0 - A Complete Operation Walk-through: Part Two

    In the first part of this blog, we covered a deployment walkthrough of MySQL InnoDB Cluster with an example on how the applications can connect to the cluster via a dedicated read/write port.

    In this operation walkthrough, we are going to show examples on how to monitor, manage and scale the InnoDB Cluster as part of the ongoing cluster maintenance operations. We’ll use the same cluster what we deployed in the first part of the blog. The following diagram shows our architecture:

    We have a three-node MySQL Group Replication and one application server running with MySQL router. All servers are running on Ubuntu 18.04 Bionic.

    MySQL InnoDB Cluster Command Options

    Before we move further with some examples and explanations, it's good to know that you can get an explanation of each function in MySQL cluster for cluster component by using the help() function, as shown below:

    $ mysqlsh
    MySQL|localhost:3306 ssl|JS> shell.connect("clusteradmin@db1:3306");
    MySQL|db1:3306 ssl|JS> cluster = dba.getCluster();
    <Cluster:my_innodb_cluster>
    MySQL|db1:3306 ssl|JS> cluster.help()

    The following list shows the available functions on MySQL Shell 8.0.18, for MySQL Community Server 8.0.18:

    • addInstance(instance[, options])- Adds an Instance to the cluster.
    • checkInstanceState(instance)- Verifies the instance gtid state in relation to the cluster.
    • describe()- Describe the structure of the cluster.
    • disconnect()- Disconnects all internal sessions used by the cluster object.
    • dissolve([options])- Deactivates replication and unregisters the ReplicaSets from the cluster.
    • forceQuorumUsingPartitionOf(instance[, password])- Restores the cluster from quorum loss.
    • getName()- Retrieves the name of the cluster.
    • help([member])- Provides help about this class and it's members
    • options([options])- Lists the cluster configuration options.
    • rejoinInstance(instance[, options])- Rejoins an Instance to the cluster.
    • removeInstance(instance[, options])- Removes an Instance from the cluster.
    • rescan([options])- Rescans the cluster.
    • resetRecoveryAccountsPassword(options)- Reset the password of the recovery accounts of the cluster.
    • setInstanceOption(instance, option, value)- Changes the value of a configuration option in a Cluster member.
    • setOption(option, value)- Changes the value of a configuration option for the whole cluster.
    • setPrimaryInstance(instance)- Elects a specific cluster member as the new primary.
    • status([options])- Describe the status of the cluster.
    • switchToMultiPrimaryMode()- Switches the cluster to multi-primary mode.
    • switchToSinglePrimaryMode([instance])- Switches the cluster to single-primary mode.

    We are going to look into most of the functions available to help us monitor, manage and scale the cluster.

    Monitoring MySQL InnoDB Cluster Operations

    Cluster Status

    To check the cluster status, firstly use the MySQL shell command line and then connect as clusteradmin@{one-of-the-db-nodes}:

    $ mysqlsh
    MySQL|localhost:3306 ssl|JS> shell.connect("clusteradmin@db1:3306");

    Then, create an object called "cluster" and declare it as "dba" global object which provides access to InnoDB cluster administration functions using the AdminAPI (check out MySQL Shell API docs):

    MySQL|db1:3306 ssl|JS> cluster = dba.getCluster();
    <Cluster:my_innodb_cluster>

    Then, we can use the object name to call the API functions for "dba" object:

    MySQL|db1:3306 ssl|JS> cluster.status()
    {
        "clusterName": "my_innodb_cluster",
        "defaultReplicaSet": {
            "name": "default",
            "primary": "db1:3306",
            "ssl": "REQUIRED",
            "status": "OK",
            "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
            "topology": {
                "db1:3306": {
                    "address": "db1:3306",
                    "mode": "R/W",
                    "readReplicas": {},
                    "replicationLag": null,
                    "role": "HA",
                    "status": "ONLINE",
                    "version": "8.0.18"
                },
                "db2:3306": {
                    "address": "db2:3306",
                    "mode": "R/O",
                    "readReplicas": {},
                    "replicationLag": "00:00:09.061918",
                    "role": "HA",
                    "status": "ONLINE",
                    "version": "8.0.18"
                },
                "db3:3306": {
                    "address": "db3:3306",
                    "mode": "R/O",
                    "readReplicas": {},
                    "replicationLag": "00:00:09.447804",
                    "role": "HA",
                    "status": "ONLINE",
                    "version": "8.0.18"
                }
            },
            "topologyMode": "Single-Primary"
        },
        "groupInformationSourceMember": "db1:3306"
    }

    The output is pretty long but we can filter it out by using the map structure. For example, if we would like to view the replication lag for db3 only, we could do like the following:

    MySQL|db1:3306 ssl|JS> cluster.status().defaultReplicaSet.topology["db3:3306"].replicationLag
    00:00:09.447804

    Note that replication lag is something that will happen in group replication, depending on the write intensivity of the primary member in the replica set and the group_replication_flow_control_* variables. We are not going to cover this topic in detail here. Check out this blog post to understand further on the group replication performance and flow control.

    Another similar function is the describe() function, but this one is a bit more simple. It describes the structure of the cluster including all its information, ReplicaSets and Instances:

    MySQL|db1:3306 ssl|JS> cluster.describe()
    {
        "clusterName": "my_innodb_cluster",
        "defaultReplicaSet": {
            "name": "default",
            "topology": [
                {
                    "address": "db1:3306",
                    "label": "db1:3306",
                    "role": "HA"
                },
                {
                    "address": "db2:3306",
                    "label": "db2:3306",
                    "role": "HA"
                },
                {
                    "address": "db3:3306",
                    "label": "db3:3306",
                    "role": "HA"
                }
            ],
            "topologyMode": "Single-Primary"
        }
    }

    Similarly, we can filter the JSON output using map structure:

    MySQL|db1:3306 ssl|JS> cluster.describe().defaultReplicaSet.topologyMode
    Single-Primary

    When the primary node went down (in this case, is db1), the output returned the following:

    MySQL|db1:3306 ssl|JS> cluster.status()
    {
        "clusterName": "my_innodb_cluster",
        "defaultReplicaSet": {
            "name": "default",
            "primary": "db2:3306",
            "ssl": "REQUIRED",
            "status": "OK_NO_TOLERANCE",
            "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
            "topology": {
                "db1:3306": {
                    "address": "db1:3306",
                    "mode": "n/a",
                    "readReplicas": {},
                    "role": "HA",
                    "shellConnectError": "MySQL Error 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 104",
                    "status": "(MISSING)"
                },
                "db2:3306": {
                    "address": "db2:3306",
                    "mode": "R/W",
                    "readReplicas": {},
                    "replicationLag": null,
                    "role": "HA",
                    "status": "ONLINE",
                    "version": "8.0.18"
                },
                "db3:3306": {
                    "address": "db3:3306",
                    "mode": "R/O",
                    "readReplicas": {},
                    "replicationLag": null,
                    "role": "HA",
                    "status": "ONLINE",
                    "version": "8.0.18"
                }
            },
            "topologyMode": "Single-Primary"
        },
        "groupInformationSourceMember": "db2:3306"
    }

    Pay attention to the status OK_NO_TOLERANCE, where the cluster is still up and running but it can't tolerate any more failure after one over three node is not available. The primary role has been taken over by db2 automatically, and the database connections from the application will be rerouted to the correct node if they connect through MySQL Router. Once db1 comes back online, we should see the following status:

    MySQL|db1:3306 ssl|JS> cluster.status()
    {
        "clusterName": "my_innodb_cluster",
        "defaultReplicaSet": {
            "name": "default",
            "primary": "db2:3306",
            "ssl": "REQUIRED",
            "status": "OK",
            "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
            "topology": {
                "db1:3306": {
                    "address": "db1:3306",
                    "mode": "R/O",
                    "readReplicas": {},
                    "replicationLag": null,
                    "role": "HA",
                    "status": "ONLINE",
                    "version": "8.0.18"
                },
                "db2:3306": {
                    "address": "db2:3306",
                    "mode": "R/W",
                    "readReplicas": {},
                    "replicationLag": null,
                    "role": "HA",
                    "status": "ONLINE",
                    "version": "8.0.18"
                },
                "db3:3306": {
                    "address": "db3:3306",
                    "mode": "R/O",
                    "readReplicas": {},
                    "replicationLag": null,
                    "role": "HA",
                    "status": "ONLINE",
                    "version": "8.0.18"
                }
            },
            "topologyMode": "Single-Primary"
        },
        "groupInformationSourceMember": "db2:3306"
    }

    It shows that db1 is now available but served as secondary with read-only enabled. The primary role is still assigned to db2 until something goes wrong to the node, where it will be automatically failed over to the next available node.

    Check Instance State

    We can check the state of a MySQL node before planning to add it into the cluster by using the checkInstanceState() function. It analyzes the instance executed GTIDs with the executed/purged GTIDs on the cluster to determine if the instance is valid for the cluster.

    The following shows instance state of db3 when it was in standalone mode, before part of the cluster:

    MySQL|db1:3306 ssl|JS> cluster.checkInstanceState("db3:3306")
    Cluster.checkInstanceState: The instance 'db3:3306' is a standalone instance but is part of a different InnoDB Cluster (metadata exists, instance does not belong to that metadata, and Group Replication is not active).

    If the node is already part of the cluster, you should get the following:

    MySQL|db1:3306 ssl|JS> cluster.checkInstanceState("db3:3306")
    Cluster.checkInstanceState: The instance 'db3:3306' already belongs to the ReplicaSet: 'default'.

    Monitor Any "Queryable" State

    With MySQL Shell, we can now use the built-in \show and \watch command to monitor any administrative query in real-time. For example, we can get the real-time value of threads connected by using:

    MySQL|db1:3306 ssl|JS> \show query SHOW STATUS LIKE '%thread%';

    Or get the current MySQL processlist:

    MySQL|db1:3306 ssl|JS> \show query SHOW FULL PROCESSLIST

    We can then use \watch command to run a report in the same way as the \show command, but it refreshes the results at regular intervals until you cancel the command using Ctrl + C. As shown in the following examples:

    MySQL|db1:3306 ssl|JS> \watch query SHOW STATUS LIKE '%thread%';
    MySQL|db1:3306 ssl|JS> \watch query --interval=1 SHOW FULL PROCESSLIST

    The default refresh interval is 2 seconds. You can change the value by using the --interval flag and specified a value from 0.1 up to 86400.

    MySQL InnoDB Cluster Management Operations

    Primary Switchover

    Primary instance is the node that can be considered as the leader in a replication group, that has the ability to perform read and write operations. Only one primary instance per cluster is allowed in single-primary topology mode. This topology is also known as replica set and is the recommended topology mode for Group Replication with protection against locking conflicts.

    To perform primary instance switchover, login to one of the database nodes as the clusteradmin user and specify the database node that you want to promote by using the setPrimaryInstance() function:

    MySQL|db1:3306 ssl|JS> shell.connect("clusteradmin@db1:3306");
    MySQL|db1:3306 ssl|JS> cluster.setPrimaryInstance("db1:3306");
    Setting instance 'db1:3306' as the primary instance of cluster 'my_innodb_cluster'...
    
    Instance 'db2:3306' was switched from PRIMARY to SECONDARY.
    Instance 'db3:3306' remains SECONDARY.
    Instance 'db1:3306' was switched from SECONDARY to PRIMARY.
    
    WARNING: The cluster internal session is not the primary member anymore. For cluster management operations please obtain a fresh cluster handle using <Dba>.getCluster().
    
    The instance 'db1:3306' was successfully elected as primary.

    We just promoted db1 as the new primary component, replacing db2 while db3 remains as the secondary node.

    Shutting Down the Cluster

    The best way to shut down the cluster gracefully by stopping the MySQL Router service first (if it's running) on the application server:

    $ myrouter/stop.sh

    The above step provides cluster protection against accidental writes by the applications. Then shutdown one database node at a time using the standard MySQL stop command, or perform system shutdown as you wish:

    $ systemctl stop mysql

    Starting the Cluster After a Shutdown

    If your cluster suffers from a complete outage or you want to start the cluster after a clean shutdown, you can ensure it is reconfigured correctly using dba.rebootClusterFromCompleteOutage() function. It simply brings a cluster back ONLINE when all members are OFFLINE. In the event that a cluster has completely stopped, the instances must be started and only then can the cluster be started.

    Thus, ensure all MySQL servers are started and running. On every database node, see if the mysqld process is running:

    $ ps -ef | grep -i mysql

    Then, pick one database server to be the primary node and connect to it via MySQL shell:

    MySQL|JS> shell.connect("clusteradmin@db1:3306");

    Run the following command from that host to start them up:

    MySQL|db1:3306 ssl|JS> cluster = dba.rebootClusterFromCompleteOutage()

    You will be presented with the following questions:

    After the above completes, you can verify the cluster status:

    MySQL|db1:3306 ssl|JS> cluster.status()

    At this point, db1 is the primary node and the writer. The rest will be the secondary members. If you would like to start the cluster with db2 or db3 as the primary, you could use the shell.connect() function to connect to the corresponding node and perform the rebootClusterFromCompleteOutage() from that particular node.

    You can then start the MySQL Router service (if it's not started) and let the application connect to the cluster again.

    Setting Member and Cluster Options

    To get the cluster-wide options, simply run:

    MySQL|db1:3306 ssl|JS> cluster.options()

    The above will list out the global options for the replica set and also individual options per member in the cluster. This function changes an InnoDB Cluster configuration option in all members of the cluster. The supported options are:

    • clusterName: string value to define the cluster name.
    • exitStateAction: string value indicating the group replication exit state action.
    • memberWeight: integer value with a percentage weight for automatic primary election on failover.
    • failoverConsistency: string value indicating the consistency guarantees that the cluster provides.
    • consistency: string value indicating the consistency guarantees that the cluster provides.
    • expelTimeout: integer value to define the time period in seconds that cluster members should wait for a non-responding member before evicting it from the cluster.
    • autoRejoinTries: integer value to define the number of times an instance will attempt to rejoin the cluster after being expelled.
    • disableClone: boolean value used to disable the clone usage on the cluster.

    Similar to other function, the output can be filtered in map structure. The following command will only list out the options for db2:

    MySQL|db1:3306 ssl|JS> cluster.options().defaultReplicaSet.topology["db2:3306"]

    You can also get the above list by using the help() function:

    MySQL|db1:3306 ssl|JS> cluster.help("setOption")

    The following command shows an example to set an option called memberWeight to 60 (from 50) on all members:

    MySQL|db1:3306 ssl|JS> cluster.setOption("memberWeight", 60)
    Setting the value of 'memberWeight' to '60' in all ReplicaSet members ...
    
    Successfully set the value of 'memberWeight' to '60' in the 'default' ReplicaSet.

    We can also perform configuration management automatically via MySQL Shell by using setInstanceOption() function and pass the database host, the option name and value accordingly:

    MySQL|db1:3306 ssl|JS> cluster = dba.getCluster()
    MySQL|db1:3306 ssl|JS> cluster.setInstanceOption("db1:3306", "memberWeight", 90)

    The supported options are:

    • exitStateActionstring value indicating the group replication exit state action.
    • memberWeight: integer value with a percentage weight for automatic primary election on failover.
    • autoRejoinTries: integer value to define the number of times an instance will attempt to rejoin the cluster after being expelled.
    • label a string identifier of the instance.

    Switching to Multi-Primary/Single-Primary Mode

    By default, InnoDB Cluster is configured with single-primary, only one member capable of performing reads and writes at one given time. This is the safest and recommended way to run the cluster and suitable for most workloads. 

    However, if the application logic can handle distributed writes, it's probably a good idea to switch to multi-primary mode, where all members in the cluster are able to process reads and writes at the same time. To switch from single-primary to multi-primary mode, simply use the switchToMultiPrimaryMode() function:

    MySQL|db1:3306 ssl|JS> cluster.switchToMultiPrimaryMode()
    Switching cluster 'my_innodb_cluster' to Multi-Primary mode...
    
    Instance 'db2:3306' was switched from SECONDARY to PRIMARY.
    Instance 'db3:3306' was switched from SECONDARY to PRIMARY.
    Instance 'db1:3306' remains PRIMARY.
    
    The cluster successfully switched to Multi-Primary mode.

    Verify with:

    MySQL|db1:3306 ssl|JS> cluster.status()
    {
        "clusterName": "my_innodb_cluster",
        "defaultReplicaSet": {
            "name": "default",
            "ssl": "REQUIRED",
            "status": "OK",
            "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
            "topology": {
                "db1:3306": {
                    "address": "db1:3306",
                    "mode": "R/W",
                    "readReplicas": {},
                    "replicationLag": null,
                    "role": "HA",
                    "status": "ONLINE",
                    "version": "8.0.18"
                },
                "db2:3306": {
                    "address": "db2:3306",
                    "mode": "R/W",
                    "readReplicas": {},
                    "replicationLag": null,
                    "role": "HA",
                    "status": "ONLINE",
                    "version": "8.0.18"
                },
                "db3:3306": {
                    "address": "db3:3306",
                    "mode": "R/W",
                    "readReplicas": {},
                    "replicationLag": null,
                    "role": "HA",
                    "status": "ONLINE",
                    "version": "8.0.18"
                }
            },
            "topologyMode": "Multi-Primary"
        },
        "groupInformationSourceMember": "db1:3306"
    }

    In multi-primary mode, all nodes are primary and able to process reads and writes. When sending a new connection via MySQL Router on single-writer port (6446), the connection will be sent to only one node, as in this example, db1:

    (app-server)$ for i in {1..3}; do mysql -usbtest -p -h192.168.10.40 -P6446 -e 'select @@hostname, @@read_only, @@super_read_only'; done
    
    +------------+-------------+-------------------+
    | @@hostname | @@read_only | @@super_read_only |
    +------------+-------------+-------------------+
    | db1        | 0           | 0                 |
    +------------+-------------+-------------------+
    
    +------------+-------------+-------------------+
    | @@hostname | @@read_only | @@super_read_only |
    +------------+-------------+-------------------+
    | db1        | 0           | 0                 |
    +------------+-------------+-------------------+
    
    +------------+-------------+-------------------+
    | @@hostname | @@read_only | @@super_read_only |
    +------------+-------------+-------------------+
    | db1        | 0           | 0                 |
    +------------+-------------+-------------------+

    If the application connects to the multi-writer port (6447), the connection will be load balanced via round robin algorithm to all members:

    (app-server)$ for i in {1..3}; do mysql -usbtest -ppassword -h192.168.10.40 -P6447 -e 'select @@hostname, @@read_only, @@super_read_only'; done
    
    +------------+-------------+-------------------+
    | @@hostname | @@read_only | @@super_read_only |
    +------------+-------------+-------------------+
    | db2        | 0           | 0                 |
    +------------+-------------+-------------------+
    
    +------------+-------------+-------------------+
    | @@hostname | @@read_only | @@super_read_only |
    +------------+-------------+-------------------+
    | db3        | 0           | 0                 |
    +------------+-------------+-------------------+
    
    +------------+-------------+-------------------+
    | @@hostname | @@read_only | @@super_read_only |
    +------------+-------------+-------------------+
    | db1        | 0           | 0                 |
    +------------+-------------+-------------------+

    As you can see from the output above, all nodes are capable of processing reads and writes with read_only = OFF. You can distribute safe writes to all members by connecting to the multi-writer port (6447), and send the conflicting or heavy writes to the single-writer port (6446).

    To switch back to the single-primary mode, use the switchToSinglePrimaryMode() function and specify one member as the primary node. In this example, we chose db1:

    MySQL|db1:3306 ssl|JS> cluster.switchToSinglePrimaryMode("db1:3306");
    
    Switching cluster 'my_innodb_cluster' to Single-Primary mode...
    
    Instance 'db2:3306' was switched from PRIMARY to SECONDARY.
    Instance 'db3:3306' was switched from PRIMARY to SECONDARY.
    Instance 'db1:3306' remains PRIMARY.
    
    WARNING: Existing connections that expected a R/W connection must be disconnected, i.e. instances that became SECONDARY.
    
    The cluster successfully switched to Single-Primary mode.

    At this point, db1 is now the primary node configured with read-only disabled and the rest will be configured as secondary with read-only enabled.

    MySQL InnoDB Cluster Scaling Operations

    Scaling Up (Adding a New DB Node)

    When adding a new instance, a node has to be provisioned first before it's allowed to participate with the replication group. The provisioning process will be handled automatically by MySQL. Also, you can check the instance state first whether the node is valid to join the cluster by using checkInstanceState() function as previously explained.

    To add a new DB node, use the addInstances() function and specify the host:

    MySQL|db1:3306 ssl|JS> cluster.addInstance("db3:3306")

    The following is what you would get when adding a new instance:

    Verify the new cluster size with:

    MySQL|db1:3306 ssl|JS> cluster.status() //or cluster.describe()

    MySQL Router will automatically include the added node, db3 into the load balancing set.

    Scaling Down (Removing a Node)

    To remove a node, connect to any of the DB nodes except the one that we are going to remove and use the removeInstance() function with the database instance name:

    MySQL|db1:3306 ssl|JS> shell.connect("clusteradmin@db1:3306");
    MySQL|db1:3306 ssl|JS> cluster = dba.getCluster()
    MySQL|db1:3306 ssl|JS> cluster.removeInstance("db3:3306")

    The following is what you would get when removing an instance:

    Verify the new cluster size with:

    MySQL|db1:3306 ssl|JS> cluster.status() //or cluster.describe()

    MySQL Router will automatically exclude the removed node, db3 from the load balancing set.

    Adding a New Replication Slave

    We can scale out the InnoDB Cluster with asynchronous replication slave replicates from any of the cluster nodes. A slave is loosely coupled to the cluster, and will be able to handle a heavy load without affecting the performance of the cluster. The slave can also be a live copy of the database for disaster recovery purposes. In multi-primary mode, you can use the slave as the dedicated MySQL read-only processor to scale out the reads workload, perform analytices operation, or as a dedicated backup server.

    On the slave server, download the latest APT config package, install it (choose MySQL 8.0 in the configuration wizard), install the APT key, update repolist and install MySQL server.

    $ wget https://repo.mysql.com/apt/ubuntu/pool/mysql-apt-config/m/mysql-apt-config/mysql-apt-config_0.8.14-1_all.deb
    $ dpkg -i mysql-apt-config_0.8.14-1_all.deb
    $ apt-key adv --recv-keys --keyserver ha.pool.sks-keyservers.net 5072E1F5
    $ apt-get update
    $ apt-get -y install mysql-server mysql-shell

    Modify the MySQL configuration file to prepare the server for replication slave. Open the configuration file via text editor:

    $ vim /etc/mysql/mysql.conf.d/mysqld.cnf

    And append the following lines:

    server-id = 1044 # must be unique across all nodes
    gtid-mode = ON
    enforce-gtid-consistency = ON
    log-slave-updates = OFF
    read-only = ON
    super-read-only = ON
    expire-logs-days = 7

    Restart MySQL server on the slave to apply the changes:

    $ systemctl restart mysql

    On one of the InnoDB Cluster servers (we chose db3), create a replication slave user and followed by a full MySQL dump:

    $ mysql -uroot -p
    mysql> CREATE USER 'repl_user'@'192.168.0.44' IDENTIFIED BY 'password';
    mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'192.168.0.44';
    mysql> exit
    $ mysqldump -uroot -p --single-transaction --master-data=1 --all-databases --triggers --routines --events > dump.sql

    Transfer the dump file from db3 to the slave:

    $ scp dump.sql root@slave:~

    And perform the restoration on the slave:

    $ mysql -uroot -p < dump.sql

    With master-data=1, our MySQL dump file will automatically configure the GTID executed and purged value. We can verify it with the following statement on the slave server after the restoration:

    $ mysql -uroot -p
    mysql> show global variables like '%gtid_%';
    +----------------------------------+----------------------------------------------+
    | Variable_name                    | Value                                        |
    +----------------------------------+----------------------------------------------+
    | binlog_gtid_simple_recovery      | ON                                           |
    | enforce_gtid_consistency         | ON                                           |
    | gtid_executed                    | d4790339-0694-11ea-8fd5-02f67042125d:1-45886 |
    | gtid_executed_compression_period | 1000                                         |
    | gtid_mode                        | ON                                           |
    | gtid_owned                       |                                              |
    | gtid_purged                      | d4790339-0694-11ea-8fd5-02f67042125d:1-45886 |
    +----------------------------------+----------------------------------------------+

    Looks good. We can then configure the replication link and start the replication threads on the slave:

    mysql> CHANGE MASTER TO MASTER_HOST = '192.168.10.43', MASTER_USER = 'repl_user', MASTER_PASSWORD = 'password', MASTER_AUTO_POSITION = 1;
    mysql> START SLAVE;

    Verify the replication state and ensure the following status return 'Yes':

    mysql> show slave status\G
    ...
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
    ...

    At this point, our architecture is now looking like this:

     

    Common Issues with MySQL InnoDB Clusters

    Memory Exhaustion

    When using MySQL Shell with MySQL 8.0, we were constantly getting the following error when the instances were configured with 1GB of RAM:

    Can't create a new thread (errno 11); if you are not out of available memory, you can consult the manual for a possible OS-dependent bug (MySQL Error 1135)

    Upgrading each host's RAM to 2GB of RAM solved the problem. Apparently, MySQL 8.0 components require more RAM to operate efficiently.

    Lost Connection to MySQL Server

    In case the primary node goes down, you would probably see the "lost connection to MySQL server error" when trying to query something on the current session:

    MySQL|db1:3306 ssl|JS> cluster.status()
    Cluster.status: Lost connection to MySQL server during query (MySQL Error 2013)
    
    MySQL|db1:3306 ssl|JS> cluster.status()
    Cluster.status: MySQL server has gone away (MySQL Error 2006)

    The solution is to re-declare the object once more:

    MySQL|db1:3306 ssl|JS> cluster = dba.getCluster()
    <Cluster:my_innodb_cluster>
    MySQL|db1:3306 ssl|JS> cluster.status()

    At this point, it will connect to the newly promoted primary node to retrieve the cluster status.

    Node Eviction and Expelled

    In an event where communication between nodes is interrupted, the problematic node will be evicted from the cluster without any delay, which is not good if you are running on a non-stable network. This is what it looks like on db2 (the problematic node):

    2019-11-14T07:07:59.344888Z 0 [ERROR] [MY-011505] [Repl] Plugin group_replication reported: 'Member was expelled from the group due to network failures, changing member status to ERROR.'
    2019-11-14T07:07:59.371966Z 0 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.'

    Meanwhile from db1, it saw db2 was offline:

    2019-11-14T07:07:44.086021Z 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address db2:3306 has become unreachable.'
    2019-11-14T07:07:46.087216Z 0 [Warning] [MY-011499] [Repl] Plugin group_replication reported: 'Members removed from the group: db2:3306'
    

    To tolerate a bit of delay on node eviction, we can set a higher timeout value before a node is being expelled from the group. The default value is 0, which means expel immediately. Use the setOption() function to set the expelTimeout value:

    Thanks to Frédéric Descamps from Oracle who pointed this out:

    Instead of relying on expelTimeout, it's recommended to set the autoRejoinTries option instead. The value represents the number of times an instance will attempt to rejoin the cluster after being expelled. A good number to start is 3, which means, the expelled member will try to rejoin the cluster for 3 times, which after an unsuccessful auto-rejoin attempt, the member waits 5 minutes before the next try.

    To set this value cluster-wide, we can use the setOption() function:

    MySQL|db1:3306 ssl|JS> cluster.setOption("autoRejoinTries", 3)
    WARNING: Each cluster member will only proceed according to its exitStateAction if auto-rejoin fails (i.e. all retry attempts are exhausted).
    
    Setting the value of 'autoRejoinTries' to '3' in all ReplicaSet members ...
    
    Successfully set the value of 'autoRejoinTries' to '3' in the 'default' ReplicaSet.

     

    Conclusion

    For MySQL InnoDB Cluster, most of the management and monitoring operations can be performed directly via MySQL Shell (only available from MySQL 5.7.21 and later).

    by ashraf at November 21, 2019 06:35 PM

    November 20, 2019

    SeveralNines

    PostgreSQL Deployment & Configuration with Puppet

    Puppet is open source software for configuration management and deployment. Founded in 2005, it’s multi-platform and even has its own declarative language for configuration.

    The tasks related to administration and maintenance of PostgreSQL (or other software really) consists of daily, repetitive processes that require monitoring. This applies even to those tasks operated by scripts or commands through a scheduling tool. The complexity of these tasks increases exponentially when executed on a massive infrastructure, however, using Puppet for these kind of tasks can often solve these types of large scale problems as Puppet centralizes and automates the performance of these operations in a very agile way.

    Puppet works within the architecture at the client/server level where the configuration is being performed; these ops are then diffused and executed on all the clients (also known as nodes).

    Typically running every 30 minutes, the agents’ node will collect a set of information (type of processor, architecture, IP address, etc..), also called as facts, then sends the information to the master which is waiting for an answer to see if there are any new configurations to apply. 

    These facts will allow the master to customize the same configuration for each node.

    In a very simplistic way, Puppet is one of the most important DevOps tools available today. In this blog we will take a look at the following...

    • The Use Case for Puppet & PostgreSQL
    • Installing Puppet
    • Configuring & Programming Puppet
    • Configuring Puppet for PostgreSQL 

    The installation and setup of Puppet (version 5.3.10) described below were performed in a set of hosts using CentOS 7.0 as operating system.

    The Use Case for Puppet & PostgreSQL

    Suppose that there is an issue in your firewall on the machines that host all your PostgreSQL servers, it would then be necessary to deny all outbound connections to PostgreSQL, and do it as soon as possible.

    Puppet is the perfect tool for this situation, especially because speed and efficiency are essential. We’ll’ talk about this example presented in the section “Configuring Puppet for PostgreSQL” by managing the parameter listen_addresses.

    Installing Puppet

    There are a set of common steps to perform either on master or agent hosts:

    Step One

    Updating of /etc/hosts file with host names and their IP address

    192.168.1.85 agent agent.severalnines.com
    
    192.168.1.87 master master.severalnines.com puppet

    Step Two

    Adding the Puppet repositories on the system

    $ sudo rpm –Uvh https://yum.puppetlabs.com/puppet5/el/7/x86_64/puppet5-release-5.0.0-1-el7.noarch.rpm

    For other operating systems or CentOS versions, the most appropriate repository can be found in Puppet, Inc. Yum Repositories.

    Step Three

    Configuration of NTP (Network Time Protocol) server

    $ sudo yum -y install chrony

    Step Four

    The chrony is used to synchronize the system clock from different NTP servers and thus keeps the time synchronized between master and agent server.

    Once installed chrony  it must be enabled and restarted:

    $ sudo systemctl enable chronyd.service
    
    $ sudo systemctl restart chronyd.service

    Step Five

    Disable the SELinux parameter

    On the file /etc/sysconfig/selinux the parameter SELINUX (Security-Enhanced Linux)  must be disabled in order do not restricts access on both hosts.

    SELINUX=disabled

    Step Six

    Before the Puppet installation (either master or agent) the firewall in these hosts must be defined accordingly:

    $ sudo firewall-cmd -–add-service=ntp -–permanent 
    
    $ sudo firewall-cmd –-reload 

    Installing the Puppet  Master

    Once the package repository puppet5-release-5.0.0-1-el7.noarch.rpm added to the system the puppetserver installation can be done:

    $ sudo yum install -y puppetserver

    The max memory allocation parameter is an important setting to update on /etc/sysconfig/puppetserver file to 2GB (or to 1GB if the service doesn’t start):

    JAVA_ARGS="-Xms2g –Xmx2g "

    In the configuration file /etc/puppetlabs/puppet/puppet.conf it’s necessary to add the following parameterization:

    [master]
    
    dns_alt_names=master.severalnines.com,puppet
    
    
    
    [main]
    
    certname = master.severalnines.com
    
    server = master.severalnines.com
    
    environment = production
    
    runinterval = 1h

    The puppetserver service uses the  port 8140 to listen to the node requests, thus it's necessary to ensure that this port will be enabled:

    $ sudo firewall-cmd --add-port=8140/tcp --permanent
    
    $ sudo firewall-cmd --reload

    Once all settings made in puppet master, it’s time to start this service up:

    $ sudo systemctl start puppetserver
    
    $ sudo systemctl enable puppetserver

    Installing the Puppet Agent

    The Puppet agent in the package repository puppet5-release-5.0.0-1-el7.noarch.rpm is also added to the system, the puppet-agent installation can be performed right away:

    $ sudo yum install -y puppet-agent

    The puppet-agent configuration file /etc/puppetlabs/puppet/puppet.conf needs also to be updated by adding the following parameter:

    [main]
    
    certname = agent.severalnines.com
    
    server = master.severalnines.com
    
    environment = production
    
    runinterval = 1h

    The next step consists of registering the agent node on the master host by executing the following command:

    $ sudo /opt/puppetlabs/bin/puppet resource service puppet ensure=running enable=true
    
    service { ‘puppet’:
    
    ensure => ‘running’,
    
    enable => ‘true’
    
      }

    At this moment, on the master host, there is a pending request from the puppet agent to sign a certificate:

    That must be signed by executing one of the following commands:

    $ sudo /opt/puppetlabs/bin/puppet cert sign agent.severalnines.com

    or

    $ sudo /opt/puppetlabs/bin/puppet cert sign --all

    Finally (and once the puppet master has signed the certificate) it’s time to apply the configurations to the agent by retrieving the catalog from puppet master:

    $ sudo /opt/puppetlabs/bin/puppet agent --test

    In this command, the parameter --test doesn’t mean a test, the settings retrieved from the master will be applied to the local agent. In order to test/check the configurations from master the following command must  be executed:

    $ sudo /opt/puppetlabs/bin/puppet agent --noop

    Configuring & Programming Puppet

    Puppet uses a declarative programming approach on which the purpose is to specify what to do and doesn't matter the way to achieve it!

    The most elementary piece of code on Puppet is the resource that specifies a system property such as command, service, file, directory, user or package.

    Below it’s presented the syntax of a resource to create an user:

    user { 'admin_postgresql':
    
      ensure     => present,
    
      uid        => '1000',
    
      gid        => '1000',
    
      home       => '/home/admin/postresql'
    
    }

    Different resources could be joined to the former class (also known as a manifest) of file with “pp” extension (it stands for Puppet Program), nevertheless, several manifests and data (such as facts, files, and templates) will compose a module. All there logical hierarchies and rules are represented in the diagram below:

    The purpose of each module is to contain all the needed manifests to execute single tasks in a modular way. On the other hand, the concept of class isn't the same one from object-oriented programming languages, in Puppet, it works as an aggregator of resources.

    These files organization has a specific directory structure to follow:

    On which the purpose of each folder is the following:

    Folder

    Description

    manifests

    Puppet code

    files

    Static files to be copied to nodes

    templates

    Template files to be copied to managed nodes(it can be customized with variables)

    examples

    Manifest to show how to use the module

    The classes(manifests) can be used by other classes as shown in the example below: the manifest init.pp on dev_accounts are using the manifest groups from the accounts module.
    class dev_accounts {
    
      $rootgroup = $osfamily ? {
    
        'Debian'  => 'sudo',
    
        'RedHat'  => 'wheel',
    
        default   => warning('This distribution is not supported by the Accounts module'),
    
      }
    
    
    
      include accounts::groups
    
    
    
      user { 'username':
    
        ensure      => present,
    
        home        => '/home/admin/postresql',
    
        shell       => '/bin/bash',
    
        managehome  => true,
    
        gid         => 'admin_db',
    
        groups      => "$rootgroup",
    
        password    => '$1$7URTNNqb$65ca6wPFDvixURc/MMg7O1'
    
      }
    
    }

    In the next section, we’ll show you how to generate the contents of the examples folder as well the commands to test and publish each module.

    Configuring Puppet for PostgreSQL

    Before to present the several configuration examples to deploy and maintain a PostgreSQL database it’s necessary to install the PostgreSQL puppet module (on the server host) to use all of their functionalities:

    $ sudo /opt/puppetlabs/bin/puppet module install puppetlabs-postgresql

    Currently, thousands of modules ready to use on Puppet are available on the public module repository Puppet Forge.

    Step One

    Configure and deploy a new PostgreSQL instance. Here is all the necessary programming and configuration to install a new PostgreSQL instance in all nodes.

    The first step is to create a new module structure directory as shared previously:

    $ cd /etc/puppetlabs/code/environments/production/modules
    
    $ mkdir db_postgresql_admin
    
    $ cd db_postgresql_admin; mkdir{examples,files,manifests,templates}

    Then, in the manifest file manifests/init.pp, you need to include the class postgresql::server provided by the installed module :

    class db_postgresql_admin{
    
      include postgresql::server
    
    }

    To check the syntax of the manifest, it's a good practice to execute the following command:

    $ sudo /opt/puppetlabs/bin/puppet parser validate init.pp

    If nothing is returned, it means that the syntax is correct

    To show you how to use this module in the example folder, it’s necessary to create a new manifest file init.pp with the following content:

    include db_postgresql_admin

    The example location in the module must be tested and applied to the master catalog:

    $ sudo /opt/puppetlabs/bin/puppet apply --modulepath=/etc/puppetlabs/code/environments/production/modules --noop init.pp

    Finally, it’s necessary to define which module each node has access in the file “/etc/puppetlabs/code/environments/production/manifests/site.pp” :

    node ’agent.severalnines.com’,’agent2.severalnines.com’{
    
     include db_postgresql_admin
    
    }

    Or a default configuration for all nodes:

    node default {
    
     include db_postgresql_admin
    
    }

    Usually each 30min the nodes check the master catalog, nevertheless this query can be forced on node side by the following command:

    $ /opt/puppetlabs/bin/puppet agent -t

    Or if the purpose is to simulate the differences between the master configuration and the current node settings, it could be used the nopp parameter (no operation):

    $ /opt/puppetlabs/bin/puppet agent -t --noop

    Step Two

    Update the PostgreSQL instance to listen all interfaces. The previous installation defines an instance setting in a very restrictive mode: only allows connections on localhost as can be confirmed by the hosts associated for the port 5432 (defined for PostgreSQL):

    $ sudo netstat -ntlp|grep 5432
    
    tcp        0 0 127.0.0.1:5432          0.0.0.0:* LISTEN   3237/postgres       
    
    tcp6       0 0 ::1:5432                :::* LISTEN   3237/postgres       

    In order to allow listening all interface, it’s necessary to have the following content in the file /etc/puppetlabs/code/environments/production/modules/db_postgresql_admin/manifests/init.pp

    class db_postgresql_admin{
    
      class{‘postgresql:server’:
    
            listen_addresses=>’*’ #listening all interfaces
    
           }
    
    }

    In the example above there is declared the class postgresql::server and setting the parameter listen_addresses to “*” that means all interfaces.

    Now the port 5432 is associated with all interfaces, it can be confirmed with the following IP address/port: “0.0.0.0:5432”

    $ sudo netstat -ntlp|grep 5432
    
    tcp        0 0 0.0.0.0:5432            0.0.0.0:* LISTEN   1232/postgres       
    
    tcp6       0 0 :::5432                 :::* LISTEN   1232/postgres  

    To put back the initial setting: only allow database connections from localhost the listen_addresses parameter must be set to “localhost” or specifying a list of hosts, if desired:

    listen_addresses = 'agent2.severalnines.com,agent3.severalnines.com,localhost'

    To retrieve the new configuration from the master host, only it’s needed to request it on the node:

    $ /opt/puppetlabs/bin/puppet agent -t

    Step Three

    Create a PostgreSQL Database. The PostgreSQL instance can be created with a new database as well as a new user (with password) to use this database and a rule on pg_hab.conf file to allow the database connection for this new user:

    class db_postgresql_admin{
    
      class{‘postgresql:server’:
    
            listen_addresses=>’*’ #listening all interfaces
    
      }
    
    
    
       postgresql::server::db{‘nines_blog_db’:
    
         user => ‘severalnines’,          password=> postgresql_password(‘severalnines’,’passwd12’)
    
       }
    
    
    
       postgresql::server::pg_hba_rule{‘Authentication for severalnines’:
    
         Description =>’Open access to severalnines’,
    
         type => ‘local’,
    
    database => ‘nines_blog_db’,
    
         user => ‘severalnines’,
    
    address => ‘127.0.0.1/32’
    
             auth_method => ‘md5’
    
       }
    
    }

    This last resource has the name of “Authentication for severalnines” and the pg_hba.conf file will have one more additional rule:

    # Rule Name: Authentication for severalnines
    
    # Description: Open access for severalnines
    
    # Order: 150
    
    local   nines_blog_db   severalnines 127.0.0.1/32    md5

    To retrieve the new configuration from the master host, all that is needed is to request it on the node:

    $ /opt/puppetlabs/bin/puppet agent -t

    Step Four

    Create a Read-Only User.  To create a new user, with read only privileges, the following resources need to be added to the previous manifest:

    postgresql::server::role{‘Creation of a new role nines_reader’:
    
    createdb   => false,
    
    createrole => false,
    
    superuser => false,     password_hash=> postgresql_password(‘nines_reader’,’passwd13’)
    
    }
    
    postgresql::server::pg_hba_rule{‘Authentication for nines_reader’:
    
         description =>’Open access to nines_reader’,
    
         type => ‘host’,
    
    database => ‘nines_blog_db’,
    
         user => ‘nines_reader’,
    
    address => ‘192.168.1.10/32’,
    
             auth_method => ‘md5’
    
       }

    To retrieve the new configuration from the master host, all that is needed is to request it on the node:

    $ /opt/puppetlabs/bin/puppet agent -t

    Conclusion 

    In this blog post, we showed you the basic steps to deploy and start configuring your PostgreSQL database through an automatic and customized way on several nodes (which could even be virtual machines).

    These types of automation can help you to become more effective then doing it manually and PostgreSQL configuration can easily be performed by using several of the classes available in the puppetforge repository

    by Hugo Dias at November 20, 2019 08:17 PM

    November 19, 2019

    SeveralNines

    Converting from Asynchronous to Synchronous Replication in PostgreSQL

    High Availability is a requirement for just about every company around the world using PostgreSQL It is well known that PostgreSQL uses Streaming Replication as the replication method. PostgreSQL Streaming Replication is asynchronous by default, so it is possible to have some transactions committed in the primary node which have not yet been replicated to the standby server. This means there is the possibility of some potential data loss.

    This delay in the commit process is supposed to be very small... if the standby server is powerful enough to keep up with the load. If this small data loss risk is not acceptable in the company, you can also use synchronous replication instead of the default.

    In synchronous replication, each commit of a write transaction will wait until the confirmation that the commit has been written to the write-ahead log on disk of both the primary and standby server.

    This method minimizes the possibility of data loss. For data loss to occur you would need both the primary and the standby to fail at the same time.

    The disadvantage of this method is the same for all synchronous methods as with this method the response time for each write transaction increases. This is due to the need to wait until all the confirmations that the transaction was committed. Luckily, read-only transactions will not be affected by this but; only the write transactions.

    In this blog, you show you how to install a PostgreSQL Cluster from scratch, convert the asynchronous replication (default) to a synchronous one. I’ll also show you how to rollback  if the response time is not acceptable as you can easily go back to the previous state. You will see how to deploy, configure, and monitor a PostgreSQL synchronous replication easily using ClusterControl using only one tool for the entire process.

    Installing a PostgreSQL Cluster

    Let’s start to install and configure an async PostgreSQL replication, that is the usual replication mode used in a PostgreSQL cluster. We will use PostgreSQL 11 on CentOS 7.

    PostgreSQL Installation

    Following the PostgreSQL official installation guide, this task is pretty simple.

    First, install the repository:

    $ yum install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

    Install the PostgreSQL client and server packages:

    $ yum install postgresql11 postgresql11-server

    Initialize the database:

    $ /usr/pgsql-11/bin/postgresql-11-setup initdb
    
    $ systemctl enable postgresql-11
    
    $ systemctl start postgresql-11

    On the standby node, you can avoid the last command (start the database service) as you will restore a binary backup to create the streaming replication.

    Now, let’s see the configuration required by an asynchronous PostgreSQL replication.

    Configuring Asynchronous PostgreSQL Replication

    Primary Node Setup

    In the PostgreSQL primary node, you must use the following basic configuration to create an Async replication. The files that will be modified are postgresql.conf and pg_hba.conf. In general, they are in the data directory (/var/lib/pgsql/11/data/) but you can confirm it on the database side:

    postgres=# SELECT setting FROM pg_settings WHERE name = 'data_directory';
    
            setting
    
    ------------------------
    
     /var/lib/pgsql/11/data
    
    (1 row)

    Postgresql.conf

    Change or add the following parameters in the postgresql.conf configuration file.

    Here you need to add the IP address(es) where to listen on. The default value is 'localhost', and for this example, we’ll use  '*' for all IP addresses in the server.

    listen_addresses = '*' 

    Set the server port where to listen on. By default 5432. 

    port = 5432 

    Determine how much information is written to the WALs. The possible values are minimal, replica, or logical. The hot_standby value is mapped to replica and it is used to keep the compatibility with previous versions.

    wal_level = hot_standby 

    Set the max number of walsender processes, which manage the connection with a standby server.

    max_wal_senders = 16

    Set the minimum amount of WAL files to be kept in the pg_wal directory.

    wal_keep_segments = 32

    Changing these parameters requires a database service restart.

    $ systemctl restart postgresql-11

    Pg_hba.conf

    Change or add the following parameters in the pg_hba.conf configuration file.

    # TYPE  DATABASE        USER ADDRESS                 METHOD
    
    host  replication  replication_user  IP_STANDBY_NODE/32  md5
    
    host  replication  replication_user  IP_PRIMARY_NODE/32  md5

    As you can see, here you need to add the user access permission. The first column is the connection type, that can be host or local. Then, you need to specify database (replication), user, source IP Address and authentication method. Changing this file requires a database service reload.

    $ systemctl reload postgresql-11

    You should add this configuration in both primary and standby nodes, as you will need it if the standby node is promoted to master in case of failure.

    Now, you must create a replication user.

    Replication Role

    The ROLE (user) must have REPLICATION privilege to use it in the streaming replication.

    postgres=# CREATE ROLE replication_user WITH LOGIN PASSWORD 'PASSWORD' REPLICATION;
    
    CREATE ROLE

    After configuring the corresponding files and the user creation, you need to create a consistent backup from the primary node and restore it on the standby node.

    Standby Node Setup

    On the standby node, go to the /var/lib/pgsql/11/ directory and move or remove the current datadir:

    $ cd /var/lib/pgsql/11/
    
    $ mv data data.bk

    Then, run the pg_basebackup command to get the current primary datadir and assign the correct owner (postgres):

    $ pg_basebackup -h 192.168.100.145 -D /var/lib/pgsql/11/data/ -P -U replication_user --wal-method=stream
    
    $ chown -R postgres.postgres data

    Now, you must use the following basic configuration to create an Async replication. The file that will be modified is postgresql.conf, and you need to create a new recovery.conf file. Both will be located in /var/lib/pgsql/11/.

    Recovery.conf

    Specify that this server will be a standby server. If it is on, the server will continue recovering by fetching new WAL segments when the end of archived WAL is reached.

    standby_mode = 'on'

    Specify a connection string to be used for the standby server to connect to the primary node.

    primary_conninfo = 'host=IP_PRIMARY_NODE port=5432 user=replication_user password=PASSWORD'

    Specify recovering into a particular timeline. The default is to recover along the same timeline that was current when the base backup was taken. Setting this to “latest” recovers to the latest timeline found in the archive.

    recovery_target_timeline = 'latest'

    Specify a trigger file whose presence ends recovery in the standby. 

    trigger_file = '/tmp/failover_5432.trigger'

    Postgresql.conf

    Change or add the following parameters in the postgresql.conf configuration file.

    Determine how much information is written to the WALs. The possible values are minimal, replica, or logical. The hot_standby value is mapped to replica and it is used to keep the compatibility with previous versions. Changing this value requires a service restart.

    wal_level = hot_standby

    Allow the queries during recovery. Changing this value requires a service restart.

    hot_standby = on

    Starting Standby Node

    Now you have all the required configuration in place, you just need to start the database service on the standby node.

    $  systemctl start postgresql-11

    And check the database logs in /var/lib/pgsql/11/data/log/. You should have something like this:

    2019-11-18 20:23:57.440 UTC [1131] LOG:  entering standby mode
    
    2019-11-18 20:23:57.447 UTC [1131] LOG:  redo starts at 0/3000028
    
    2019-11-18 20:23:57.449 UTC [1131] LOG:  consistent recovery state reached at 0/30000F8
    
    2019-11-18 20:23:57.449 UTC [1129] LOG:  database system is ready to accept read only connections
    
    2019-11-18 20:23:57.457 UTC [1135] LOG:  started streaming WAL from primary at 0/4000000 on timeline 1

    You can also check the replication status in the primary node by running the following query:

    postgres=# SELECT pid,usename,application_name,state,sync_state FROM pg_stat_replication;
    
     pid  | usename      | application_name |   state | sync_state
    
    ------+------------------+------------------+-----------+------------
    
     1467 | replication_user | walreceiver      | streaming | async
    
    (1 row)

    As you can see, we are using an async replication.

    Converting Asynchronous PostgreSQL Replication to Synchronous Replication

    Now, it’s time to convert this async replication to a sync one, and for this, you will need to configure both the primary and the standby node.

    Primary Node

    In the PostgreSQL primary node, you must use this basic configuration in addition to the previous async configuration.

    Postgresql.conf

    Specify a list of standby servers that can support synchronous replication. This standby server name is the application_name setting in the standby’s recovery.conf file.

    synchronous_standby_names = 'pgsql_0_node_0'synchronous_standby_names = 'pgsql_0_node_0'

    Specifies whether transaction commit will wait for WAL records to be written to disk before the command returns a “success” indication to the client. The valid values are on, remote_apply, remote_write, local, and off. The default value is on.

    synchronous_commit = on

    Standby Node Setup 

    In the PostgreSQL standby node, you need to change the recovery.conf file adding the 'application_name value in the primary_conninfo parameter.

    Recovery.conf

    standby_mode = 'on'
    
    primary_conninfo = 'application_name=pgsql_0_node_0 host=IP_PRIMARY_NODE port=5432 user=replication_user password=PASSWORD'
    
    recovery_target_timeline = 'latest'
    
    trigger_file = '/tmp/failover_5432.trigger'

    Restart the database service in both the primary and in the standby nodes:

    $ service postgresql-11 restart

    Now, you should have your sync streaming replication up and running:

    postgres=# SELECT pid,usename,application_name,state,sync_state FROM pg_stat_replication;
    
     pid  | usename      | application_name |   state | sync_state
    
    ------+------------------+------------------+-----------+------------
    
     1561 | replication_user | pgsql_0_node_0   | streaming | sync
    
    (1 row)

    Rollback from Synchronous to Asynchronous PostgreSQL Replication

    If you need to go back to asynchronous PostgreSQL replication, you just need to rollback the changes performed in the postgresql.conf file on the primary node:

    Postgresql.conf

    #synchronous_standby_names = 'pgsql_0_node_0'
    
    #synchronous_commit = on

    And restart the database service.

    $ service postgresql-11 restart

    So now, you should have asynchronous replication again.

    postgres=# SELECT pid,usename,application_name,state,sync_state FROM pg_stat_replication;
    
     pid  | usename      | application_name |   state | sync_state
    
    ------+------------------+------------------+-----------+------------
    
     1625 | replication_user | pgsql_0_node_0   | streaming | async
    
    (1 row)

    How to Deploy a PostgreSQL Synchronous Replication Using ClusterControl

    With ClusterControl you can perform the deployment, configuration, and monitoring tasks all-in-one from the same job and you will be able to manage it from the same UI.

    We will assume that you have ClusterControl installed and it can access the database nodes via SSH. For more information about how to configure the ClusterControl access please refer to our official documentation.

    Go to ClusterControl and use the “Deploy” option to create a new PostgreSQL cluster.

    When selecting PostgreSQL, you must specify User, Key, or Password and a port to connect by SSH to our servers. You also need a name for your new cluster and if you want ClusterControl to install the corresponding software and configurations for you.

    After setting up the SSH access information, you must enter the data to access your database. You can also specify which repository to use.

    In the next step, you need to add your servers to the cluster that you are going to create. When adding your servers, you can enter IP or hostname. 

    And finally, in the last step, you can choose the replication method, which can be asynchronous or synchronous replication.

    That’s it. You can monitor the job status in the ClusterControl activity section.

    And when this job finishes, you will have your PostgreSQL synchronous cluster installed, configured and monitored by ClusterControl.

    Conclusion

    As we mentioned at the beginning of this blog, High Availability is a requirement for all companies, so you should know the available options to achieve it for each technology in use. For PostgreSQL, you can use synchronous streaming replication as the safest way to implement it, but this method doesn’t work for all environments and workloads. 

    Be careful with the latency generated by waiting for the confirmation of each transaction that could be a problem instead of a High Availability solution.

     

    by Sebastian Insausti at November 19, 2019 08:11 PM

    MariaDB Foundation

    2019 MariaDB Developers Unconference Shanghai Presentations

    The 2019 Shanghai MariaDB Developers Unconference is being hosted by Microsoft Shanghai, from 19 November. Slides will be added to this post as they become available. […]

    The post 2019 MariaDB Developers Unconference Shanghai Presentations appeared first on MariaDB.org.

    by Ian Gilfillan at November 19, 2019 09:43 AM

    November 18, 2019

    SeveralNines

    Database Load Balancing in the Cloud - MySQL Master Failover with ProxySQL 2.0: Part Two (Seamless Failover)

    In the previous blog we showed you how to set up an environment in Amazon AWS EC2 that consists of a Percona Server 8.0 Replication Cluster (in Master - Slave topology). We deployed ProxySQL and we configured our application (Sysbench). 

    We also used ClusterControl to make the deployment easier, faster and more stable. This is the environment we ended up with...

    This is how it looks in ClusterControl:

    In this blog post we are going to review the requirements and show you how, in this setup, you can seamlessly perform master switches.

    Seamless Master Switch with ProxySQL 2.0

    We are going to benefit from ProxySQL ability to queue connections if there are no nodes available in a hostgroup. ProxySQL utilizes hostgroups to differentiate between backend nodes with different roles. You can see the configuration on the screenshot below.

    In our case we have two host groups - hostgroup 10 contains writers (master) and hostgroup 20 contains slaves (and also it may contain master, depends on the configuration). As you may know, ProxySQL uses SQL interface for configuration. ClusterControl exposes most of the configuration options in the UI but some settings cannot be set up via ClusterControl (or they are configured automatically by ClusterControl). One of such settings is how the ProxySQL should detect and configure backend nodes in replication environment.

    mysql> SELECT * FROM mysql_replication_hostgroups;
    
    +------------------+------------------+------------+-------------+
    
    | writer_hostgroup | reader_hostgroup | check_type | comment     |
    
    +------------------+------------------+------------+-------------+
    
    | 10               | 20 | read_only  | host groups |
    
    +------------------+------------------+------------+-------------+
    
    1 row in set (0.00 sec)

    Configuration stored in mysql_replication_hostgroups table defines if and how ProxySQL will automatically assign master and slaves to correct hostgroups. In short, the configuration above tells ProxySQL to assign writers to HG10, readers to HG20. If a node is a writer or reader is determined by the state of variable ‘read_only’. If read_only is enabled, node is marked as reader and assigned to HG20. If not, node is marked as writer and assigned to HG10. On top of that we have a variable:

    Which determines if writer should also show up in the readers’ hostgroup or not. In our case it is set to ‘True’ thus our writer (master) is also a part of HG20.

    ProxySQL does not manage backend nodes but it does access them and check the state of them, including the state of the read_only variable. This is done by monitoring user, which has been configured by ClusterControl according to your input at the deployment time for ProxySQL. If the state of the variable changes, ProxySQL will reassign it to proper hostgroup, based on the value for read_only variable and based on the settings in mysql-monitor_writer_is_also_reader variable in ProxySQL.

    Here enters ClusterControl. ClusterControl monitors the state of the cluster. Should master is not available, failover will occur. It is more complex than that and we explained this process in detail in one of our earlier blogs. What is important for us is that, as long as it is safe, ClusterControl will execute the failover and in the process it will reconfigure read_only variables on old and new master. ProxySQL will see the change and modify its hostgroups accordingly. This will also happen in case of the regular slave promotion, which can easily be executed from ClusterControl by starting this job:

    The final outcome will be that the new master will be promoted and assigned to HG10 in ProxySQL while the old master will be reconfigured as a slave (and it will be a part of HG20 in ProxySQL). The process of master change may take a while depending on environment, application and traffic (it is even possible to failover in  11 seconds, as my colleague has tested). During this time database (master) will not be reachable in ProxySQL. This leads to some problems. For starters, the application will receive errors from the database and user experience will suffer - no one likes to see errors. Luckily, under some circumstances,  we can reduce the impact. The requirement for this is that the application does not use (at all or at that particular time) multi-statement transactions. This is quite expected - if you have a multi-statement transaction (so, BEGIN; … ; COMMIT;) you cannot move it from server to server because this will no longer be a transaction. In such cases the only safe way is to rollback the transaction and start once more on a new master. Prepared statements are also a no-no: they are prepared on a particular host (master) and they do not exist on slaves so once one slave will be promoted to a new master, it is not possible for it to execute prepared statements which has been prepared on old master. On the other hand if you run only auto-committed, single-statement transactions, you can benefit from the feature we are going to describe below.

    One of the great features ProxySQL has is an ability to queue incoming transactions if they are directed to a hostgroup that does not have any nodes available. This is defined by following two variables:

    ClusterControl increases them to 20 seconds, allowing even for quite some long failovers to perform without any error being sent to the application.

    Testing the Seamless Master Switch

    We are going to run the test in our environment. As the application we are going to use SysBench started as:

    while true ; do sysbench /root/sysbench/src/lua/oltp_read_write.lua --threads=4 --events=0 --time=3600 --reconnect=1 --mysql-socket=/tmp/proxysql.sock --mysql-user=sbtest --mysql-password=sbtest --tables=32 --report-interval=1 --skip-trx=on --table-size=100000 --db-ps-mode=disable --rate=5 run ; done

    Basically, we will run sysbench in a loop (in case an error show up). We will run it in 4 threads. Threads will reconnect after every transaction. There will be no multi-statement transactions and we will not use prepared statements. Then we will trigger the master switch by promoting a slave in the ClusterControl UI. This is how the master switch looks like from the application standpoint:

    [ 560s ] thds: 4 tps: 5.00 qps: 90.00 (r/w/o: 70.00/20.00/0.00) lat (ms,95%): 18.95 err/s: 0.00 reconn/s: 5.00
    
    [ 560s ] queue length: 0, concurrency: 0
    
    [ 561s ] thds: 4 tps: 5.00 qps: 90.00 (r/w/o: 70.00/20.00/0.00) lat (ms,95%): 17.01 err/s: 0.00 reconn/s: 5.00
    
    [ 561s ] queue length: 0, concurrency: 0
    
    [ 562s ] thds: 4 tps: 7.00 qps: 126.00 (r/w/o: 98.00/28.00/0.00) lat (ms,95%): 28.67 err/s: 0.00 reconn/s: 7.00
    
    [ 562s ] queue length: 0, concurrency: 0
    
    [ 563s ] thds: 4 tps: 3.00 qps: 68.00 (r/w/o: 56.00/12.00/0.00) lat (ms,95%): 17.95 err/s: 0.00 reconn/s: 3.00
    
    [ 563s ] queue length: 0, concurrency: 1

    We can see that the queries are being executed with low latency.

    [ 564s ] thds: 4 tps: 0.00 qps: 42.00 (r/w/o: 42.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
    
    [ 564s ] queue length: 1, concurrency: 4

    Then the queries paused - you can see this by the latency being zero and transactions per second being equal to zero as well.

    [ 565s ] thds: 4 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
    
    [ 565s ] queue length: 5, concurrency: 4
    
    [ 566s ] thds: 4 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
    
    [ 566s ] queue length: 15, concurrency: 4

    Two seconds in queue is growing, still no response coming from the database.

    [ 567s ] thds: 4 tps: 20.00 qps: 367.93 (r/w/o: 279.95/87.98/0.00) lat (ms,95%): 3639.94 err/s: 0.00 reconn/s: 20.00
    
    [ 567s ] queue length: 1, concurrency: 4

    After three seconds application was finally able to reach the database again. You can see the traffic is now non-zero and the queue length has been reduced. You can see the latency around 3.6 seconds - this is for how long the queries have been paused

    [ 568s ] thds: 4 tps: 10.00 qps: 116.04 (r/w/o: 84.03/32.01/0.00) lat (ms,95%): 539.71 err/s: 0.00 reconn/s: 10.00
    
    [ 568s ] queue length: 0, concurrency: 0
    
    [ 569s ] thds: 4 tps: 4.00 qps: 72.00 (r/w/o: 56.00/16.00/0.00) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 4.00
    
    [ 569s ] queue length: 0, concurrency: 0
    
    [ 570s ] thds: 4 tps: 8.00 qps: 144.01 (r/w/o: 112.00/32.00/0.00) lat (ms,95%): 24.83 err/s: 0.00 reconn/s: 8.00
    
    [ 570s ] queue length: 0, concurrency: 0
    
    [ 571s ] thds: 4 tps: 5.00 qps: 98.99 (r/w/o: 78.99/20.00/0.00) lat (ms,95%): 21.50 err/s: 0.00 reconn/s: 5.00
    
    [ 571s ] queue length: 0, concurrency: 1
    
    [ 572s ] thds: 4 tps: 5.00 qps: 80.98 (r/w/o: 60.99/20.00/0.00) lat (ms,95%): 17.95 err/s: 0.00 reconn/s: 5.00
    
    [ 572s ] queue length: 0, concurrency: 0
    
    [ 573s ] thds: 4 tps: 2.00 qps: 36.01 (r/w/o: 28.01/8.00/0.00) lat (ms,95%): 14.46 err/s: 0.00 reconn/s: 2.00
    
    [ 573s ] queue length: 0, concurrency: 0

    Everything is stable again, total impact for the master switch was 3.6 second increase in the latency and no traffic hitting database for 3.6 seconds. Other than that the master switch was transparent to the application. Of course, whether it will be 3.6 seconds or more depends on the environment, traffic and so on but as long as the master switch can be performed under 20 seconds, no error will be returned to the application.

    Conclusion

    As you can see, with ClusterControl and ProxySQL 2.0 you are just a couple of clicks from achieving a seamless failover and master switch for your MySQL Replication clusters.

    by krzysztof at November 18, 2019 04:20 PM

    November 17, 2019

    Valeriy Kravchuk

    Dynamic Tracing of MySQL Server with Timestamps Using gdb

    Some time ago I wanted a customer to trace some MariaDB function execution and make sure that when it is executed I get both timestamp of execution and some of the arguments printed into some log file. Our InnoDB guru in MariaDB, Marko Mäkelä, suggested to use gdb, set breakpoint on the function and use its command ... end syntax to print whatever we needed, and log the output.

    Adding a timestamp to each breakpoint "command" execution was the next step, and for this I've suggested to use plain "shell date" command. Redirecting output to the same file gdb uses to log everything indeed did the trick. Let me document it here for readers and myself. I'll consider the example of logging SQL queries I've used in the previous blog post on dynamic tracing with perf and use recent Percona Server 5.7.28-31 on Ubuntu 16.04 for tests.

    Skipping the details already explained in that previous post, let's assume I want to set a breakpoint at the dispatch_command() function and print the value of com_data->com_query.query every time it hit it. This is what I did with gdb after attaching it to the proper mysqld process:
    ...
    [New LWP 6562]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    0x00007fc563d8774d in poll () at ../sysdeps/unix/syscall-template.S:84
    84      ../sysdeps/unix/syscall-template.S: No such file or directory.
    (gdb) set height 0
    (gdb) set log on
    Copying output to gdb.txt.
    (gdb) b dispatch_command
    Breakpoint 1 at 0xbe9660: file /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc, line 1254.
    (gdb) command 1
    Type commands for breakpoint(s) 1, one per line.
    End with a line saying just "end".
    >shell date >> ./gdb.txt
    >p com_data->com_query.query
    >continue
    >end
    (gdb) continue
    Continuing.

    Key detail here is to use command and refer to the breakpoint by number, 1, and then append the output of the date command to the gdb.txt file in the current directory that is used by default to log the gdb output.

    Then in another shell I executed the following:
    openxs@ao756:~$ mysql -uroot test
    Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A

    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 2
    Server version: 5.7.28-31-log Percona Server (GPL), Release '31', Revision 'd14ef86'

    Copyright (c) 2009-2019 Percona LLC and/or its affiliates
    Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.

    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

    mysql> select 1;
    +---+
    | 1 |
    +---+
    | 1 |
    +---+
    1 row in set (0.04 sec)

    mysql> select 2;
    +---+
    | 2 |
    +---+
    | 2 |
    +---+
    1 row in set (0.04 sec)

    mysql> shutdown;
    Query OK, 0 rows affected (0.04 sec)

    mysql> quit
    Bye

    I had the output in the gdb window, but what's more important is that I've got it in the gdb.txt file as well. The content looked as follows:
    openxs@ao756:~$ ls -l gdb.txt
    -rw-r--r-- 1 root root 7123 лис 17 19:21 gdb.txt
    openxs@ao756:~$ cat gdb.txt
    Breakpoint 1 at 0xbe9660: file /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc, line 1254.
    Type commands for breakpoint(s) 1, one per line.
    End with a line saying just "end".
    Continuing.
    [New Thread 0x7fc566411700 (LWP 6611)]
    [Switching to Thread 0x7fc566411700 (LWP 6611)]

    Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (thd=thd@entry=
        0x7fc548631000, com_data=com_data@entry=0x7fc566410da0, command=COM_QUERY)
        at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc:1254
    1254    /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc: No such file or directory.
    неділя, 17 листопада 2019 19:20:47 +0200
    $1 = 0x7fc54865b021 "show databases"
    Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
        thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
        command=COM_QUERY)
        at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc:1254
    1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc
    неділя, 17 листопада 2019 19:20:47 +0200
    $2 = 0x7fc54865b021 "show tables"
    ...
    Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
        thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
        command=COM_QUERY)
        at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc:1254
    1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc
    неділя, 17 листопада 2019 19:20:47 +0200
    $9 = 0x7fc54865b021 "select @@version_comment limit 1"

    Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
        thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
        command=COM_QUERY)
        at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc:1254
    1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc
    неділя, 17 листопада 2019 19:21:01 +0200
    $10 = 0x7fc54865b021 "select 1"

    Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
        thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
        command=COM_QUERY)
        at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc:1254
    1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc
    неділя, 17 листопада 2019 19:21:08 +0200
    $11 = 0x7fc54865b021 "select 2"

    Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
        thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
        command=COM_QUERY)
        at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc:1254
    1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc
    неділя, 17 листопада 2019 19:21:12 +0200
    $12 = 0x7fc54865b021 "shutdown"
    [Thread 0x7fc566411700 (LWP 6611) exited]

    Thread 1 "mysqld" received signal SIGUSR1, User defined signal 1.
    [Switching to Thread 0x7fc56645b780 (LWP 6525)]
    0x00007fc563d8774d in poll () at ../sysdeps/unix/syscall-template.S:84
    84      ../sysdeps/unix/syscall-template.S: No such file or directory.
    Detaching from program: /usr/sbin/mysqld, process 6525
    openxs@ao756:~$
    That's it. For basic tracing of any function calls in MySQL or MariaDB server (or any binary with debuginfo available) including timestamps and any arguments, variables or expressions printed you do not strictly need anything but gdb. With different format options for date command you can format timestamp any way you need, for example, you can remove day and month names etc and get even nanosecond precision if you prefer:
    openxs@ao756:~$ for i in `seq 1 20`; do sleep 0.1; date +'%H:%m:%S.%N'; done
    20:11:41.126205172
    20:11:41.230087681
    ...
    20:11:42.996521734
    20:11:43.100229347
    openxs@ao756:~$
    * * * 
    Sometimes event basic tools allow to get useful results. This photo was made with my dumb Nokia phone while running. Just a stop for a moment. Nice and useful result, IMHO, same as with gdb trick discussed here.
    To summarize, for basic dynamic tracing with timestamps and arbitrary information printed you can use a command line debugger, like gdb. In some cases this simple approach is quite useful.

    This kind of tracing comes with a cost, both in terms of performance impact (awful) and additional steps to parse the output (we do not really care about all that breakpoint details, we justneed a timestamp and some values printed). eBPF and related dynamic tracing tools (bcc trace and bpftrace) may help with both problems, as I am going to demonstrate in upcoming blog posts. Stay tuned!

    by Valerii Kravchuk (noreply@blogger.com) at November 17, 2019 06:37 PM

    November 15, 2019

    SeveralNines

    Tips for Migrating from MySQL Replication to MySQL Galera Cluster 4.0

    We have previously blogged about What’s New in MySQL Galera Cluster 4.0, Handling Large Transactions with Streaming Replication and MariaDB 10.4 and presented some guides about using the new Streaming Replication feature in a part 1 & part 2 series.

    Moving your database technology from MySQL Replication to MySQL Galera Cluster requires you to have the right skills and an understanding of what you are doing to be successful. In this blog we’ll share some tips for migrating from a MySQL Replication setup to MySQL Galera Cluster 4.0 one.

    The Differences Between MySQL Replication and Galera Cluster

    If you're not yet familiar with Galera, we suggest you to go over our Galera Cluster for MySQL Tutorial. Galera Cluster uses a whole different level of replication based on synchronous replication, in contrast to the MySQL Replication which uses asynchronous replication (but could be configured also to achieve a semi-synchronous replication). 

    Galera Cluster also supports multi-master replication. It is capable of unconstrained parallel applying (i.e., “parallel replication”), multicast replication, and automatic node provisioning. 

    The primary focus of Galera Cluster is data consistency, whereas with MySQL Replication, it's prone to data inconsistency (which can be avoided with best practices and proper configuration such as enforcing read-only on the slaves to avoid unwanted writes within the slaves).

    Although transactions received by Galera are either applied to every node or not at all,  each of these nodes certifies the replicated write-set in the applier queue (transaction commits) which also includes information on all of the locks that were held by the database during the transaction. These write-set, once no conflicting locks identified, are applied. Up to this point, transactions are considered committed and continues to apply it to the tablespace. Unlike in asynchronous replication, this approach is also called virtually synchronous replication since the writes and commits happens in a logical synchronous mode but the actual writing and committing to the tablespace happens independently and goes asynchronous on each node.

    Unlike MySQL Replication, a Galera Cluster is a true multi-master, multi-threaded slave, a pure hot-standby, with no need for master-failover or read-write splitting. However, migrating to Galera Cluster doesn't mean an automatic answer to your problems. Galera Cluster supports only InnoDB, so there could be design modifications if you are using MyISAM or Memory storage engines. 

    Converting Non-InnoDB Tables to InnoDB

    Galera Cluster does allow you to use MyISAM, but this is not what Galera Cluster was designed for. Galera Cluster is designed to strictly implement data consistency within all of the nodes within the Cluster and this requires a strong ACID compliant database engine. InnoDB is an engine that has this strong capabilities in this area and is recommended that you use InnoDB; especially when dealing with transactions.

    If you're using ClusterControl, you can benefit easily to determine your database instance(s) for any MyISAM tables which is provided by Performance Advisors. You can find this under Performance → Advisors tab. For example,

    If you require MyISAM and MEMORY tables, you can still use it but make sure your data that does not need to be replicated. You can use your data stored for read-only and, use "START TRANSACTION READONLY" wherever appropriate.

    Adding Primary Keys To your InnoDB Tables

    Since Galera Cluster only supports InnoDB, it is very important that all of your tables must have a clustered index, (also called primary key or unique key).  To get the best performance from queries, inserts, and other database operations, it is very important that you must define every table with a unique key(s) since InnoDB uses the clustered index to optimize the most common lookup and DML operations for each table. This helps avoid long running queries within the cluster and possible can slow down write/read operations in the cluster.

    In ClusterControl, there are advisors which can notify you of this. For example, in your MySQL Replication master/slave cluster, you'll an alarm from the or view from the list of advisors. The example screenshot below reveals that you have no tables that has no primary key:

    Identify a Master (or Active-Writer) Node

    Galera Cluster is purely a true multi-master replication. However, it doesn't mean that you're all free to write whichever node you would like to target. One thing to identify is, when writing on a different node and a conflicting transaction will be detected, you'll get into a deadlock issue just like below:

    2019-11-14T21:14:03.797546Z 12 [Note] [MY-011825] [InnoDB] *** Priority TRANSACTION:
    
    TRANSACTION 728431, ACTIVE 0 sec starting index read
    
    mysql tables in use 1, locked 1
    
    MySQL thread id 12, OS thread handle 140504401893120, query id 1414279 Applying batch of row changes (update)
    
    2019-11-14T21:14:03.797696Z 12 [Note] [MY-011825] [InnoDB] *** Victim TRANSACTION:
    
    TRANSACTION 728426, ACTIVE 3 sec updating or deleting
    
    mysql tables in use 1, locked 1
    
    , undo log entries 11409
    
    MySQL thread id 57, OS thread handle 140504353195776, query id 1414228 localhost root updating
    
    update sbtest1_success set k=k+1 where id > 1000 and id < 100000
    
    2019-11-14T21:14:03.797709Z 12 [Note] [MY-011825] [InnoDB] *** WAITING FOR THIS LOCK TO BE GRANTED:
    
    RECORD LOCKS space id 1663 page no 11 n bits 144 index PRIMARY of table `sbtest`.`sbtest1_success` trx id 728426 lock_mode X
    
    Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
    
     0: len 8; hex 73757072656d756d; asc supremum;;

    The problem with multiple nodes writing without identifying a current active-writer node, you'll end up with these issues which are very common problems I've seen when using Galera Cluster when writing on multiple nodes at the same time. In order to avoid this, you can use single-master setup approach:

    From the documentation,

    To relax flow control, you might use the settings below:

    wsrep_provider_options = "gcs.fc_limit = 256; gcs.fc_factor = 0.99; gcs.fc_master_slave = YES"

    The above requires a server restart since fc_master_slave is not dynamic.

    Enable Debugging Mode For Logging Conflicts or Deadlocks

    Debugging or tracing issues with your Galera Cluster is very important. Locks in Galera is implemented differently compared to MySQL Replication. It uses optimistic locking when dealing with transactions cluster-wide. Unlike the MySQL Replication, it has only pessimistic locking which doesn't know if there's such same or conflicting transaction being executed in a co-master on a multi-master setup. Galera still uses pessimistic locking but on the local node since it's managed by InnoDB, which is the storage engine supported. Galera uses optimistic locking when it goes to other nodes. This means that no checks are made with other nodes on the cluster when local locks are attained (pessimistic locking). Galera assumes that, once the transaction passes the commit phase within the storage engine and the other nodes are informed, everything will be okay and no conflicts will arise.

    In practice, it's best to enable wsrep_logs_conflicts. This will log the details of conflicting MDL as well as InnoDB locks in the cluster. Enabling this variable can be set dynamically but caveat once this is enabled. It will verbosely populate your error-log file and can fill up your disk once your error-log file size is too large.

    Be Careful With Your DDL Queries

    Unlike MySQL Replication, running an ALTER statement can affect only incoming connections that requires to access or reference that table targeted by your ALTER statement. It can also affect slaves if the table is large and can bring slave lag. However, writes to your master won't be block as long as your queries does not conflict with the current ALTER. However, this is entirely not the case when running your DDL statements such as ALTER with Galera Cluster. ALTER statements  can bring problems such as Galera Cluster stuck due to cluster-wide lock or flow control starts to relax the replication while some nodes are recovering from large writes.

    In some situations, you might end up having downtime to your Galera Cluster if that table is too large and is a primary and vital table to your application. However, it can be achieved without downtime. As Rick James pointed out in his blog, you can follow the recommendations below:

    RSU vs TOI

    • Rolling Schema Upgrade = manually do one node (offline) at a time
    • Total Order Isolation = Galera synchronizes so that it is done at the same time (in the replication sequence) on all nodes. RSU and TOI

    Caution: Since there is no way to synchronize the clients with the DDL, you must make sure that the clients are happy with either the old or the new schema. Otherwise, you will probably need to take down the entire cluster while simultaneously switching over both the schema and the client code.

    A "fast" DDL may as well be done via TOI. This is a tentative list of such:

    • CREATE/DROP/RENAME DATABASE/TABLE
    • ALTER to change DEFAULT
    • ALTER to change definition of ENUM or SET (see caveats in manual)
    • Certain PARTITION ALTERs that are fast.
    • DROP INDEX (other than PRIMARY KEY)
    • ADD INDEX?
    • Other ALTERs on 'small' tables.
    • With 5.6 and especially 5.7 having a lot of ALTER ALGORITHM=INPLACE cases, check which ALTERs should be done which way.

    Otherwise, use RSU. Do the following separately for each node:

    SET GLOBAL wsrep_OSU_method='RSU';

    This also takes the node out of the cluster.

    ALTER TABLE
    SET GLOBAL wsrep_OSU_method='TOI';

    Puts back in, leading to resync (hopefully a quick IST, not a slow SST)

    Preserve the Consistency Of Your Cluster

    Galera Cluster does not support replication filters such as binlog_do_db or binlog_ignore_db since Galera does not rely with binary logging. It relies on the ring-buffer file also called GCache which stores write-sets that are replicated along the cluster. You cannot apply any inconsistent behavior or state of such database nodes. 

    Galera, on the other hand, strictly implements data consistency within the cluster. It's still possible that there can be inconsistency where rows or records cannot be found. For example, setting your variable wsrep_OSU_method either RSU or TOI for your DDL ALTER statements might bring inconsistent behavior.  Check this external blog from Percona discussing about inconsistency with Galera with TOI vs RSU.

    Setting wsrep_on=OFF and subsequently run DML or DDL queries can be dangerous to your cluster. You must also review your stored procedures, triggers, functions, events, or views if results are not dependent on a node's state or environment. When a certain node(s) can be inconsistent, it can potentially bring the entire cluster to go down. Once Galera detects an inconsistent behavior, Galera will attempt to leave the cluster and terminate that node. Hence, it's possible that all of the nodes can be inconsistent leaving you under a state of dilemma. 

    If a Galera Cluster node as well experiences a crash especially upon a high-traffic period, it's better not to start right away the node. Instead, perform a full SST or bring a new instance as soon as possible or once the traffic goes low. It can be possible that node can bring inconsistent behavior which might have corrupted data. 

    Segregate Large Transactions and Determine Whether to Use Streaming Replication 

    Let's get straight on this one. One of the biggest changes features especially on Galera Cluster 4.0 is the streaming replication. Past versions of Galera Cluster 4.0, it limits transactions < 2GiB which is typically controlled by variables wsrep_max_ws_rows and wsrep_max_ws_size. Since Galera Cluster 4.0, you can able to send > 2GiB of transactions but you must determine how large the fragments has to be processed during replication. It has to be set by session and the only variables you need to take care are wsrep_trx_fragment_unit and wsrep_trx_fragment_size. Disabling the Streaming Replication is simple as setting the wsrep_trx_fragment_size = 0 will do it. Take note that, replicating a large transaction also possess overhead on the slave nodes (nodes that are replicating against the current active-writer/master node) since logs will be written to wsrep_streaming_log table in the MySQL database.

    Another thing to add, since you're dealing with large transaction, it's considerable that your transaction might take some time to finish so setting the variable innodb_lock_wait_timeout high must have to be taken into account. Set this via session depending on the time you estimate but larger than the time you estimate it to finish, otherwise raise a timeout.

    We recommend you read this previous blog about streaming replication in action.

    Replicating GRANTs Statements

    If you're using GRANTs and related operations act on the MyISAM/Aria tables in the database `mysql`. The GRANT statements will be replicated, but the underlying tables will not. So this means, INSERT INTO mysql.user ... will not be replicated because the table is MyISAM.

    However, the above might not be true anymore since Percona XtraDB Cluster(PXC) 8.0 (currently experimental) as mysql schema tables have been converted to InnoDB, whilst in MariaDB 10.4, some of the tables are still in Aria format but others are in CSV or InnoDB. You should determine what version and provider of Galera you have but best to avoid using DML statements referencing mysql schema. Otherwise, you might end up on unexpected results unless you're sure that this is PXC 8.0.

    XA Transactions, LOCK/UNLOCK TABLES, GET_LOCK/RELEASE_LOCK are Not Supported

    Galera Cluster does not support XA Transactions since XA Transactions handles rollback and commits differently. LOCK/UNLOCK or GET_LOCK/RELEASE_LOCK statements are dangerous to be applied or used with Galera. You might experience a crash or locks that are not killable and stay locked. For example,

    ---TRANSACTION 728448, ACTIVE (PREPARED) 13356 sec
    
    mysql tables in use 2, locked 2
    
    3 lock struct(s), heap size 1136, 5 row lock(s), undo log entries 5
    
    MySQL thread id 67, OS thread handle 140504353195776, query id 1798932 localhost root wsrep: write set replicated and certified (13)
    
    insert into sbtest1(k,c,pad) select k,c,pad from sbtest1_success limit 5

    This transaction has already been unlocked and even been killed but to no avail. We suggest that you have to redesign your application client and get rid of these functions when migrating to Galera Cluster.

    Network Stability is a MUST!!!

    Galera Cluster can work even with inter-WAN topology or inter-geo topology without any issues (check this blog about implementing inter-geo topology with Galera). However, if your network connectivity between each nodes is not stable or intermittently going down for an unsuspected time, it can be problematic for the cluster. It's best you have a cluster running in a private and local network where each of these nodes are connected. When designing a node as a disaster recovery, then plan to create a cluster if these are on a different region or geography.  You may start reading our previous blog, Using MySQL Galera Cluster Replication to Create a Geo-Distributed Cluster: Part One as this could help you best to decide your Galera Cluster topology.

    Another thing to add about investing your network hardware, it would be problematic if your network transfer rate provides you a lower speed during rebuilding of an instance during IST or worse at SST especially if your data set is massive. It can take long hours of network transfer and that might affect the stability of your cluster especially if you have a 3-node cluster while 2 nodes are not available where these 2 are a donor and a joiner. Take note that, during SST phase, the DONOR/JOINER nodes cannot be in-used until it's finally able to sync with the primary cluster.

    In previous version of Galera, when it comes to donor node selection, the State Snapshot Transfer (SST) donor was selected at random. In Glera 4, it has much more improved and has the ability to choose the right donor within the cluster, as it will favour a donor that can provide an Incremental State Transfer (IST), or pick a donor in the same segment. Alternatively, you can set wsrep_sst_donor variable to the right donor you would like to always pick.

    Backup Your Data and Do Rigid Testing During Migration and Before Production

    Once you are suit up and has decided to try and migrate your data to Galera Cluster 4.0, make sure you always have your backup prepared. If you tried ClusterControl, taking backups shall be easier to do this.

    Ensure that you are migrating to the right version of InnoDB and do not forget to always apply and run mysql_upgrade before doing the test. Ensure that all your test passes the desired result from which the MySQL Replication can offer you. Most likely, there's no difference with the InnoDB storage engine you're using in a MySQL Replication Cluster versus the MySQL Galera Cluster as long as the recommendations and tips have been applied and prepared beforehand.

    Conclusion

    Migrating to Galera Cluster 4.0 might not be your desired database technology solution. However, it is not pulling you away to utilize Galera Cluster 4.0 as long as its specific requirements can be prepared, setup, and provided. Galera Cluster 4.0 has now become a very powerful viable choice and option especially on a highly-available platform and solution. We also suggest that you read these external blogs about Galera Caveats or the Limitations of Galera Cluster or this manual from MariaDB.

    by Paul Namuag at November 15, 2019 07:34 PM

    November 14, 2019

    SeveralNines

    MySQL InnoDB Cluster 8.0 - A Complete Deployment Walk-Through: Part One

    MySQL InnoDB Cluster consists of 3 components:

    • MySQL Group Replication (a group of database server which replicates to each other with fault tolerance).
    • MySQL Router (query router to the healthy database nodes)
    • MySQL Shell (helper, client, configuration tool)

    In the first part of this walkthrough, we are going to deploy a MySQL InnoDB Cluster. There are a number of hands-on tutorial available online but this walkthrough covers all the necessary steps/commands to install and run the cluster in one place. We will be covering monitoring, management and scaling operations as well as some gotchas when dealing with MySQL InnoDB Cluster in the second part of this blog post.

    The following diagram illustrates our post-deployment architecture:

    We are going to deploy a total of 4 nodes; A three-node MySQL Group Replication and one MySQL router node co-located within the application server. All servers are running on Ubuntu 18.04 Bionic.

    Installing MySQL

    The following steps should be performed on all database nodes db1, db2 and db3.

    Firstly, we have to do some host mapping. This is crucial if you want to use hostname as the host identifier in InnoDB Cluster and this is the recommended way to do. Map all hosts as the following inside /etc/hosts:

    $ vi /etc/hosts
    192.168.10.40   router apps
    192.168.10.41   db1 db1.local
    192.168.10.42   db2 db2.local
    192.168.10.43   db3 db3.local
    127.0.0.1       localhost localhost.localdomain

    Stop and disable AppArmor:

    $ service apparmor stop
    $ service apparmor teardown
    $ systemctl disable apparmor

    Download the latest APT config repository from MySQL Ubuntu repository website at https://repo.mysql.com/apt/ubuntu/pool/mysql-apt-config/m/mysql-apt-config/. At the time of this writing, the latest one is dated 15-Oct-2019 which is mysql-apt-config_0.8.14-1_all.deb:

    $ wget https://repo.mysql.com/apt/ubuntu/pool/my